Age | Commit message (Collapse) | Author |
|
instructions use. This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12840 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
fld QWORD PTR [%ESP + 4]
fadd QWORD PTR [.CPItest_add_0]
instead of:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_add_0]
faddp %ST(1)
I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.
This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12839 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
1. If an incoming argument is dead, don't load it from the stack
2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
a move. This should reduce register pressure for allocators that are
unable to coallesce away these copies in some cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12835 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12815 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
is listed first and the address is listed second.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12795 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12787 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
InstSelectSimple.cpp:
Change the checks for proper I/O port address size into an exit() instead
of an assertion. Assertions aren't used in Release builds, and handling
this error should be graceful (not that this counts as graceful, but it's
more graceful).
Modified the generation of the IN/OUT instructions to have 0 arguments.
X86InstrInfo.td:
Added the OpSize attribute to the 16 bit IN and OUT instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12786 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
I/O port instructions on x86. The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output. This required adjustment to some
methods so that a leading comma would or would not be printed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12782 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12711 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Enable folding of long seteq/setne comparisons into branches and select instructions
Implement unfolded long relational comparisons against a constants a bit more efficiently
Folding comparisons changes code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
into code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
jne .LBB2 # PC rel: F
This speeds up 186.crafty by 6% with llc-ls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12702 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
comparing a long against zero got us this:
sub %ESP, 8
mov DWORD PTR [%ESP + 4], %ESI
mov DWORD PTR [%ESP], %EDI
mov %EAX, DWORD PTR [%ESP + 12]
mov %EDX, DWORD PTR [%ESP + 16]
mov %ECX, 0
mov %ESI, 0
mov %EDI, %EAX
xor %EDI, %ECX
mov %ECX, %EDX
xor %ECX, %ESI
or %EDI, %ECX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
Now it gets us this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12696 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
immediate. For
example, multiplying X*(1 + (1LL << 32)) now produces:
test:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %EAX, %ECX
add %EDX, %ECX
ret
[[[Note to Alkis: why isn't linear scan generating this code?? This might be a
problem with your intervals being too conservative:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
add %EDX, %EAX
ret
end note]]]
Whereas GCC produces this:
T:
sub %esp, 12
mov %edx, DWORD PTR [%esp+16]
mov DWORD PTR [%esp+8], %edi
mov %ecx, DWORD PTR [%esp+20]
xor %edi, %edi
mov DWORD PTR [%esp], %ebx
mov %ebx, %edi
mov %eax, %edx
mov DWORD PTR [%esp+4], %esi
add %ebx, %edx
mov %edi, DWORD PTR [%esp+8]
lea %edx, [%ecx+%ebx]
mov %esi, DWORD PTR [%esp+4]
mov %ebx, DWORD PTR [%esp]
add %esp, 12
ret
I'm not sure example what GCC is smoking here, but it looks like it has just
confused itself with a bunch of stack slots or something. The intel compiler
is better, but still not good:
T:
movl 4(%esp), %edx #2.11
movl 8(%esp), %eax #2.11
lea (%eax,%edx), %ecx #3.12
movl $1, %eax #3.12
mull %edx #3.12
addl %ecx, %edx #3.12
ret #3.12
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12693 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
long %test(long %X) {
%Y = mul long %X, 123
ret long %Y
}
we used to generate:
test:
sub %ESP, 12
mov DWORD PTR [%ESP + 8], %ESI
mov DWORD PTR [%ESP + 4], %EDI
mov DWORD PTR [%ESP], %EBX
mov %ECX, DWORD PTR [%ESP + 16]
mov %ESI, DWORD PTR [%ESP + 20]
mov %EDI, 123
mov %EBX, 0
mov %EAX, %ECX
mul %EDI
imul %ESI, %EDI
add %ESI, %EDX
imul %ECX, %EBX
add %ESI, %ECX
mov %EDX, %ESI
mov %EBX, DWORD PTR [%ESP]
mov %EDI, DWORD PTR [%ESP + 4]
mov %ESI, DWORD PTR [%ESP + 8]
add %ESP, 12
ret
Now we emit:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
mov %EDX, 123
mul %EDX
imul %ECX, %ECX, 123
add %ECX, %EDX
mov %EDX, %ECX
ret
Which, incidently, is substantially nicer than what GCC manages:
T:
sub %esp, 8
mov %eax, 123
mov DWORD PTR [%esp], %ebx
mov %ebx, DWORD PTR [%esp+16]
mov DWORD PTR [%esp+4], %esi
mov %esi, DWORD PTR [%esp+12]
imul %ecx, %ebx, 123
mov %ebx, DWORD PTR [%esp]
mul %esi
mov %esi, DWORD PTR [%esp+4]
add %esp, 8
lea %edx, [%ecx+%edx]
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12692 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
On this testcase:
long %test(long %X) {
%Y = shr long %X, ubyte 32
ret long %Y
}
instead of:
t:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
sar %EAX, 0
mov %EDX, 0
ret
we now emit:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
mov %EDX, 0
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12688 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12687 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
For example, on this instruction:
call void %test(long 1234)
Instead of this:
mov %EAX, 1234
mov %ECX, 0
mov DWORD PTR [%ESP], %EAX
mov DWORD PTR [%ESP + 4], %ECX
call test
We now emit this:
mov DWORD PTR [%ESP], 1234
mov DWORD PTR [%ESP + 4], 0
call test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12686 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
of the words of the constant is zeros. For example:
Y = and long X, 1234
now generates:
Yl = and Xl, 1234
Yh = 0
instead of:
Yl = and Xl, 1234
Yh = and Xh, 0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12685 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12684 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
This allows us to handle code like 'add long %X, 123456789012' more efficiently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12683 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
long %test(long %X) {
%Y = sub long 0, %X
ret long %Y
}
We used to generate:
test:
sub %ESP, 4
mov DWORD PTR [%ESP], %ESI
mov %ECX, DWORD PTR [%ESP + 8]
mov %ESI, DWORD PTR [%ESP + 12]
mov %EAX, 0
mov %EDX, 0
sub %EAX, %ECX
sbb %EDX, %ESI
mov %ESI, DWORD PTR [%ESP]
add %ESP, 4
ret
Now we generate:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
neg %EAX
adc %EDX, 0
neg %EDX
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12681 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
to eliminate
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12680 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
* In promote32, if we can just promote a constant value, do so instead of
promoting a constant dynamically.
* In visitReturn inst, actually USE the promote32 argument that takes a
Value*
The end result of this is that we now generate this:
test:
mov %EAX, 0
ret
instead of...
test:
mov %AX, 0
movzx %EAX, %AX
ret
for:
ushort %test() {
ret ushort 0
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12679 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
types and can have arbitrary 32- and 64-bit integer types indexing into
sequential types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12653 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12615 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12610 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12579 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
the X86 does not support a full set of fp cmove instructions, so we can't always
fold the condition into the select. :( Yuck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12577 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
using our broad selection of movcc instructions. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12560 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
support
folding compares into the select yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12553 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
an incoming value from a block, the selector would evaluate the constant
at the TOP of the block instead of at the end of the block. This made the
live range for the constant span the entire block, increasing register
pressure needlessly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12542 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12500 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
folding load instructions into other instructions across free instruction
boundaries. Perhaps this will also fix the other strange failures?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12494 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12357 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Intrinsic::va*. This avoid conflicting with macros in the stdlib.h file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12356 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
testcase like this:
int %test(int* %P, int %A) {
%Pv = load int* %P
%B = add int %A, %Pv
ret int %B
}
We now generate:
test:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
add %EAX, DWORD PTR [%ECX]
ret
Instead of:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
mov %EAX, DWORD PTR [%EAX]
add %EAX, %ECX
ret
... saving one instruction, and often a register. Note that there are a lot
of other instructions that could use this, but they aren't handled. I'm not
really interested in adding them, but mul/div and all of the FP instructions
could be supported as well if someone wanted to add them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12204 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12203 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12064 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
of generating this code:
mov %EAX, 4
mov DWORD PTR [%ESP], %EAX
mov %AX, 123
movsx %EAX, %AX
mov DWORD PTR [%ESP + 4], %EAX
call Y
we now generate:
mov DWORD PTR [%ESP], 4
mov DWORD PTR [%ESP + 4], 123
call Y
Which hurts the eyes less. :)
Considering that register pressure around call sites is already high (with all
of the callee clobber registers n stuff), this may help a lot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12028 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
to function calls, we would emit dead code, like this:
int Y(int, short, double);
int X() {
Y(4, 123, 4);
}
--- Old
X:
sub %ESP, 20
mov %EAX, 4
mov DWORD PTR [%ESP], %EAX
*** mov %AX, 123
mov %AX, 123
movsx %EAX, %AX
mov DWORD PTR [%ESP + 4], %EAX
fld QWORD PTR [.CPIX_0]
fstp QWORD PTR [%ESP + 8]
call Y
mov %EAX, 0
# IMPLICIT_USE %EAX %ESP
add %ESP, 20
ret
Now we emit:
X:
sub %ESP, 20
mov %EAX, 4
mov DWORD PTR [%ESP], %EAX
mov %AX, 123
movsx %EAX, %AX
mov DWORD PTR [%ESP + 4], %EAX
fld QWORD PTR [.CPIX_0]
fstp QWORD PTR [%ESP + 8]
call Y
mov %EAX, 0
# IMPLICIT_USE %EAX %ESP
add %ESP, 20
ret
Next up, eliminate the mov AX and movsx entirely!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12026 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
their names more decriptive. A name consists of the base name, a
default operand size followed by a character per operand with an
optional special size. For example:
ADD8rr -> add, 8-bit register, 8-bit register
IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11995 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Replace uses of addZImm with addImm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11992 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
block loops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11990 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11984 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
to denote this fact.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11972 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
denote this fact.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11971 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
consistent with the rest and also pepare for the addition of their
memory operand variants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11902 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
MRegisterInfo::is{Physical,Virtual}Register. Apply appropriate fixes
to relevant files.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11882 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
bugs. Thanks Brian!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11859 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
into X86
scaled indexes. This allows us to compile GEP's like this:
int* %test([10 x { int, { int } }]* %X, int %Idx) {
%Idx = cast int %Idx to long
%X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
ret int* %X
}
Into a single address computation:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
ret
Before it generated:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
shl %ECX, 3
add %EAX, %ECX
lea %EAX, DWORD PTR [%EAX + 4]
ret
This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot. On
bzip2 for example, we go from this:
10665 asm-printer - Number of machine instrs printed
40 ra-local - Number of loads/stores folded into instructions
1708 ra-local - Number of loads added
1532 ra-local - Number of stores added
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
2794 x86-peephole - Number of peephole optimization performed
to this:
9873 asm-printer - Number of machine instrs printed
41 ra-local - Number of loads/stores folded into instructions
1710 ra-local - Number of loads added
1521 ra-local - Number of stores added
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole - Number of peephole optimization performed
... and these types of instructions are often in tight loops.
Linear scan is also helped, but not as much. It goes from:
8787 asm-printer - Number of machine instrs printed
2389 liveintervals - Number of identity moves eliminated after coalescing
2288 liveintervals - Number of interval joins performed
3522 liveintervals - Number of intervals after coalescing
5810 liveintervals - Number of original intervals
700 spiller - Number of loads added
487 spiller - Number of stores added
303 spiller - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
363 x86-peephole - Number of peephole optimization performed
to:
7982 asm-printer - Number of machine instrs printed
1759 liveintervals - Number of identity moves eliminated after coalescing
1658 liveintervals - Number of interval joins performed
3282 liveintervals - Number of intervals after coalescing
4940 liveintervals - Number of original intervals
635 spiller - Number of loads added
452 spiller - Number of stores added
288 spiller - Number of register spills
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
258 x86-peephole - Number of peephole optimization performed
Though I'm not complaining about the drop in the number of intervals. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11820 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
MachineInstr
to do analysis.
*** FOLD getelementptr instructions into loads and stores when possible,
making use of some of the crazy X86 addressing modes.
For example, the following C++ program fragment:
struct complex {
double re, im;
complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
return arg + complex(1,0);
}
Used to be compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
*** mov %EDX, %ECX
fld QWORD PTR [%EDX]
fld1
faddp %ST(1)
*** add %ECX, 8
fld QWORD PTR [%ECX]
fldz
faddp %ST(1)
*** mov %ECX, %EAX
fxch %ST(1)
fstp QWORD PTR [%ECX]
*** add %EAX, 8
fstp QWORD PTR [%EAX]
ret
Now it is compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
fld QWORD PTR [%ECX]
fld1
faddp %ST(1)
fld QWORD PTR [%ECX + 8]
fldz
faddp %ST(1)
fxch %ST(1)
fstp QWORD PTR [%EAX]
fstp QWORD PTR [%EAX + 8]
ret
Other programs should see similar improvements, across the board. Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11819 91177308-0d34-0410-b5e6-96231b3b80d8
|