Age | Commit message (Collapse) | Author |
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177611 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
It's not yet clear if these instructions need a more careful model.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177599 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
This is used for all the expensive system instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177598 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177592 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177591 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- After moving logic recognizing vector shift with scalar amount from
DAG combining into DAG lowering, we declare to customize all vector
shifts even vector shift on AVX is legal. As a result, the cost model
needs special tuning to identify these legal cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177586 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177540 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177539 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- Move SRA/SRL/SHL lowering support from DAG combination to DAG lowering
to support extended 256-bit integer in AVX but not AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177478 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- Prepare moving logic from DAG combining into DAG lowering. There's no
functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177477 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- no functionality change
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177476 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177461 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177460 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177459 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
logic as a QOI cleanup. No functional change. Tests already in place.
rdar://13456414
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177446 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Add a new WriteZero SchedWrite type for the common dependency-breaking
instructions that clear a register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177442 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
an X86Operand, but also performs a Sema lookup and adds the sizing directive
when appropriate. Use this when parsing a bracketed statement. This is
necessary to get the instruction matching correct as well. Test case coming
on clang side.
rdar://13455408
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177439 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))),
(MOV64rm tglobaltlsaddr :$dst)>;
This pattern is invalid because the MOV64rm instruction expects a
source operand of type "i64mem", which is a subclass of X86MemOperand
and thus actually consists of five MI operands, but the Pat provides
only a single MI operand ("tglobaltlsaddr" matches an SDnode of
type ISD::TargetGlobalTLSAddress and provides a single output).
Thus, if the pattern were ever matched, subsequent uses of the MOV64rm
instruction pattern would access uninitialized memory. In addition,
with the TableGen patch I'm about to check in, this would actually be
reported as a build-time error.
Fortunately, the pattern does in fact never match, for at least two
independent reasons.
First, the code generator actually never generates a pattern of the
form (load (X86Wrapper (tglobaltlsaddr))). For most combinations of
TLS and code models, (tglobaltlsaddr) represents just an offset that
needs to be added to some base register, so it is never directly
dereferenced. The only exception is the initial-exec model, where
(tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot,
which *is* in fact directly dereferenced: but in that case, the
X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match.
Second, even if some patterns along those lines *were* ever generated,
we should not need an extra Pat pattern to match it. Instead, the
original MOV64rm instruction pattern ought to match directly, since
it uses an "addr" operand, which is implemented via the SelectAddr
C++ routine; this routine is supposed to accept the full range of
input DAGs that may be implemented by a single mov instruction,
including those cases involving ISD::TargetGlobalTLSAddress (and
actually does so e.g. in the initial-exec case as above).
To avoid build breaks (due to the above-mentioned error) after the
TableGen patch is checked in, I'm removing this Pat here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177426 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177421 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177418 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177417 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
logic as a QOI cleanup.
rdar://13445327
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177413 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
parsed one. Test case coming shortly.
rdar://13446980
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177347 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
We hitch a ride with the existing OpndItins class that was used to add
instruction itinerary classes in the many multiclasses in this file.
Use the link provided by the X86FoldableSchedWrite.Folded to find the
right SchedWrite for folded loads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177326 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
This new-style scheduling information is going to replace the
instruction iteneraries.
This also serves as a test case for Andy's fix in r177317.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177323 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
MinGW is almost completely compatible to MSVC, with the exception of the _tls_array global not being available.
Patch by David Nadlinger!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177257 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
immediate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177243 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177242 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Previously we weren't skipping the VVVV encoded register. Based on patch by Michael Liao.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177221 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Since almost all X86 instructions can fold loads, use a multiclass to
define register/memory pairs of SchedWrites.
An X86FoldableSchedWrite represents the register version of an
instruction. It holds a reference to the SchedWrite to use when the
instruction folds a load.
This will be used inside multiclasses that define rr and rm instruction
versions together.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177210 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177135 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177130 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
The new InstrSchedModel is easier to use than the instruction
itineraries. It will be used to model instruction latency and throughput
in modern Intel microarchitectures like Sandy Bridge.
InstrSchedModel should be able to coexist with instruction itinerary
classes, but for cleanliness we should switch the Atom processor model
to the new InstrSchedModel as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177122 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
the win64 calling convention.
rdar://13423768
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177113 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
from r177014.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177015 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
set. The VEX.B was being calculated from the wrong operand. Fixes at least some portion of PR14185.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177014 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
register to register moves should be switched from using the MRMSrcReg form to the MRMDestReg form if the source register is a 64-bit extended register and the destination register is not. This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177011 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- Fix the typo on type checking
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177010 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
rdar://13318048
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176828 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
LegalizeDAG.cpp uses the value of the comparison operands when checking
the legality of BR_CC, so DAGCombiner should do the same.
v2:
- Expand more BR_CC value types for NVPTX
v3:
- Expand correct BR_CC value types for Hexagon, Mips, and XCore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176694 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
That can usually be lowered efficiently and is common in sandybridge code.
It would be nice to do this in DAGCombiner but we can't insert arbitrary
BUILD_VECTORs this late.
Fixes PR15462.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176634 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- Phi nodes should be replaced/updated after lowering CMOV into branch
because 'mainMBB' updating operand in Phi node is changed.
- Add EFLAGS in livein before lowering the 2nd CMOV. It's necessary as
we will reuse the EFLAGS generated before the 1st lowered CMOV, which
won't clobber EFLAGS. However, we need explicitly specify that.
- '-attr=-cmov' test case are added.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176598 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- Clear 'mayStore' flag when loading from the atomic variable before the
spin loop
- Clear kill flag from one use to multiple use in registers forming the
address to that atomic variable
- don't use a physical register as live-in register in BB (neither entry
nor landing pad.) by copying it into virtual register
(patch by Cameron Zwarich)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176538 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
one-byte NOPs. If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact. From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12. This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176464 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
* Only apply divide bypass optimization when not optimizing for size.
* Fixed bug caused by constant for 0 value of type Int32,
used dividend type to generate the constant instead.
* For atom x86-64 apply the divide bypass to use 16-bit divides instead of
64-bit divides when operand values are small enough.
* Added lit tests for 64-bit divide bypass.
Patch by Tyler Nowicki!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176442 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
This matters for example in following matrix multiply:
int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
int i, j, k, val;
for (i=0; i<rows; i++) {
for (j=0; j<cols; j++) {
val = 0;
for (k=0; k<cols; k++) {
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
Taken from the test-suite benchmark Shootout.
We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).
Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.
I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:
for (i ...)
r += a[i] * 3;
for (i ...)
m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.
In each case the vectorized version was considerably slower.
radar://13304919
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
but TLI.getShiftAmountTy() so far only return scalar type. As a
result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
return target-specificed scalar type or the same vector type as the
1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176364 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176341 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176270 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176171 91177308-0d34-0410-b5e6-96231b3b80d8
|