Age | Commit message (Collapse) | Author |
|
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.
Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again. For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
%inlo = v4i32 extract_subvector %in, 0
%inhi = v4i32 extract_subvector %in, 4
%lo16 = v4i16 trunc v4i32 %inlo
%hi16 = v4i16 trunc v4i32 %inhi
%in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
%res = v8i8 trunc v8i16 %in16
This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.
Update the ARMTargetTransformInfo to take this improved legalization
into account.
Consider the simplified IR:
define <16 x i8> @test1(<16 x i32>* %ap) {
%a = load <16 x i32>* %ap
%tmp = trunc <16 x i32> %a to <16 x i8>
ret <16 x i8> %tmp
}
define <8 x i8> @test2(<8 x i32>* %ap) {
%a = load <8 x i32>* %ap
%tmp = trunc <8 x i32> %a to <8 x i8>
ret <8 x i8> %tmp
}
Previously, we would generate the truly hideous:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #20
bic sp, sp, #7
add r1, r0, #48
add r2, r0, #32
vld1.64 {d24, d25}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
vld1.64 {d18, d19}, [r2:128]
add r1, r0, #16
vmovn.i32 d22, q8
vld1.64 {d16, d17}, [r1:128]
vmovn.i32 d20, q9
vmovn.i32 d18, q12
vmov.u16 r0, d22[3]
strb r0, [sp, #15]
vmov.u16 r0, d22[2]
strb r0, [sp, #14]
vmov.u16 r0, d22[1]
strb r0, [sp, #13]
vmov.u16 r0, d22[0]
vmovn.i32 d16, q8
strb r0, [sp, #12]
vmov.u16 r0, d20[3]
strb r0, [sp, #11]
vmov.u16 r0, d20[2]
strb r0, [sp, #10]
vmov.u16 r0, d20[1]
strb r0, [sp, #9]
vmov.u16 r0, d20[0]
strb r0, [sp, #8]
vmov.u16 r0, d18[3]
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
vldmia sp, {d16, d17}
vmov r0, r1, d16
vmov r2, r3, d17
mov sp, r7
pop {r7}
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #12
bic sp, sp, #7
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d20, d21}, [r0:128]
vmovn.i32 d18, q8
vmov.u16 r0, d18[3]
vmovn.i32 d16, q10
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
ldm sp, {r0, r1}
mov sp, r7
pop {r7}
bx lr
Now, however, we generate the much more straightforward:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
add r1, r0, #48
add r2, r0, #32
vld1.64 {d20, d21}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
add r1, r0, #16
vld1.64 {d18, d19}, [r2:128]
vld1.64 {d22, d23}, [r1:128]
vmovn.i32 d17, q8
vmovn.i32 d16, q9
vmovn.i32 d18, q10
vmovn.i32 d19, q11
vmovn.i16 d17, q8
vmovn.i16 d16, q9
vmov r0, r1, d16
vmov r2, r3, d17
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d18, d19}, [r0:128]
vmovn.i32 d16, q8
vmovn.i32 d17, q9
vmovn.i16 d16, q8
vmov r0, r1, d16
bx lr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179939 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
For this we need to use a libcall. Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner. A test case from the bug report is included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178639 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
case of order propagation during isel.
Thanks Owen for the suggestion!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177525 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177465 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Just scalarize the element and rebuild a vector of the result type
from that.
rdar://13281568
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176614 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
with an initial number of elements, instead of DenseMap, which has
zero initial elements, in order to avoid the copying of elements
when the size changes and to avoid allocating space every time
LegalizeTypes is run. This patch will not affect the memory footprint,
because DenseMap will increase the element size to 64
when the first element is added.
Patch by Wan Xiaofei.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173448 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
fp128 is almost but not quite completely illegal as a type on AArch64. As a
result it needs to have a register class (for argument passing mainly), but all
operations need to be lowered to runtime calls. Currently there's no way for
targets to do this (without duplicating code), as the relevant functions are
hidden in SelectionDAG. This patch changes that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171971 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169224 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
If we need to split the operand of a VSELECT, it must be the mask operand. We
split the entire VSELECT operand with EXTRACT_SUBVECTOR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168883 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
- Due to the current matching vector elements constraints in ISD::FP_EXTEND,
rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
to convert it into a target-specific X86ISD::VFPEXT to work around this
constraints. This patch also reverts a previous attempt to fix this issue by
recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
reduces the overhead of supporting non-power-2 vector FP extend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165625 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162893 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
wider than the output element type. Make sure to trunc them if needed.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160235 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159092 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156909 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Instead of passing listener pointers to RAUW, let SelectionDAG itself
keep a linked list of interested listeners.
This makes it possible to have multiple listeners active at once, like
RAUWUpdateListener was already doing. It also makes it possible to
register listeners up the call stack without controlling all RAUW calls
below.
DAGUpdateListener uses an RAII pattern to add itself to the SelectionDAG
list of active listeners.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155248 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
would crash if it encountered a 1 element VSELECT. Solution is slightly more complicated than just creating a SELET as we have to mask or sign extend the vector condition if it had different boolean contents from the scalar condition. Fixes <rdar://problem/11178095>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153976 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
type.
2. Fix a typo in CONCAT_VECTORS which exposed the bug in #1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142648 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142488 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
DecomposeMERGE_VALUES to "know" that results are legalized in
a particular order, by passing it the number of the result
being legalized (the type legalization core provides this, it
just needs to be passed on).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140373 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
integer-promotion of CONCAT_VECTORS.
Test: test/CodeGen/X86/widen_shuffle-1.ll
This patch fixes the above tests (when running in with -promote-elements).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140372 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139851 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139692 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139159 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138887 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
by Micah Villmow. (No testcase because the issue only showed up in an out-of-tree backend.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138877 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
implements 64-bit atomic load/store for ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138872 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
is to use this for architectures that have a native FMA instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134742 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
calls if we haven't been able to lower them any
other way.
Fixes rdar://9090077 and rdar://9210061
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133288 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
GetDemandBits (which must operate on the vector element type).
Fix the a usage of getZeroExtendInReg which must also be done on scalar types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133052 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
legalize SDNodes such as BUILD_VECTOR, EXTRACT_VECTOR_ELT, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132689 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
the TargetLowering enum.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132418 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
LegalizeTypeAction.
This patch does not change the behavior of the type legalizer. The codegen
produces the same code.
This infrastructural change is needed in order to enable complex decisions
for vector types (needed by the vector-select patch).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132263 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
code in one place. Re-apply 131534 and fix the multi-step promotion of integers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132217 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Original log entry:
Refactor getActionType and getTypeToTransformTo ; place all of the 'decision'
code in one place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@131536 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
code in one place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@131534 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Also cleaning up some duplicated code while I'm here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128176 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126964 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
vector fp conversions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125482 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123909 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
to add/sub by doing the normal operation and then checking for overflow
afterwards. This generally relies on the DAG handling the later invalid
operations as well.
Fixes the 64-bit part of rdar://8622122 and rdar://8774702.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123908 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
typed atomics. This will lower exclusively to libcalls at the moment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122979 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@119990 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112171 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112104 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111982 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
expansion is the same as that used by LegalizeDAG.
The resulting code sucks in terms of performance/codesize on x86-32 for a
64-bit operation; I haven't looked into whether different expansions might be
better in general.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@105378 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@105283 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
to LLVM_LIBRARY_VISIBILITY and introduce LLVM_GLOBAL_VISIBILITY, which is
the opposite, for future use by dragonegg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103495 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
const_casts, and it reinforces the design of the Target classes being
immutable.
SelectionDAGISel::IsLegalToFold is now a static member function, because
PIC16 uses it in an unconventional way. There is more room for API
cleanup here.
And PIC16's AsmPrinter no longer uses TargetLowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@101635 91177308-0d34-0410-b5e6-96231b3b80d8
|