aboutsummaryrefslogtreecommitdiff
path: root/test/Analysis/CostModel
AgeCommit message (Collapse)Author
2013-04-29TBAA: remove !tbaa from testing cases if not used.Manman Ren
This will make it easier to turn on struct-path aware TBAA since the metadata format will change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180743 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-25ARM cost model: Integer div and rem is lowered to a function callArnold Schwaighofer
Reflect this in the cost model. I observed this in MiBench/consumer-lame. radar://13354716 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180576 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-21Legalize vector truncates by parts rather than just splitting.Jim Grosbach
Rather than just splitting the input type and hoping for the best, apply a bit more cleverness. Just splitting the types until the source is legal often leads to an illegal result time, which is then widened and a scalarization step is introduced which leads to truly horrible code generation. With the loop vectorizer, these sorts of operations are much more common, and so it's worth extra effort to do them well. Add a legalization hook for the operands of a TRUNCATE node, which will be encountered after the result type has been legalized, but if the operand type is still illegal. If simple splitting of both types ends up with the result type of each half still being legal, just do that (v16i16 -> v16i8 on ARM, for example). If, however, that would result in an illegal result type (v8i32 -> v8i8 on ARM, for example), we can get more clever with power-two vectors. Specifically, split the input type, but also widen the result element size, then concatenate the halves and truncate again. For example on ARM, To perform a "%res = v8i8 trunc v8i32 %in" we transform to: %inlo = v4i32 extract_subvector %in, 0 %inhi = v4i32 extract_subvector %in, 4 %lo16 = v4i16 trunc v4i32 %inlo %hi16 = v4i16 trunc v4i32 %inhi %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16 %res = v8i8 trunc v8i16 %in16 This allows instruction selection to generate three VMOVN instructions instead of a sequences of moves, stores and loads. Update the ARMTargetTransformInfo to take this improved legalization into account. Consider the simplified IR: define <16 x i8> @test1(<16 x i32>* %ap) { %a = load <16 x i32>* %ap %tmp = trunc <16 x i32> %a to <16 x i8> ret <16 x i8> %tmp } define <8 x i8> @test2(<8 x i32>* %ap) { %a = load <8 x i32>* %ap %tmp = trunc <8 x i32> %a to <8 x i8> ret <8 x i8> %tmp } Previously, we would generate the truly hideous: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: push {r7} mov r7, sp sub sp, sp, #20 bic sp, sp, #7 add r1, r0, #48 add r2, r0, #32 vld1.64 {d24, d25}, [r0:128] vld1.64 {d16, d17}, [r1:128] vld1.64 {d18, d19}, [r2:128] add r1, r0, #16 vmovn.i32 d22, q8 vld1.64 {d16, d17}, [r1:128] vmovn.i32 d20, q9 vmovn.i32 d18, q12 vmov.u16 r0, d22[3] strb r0, [sp, #15] vmov.u16 r0, d22[2] strb r0, [sp, #14] vmov.u16 r0, d22[1] strb r0, [sp, #13] vmov.u16 r0, d22[0] vmovn.i32 d16, q8 strb r0, [sp, #12] vmov.u16 r0, d20[3] strb r0, [sp, #11] vmov.u16 r0, d20[2] strb r0, [sp, #10] vmov.u16 r0, d20[1] strb r0, [sp, #9] vmov.u16 r0, d20[0] strb r0, [sp, #8] vmov.u16 r0, d18[3] strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] vldmia sp, {d16, d17} vmov r0, r1, d16 vmov r2, r3, d17 mov sp, r7 pop {r7} bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: push {r7} mov r7, sp sub sp, sp, #12 bic sp, sp, #7 vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d20, d21}, [r0:128] vmovn.i32 d18, q8 vmov.u16 r0, d18[3] vmovn.i32 d16, q10 strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] ldm sp, {r0, r1} mov sp, r7 pop {r7} bx lr Now, however, we generate the much more straightforward: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: add r1, r0, #48 add r2, r0, #32 vld1.64 {d20, d21}, [r0:128] vld1.64 {d16, d17}, [r1:128] add r1, r0, #16 vld1.64 {d18, d19}, [r2:128] vld1.64 {d22, d23}, [r1:128] vmovn.i32 d17, q8 vmovn.i32 d16, q9 vmovn.i32 d18, q10 vmovn.i32 d19, q11 vmovn.i16 d17, q8 vmovn.i16 d16, q9 vmov r0, r1, d16 vmov r2, r3, d17 bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d18, d19}, [r0:128] vmovn.i32 d16, q8 vmovn.i32 d17, q9 vmovn.i16 d16, q8 vmov r0, r1, d16 bx lr git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-17X86 cost model: Exit before calling getSimpleVT on non-simple VTsArnold Schwaighofer
getSimpleVT can only handle simple value types. radar://13676022 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179714 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-12CostModel: increase the default cost of supported floating point operations ↵Nadav Rotem
from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179413 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-08X86 cost model: Model cost for uitofp and sitofp on SSE2Arnold Schwaighofer
The costs are overfitted so that I can still use the legalization factor. For example the following kernel has about half the throughput vectorized than unvectorized when compiled with SSE2. Before this patch we would vectorize it. unsigned short A[1024]; double B[1024]; void f() { int i; for (i = 0; i < 1024; ++i) { B[i] = (double) A[i]; } } radar://13599001 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179033 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-07TargetLowering: Fix getTypeConversion handling of extended vector typesArnold Schwaighofer
The code in getTypeConversion attempts to promote the element vector type before it trys to split or widen the vector. After it failed finding a legal vector type by promoting it would continue using the promoted vector element type. Thereby missing legal splitted vector types. For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2 would be transformed to: v32i256 and from there on successively split to: v16i256, v8i256, v1i256 and then finally ends up as an i64 type. By resetting the vector element type to the original vector element type that existed before the promotion the code will attempt to split the vector type to smaller vector widths of the same type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178999 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-04X86 cost model: Differentiate cost for vector shifts of constantsArnold Schwaighofer
SSE2 has efficient support for shifts by a scalar. My previous change of making shifts expensive did not take this into account marking all shifts as expensive. This would prevent vectorization from happening where it is actually beneficial. With this change we differentiate between shifts of constants and other shifts. radar://13576547 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-03X86 cost model: Vector shifts are expensive in most casesArnold Schwaighofer
The default logic does not correctly identify costs of casts because they are marked as custom on x86. For some cases, where the shift amount is a scalar we would be able to generate better code. Unfortunately, when this is the case the value (the splat) will get hoisted out of the loop, thereby making it invisible to ISel. radar://13130673 radar://13537826 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-01X86TTI: Add accurate costs for itofp operations, based on the actual ↵Benjamin Kramer
instruction counts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178459 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-20Correct cost model for vector shift on AVX2Michael Liao
- After moving logic recognizing vector shift with scalar amount from DAG combining into DAG lowering, we declare to customize all vector shifts even vector shift on AVX is legal. As a result, the cost model needs special tuning to identify these legal cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177586 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.Nadav Rotem
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177421 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19Improve long vector sext/zext lowering on ARMRenato Golin
The ARM backend currently has poor codegen for long sext/zext operations, such as v8i8 -> v8i32. This patch addresses this by performing a custom expansion in ARMISelLowering. It also adds/changes the cost of such lowering in ARMTTI. This partially addresses PR14867. Patch by Pete Couperus git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177380 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18ARM cost model: Make some vector integer to float casts cheaperArnold Schwaighofer
The default logic marks them as too expensive. For example, before this patch we estimated: cost of 16 for instruction: %r = uitofp <4 x i16> %v0 to <4 x float> While this translates to: vmovl.u16 q8, d16 vcvt.f32.u32 q8, q8 All other costs are left to the values assigned by the fallback logic. Theses costs are mostly reasonable in the sense that they get progressively more expensive as the instruction sequences emitted get longer. radar://13445992 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177334 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18ARM cost model: Correct cost for some cheap float to integer conversionsArnold Schwaighofer
Fix cost of some "cheap" cast instructions. Before this patch we used to estimate for example: cost of 16 for instruction: %r = fptoui <4 x float> %v0 to <4 x i16> While we would emit: vcvt.s32.f32 q8, q8 vmovn.i32 d16, q8 vuzp.8 d16, d17 All other costs are left to the values assigned by the fallback logic. Theses costs are mostly reasonable in the sense that they get progressively more expensive as the instruction sequences emitted get longer. radar://13434072 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177333 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-15ARM cost model: Fix costs for some vector selectsArnold Schwaighofer
I was too pessimistic in r177105. Vector selects that fit into a legal register type lower just fine. I was mislead by the code fragment that I was using. The stores/loads that I saw in those cases came from lowering the conditional off an address. Changing the code fragment to: %T0_3 = type <8 x i18> %T1_3 = type <8 x i1> define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2, %T1_3* %blend, %T0_3* %storeaddr) { %v0 = load %T0_3* %loadaddr %v1 = load %T0_3* %loadaddr2 ==> FROM: ;%c = load %T1_3* %blend ==> TO: %c = icmp slt %T0_3 %v0, %v1 ==> USE: %r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1 store %T0_3 %r, %T0_3* %storeaddr ret void } revealed this mistake. radar://13403975 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177170 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-15ARM cost model: Fix cost of fptrunc and fpext instructionsArnold Schwaighofer
A vector fptrunc and fpext simply gets split into scalar instructions. radar://13192358 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177159 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-14ARM cost model: Increase cost of some vector selects we do terrible onArnold Schwaighofer
By terrible I mean we store/load from the stack. This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update) where we decide to vectorize a loop with a VF of 8 resulting in a 25% degradation on a cortex-a8. LV: Found an estimated cost of 2 for VF 8 For instruction: icmp slt i32 LV: Found an estimated cost of 2 for VF 8 For instruction: select i1, i32, i32 The bug that tracks the CodeGen part is PR14868. radar://13403975 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177105 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-12ARM cost model: Increase the cost for vector casts that use the stackArnold Schwaighofer
Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend currently lowers those using stack accesses. This was responsible for a significant degradation on MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 where we vectorize one loop to a vector factor of 16. After this patch we select a vector factor of 4 which will generate reasonable code. unsigned char cle[32]; void test(short c) { unsigned short compte; for (compte = 0; compte <= 31; compte++) { cle[compte] = cle[compte] ^ c; } } radar://13220512 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176898 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-02X86 cost model: Adjust cost for custom lowered vector multipliesArnold Schwaighofer
This matters for example in following matrix multiply: int **mmult(int rows, int cols, int **m1, int **m2, int **m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-28Cost model support for lowered math builtins.Benjamin Kramer
We make the cost for calling libm functions extremely high as emitting the calls is expensive and causes spills (on x86) so performance suffers. We still vectorize important calls like ceilf and friends on SSE4.1. and fabs. Differential Revision: http://llvm-reviews.chandlerc.com/D466 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176287 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-20I optimized the following patterns:Elena Demikhovsky
sext <4 x i1> to <4 x i64> sext <4 x i8> to <4 x i64> sext <4 x i16> to <4 x i64> I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns: (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT))) The sext_in_reg (v4i32 x) may be lowered to shl+sar operations. The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution. I also added a cost of this operations to the AVX costs table. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175619 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-12ARM cost model: Add vector reverse shuffle costsArnold Schwaighofer
A reverse shuffle is lowered to a vrev and possibly a vext instruction (quad word). radar://13171406 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174933 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-08Refine fix to bug 15041.Bill Schmidt
Thanks to help from Nadav and Hal, I have a more reasonable (and even correct!) approach. This specifically penalizes the insertelement and extractelement operations for the performance hit that will occur on PowerPC processors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174725 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-08ARM cost model: Address computation in vector mem ops not freeArnold Schwaighofer
Adds a function to target transform info to query for the cost of address computation. The cost model analysis pass now also queries this interface. The code in LoopVectorize adds the cost of address computation as part of the memory instruction cost calculation. Only there, we know whether the instruction will be scalarized or not. Increase the penality for inserting in to D registers on swift. This becomes necessary because we now always assume that address computation has a cost and three is a closer value to the architecture. radar://13097204 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174713 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-07ARM cost model: Add costs for vector selectsArnold Schwaighofer
Vector selects are cheap on NEON. They get lowered to a vbsl instruction. radar://13158753 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174631 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-05ARM cost model: Cost for scalar integer casts and floating point conversionsArnold Schwaighofer
Also adds some costs for vector integer float conversions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174371 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-04ARM cost model: Penalize insertelement into D subregistersArnold Schwaighofer
Swift has a renaming dependency if we load into D subregisters. We don't have a way of distinguishing between insertelement operations of values from loads and other values. Therefore, we are pessimistic for now (The performance problem showed up in example 14 of gcc-loops). radar://13096933 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174300 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-25Initial implementation of PPCTargetTransformInfoHal Finkel
This provides a place to add customized operation cost information and control some other target-specific IR-level transformations. The only non-trivial logic in this checkin assigns a higher cost to unaligned loads and stores (covered by the included test case). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173520 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-01Make opt grab the triple from the module and use it to initialize the target ↵Nadav Rotem
machine. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171341 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-23We are not ready to estimate the cost of integer expansions based on the ↵Nadav Rotem
number of parts. This test is too noisy. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170999 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-21Improve the X86 cost model for loads and stores.Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170830 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-18Reverse order of checking SSE level when calculating compare cost, so we checkJakub Staszak
AVX2 before AVX. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170464 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-05Cost Model: change the default cost of control flow instructions (br / ret / ↵Nadav Rotem
...) to zero. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169423 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-06CostModel: add another known vector trunc optimization.Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167488 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-06Cost Model: add tables for some avx type-conversion hacks.Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167480 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05CostModel: Add tables for the common x86 compares.Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167421 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05Code Model: Improve the accuracy of the zext/sext/trunc vector cost estimation.Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167412 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05Cost Model: Normalize the insert/extract index when splitting typesNadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167402 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05Cost Model: teach the cost model about expanding integers.Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167401 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05Implement the cost of abnormal x86 instruction lowering as a table.Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167395 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-03X86 CostModel: Add support for a some of the common arithmetic instructions ↵Nadav Rotem
for SSE4, AVX and AVX2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167347 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-02Add a stub for the x86 cost model impl. Implement a basic cost rule for ↵Nadav Rotem
inserting/extracting from XMM registers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167333 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-02CostModel: add support for Vector Insert and Extract.Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167329 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-02Add a cost model analysis that allows us to estimate the cost of IR-level ↵Nadav Rotem
instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167324 91177308-0d34-0410-b5e6-96231b3b80d8