Age | Commit message (Collapse) | Author |
|
If a switch instruction has a case for every possible value of its type,
with the same successor, SimplifyCFG would replace it with an icmp ult,
but the computation of the bound overflows in that case, which inverts
the test.
Patch by Jed Davis!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179587 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Two return types are not equivalent if one is a pointer and the other is an
integral. This is because we cannot bitcast a pointer to an integral value.
PR15185
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179569 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
vector-gather sequence out of loops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179562 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179542 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
-fslp-vectorize run the slp-vectorizer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179508 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179505 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179504 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1
The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.
Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
%shr = lshr i32 %X, 4
%and = and i32 %shr, 15
%add = add i32 %and, -14
%tobool = icmp ne i32 %add, 0
into:
%and = and i32 %X, 240
%tobool = icmp ne i32 %and, 224
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179493 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179483 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179479 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
and add the cost of extracting values from the roots of the tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179475 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179470 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
There is a Constant with non-constant operands: blockaddress.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179460 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.
For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.
The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179458 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Fixes PR15737.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179417 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179414 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179412 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
The transform will execute like so:
(A & ~B) == 0 --> (A & B) != 0
(A & ~B) != 0 --> (A & B) == 0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179386 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.
Example:
int a[] = { 2, 5, 2, 2}
int x = 80;
for()
x /= a[i];
Scalar:
x /= 2 // = 40
x /= 5 // = 8
x /= 2 // = 4
x /= 2 // = 2
Vectorized:
<80, 1> / <2,5> //= <40,0>
<40, 0> / <2,2> //= <20,0>
20*0 = 0
radar://13640654
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179381 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Allows LLVM to optimize sequences like the following:
%add = add nsw i32 %x, 1
%cmp = icmp sgt i32 %add, %y
into:
%cmp = icmp sge i32 %x, %y
as well as:
%add1 = add nsw i32 %x, 20
%add2 = add nsw i32 %y, 57
%cmp = icmp sge i32 %add1, %add2
into:
%add = add nsw i32 %y, 37
%cmp = icmp sle i32 %cmp, %x
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179316 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
When trying to collapse sequences of insertelement/extractelement
instructions into single shuffle instructions, there is one specific
case where the Instruction Combiner wrongly updates the resulting
Mask of shuffle indexes.
The problem is in function CollectShuffleElments.
If we have a sequence of insert/extract element instructions
like the one below:
%tmp1 = extractelement <4 x float> %LHS, i32 0
%tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1
%tmp3 = extractelement <4 x float> %RHS, i32 2
%tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3
Where:
. %RHS will have a mask of [4,5,6,7]
. %LHS will have a mask of [0,1,2,3]
The Mask of shuffle indexes is wrongly computed to [4,1,6,7]
instead of [4,0,6,7].
When analyzing %tmp2 in order to compute the Mask for the
resulting shuffle instruction, the algorithm forgets to update
the mask index at position 1 with the index associated to the
element extracted from %LHS by instruction %tmp1.
Patch by Andrea DiBiagio!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179291 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179280 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
expose it in the header file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179272 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
function calls when we check if it is safe to sink instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179207 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179206 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
rather than checking if the source and destination have the same number of
arguments and copying the attributes over directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179169 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179132 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:
1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).
2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.
3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.
This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:
void SAXPY(int *x, int *y, int a, int i) {
x[i] = a * x[i] + y[i];
x[i+1] = a * x[i+1] + y[i+1];
x[i+2] = a * x[i+2] + y[i+2];
x[i+3] = a * x[i+3] + y[i+3];
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179117 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
invalidation in Reassociate.
I brazenly think this change is slightly simpler than r178793 because:
- no "state" in functor
- "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]"
While I can reproduce the probelm in Valgrind, it is rather difficult to come up
a standalone testing case. The reason is that when an iterator is invalidated,
the stale invalidated elements are not yet clobbered by nonsense data, so the
optimizer can still proceed successfully.
Thank Benjamin for fixing this bug and generously providing the test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179062 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.
All of the existing tests continue to pass, and I've added a test from
the PR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178974 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178932 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178922 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
The normal dataflow sequence in the ARC optimizer consists of the following
states:
Retain -> CanRelease -> Use -> Release
The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).
If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:
Retain -> CanRelease -> Release (TopDown)
Retain <- Use <- Release (BottomUp)
*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178921 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178915 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
index.
This optimization is unstable at this moment; it
1) block us on a very important application
2) PR15200
3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
(the CHECK command compare the output against wrong result)
I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way. Although in theory downstream optimizaters might reveal
some gather/scatter optimization opportunities, the chance is quite slim.
For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure
out a stretch of memory tightenly cover the area accessed by the dynamic index.
rdar://13174884
PR15200
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178912 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
VisitInstructionsBottomUp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178895 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178893 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Pass down the fact that an operand is going to be a vector of constants.
This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.
radar://13576547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178809 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
OpndPtrs stored pointers into the Opnd vector that became invalid when the
vector grows. Store indices instead. Sadly I only have a large testcase that
only triggers under valgrind, so I didn't include it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178793 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
ObjCARCOpt::OptimizeReturns.
Now ObjCARCOpt::OptimizeReturns is easy to read and reason about.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178715 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
ObjCARCOpt::OptimizeReturns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178714 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
Cleaned up trailing whitespace and added extra slashes in front of a
function level comment so that it follow the convention of having 3
slashes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178712 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
HasSafePathToPredecessorCall.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178710 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178709 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
conditional macros that no-op in Release mode instead of #ifdef sections of the code.
This is to follow the example of the DEBUG macro.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178705 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
objc_autoreleaseReturnValue.
The semantics of ARC implies that a pointer passed into an objc_autorelease
must live until some point (potentially down the stack) where an
autorelease pool is popped. On the other hand, an
objc_autoreleaseReturnValue just signifies that the object must live
until the end of the given function at least.
Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in
terms of the semantics of ARC* implying that performing the given
strength reduction without any knowledge of how this relates to
the autorelease pool pop that is further up the stack violates the
semantics of ARC.
*Even though objc_autoreleaseReturnValue if you know that no RV
optimization will occur is more computationally expensive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178612 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178605 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
The iterator could be invalidated when it's recursively deleting a whole bunch
of constant expressions in a constant initializer.
Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was
run on a `.ll' file, it wouldn't crash. This is why the test first pushes the
`.ll' file through `llvm-as' before feeding it to `opt'.
PR15440
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178531 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178484 91177308-0d34-0410-b5e6-96231b3b80d8
|
|
rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
only useful when c1=c2
rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2
It reduces an application's size (in terms of # of instructions) by 8.9%.
Reviwed by Pete Cooper. Thanks a lot!
rdar://13212115
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178409 91177308-0d34-0410-b5e6-96231b3b80d8
|