<feed xmlns='http://www.w3.org/2005/Atom'>
<title>llvm/test/Analysis/CostModel, branch master</title>
<subtitle>http://llvm.org</subtitle>
<id>https://git.amat.us/llvm/atom/test/Analysis/CostModel?h=master</id>
<link rel='self' href='https://git.amat.us/llvm/atom/test/Analysis/CostModel?h=master'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/'/>
<updated>2013-04-29T22:42:01Z</updated>
<entry>
<title>TBAA: remove !tbaa from testing cases if not used.</title>
<updated>2013-04-29T22:42:01Z</updated>
<author>
<name>Manman Ren</name>
<email>mren@apple.com</email>
</author>
<published>2013-04-29T22:42:01Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=e78d832097fc4df6f624150017c54c7a3189cd19'/>
<id>urn:sha1:e78d832097fc4df6f624150017c54c7a3189cd19</id>
<content type='text'>
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180743 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>ARM cost model: Integer div and rem is lowered to a function call</title>
<updated>2013-04-25T21:16:18Z</updated>
<author>
<name>Arnold Schwaighofer</name>
<email>aschwaighofer@apple.com</email>
</author>
<published>2013-04-25T21:16:18Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=45c9e0b412495c2d660918b0e964529bcb5e05b8'/>
<id>urn:sha1:45c9e0b412495c2d660918b0e964529bcb5e05b8</id>
<content type='text'>
Reflect this in the cost model. I observed this in MiBench/consumer-lame.

radar://13354716

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180576 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>Legalize vector truncates by parts rather than just splitting.</title>
<updated>2013-04-21T23:47:41Z</updated>
<author>
<name>Jim Grosbach</name>
<email>grosbach@apple.com</email>
</author>
<published>2013-04-21T23:47:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=0cb1019e9cd41237408eae09623eb9a34a4cbe0c'/>
<id>urn:sha1:0cb1019e9cd41237408eae09623eb9a34a4cbe0c</id>
<content type='text'>
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -&gt; v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -&gt; v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define &lt;16 x i8&gt; @test1(&lt;16 x i32&gt;* %ap) {
  %a = load &lt;16 x i32&gt;* %ap
  %tmp = trunc &lt;16 x i32&gt; %a to &lt;16 x i8&gt;
  ret &lt;16 x i8&gt; %tmp
}

define &lt;8 x i8&gt; @test2(&lt;8 x i32&gt;* %ap) {
  %a = load &lt;8 x i32&gt;* %ap
  %tmp = trunc &lt;8 x i32&gt; %a to &lt;8 x i8&gt;
  ret &lt;8 x i8&gt; %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>X86 cost model: Exit before calling getSimpleVT on non-simple VTs</title>
<updated>2013-04-17T20:04:53Z</updated>
<author>
<name>Arnold Schwaighofer</name>
<email>aschwaighofer@apple.com</email>
</author>
<published>2013-04-17T20:04:53Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=9c63f0d687cf1130ee2e76a6fdc87d71ae9d3961'/>
<id>urn:sha1:9c63f0d687cf1130ee2e76a6fdc87d71ae9d3961</id>
<content type='text'>
getSimpleVT can only handle simple value types.

radar://13676022

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179714 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.</title>
<updated>2013-04-12T21:15:03Z</updated>
<author>
<name>Nadav Rotem</name>
<email>nrotem@apple.com</email>
</author>
<published>2013-04-12T21:15:03Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=9eb366acba65b5779d2129db3a6fb6a0414572d4'/>
<id>urn:sha1:9eb366acba65b5779d2129db3a6fb6a0414572d4</id>
<content type='text'>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179413 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>X86 cost model: Model cost for uitofp and sitofp on SSE2</title>
<updated>2013-04-08T18:05:48Z</updated>
<author>
<name>Arnold Schwaighofer</name>
<email>aschwaighofer@apple.com</email>
</author>
<published>2013-04-08T18:05:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=813456527e73f0c1468514c523c6258d360bcd91'/>
<id>urn:sha1:813456527e73f0c1468514c523c6258d360bcd91</id>
<content type='text'>
The costs are overfitted so that I can still use the legalization factor.

For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.

unsigned short A[1024];
double B[1024];
void f() {
  int i;
  for (i = 0; i &lt; 1024; ++i) {
    B[i] = (double) A[i];
  }
}

radar://13599001

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179033 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>TargetLowering: Fix getTypeConversion handling of extended vector types</title>
<updated>2013-04-07T20:22:56Z</updated>
<author>
<name>Arnold Schwaighofer</name>
<email>aschwaighofer@apple.com</email>
</author>
<published>2013-04-07T20:22:56Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=cd3d60c4505efad809a3d8b4ba9aed315568f8d8'/>
<id>urn:sha1:cd3d60c4505efad809a3d8b4ba9aed315568f8d8</id>
<content type='text'>
The code in getTypeConversion attempts to promote the element vector type
before it trys to split or widen the vector.
After it failed finding a legal vector type by promoting it would continue using
the promoted vector element type. Thereby missing legal splitted vector types.
For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2
would be transformed to: v32i256 and from there on successively split to:
v16i256, v8i256, v1i256 and then finally ends up as an i64 type.
By resetting the vector element type to the original vector element type that
existed before the promotion the code will attempt to split the vector type to
smaller vector widths of the same type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178999 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>X86 cost model: Differentiate cost for vector shifts of constants</title>
<updated>2013-04-04T23:26:24Z</updated>
<author>
<name>Arnold Schwaighofer</name>
<email>aschwaighofer@apple.com</email>
</author>
<published>2013-04-04T23:26:24Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=2537f3c6597bc1b8eb14c76c8f8e7046be41c9ba'/>
<id>urn:sha1:2537f3c6597bc1b8eb14c76c8f8e7046be41c9ba</id>
<content type='text'>
SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.

With this change we differentiate between shifts of constants and other shifts.

radar://13576547

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>X86 cost model: Vector shifts are expensive in most cases</title>
<updated>2013-04-03T21:46:05Z</updated>
<author>
<name>Arnold Schwaighofer</name>
<email>aschwaighofer@apple.com</email>
</author>
<published>2013-04-03T21:46:05Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=6b6050b229976a2f53184f6d6857e6f445a869d0'/>
<id>urn:sha1:6b6050b229976a2f53184f6d6857e6f445a869d0</id>
<content type='text'>
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.

For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.

radar://13130673
radar://13537826

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
<entry>
<title>X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts.</title>
<updated>2013-04-01T10:23:49Z</updated>
<author>
<name>Benjamin Kramer</name>
<email>benny.kra@googlemail.com</email>
</author>
<published>2013-04-01T10:23:49Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/llvm/commit/?id=13497b3aa7589fc4f9e924f850a7e5151e9ddd2f'/>
<id>urn:sha1:13497b3aa7589fc4f9e924f850a7e5151e9ddd2f</id>
<content type='text'>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178459 91177308-0d34-0410-b5e6-96231b3b80d8
</content>
</entry>
</feed>
