Age | Commit message (Collapse) | Author | |
---|---|---|---|
2012-12-22 | X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available. | Benjamin Kramer | |
pmuludq is slow, but it turns out that all the unpacking and packing of the scalarized mul is even slower. 10% speedup on loop-vectorized paq8p. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170985 91177308-0d34-0410-b5e6-96231b3b80d8 |