riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Sanjay Patel <spatel@rotateright.com>	2015-12-24 21:17:56 +0000
committer	Sanjay Patel <spatel@rotateright.com>	2015-12-24 21:17:56 +0000
commit	ae945e7927e3a38cc3a44e829bc158c1ce5602ad (patch)
tree	8a2c332a77ed5ce4586989dfe8f17bf37d53e6f1 /llvm/lib/Target/WebAssembly/WebAssemblyRegStackify.cpp
parent	376c06c2b9622bbcb6ffa04b5caad011e4330d89 (diff)
download	llvm-ae945e7927e3a38cc3a44e829bc158c1ce5602ad.zip llvm-ae945e7927e3a38cc3a44e829bc158c1ce5602ad.tar.gz llvm-ae945e7927e3a38cc3a44e829bc158c1ce5602ad.tar.bz2

[InstCombine] transform more extract/insert pairs into shuffles (PR2109)

This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394

Diffstat (limited to 'llvm/lib/Target/WebAssembly/WebAssemblyRegStackify.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: