aboutsummaryrefslogtreecommitdiff
path: root/flang/lib/Frontend/CompilerInvocation.cpp
diff options
context:
space:
mode:
authorRohit Aggarwal <44664450+rohitaggarwal007@users.noreply.github.com>2025-05-06 12:55:10 +0530
committerGitHub <noreply@github.com>2025-05-06 15:25:10 +0800
commitfdbc30a383973d89d738283e733ba0db98df6a77 (patch)
treecbbed79edac6093dbb033e769331a1fb2c1f15d1 /flang/lib/Frontend/CompilerInvocation.cpp
parent7aabf47522625e227433cc9603e0b6858c5dd66d (diff)
downloadllvm-fdbc30a383973d89d738283e733ba0db98df6a77.zip
llvm-fdbc30a383973d89d738283e733ba0db98df6a77.tar.gz
llvm-fdbc30a383973d89d738283e733ba0db98df6a77.tar.bz2
[X86][DAGCombiner][SelectionDAG] - Fold Zext Build Vector to Bitcast of widen Build Vector (#135010)
I am working on a problem in which a kernel is SLP vectorized and lead to generation of insertelements followed by zext. On lowering, the assembly looks like below: vmovd %r9d, %xmm0 vpinsrb $1, (%rdi,%rsi), %xmm0, %xmm0 vpinsrb $2, (%rdi,%rsi,2), %xmm0, %xmm0 vpinsrb $3, (%rdi,%rcx), %xmm0, %xmm0 vpinsrb $4, (%rdi,%rsi,4), %xmm0, %xmm0 vpinsrb $5, (%rdi,%rax), %xmm0, %xmm0 vpinsrb $6, (%rdi,%rcx,2), %xmm0, %xmm0 vpinsrb $7, (%rdi,%r8), %xmm0, %xmm0 vpmovzxbw %xmm0, %xmm0 # xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero vpmaddwd (%rdx), %xmm0, %xmm0 After all the insrb, xmm0 looks like xmm0=xmm0[0],xmm0[1],xmm0[2],xmm0[3],xmm0[4],xmm0[5],xmm0[6],xmm0[7],zero,zero,zero,zero,zero,zero,zero,zero Here vpmovzxbw perform the extension of i8 to i16. But it is expensive operation and I want to remove it. Optimization Place the value in correct location while inserting so that zext can be avoid. While lowering, we can write a custom lowerOperation for zero_extend_vector_inreg opcode. We can override the current default operation with my custom in the legalization step. The changes proposed are state below: vpinsrb $2, (%rdi,%rsi), %xmm0, %xmm0 vpinsrb $4, (%rdi,%rsi,2), %xmm0, %xmm0 vpinsrb $6, (%rdi,%rcx), %xmm0, %xmm0 vpinsrb $8, (%rdi,%rsi,4), %xmm0, %xmm0 vpinsrb $a, (%rdi,%rax), %xmm0, %xmm0 vpinsrb $c, (%rdi,%rcx,2), %xmm0, %xmm0 vpinsrb $e, (%rdi,%r8), %xmm0, %xmm0 # xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero vpmaddwd (%rdx), %xmm0, %xmm0 More details in the discourse topic [https://discourse.llvm.org/t/improve-the-gathering-of-the-elements-so-that-unwanted-ext-operations-can-be-avoided/85443](url) --------- Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
Diffstat (limited to 'flang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions