diff options
author | Rohit Aggarwal <44664450+rohitaggarwal007@users.noreply.github.com> | 2025-05-06 12:55:10 +0530 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-05-06 15:25:10 +0800 |
commit | fdbc30a383973d89d738283e733ba0db98df6a77 (patch) | |
tree | cbbed79edac6093dbb033e769331a1fb2c1f15d1 /flang/lib/Frontend/CompilerInvocation.cpp | |
parent | 7aabf47522625e227433cc9603e0b6858c5dd66d (diff) | |
download | llvm-fdbc30a383973d89d738283e733ba0db98df6a77.zip llvm-fdbc30a383973d89d738283e733ba0db98df6a77.tar.gz llvm-fdbc30a383973d89d738283e733ba0db98df6a77.tar.bz2 |
[X86][DAGCombiner][SelectionDAG] - Fold Zext Build Vector to Bitcast of widen Build Vector (#135010)
I am working on a problem in which a kernel is SLP vectorized and lead
to generation of insertelements followed by zext. On lowering, the
assembly looks like below:
vmovd %r9d, %xmm0
vpinsrb $1, (%rdi,%rsi), %xmm0, %xmm0
vpinsrb $2, (%rdi,%rsi,2), %xmm0, %xmm0
vpinsrb $3, (%rdi,%rcx), %xmm0, %xmm0
vpinsrb $4, (%rdi,%rsi,4), %xmm0, %xmm0
vpinsrb $5, (%rdi,%rax), %xmm0, %xmm0
vpinsrb $6, (%rdi,%rcx,2), %xmm0, %xmm0
vpinsrb $7, (%rdi,%r8), %xmm0, %xmm0
vpmovzxbw %xmm0, %xmm0 # xmm0 =
xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
vpmaddwd (%rdx), %xmm0, %xmm0
After all the insrb, xmm0 looks like
xmm0=xmm0[0],xmm0[1],xmm0[2],xmm0[3],xmm0[4],xmm0[5],xmm0[6],xmm0[7],zero,zero,zero,zero,zero,zero,zero,zero
Here vpmovzxbw perform the extension of i8 to i16. But it is expensive
operation and I want to remove it.
Optimization
Place the value in correct location while inserting so that zext can be
avoid.
While lowering, we can write a custom lowerOperation for
zero_extend_vector_inreg opcode. We can override the current default
operation with my custom in the legalization step.
The changes proposed are state below:
vpinsrb $2, (%rdi,%rsi), %xmm0, %xmm0
vpinsrb $4, (%rdi,%rsi,2), %xmm0, %xmm0
vpinsrb $6, (%rdi,%rcx), %xmm0, %xmm0
vpinsrb $8, (%rdi,%rsi,4), %xmm0, %xmm0
vpinsrb $a, (%rdi,%rax), %xmm0, %xmm0
vpinsrb $c, (%rdi,%rcx,2), %xmm0, %xmm0
vpinsrb $e, (%rdi,%r8), %xmm0, %xmm0 # xmm0 =
xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
vpmaddwd (%rdx), %xmm0, %xmm0
More details in the discourse topic
[https://discourse.llvm.org/t/improve-the-gathering-of-the-elements-so-that-unwanted-ext-operations-can-be-avoided/85443](url)
---------
Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
Diffstat (limited to 'flang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions