aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
diff options
context:
space:
mode:
authorLuke Lau <luke@igalia.com>2024-12-12 13:41:27 +0800
committerGitHub <noreply@github.com>2024-12-12 13:41:27 +0800
commit088db868f3370ffe01c9750f75732679efecd1fe (patch)
treea423b032af93b8654b3393311faac595311f0c98 /llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp
parentb26fe5b7e9833b7813459c6a0dc4577b350754f1 (diff)
downloadllvm-088db868f3370ffe01c9750f75732679efecd1fe.zip
llvm-088db868f3370ffe01c9750f75732679efecd1fe.tar.gz
llvm-088db868f3370ffe01c9750f75732679efecd1fe.tar.bz2
[RISCV] Merge shuffle sources if lanes are disjoint (#119401)
In x264, there's a few kernels with shuffles like this: %41 = add nsw <16 x i32> %39, %40 %42 = sub nsw <16 x i32> %39, %40 %43 = shufflevector <16 x i32> %41, <16 x i32> %42, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 26, i32 30, i32 22, i32 18, i32 9, i32 13, i32 5, i32 1, i32 24, i32 28, i32 20, i32 16> Because this is a complex two-source shuffle, this will get lowered as two vrgather.vvs that are blended together. vadd.vv v20, v16, v12 vsub.vv v12, v16, v12 vrgatherei16.vv v24, v20, v10 vrgatherei16.vv v24, v12, v16, v0.t However the indices coming from each source are disjoint, so we can blend the two together and perform a single source shuffle instead: %41 = add nsw <16 x i32> %39, %40 %42 = sub nsw <16 x i32> %39, %40 %43 = select <0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1> %41, %42 %44 = shufflevector <16 x i32> %43, <16 x i32> poison, <16 x i32> <i32 11, i32 15, i32 7, i32 3, i32 10, i32 14, i32 6, i32 2, i32 9, i32 13, i32 5, i32 1, i32 8, i32 12, i32 4, i32 0> The select will likely get merged into the preceding instruction, and then we only have to do one vrgather.vv: vadd.vv v20, v16, v12 vsub.vv v20, v16, v12, v0.t vrgatherei16.vv v24, v20, v10 This patch bails if either of the sources are a broadcast/splat/identity shuffle, since that will usually already have some sort of cheaper lowering. This improves performance on 525.x264_r by 4.12% with -O3 -flto -march=rva22u64_v on the spacemit-x60.
Diffstat (limited to 'llvm/lib/Target/RISCV/Disassembler/RISCVDisassembler.cpp')
0 files changed, 0 insertions, 0 deletions