aboutsummaryrefslogtreecommitdiff
path: root/clang/unittests/Frontend/CompilerInvocationTest.cpp
diff options
context:
space:
mode:
authorSushant Gokhale <sgokhale@nvidia.com>2024-11-13 11:10:49 +0530
committerGitHub <noreply@github.com>2024-11-13 11:10:49 +0530
commit9991ea28fcd308d5bd357358710e5344e26b46e1 (patch)
treea8a2b09b6654f72cf7336007c07bcd6721cbcb0e /clang/unittests/Frontend/CompilerInvocationTest.cpp
parent95554cbd7717e7d1925f475540a70603bcb3a224 (diff)
downloadllvm-9991ea28fcd308d5bd357358710e5344e26b46e1.zip
llvm-9991ea28fcd308d5bd357358710e5344e26b46e1.tar.gz
llvm-9991ea28fcd308d5bd357358710e5344e26b46e1.tar.bz2
[CostModel][AArch64] Make extractelement, with fmul user, free whenev… (#111479)
…er possible In case of Neon, if there exists extractelement from lane != 0 such that 1. extractelement does not necessitate a move from vector_reg -> GPR 2. extractelement result feeds into fmul 3. Other operand of fmul is a scalar or extractelement from lane 0 or lane equivalent to 0 then the extractelement can be merged with fmul in the backend and it incurs no cost. e.g. ``` define double @foo(<2 x double> %a) { %1 = extractelement <2 x double> %a, i32 0 %2 = extractelement <2 x double> %a, i32 1 %res = fmul double %1, %2 ret double %res } ``` `%2` and `%res` can be merged in the backend to generate: `fmul d0, d0, v0.d[1]` The change was tested with SPEC FP(C/C++) on Neoverse-v2. **Compile time impact**: None **Performance impact**: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.
Diffstat (limited to 'clang/unittests/Frontend/CompilerInvocationTest.cpp')
0 files changed, 0 insertions, 0 deletions