rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Sushant Gokhale <sgokhale@nvidia.com>	2024-11-13 11:10:49 +0530
committer	GitHub <noreply@github.com>	2024-11-13 11:10:49 +0530
commit	9991ea28fcd308d5bd357358710e5344e26b46e1 (patch)
tree	a8a2b09b6654f72cf7336007c07bcd6721cbcb0e /clang/unittests/Frontend/CompilerInvocationTest.cpp
parent	95554cbd7717e7d1925f475540a70603bcb3a224 (diff)
download	llvm-9991ea28fcd308d5bd357358710e5344e26b46e1.zip llvm-9991ea28fcd308d5bd357358710e5344e26b46e1.tar.gz llvm-9991ea28fcd308d5bd357358710e5344e26b46e1.tar.bz2

[CostModel][AArch64] Make extractelement, with fmul user, free whenev… (#111479)

…er possible In case of Neon, if there exists extractelement from lane != 0 such that 1. extractelement does not necessitate a move from vector_reg -> GPR 2. extractelement result feeds into fmul 3. Other operand of fmul is a scalar or extractelement from lane 0 or lane equivalent to 0 then the extractelement can be merged with fmul in the backend and it incurs no cost. e.g. ``` define double @foo(<2 x double> %a) { %1 = extractelement <2 x double> %a, i32 0 %2 = extractelement <2 x double> %a, i32 1 %res = fmul double %1, %2 ret double %res } ``` `%2` and `%res` can be merged in the backend to generate: `fmul d0, d0, v0.d[1]` The change was tested with SPEC FP(C/C++) on Neoverse-v2. **Compile time impact**: None **Performance impact**: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.

Diffstat (limited to 'clang/unittests/Frontend/CompilerInvocationTest.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: