diff options
author | Sushant Gokhale <sgokhale@nvidia.com> | 2024-11-13 11:10:49 +0530 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-11-13 11:10:49 +0530 |
commit | 9991ea28fcd308d5bd357358710e5344e26b46e1 (patch) | |
tree | a8a2b09b6654f72cf7336007c07bcd6721cbcb0e /clang/unittests/Frontend/CompilerInvocationTest.cpp | |
parent | 95554cbd7717e7d1925f475540a70603bcb3a224 (diff) | |
download | llvm-9991ea28fcd308d5bd357358710e5344e26b46e1.zip llvm-9991ea28fcd308d5bd357358710e5344e26b46e1.tar.gz llvm-9991ea28fcd308d5bd357358710e5344e26b46e1.tar.bz2 |
[CostModel][AArch64] Make extractelement, with fmul user, free whenev… (#111479)
…er possible
In case of Neon, if there exists extractelement from lane != 0 such that
1. extractelement does not necessitate a move from vector_reg -> GPR
2. extractelement result feeds into fmul
3. Other operand of fmul is a scalar or extractelement from lane 0 or
lane equivalent to 0
then the extractelement can be merged with fmul in the backend and it
incurs no cost.
e.g.
```
define double @foo(<2 x double> %a) {
%1 = extractelement <2 x double> %a, i32 0
%2 = extractelement <2 x double> %a, i32 1
%res = fmul double %1, %2
ret double %res
}
```
`%2` and `%res` can be merged in the backend to generate:
`fmul d0, d0, v0.d[1]`
The change was tested with SPEC FP(C/C++) on Neoverse-v2.
**Compile time impact**: None
**Performance impact**: Observing 1.3-1.7% uplift on lbm benchmark with -flto depending upon the config.
Diffstat (limited to 'clang/unittests/Frontend/CompilerInvocationTest.cpp')
0 files changed, 0 insertions, 0 deletions