diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2023-06-06 09:54:41 +0100 |
---|---|---|
committer | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2023-06-06 09:54:41 +0100 |
commit | b327cbe8f4eefc91ee2bea49a1da7128adf30281 (patch) | |
tree | 02d04fdafaa0319b916927bdc61e7ce47e3e8f01 /gcc/expr.cc | |
parent | 84eec2916fa68cd2e2b3a2cf764f2ba595cce843 (diff) | |
download | gcc-b327cbe8f4eefc91ee2bea49a1da7128adf30281.zip gcc-b327cbe8f4eefc91ee2bea49a1da7128adf30281.tar.gz gcc-b327cbe8f4eefc91ee2bea49a1da7128adf30281.tar.bz2 |
aarch64: Improve representation of ADDLV instructions
We've received requests to optimise the attached intrinsics testcase.
We currently generate:
foo_1:
uaddlp v0.4s, v0.8h
uaddlv d31, v0.4s
fmov x0, d31
ret
foo_2:
uaddlp v0.4s, v0.8h
addv s31, v0.4s
fmov w0, s31
ret
foo_3:
saddlp v0.4s, v0.8h
addv s31, v0.4s
fmov w0, s31
ret
The widening pair-wise addition addlp instructions can be omitted if we're just doing an ADDV afterwards.
Making this optimisation would be quite simple if we had a standard RTL PLUS vector reduction code.
As we don't, we can use UNSPEC_ADDV as a stand in.
This patch expresses the SADDLV and UADDLV instructions as an UNSPEC_ADDV over a widened input, thus removing
the need for separate UNSPEC_SADDLV and UNSPEC_UADDLV codes.
To optimise the testcases involved we add two splitters that match a vector addition where all participating elements
are taken and widened from the same vector and then fed into an UNSPEC_ADDV. In that case we can just remove the
vector PLUS and just emit the simple RTL for SADDLV/UADDLV.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_parallel_select_half_p):
Define prototype.
(aarch64_pars_overlap_p): Likewise.
* config/aarch64/aarch64-simd.md (aarch64_<su>addlv<mode>):
Express in terms of UNSPEC_ADDV.
(*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): Likewise.
(*aarch64_<su>addlv<mode>_reduction): Define.
(*aarch64_uaddlv<mode>_reduction_2): Likewise.
* config/aarch64/aarch64.cc (aarch64_parallel_select_half_p): Define.
(aarch64_pars_overlap_p): Likewise.
* config/aarch64/iterators.md (UNSPEC_SADDLV, UNSPEC_UADDLV): Delete.
(VQUADW): New mode attribute.
(VWIDE2X_S): Likewise.
(USADDLV): Delete.
(su): Delete handling of UNSPEC_SADDLV, UNSPEC_UADDLV.
* config/aarch64/predicates.md (vect_par_cnst_select_half): Define.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/addlv_1.c: New test.
Diffstat (limited to 'gcc/expr.cc')
0 files changed, 0 insertions, 0 deletions