diff options
author | Jennifer Schmitz <jschmitz@nvidia.com> | 2024-10-01 08:01:13 -0700 |
---|---|---|
committer | Jennifer Schmitz <jschmitz@nvidia.com> | 2024-10-24 09:06:20 +0200 |
commit | fc40202c1ac5d585bb236cdaf3a3968927e970a0 (patch) | |
tree | d4f4797ee2b488065aafabe21df16efdabcad912 /gcc/gcc-urlifier.cc | |
parent | 90e38c4ffad086a82635e8ea9bf0e7e9e02f1ff7 (diff) | |
download | gcc-fc40202c1ac5d585bb236cdaf3a3968927e970a0.zip gcc-fc40202c1ac5d585bb236cdaf3a3968927e970a0.tar.gz gcc-fc40202c1ac5d585bb236cdaf3a3968927e970a0.tar.bz2 |
SVE intrinsics: Fold division and multiplication by -1 to neg
Because a neg instruction has lower latency and higher throughput than
sdiv and mul, svdiv and svmul by -1 can be folded to svneg. For svdiv,
this is already implemented on the RTL level; for svmul, the
optimization was still missing.
This patch implements folding to svneg for both operations using the
gimple_folder. For svdiv, the transform is applied if the divisor is -1.
Svmul is folded if either of the operands is -1. A case distinction of
the predication is made to account for the fact that svneg_m has 3 arguments
(argument 0 holds the values for the inactive lanes), while svneg_x and
svneg_z have only 2 arguments.
Tests were added or adjusted to check the produced assembly and runtime
tests were added to check correctness.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Fold division by -1 to svneg.
(svmul_impl::fold): Fold multiplication by -1 to svneg.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/div_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Adjust expected outcome.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/div_const_run.c: New test.
* gcc.target/aarch64/sve/mul_const_run.c: Likewise.
Diffstat (limited to 'gcc/gcc-urlifier.cc')
0 files changed, 0 insertions, 0 deletions