diff options
author | Tamar Christina <tamar.christina@arm.com> | 2021-12-02 14:39:22 +0000 |
---|---|---|
committer | Tamar Christina <tamar.christina@arm.com> | 2021-12-02 14:39:43 +0000 |
commit | 9b8830b6f3920b3ec6b9013230c687dc250bb6e9 (patch) | |
tree | 1e5af8440fa2c7ff97be56d2b10d7304084f38dc /gcc/ada | |
parent | d47393d0b4d0d498795c4ae1353e6c156c1c4500 (diff) | |
download | gcc-9b8830b6f3920b3ec6b9013230c687dc250bb6e9.zip gcc-9b8830b6f3920b3ec6b9013230c687dc250bb6e9.tar.gz gcc-9b8830b6f3920b3ec6b9013230c687dc250bb6e9.tar.bz2 |
AArch64: Optimize right shift rounding narrowing
This optimizes right shift rounding narrow instructions to
rounding add narrow high where one vector is 0 when the shift amount is half
that of the original input type.
i.e.
uint32x4_t foo (uint64x2_t a, uint64x2_t b)
{
return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32);
}
now generates:
foo:
movi v3.4s, 0
raddhn v0.2s, v2.2d, v3.2d
raddhn2 v0.4s, v2.2d, v3.2d
instead of:
foo:
rshrn v0.2s, v0.2d, 32
rshrn2 v0.4s, v1.2d, 32
ret
On Arm cores this is an improvement in both latency and throughput.
Because a vector zero is needed I created a new method
aarch64_gen_shareable_zero that creates zeros using V4SI and then takes a subreg
of the zero to the desired type. This allows CSE to share all the zero
constants.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_gen_shareable_zero): New.
* config/aarch64/aarch64-simd.md (aarch64_rshrn<mode>,
aarch64_rshrn2<mode>): Generate rounding half-ing add when appropriate.
* config/aarch64/aarch64.c (aarch64_gen_shareable_zero): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/advsimd-intrinsics/shrn-1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/shrn-2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/shrn-3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/shrn-4.c: New test.
Diffstat (limited to 'gcc/ada')
0 files changed, 0 insertions, 0 deletions