aboutsummaryrefslogtreecommitdiff
path: root/gcc/ada
diff options
context:
space:
mode:
authorTamar Christina <tamar.christina@arm.com>2021-12-02 14:39:22 +0000
committerTamar Christina <tamar.christina@arm.com>2021-12-02 14:39:43 +0000
commit9b8830b6f3920b3ec6b9013230c687dc250bb6e9 (patch)
tree1e5af8440fa2c7ff97be56d2b10d7304084f38dc /gcc/ada
parentd47393d0b4d0d498795c4ae1353e6c156c1c4500 (diff)
downloadgcc-9b8830b6f3920b3ec6b9013230c687dc250bb6e9.zip
gcc-9b8830b6f3920b3ec6b9013230c687dc250bb6e9.tar.gz
gcc-9b8830b6f3920b3ec6b9013230c687dc250bb6e9.tar.bz2
AArch64: Optimize right shift rounding narrowing
This optimizes right shift rounding narrow instructions to rounding add narrow high where one vector is 0 when the shift amount is half that of the original input type. i.e. uint32x4_t foo (uint64x2_t a, uint64x2_t b) { return vrshrn_high_n_u64 (vrshrn_n_u64 (a, 32), b, 32); } now generates: foo: movi v3.4s, 0 raddhn v0.2s, v2.2d, v3.2d raddhn2 v0.4s, v2.2d, v3.2d instead of: foo: rshrn v0.2s, v0.2d, 32 rshrn2 v0.4s, v1.2d, 32 ret On Arm cores this is an improvement in both latency and throughput. Because a vector zero is needed I created a new method aarch64_gen_shareable_zero that creates zeros using V4SI and then takes a subreg of the zero to the desired type. This allows CSE to share all the zero constants. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_gen_shareable_zero): New. * config/aarch64/aarch64-simd.md (aarch64_rshrn<mode>, aarch64_rshrn2<mode>): Generate rounding half-ing add when appropriate. * config/aarch64/aarch64.c (aarch64_gen_shareable_zero): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/shrn-1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/shrn-2.c: New test. * gcc.target/aarch64/advsimd-intrinsics/shrn-3.c: New test. * gcc.target/aarch64/advsimd-intrinsics/shrn-4.c: New test.
Diffstat (limited to 'gcc/ada')
0 files changed, 0 insertions, 0 deletions