diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2021-01-22 14:16:30 +0000 |
---|---|---|
committer | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2021-01-28 11:42:20 +0000 |
commit | fdb904a1822c38db5d69a50878b21041c476f045 (patch) | |
tree | 069b5ee9963928cb913c37021fa5d5e7a43b9824 /gcc/go | |
parent | f7a6d314e7f7eeb6240a4f62511c189c90ef300c (diff) | |
download | gcc-fdb904a1822c38db5d69a50878b21041c476f045.zip gcc-fdb904a1822c38db5d69a50878b21041c476f045.tar.gz gcc-fdb904a1822c38db5d69a50878b21041c476f045.tar.bz2 |
aarch64: Reimplement vshrn_n* intrinsics using builtins
This patch reimplements the vshrn_n* intrinsics to use RTL builtins.
These perform a narrowing right shift.
Although the intrinsic generates the half-width mode (e.g. V8HI ->
V8QI), the new pattern generates a full 128-bit mode (V8HI -> V16QI) by representing the
fill-with-zeroes semantics of the SHRN instruction. The narrower (V8QI) result is extracted with a
lowpart subreg.
I found this allows the RTL optimisers to do a better job at optimising
redundant moves away in frequently-occurring SHRN+SRHN2 pairs, like in:
uint8x16_t
foo (uint16x8_t in1, uint16x8_t in2)
{
uint8x8_t tmp = vshrn_n_u16 (in2, 7);
uint8x16_t tmp2 = vshrn_high_n_u16 (tmp, in1, 4);
return tmp2;
}
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (shrn): Define
builtin.
* config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le):
Define.
(aarch64_shrn<mode>_insn_be): Likewise.
(aarch64_shrn<mode>): Likewise.
* config/aarch64/arm_neon.h (vshrn_n_s16): Reimplement using
builtins.
(vshrn_n_s32): Likewise.
(vshrn_n_s64): Likewise.
(vshrn_n_u16): Likewise.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
* config/aarch64/iterators.md (vn_mode): New mode attribute.
Diffstat (limited to 'gcc/go')
0 files changed, 0 insertions, 0 deletions