aboutsummaryrefslogtreecommitdiff
path: root/gcc/go
diff options
context:
space:
mode:
authorKyrylo Tkachov <kyrylo.tkachov@arm.com>2021-01-22 14:16:30 +0000
committerKyrylo Tkachov <kyrylo.tkachov@arm.com>2021-01-28 11:42:20 +0000
commitfdb904a1822c38db5d69a50878b21041c476f045 (patch)
tree069b5ee9963928cb913c37021fa5d5e7a43b9824 /gcc/go
parentf7a6d314e7f7eeb6240a4f62511c189c90ef300c (diff)
downloadgcc-fdb904a1822c38db5d69a50878b21041c476f045.zip
gcc-fdb904a1822c38db5d69a50878b21041c476f045.tar.gz
gcc-fdb904a1822c38db5d69a50878b21041c476f045.tar.bz2
aarch64: Reimplement vshrn_n* intrinsics using builtins
This patch reimplements the vshrn_n* intrinsics to use RTL builtins. These perform a narrowing right shift. Although the intrinsic generates the half-width mode (e.g. V8HI -> V8QI), the new pattern generates a full 128-bit mode (V8HI -> V16QI) by representing the fill-with-zeroes semantics of the SHRN instruction. The narrower (V8QI) result is extracted with a lowpart subreg. I found this allows the RTL optimisers to do a better job at optimising redundant moves away in frequently-occurring SHRN+SRHN2 pairs, like in: uint8x16_t foo (uint16x8_t in1, uint16x8_t in2) { uint8x8_t tmp = vshrn_n_u16 (in2, 7); uint8x16_t tmp2 = vshrn_high_n_u16 (tmp, in1, 4); return tmp2; } gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (shrn): Define builtin. * config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Define. (aarch64_shrn<mode>_insn_be): Likewise. (aarch64_shrn<mode>): Likewise. * config/aarch64/arm_neon.h (vshrn_n_s16): Reimplement using builtins. (vshrn_n_s32): Likewise. (vshrn_n_s64): Likewise. (vshrn_n_u16): Likewise. (vshrn_n_u32): Likewise. (vshrn_n_u64): Likewise. * config/aarch64/iterators.md (vn_mode): New mode attribute.
Diffstat (limited to 'gcc/go')
0 files changed, 0 insertions, 0 deletions