riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2021-01-22 14:16:30 +0000
committer	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2021-01-28 11:42:20 +0000
commit	fdb904a1822c38db5d69a50878b21041c476f045 (patch)
tree	069b5ee9963928cb913c37021fa5d5e7a43b9824 /gcc/go
parent	f7a6d314e7f7eeb6240a4f62511c189c90ef300c (diff)
download	gcc-fdb904a1822c38db5d69a50878b21041c476f045.zip gcc-fdb904a1822c38db5d69a50878b21041c476f045.tar.gz gcc-fdb904a1822c38db5d69a50878b21041c476f045.tar.bz2

aarch64: Reimplement vshrn_n* intrinsics using builtins

This patch reimplements the vshrn_n* intrinsics to use RTL builtins. These perform a narrowing right shift. Although the intrinsic generates the half-width mode (e.g. V8HI -> V8QI), the new pattern generates a full 128-bit mode (V8HI -> V16QI) by representing the fill-with-zeroes semantics of the SHRN instruction. The narrower (V8QI) result is extracted with a lowpart subreg. I found this allows the RTL optimisers to do a better job at optimising redundant moves away in frequently-occurring SHRN+SRHN2 pairs, like in: uint8x16_t foo (uint16x8_t in1, uint16x8_t in2) { uint8x8_t tmp = vshrn_n_u16 (in2, 7); uint8x16_t tmp2 = vshrn_high_n_u16 (tmp, in1, 4); return tmp2; } gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (shrn): Define builtin. * config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Define. (aarch64_shrn<mode>_insn_be): Likewise. (aarch64_shrn<mode>): Likewise. * config/aarch64/arm_neon.h (vshrn_n_s16): Reimplement using builtins. (vshrn_n_s32): Likewise. (vshrn_n_s64): Likewise. (vshrn_n_u16): Likewise. (vshrn_n_u32): Likewise. (vshrn_n_u64): Likewise. * config/aarch64/iterators.md (vn_mode): New mode attribute.

Diffstat (limited to 'gcc/go')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: