riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2021-01-13 12:48:57 +0000
committer	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2021-01-14 08:36:19 +0000
commit	48f8d1d48f2c7c2bc724dee979bcf56957f233cb (patch)
tree	d1996f8ec847cae706cacb82558ed745f9f8b713 /gcc/tree-data-ref.c
parent	52cd1cd1b67b10a6d58612bafaded6e8e3a303a1 (diff)
download	gcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.zip gcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.tar.gz gcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.tar.bz2

aarch64: Reimplememnt vmovn/vmovl intrinsics with builtins instead

Turns out __builtin_convertvector is not as good a fit for the widening and narrowing intrinsics as I had hoped. During the veclower phase we lower most of it to bitfield operations and hope DCE cleans it back up into vector pack/unpack and extend operations. I received reports that in more complex cases GCC fails to do that and we're left with many vector extract operations that clutter the output. I think veclower can be improved on that front, but for GCC 10 I'd like to just implement these builtins with a good old RTL builtin rather than inline asm. gcc/ * config/aarch64/aarch64-simd.md (aarch64_<su>xtl<mode>): Define. (aarch64_xtn<mode>): Likewise. * config/aarch64/aarch64-simd-builtins.def (sxtl, uxtl, xtn): Define builtins. * config/aarch64/arm_neon.h (vmovl_s8): Reimplement using builtin. (vmovl_s16): Likewise. (vmovl_s32): Likewise. (vmovl_u8): Likewise. (vmovl_u16): Likewise. (vmovl_u32): Likewise. (vmovn_s16): Likewise. (vmovn_s32): Likewise. (vmovn_s64): Likewise. (vmovn_u16): Likewise. (vmovn_u32): Likewise. (vmovn_u64): Likewise.

Diffstat (limited to 'gcc/tree-data-ref.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: