aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-data-ref.c
diff options
context:
space:
mode:
authorKyrylo Tkachov <kyrylo.tkachov@arm.com>2021-01-13 12:48:57 +0000
committerKyrylo Tkachov <kyrylo.tkachov@arm.com>2021-01-14 08:36:19 +0000
commit48f8d1d48f2c7c2bc724dee979bcf56957f233cb (patch)
treed1996f8ec847cae706cacb82558ed745f9f8b713 /gcc/tree-data-ref.c
parent52cd1cd1b67b10a6d58612bafaded6e8e3a303a1 (diff)
downloadgcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.zip
gcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.tar.gz
gcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.tar.bz2
aarch64: Reimplememnt vmovn/vmovl intrinsics with builtins instead
Turns out __builtin_convertvector is not as good a fit for the widening and narrowing intrinsics as I had hoped. During the veclower phase we lower most of it to bitfield operations and hope DCE cleans it back up into vector pack/unpack and extend operations. I received reports that in more complex cases GCC fails to do that and we're left with many vector extract operations that clutter the output. I think veclower can be improved on that front, but for GCC 10 I'd like to just implement these builtins with a good old RTL builtin rather than inline asm. gcc/ * config/aarch64/aarch64-simd.md (aarch64_<su>xtl<mode>): Define. (aarch64_xtn<mode>): Likewise. * config/aarch64/aarch64-simd-builtins.def (sxtl, uxtl, xtn): Define builtins. * config/aarch64/arm_neon.h (vmovl_s8): Reimplement using builtin. (vmovl_s16): Likewise. (vmovl_s32): Likewise. (vmovl_u8): Likewise. (vmovl_u16): Likewise. (vmovl_u32): Likewise. (vmovn_s16): Likewise. (vmovn_s32): Likewise. (vmovn_s64): Likewise. (vmovn_u16): Likewise. (vmovn_u32): Likewise. (vmovn_u64): Likewise.
Diffstat (limited to 'gcc/tree-data-ref.c')
0 files changed, 0 insertions, 0 deletions