diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2021-01-13 12:48:57 +0000 |
---|---|---|
committer | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2021-01-14 08:36:19 +0000 |
commit | 48f8d1d48f2c7c2bc724dee979bcf56957f233cb (patch) | |
tree | d1996f8ec847cae706cacb82558ed745f9f8b713 /gcc/tree-data-ref.c | |
parent | 52cd1cd1b67b10a6d58612bafaded6e8e3a303a1 (diff) | |
download | gcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.zip gcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.tar.gz gcc-48f8d1d48f2c7c2bc724dee979bcf56957f233cb.tar.bz2 |
aarch64: Reimplememnt vmovn/vmovl intrinsics with builtins instead
Turns out __builtin_convertvector is not as good a fit for the widening
and narrowing intrinsics as I had hoped.
During the veclower phase we lower most of it to bitfield operations and
hope DCE cleans it back up into
vector pack/unpack and extend operations. I received reports that in
more complex cases GCC fails to do that
and we're left with many vector extract operations that clutter the
output.
I think veclower can be improved on that front, but for GCC 10 I'd like
to just implement these builtins
with a good old RTL builtin rather than inline asm.
gcc/
* config/aarch64/aarch64-simd.md (aarch64_<su>xtl<mode>):
Define.
(aarch64_xtn<mode>): Likewise.
* config/aarch64/aarch64-simd-builtins.def (sxtl, uxtl, xtn):
Define
builtins.
* config/aarch64/arm_neon.h (vmovl_s8): Reimplement using
builtin.
(vmovl_s16): Likewise.
(vmovl_s32): Likewise.
(vmovl_u8): Likewise.
(vmovl_u16): Likewise.
(vmovl_u32): Likewise.
(vmovn_s16): Likewise.
(vmovn_s32): Likewise.
(vmovn_s64): Likewise.
(vmovn_u16): Likewise.
(vmovn_u32): Likewise.
(vmovn_u64): Likewise.
Diffstat (limited to 'gcc/tree-data-ref.c')
0 files changed, 0 insertions, 0 deletions