riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2021-02-05 08:14:07 +0000
committer	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2021-02-05 08:14:07 +0000
commit	b6e7a7498732b83df61443c211b8d69454ad0b22 (patch)
tree	59812c9e666a19680aef716fccaff066146bff71 /gcc/tree-vectorizer.h
parent	072f20c555907cce38a424da47b6c1baa8330169 (diff)
download	gcc-b6e7a7498732b83df61443c211b8d69454ad0b22.zip gcc-b6e7a7498732b83df61443c211b8d69454ad0b22.tar.gz gcc-b6e7a7498732b83df61443c211b8d69454ad0b22.tar.bz2

aarch64: Reimplement vget_low* intrinsics

We can do better on the vget_low* intrinsics. Currently they reinterpret their argument into a V2DI vector and extract the low "lane", reinterpreting that back into the shorter vector. This is functionally correct and generates a sequence of subregs and a vec_select that, by itself, gets optimised away eventually. However it's bad when we want to use the result in a other SIMD operations. Then the subreg-vec_select-subreg combo blocks many combine patterns. This patch reimplements them to emit a proper low vec_select from the start. It generates much cleaner RTL and allows for more aggressive combinations, particularly with the patterns that Jonathan has been pushing lately. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (get_low): Define builtin. * config/aarch64/aarch64-simd.md (aarch64_get_low<mode>): Define. * config/aarch64/arm_neon.h (__GET_LOW): Delete. (vget_low_f16): Reimplement using new builtin. (vget_low_f32): Likewise. (vget_low_f64): Likewise. (vget_low_p8): Likewise. (vget_low_p16): Likewise. (vget_low_p64): Likewise. (vget_low_s8): Likewise. (vget_low_s16): Likewise. (vget_low_s32): Likewise. (vget_low_s64): Likewise. (vget_low_u8): Likewise. (vget_low_u16): Likewise. (vget_low_u32): Likewise. (vget_low_u64): Likewise.

Diffstat (limited to 'gcc/tree-vectorizer.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: