riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Xi Ruoyao <xry111@xry111.site>	2023-11-19 06:12:22 +0800
committer	Xi Ruoyao <xry111@xry111.site>	2023-11-22 17:06:06 +0800
commit	fce367810149580da1bb0cb0c3cd4fb00b968f1c (patch)
tree	fffa1a55cac6ab33726ff62d809fccdb1c665c6f /gcc/expr.cc
parent	bd17d00a4bdee34876cc97bdf9a1f2316e0a6790 (diff)
download	gcc-fce367810149580da1bb0cb0c3cd4fb00b968f1c.zip gcc-fce367810149580da1bb0cb0c3cd4fb00b968f1c.tar.gz gcc-fce367810149580da1bb0cb0c3cd4fb00b968f1c.tar.bz2

LoongArch: Optimize LSX vector shuffle on floating-point vector

The vec_perm expander was wrongly defined. GCC internal says: Operand 3 is the “selector”. It is an integral mode vector of the same width and number of elements as mode M. But we made operand 3 in the same mode as the shuffled vectors, so it would be a FP mode vector if the shuffled vectors are FP mode. With this mistake, the generic code manages to work around and it ends up creating some very nasty code for a simple __builtin_shuffle (a, b, c) where a and b are V4SF, c is V4SI: la.local $r12,.LANCHOR0 la.local $r13,.LANCHOR1 vld $vr1,$r12,48 vslli.w $vr1,$vr1,2 vld $vr2,$r12,16 vld $vr0,$r13,0 vld $vr3,$r13,16 vshuf.b $vr0,$vr1,$vr1,$vr0 vld $vr1,$r12,32 vadd.b $vr0,$vr0,$vr3 vandi.b $vr0,$vr0,31 vshuf.b $vr0,$vr1,$vr2,$vr0 vst $vr0,$r12,0 jr $r1 This is obviously stupid. Fix the expander definition and adjust loongarch_expand_vec_perm to handle it correctly. gcc/ChangeLog: * config/loongarch/lsx.md (vec_perm<mode:LSX>): Make the selector VIMODE. * config/loongarch/loongarch.cc (loongarch_expand_vec_perm): Use the mode of the selector (instead of the shuffled vector) for truncating it. Operate on subregs in the selector mode if the shuffled vector has a different mode (i. e. it's a floating-point vector). gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-shuf-fp.c: New test.

Diffstat (limited to 'gcc/expr.cc')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: