From b1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Wed, 13 Jan 2021 11:28:48 +0100 Subject: i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905] The following patch implements what I've talked about, i.e. to no longer force operands of vec_perm_const into registers in the generic code, but let each of the (currently 8) targets force it into registers individually, giving the targets better control on if it does that and when and allowing them to do something special with some particular operands. And then defines the define_insn_and_split for the 256-bit and 512-bit permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could add define_insn_and_split patterns also for the bd, bq and wq). 2021-01-13 Jakub Jelinek PR target/95905 * optabs.c (expand_vec_perm_const): Don't force v0 and v1 into registers before calling targetm.vectorize.vec_perm_const, only after that. * config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle two argument permutation when one operand is zero vector and only after that force operands into registers. * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1): New define_insn_and_split pattern. (*avx512bw_zero_extendv32qiv32hi2_1): Likewise. (*avx512f_zero_extendv16hiv16si2_1): Likewise. (*avx2_zero_extendv8hiv8si2_1): Likewise. (*avx512f_zero_extendv8siv8di2_1): Likewise. (*avx2_zero_extendv4siv4di2_1): Likewise. * config/mips/mips.c (mips_vectorize_vec_perm_const): Force operands into registers. * config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise. * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise. * config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise. * config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const): Likewise. * config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const): Likewise. * config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise. Use std::swap. * gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of scan-assembler. Add tests with zero vector as first __builtin_shuffle operand. * gcc.target/i386/pr95905-3.c: New test. * gcc.target/i386/pr95905-4.c: New test. --- gcc/config/sparc/sparc.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'gcc/config/sparc') diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 8a5a269..f355793 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -12942,6 +12942,12 @@ sparc_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0, if (vmode != V8QImode) return false; + rtx nop0 = force_reg (vmode, op0); + if (op0 == op1) + op1 = nop0; + op0 = nop0; + op1 = force_reg (vmode, op1); + unsigned int i, mask; for (i = mask = 0; i < 8; ++i) mask |= (sel[i] & 0xf) << (28 - i*4); -- cgit v1.1