diff options
author | Jakub Jelinek <jakub@redhat.com> | 2021-01-13 11:28:48 +0100 |
---|---|---|
committer | Jakub Jelinek <jakub@redhat.com> | 2021-01-13 11:36:38 +0100 |
commit | b1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1 (patch) | |
tree | 722a890359eaccb0530f254bceed153fdac46ff0 /gcc/optabs.c | |
parent | 7875e8dc831f30eec7203e090a209efe4c01a27d (diff) | |
download | gcc-b1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1.zip gcc-b1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1.tar.gz gcc-b1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1.tar.bz2 |
i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]
The following patch implements what I've talked about, i.e. to no longer
force operands of vec_perm_const into registers in the generic code, but let
each of the (currently 8) targets force it into registers individually,
giving the targets better control on if it does that and when and allowing
them to do something special with some particular operands.
And then defines the define_insn_and_split for the 256-bit and 512-bit
permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could
add define_insn_and_split patterns also for the bd, bq and wq).
2021-01-13 Jakub Jelinek <jakub@redhat.com>
PR target/95905
* optabs.c (expand_vec_perm_const): Don't force v0 and v1 into
registers before calling targetm.vectorize.vec_perm_const, only after
that.
* config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle
two argument permutation when one operand is zero vector and only
after that force operands into registers.
* config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1): New
define_insn_and_split pattern.
(*avx512bw_zero_extendv32qiv32hi2_1): Likewise.
(*avx512f_zero_extendv16hiv16si2_1): Likewise.
(*avx2_zero_extendv8hiv8si2_1): Likewise.
(*avx512f_zero_extendv8siv8di2_1): Likewise.
(*avx2_zero_extendv4siv4di2_1): Likewise.
* config/mips/mips.c (mips_vectorize_vec_perm_const): Force operands
into registers.
* config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise.
* config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise.
* config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise.
* config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const): Likewise.
* config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const): Likewise.
* config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise. Use std::swap.
* gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of
scan-assembler. Add tests with zero vector as first __builtin_shuffle
operand.
* gcc.target/i386/pr95905-3.c: New test.
* gcc.target/i386/pr95905-4.c: New test.
Diffstat (limited to 'gcc/optabs.c')
-rw-r--r-- | gcc/optabs.c | 8 |
1 files changed, 5 insertions, 3 deletions
diff --git a/gcc/optabs.c b/gcc/optabs.c index 6f671fd..f4614a3 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6070,11 +6070,8 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, if (targetm.vectorize.vec_perm_const != NULL) { - v0 = force_reg (mode, v0); if (single_arg_p) v1 = v0; - else - v1 = force_reg (mode, v1); if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices)) return target; @@ -6095,6 +6092,11 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, return gen_lowpart (mode, target_qi); } + v0 = force_reg (mode, v0); + if (single_arg_p) + v1 = v0; + v1 = force_reg (mode, v1); + /* Otherwise expand as a fully variable permuation. */ /* The optabs are only defined for selectors with the same width |