tree-optimization/108724 - vectorized code getting piecewise expanded

This fixes an oversight to when removing the hard limits on using generic vectors for the vectorizer to enable both SLP and BB vectorization to use those. The vectorizer relies on vector lowering to expand plus, minus and negate to bit operations but vector lowering has a hard limit on the minimum number of elements per work item. Vectorizer costs for the testcase at hand work out to vectorize a loop with just two work items per vector and that causes element wise expansion and spilling. The fix for now is to re-instantiate the hard limit, matching what vector lowering does. For the future the way to go is to emit the lowered sequence directly from the vectorizer instead. PR tree-optimization/108724 * tree-vect-stmts.cc (vectorizable_operation): Avoid using word_mode vectors when vector lowering will decompose them to elementwise operations. * gcc.target/i386/pr108724.c: New testcase.
author: Richard Biener <rguenther@suse.de> 2023-02-10 11:07:30 +0100
committer: Richard Biener <rguenther@suse.de> 2023-02-10 12:21:41 +0100
commit: dc87e1391c55c666c7ff39d4f0dea87666f25468 (patch)
tree: 38e33d8ef90915dfaf3e9ec58344e3dd0c395a3d /gcc
parent: 2a37a4a3cbfaecb6c7666109353bb4d5c97b0702 (diff)
download: gcc-dc87e1391c55c666c7ff39d4f0dea87666f25468.zip
gcc-dc87e1391c55c666c7ff39d4f0dea87666f25468.tar.gz
gcc-dc87e1391c55c666c7ff39d4f0dea87666f25468.tar.bz2
2 files changed, 29 insertions, 0 deletions
diff --git a/gcc/testsuite/gcc.target/i386/pr108724.c b/gcc/testsuite/gcc.target/i386/pr108724.c
new file mode 100644
index 0000000..c4e0e91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108724.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mno-sse" } */
+
+int a[16], b[16], c[16];
+void foo()
+{
+  for (int i = 0; i < 16; i++) {
+    a[i] = b[i] + c[i];
+  }
+}
+
+/* When this is vectorized this shouldn't be expanded piecewise again
+   which will result in spilling for the upper half access.  */
+
+/* { dg-final { scan-assembler-not "\\\[er\\\]sp" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c86249a..09b5af6 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6315,6 +6315,20 @@ vectorizable_operation (vec_info *vinfo,
       return false;
     }
 
+  /* ???  We should instead expand the operations here, instead of
+     relying on vector lowering which has this hard cap on the number
+     of vector elements below it performs elementwise operations.  */
+  if (using_emulated_vectors_p
+      && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
+      && ((BITS_PER_WORD / vector_element_bits (vectype)) < 4
+	  || maybe_lt (nunits_out, 4U)))
+    {
+      if (dump_enabled_p ())
+	dump_printf (MSG_NOTE, "not using word mode for +- and less than "
+		     "four vector elements\n");
+      return false;
+    }
+
   int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
   vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL);
   internal_fn cond_fn = get_conditional_internal_fn (code);
author	Richard Biener <rguenther@suse.de>	2023-02-10 11:07:30 +0100
committer	Richard Biener <rguenther@suse.de>	2023-02-10 12:21:41 +0100
commit	dc87e1391c55c666c7ff39d4f0dea87666f25468 (patch)
tree	38e33d8ef90915dfaf3e9ec58344e3dd0c395a3d /gcc
parent	2a37a4a3cbfaecb6c7666109353bb4d5c97b0702 (diff)
download	gcc-dc87e1391c55c666c7ff39d4f0dea87666f25468.zip gcc-dc87e1391c55c666c7ff39d4f0dea87666f25468.tar.gz gcc-dc87e1391c55c666c7ff39d4f0dea87666f25468.tar.bz2