Fix fallout of peeling for gap improvements

The following hopefully addresses an observed bootstrap issue on aarch64 where maybe-uninit diagnostics occur. It also fixes bogus napkin math from myself when I was confusing rounded up size of a single access with rounded up size of the group accessed in a single scalar iteration. So the following puts in a correctness check, leaving a set of peeling for gaps as insufficient. This could be rectified by splitting the last load into multiple ones but I'm leaving this for a followup, better quickly fix the reported wrong-code. * tree-vect-stmts.cc (get_group_load_store_type): Do not re-use poly-int remain but re-compute with non-poly values. Verify the shortened load is good enough to be covered with a single scalar gap iteration before accepting it. * gcc.dg/vect/pr115385.c: Enable AVX2 if available.
author: Richard Biener <rguenther@suse.de> 2024-06-14 07:54:15 +0200
committer: Richard Biener <rguenther@suse.de> 2024-06-14 09:04:18 +0200
commit: e575b5c56137b12d402d9fb39291fe20985067b7 (patch)
tree: 45d3739620f56b1635da7e0b20cc2bfd8e933d1c
parent: d3fae2bea034edb001cd45d1d86c5ceef146899b (diff)
download: gcc-e575b5c56137b12d402d9fb39291fe20985067b7.zip
gcc-e575b5c56137b12d402d9fb39291fe20985067b7.tar.gz
gcc-e575b5c56137b12d402d9fb39291fe20985067b7.tar.bz2
2 files changed, 8 insertions, 5 deletions
diff --git a/gcc/testsuite/gcc.dg/vect/pr115385.c b/gcc/testsuite/gcc.dg/vect/pr115385.c
index a18cd66..baea0b2 100644
--- a/gcc/testsuite/gcc.dg/vect/pr115385.c
+++ b/gcc/testsuite/gcc.dg/vect/pr115385.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target mmap } */
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
 
 #include <sys/mman.h>
 #include <stdio.h>
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e32d440..ca60526 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2148,15 +2148,17 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
 	    {
 	      /* But peeling a single scalar iteration is enough if
 		 we can use the next power-of-two sized partial
-		 access.  */
+		 access and that is sufficiently small to be covered
+		 by the single scalar iteration.  */
 	      unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size;
 	      if (!nunits.is_constant (&cnunits)
 		  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf)
-		  || ((cremain = remain.to_constant (), true)
+		  || (((cremain = group_size * cvf - gap % cnunits), true)
 		      && ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits)
-		      && vector_vector_composition_type
-			   (vectype, cnunits / cpart_size,
-			    &half_vtype) == NULL_TREE))
+		      && (cremain + group_size < cpart_size
+			  || vector_vector_composition_type
+			       (vectype, cnunits / cpart_size,
+				&half_vtype) == NULL_TREE)))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
author	Richard Biener <rguenther@suse.de>	2024-06-14 07:54:15 +0200
committer	Richard Biener <rguenther@suse.de>	2024-06-14 09:04:18 +0200
commit	e575b5c56137b12d402d9fb39291fe20985067b7 (patch)
tree	45d3739620f56b1635da7e0b20cc2bfd8e933d1c
parent	d3fae2bea034edb001cd45d1d86c5ceef146899b (diff)
download	gcc-e575b5c56137b12d402d9fb39291fe20985067b7.zip gcc-e575b5c56137b12d402d9fb39291fe20985067b7.tar.gz gcc-e575b5c56137b12d402d9fb39291fe20985067b7.tar.bz2