tree-optimization/115372 - failed store-lanes in some cases

The gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c testcase shows that we sometimes fail to use store-lanes even though it should be profitable. We're currently relying on vect_slp_prefer_store_lanes_p at the point we run into the first SLP discovery mismatch with obviously limited information. For the case at hand we have 3, 5 or 7 lanes of VnDImode [2, 2] vectors with the first mismatch at lane 2 so the new group size is 1. The heuristic says that might be an OK split given the rest is a multiple of the vector lanes. Now we continue discovery but in the end mismatches result in uniformly single-lane SLP instances which we can handle via interleaving but of course are prime candidates for store-lanes. The following patch re-assesses with the extra knowledge now just relying on the fact whether the target supports store-lanes for the given group size. PR tree-optimization/115372 * tree-vect-slp.cc (vect_build_slp_instance): Compute the uniform, if, number of lanes of the RHS sub-graphs feeding the store and if uniformly one, use store-lanes if the target supports that.
author: Richard Biener <rguenther@suse.de> 2024-09-20 15:07:24 +0200
committer: Richard Biener <rguenth@gcc.gnu.org> 2024-09-24 10:17:36 +0200
commit: f594008dcced0ebb86908f3d7602fcf943e05bc7 (patch)
tree: 82428a4cae926407a2b887064ce4455e93cc8515
parent: 618871ff09c07817f7ce9b2bd7338cd3299ad8f5 (diff)
download: gcc-f594008dcced0ebb86908f3d7602fcf943e05bc7.zip
gcc-f594008dcced0ebb86908f3d7602fcf943e05bc7.tar.gz
gcc-f594008dcced0ebb86908f3d7602fcf943e05bc7.tar.bz2
1 files changed, 18 insertions, 0 deletions
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ab49bb0..f5b47e4 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3957,6 +3957,7 @@ vect_build_slp_instance (vec_info *vinfo,
 	  /* Calculate the unrolling factor based on the smallest type.  */
 	  poly_uint64 unrolling_factor = 1;
 
+	  unsigned int rhs_common_nlanes = 0;
 	  unsigned int start = 0, end = i;
 	  while (start < group_size)
 	    {
@@ -3978,6 +3979,10 @@ vect_build_slp_instance (vec_info *vinfo,
 					     calculate_unrolling_factor
 					       (max_nunits, end - start));
 		  rhs_nodes.safe_push (node);
+		  if (start == 0)
+		    rhs_common_nlanes = SLP_TREE_LANES (node);
+		  else if (rhs_common_nlanes != SLP_TREE_LANES (node))
+		    rhs_common_nlanes = 0;
 		  start = end;
 		  if (want_store_lanes || force_single_lane)
 		    end = start + 1;
@@ -4015,6 +4020,19 @@ vect_build_slp_instance (vec_info *vinfo,
 		}
 	    }
 
+	  /* Now re-assess whether we want store lanes in case the
+	     discovery ended up producing all single-lane RHSs.  */
+	  if (rhs_common_nlanes == 1
+	      && ! STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+	      && ! STMT_VINFO_STRIDED_P (stmt_info)
+	      && compare_step_with_zero (vinfo, stmt_info) > 0
+	      && (vect_store_lanes_supported (SLP_TREE_VECTYPE (rhs_nodes[0]),
+					      group_size,
+					      SLP_TREE_CHILDREN
+						(rhs_nodes[0]).length () != 1)
+		  != IFN_LAST))
+	    want_store_lanes = true;
+
 	  /* Now we assume we can build the root SLP node from all stores.  */
 	  if (want_store_lanes)
 	    {
author	Richard Biener <rguenther@suse.de>	2024-09-20 15:07:24 +0200
committer	Richard Biener <rguenth@gcc.gnu.org>	2024-09-24 10:17:36 +0200
commit	f594008dcced0ebb86908f3d7602fcf943e05bc7 (patch)
tree	82428a4cae926407a2b887064ce4455e93cc8515
parent	618871ff09c07817f7ce9b2bd7338cd3299ad8f5 (diff)
download	gcc-f594008dcced0ebb86908f3d7602fcf943e05bc7.zip gcc-f594008dcced0ebb86908f3d7602fcf943e05bc7.tar.gz gcc-f594008dcced0ebb86908f3d7602fcf943e05bc7.tar.bz2