vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in vect_better_loop_vinfo_p

While experimenting with some backend costs for Advanced SIMD and SVE I hit many cases where GCC would pick SVE for VLA auto-vectorisation even when the backend very clearly presented cheaper costs for Advanced SIMD. For a simple float addition loop the SVE costs were: vec.c:9:21: note: Cost model analysis: Vector inside of loop cost: 28 Vector prologue cost: 2 Vector epilogue cost: 0 Scalar iteration cost: 10 Scalar outside cost: 0 Vector outside cost: 2 prologue iterations: 0 epilogue iterations: 0 Minimum number of vector iterations: 1 Calculated minimum iters for profitability: 4 and for Advanced SIMD (Neon) they're: vec.c:9:21: note: Cost model analysis: Vector inside of loop cost: 11 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar iteration cost: 10 Scalar outside cost: 0 Vector outside cost: 0 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 0 vec.c:9:21: note: Runtime profitability threshold = 4 yet the SVE one was always picked. With guidance from Richard this seems to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in particular the part with the big comment explaining the estimated_rel_new * 2 <= estimated_rel_old heuristic. This patch extends the comparisons by introducing a three-way estimate kind for poly_int values that the backend can distinguish. This allows vect_better_loop_vinfo_p to ask for minimum, maximum and likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper. gcc/ * target.h (enum poly_value_estimate_kind): Define. (estimated_poly_value): Take an estimate kind argument. * target.def (estimated_poly_value): Update definition for the above. * doc/tm.texi: Regenerate. * targhooks.c (estimated_poly_value): Update prototype. * tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and likely estimates of VF to pick between vinfos. * config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use estimated_poly_value instead of aarch64_estimated_poly_value. (aarch64_estimated_poly_value): Take a kind argument and handle it.
author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> 2020-12-17 18:02:37 +0000
committer: Kyrylo Tkachov <kyrylo.tkachov@arm.com> 2020-12-17 18:04:21 +0000
commit: 64432b680eab0bddbe9a4ad4798457cf6a14ad60 (patch)
tree: c88d7613be91dc9c472634980e0de2f9a8897c37 /gcc/tree-vect-loop.c
parent: 2d7a40fa60fb8b9870cfd053a37fc67404353ee2 (diff)
download: gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.zip
gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.gz
gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.bz2
1 files changed, 49 insertions, 36 deletions
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 52757ad..688538a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2773,43 +2773,56 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
 
   /* Check whether the (fractional) cost per scalar iteration is lower
      or higher: new_inside_cost / new_vf vs. old_inside_cost / old_vf.  */
-  poly_widest_int rel_new = (new_loop_vinfo->vec_inside_cost
-			     * poly_widest_int (old_vf));
-  poly_widest_int rel_old = (old_loop_vinfo->vec_inside_cost
-			     * poly_widest_int (new_vf));
-  if (maybe_lt (rel_old, rel_new))
-    {
-      /* When old_loop_vinfo uses a variable vectorization factor,
-	 we know that it has a lower cost for at least one runtime VF.
-	 However, we don't know how likely that VF is.
-
-	 One option would be to compare the costs for the estimated VFs.
-	 The problem is that that can put too much pressure on the cost
-	 model.  E.g. if the estimated VF is also the lowest possible VF,
-	 and if old_loop_vinfo is 1 unit worse than new_loop_vinfo
-	 for the estimated VF, we'd then choose new_loop_vinfo even
-	 though (a) new_loop_vinfo might not actually be better than
-	 old_loop_vinfo for that VF and (b) it would be significantly
-	 worse at larger VFs.
-
-	 Here we go for a hacky compromise: pick new_loop_vinfo if it is
-	 no more expensive than old_loop_vinfo even after doubling the
-	 estimated old_loop_vinfo VF.  For all but trivial loops, this
-	 ensures that we only pick new_loop_vinfo if it is significantly
-	 better than old_loop_vinfo at the estimated VF.  */
-      if (rel_new.is_constant ())
-	return false;
-
-      HOST_WIDE_INT new_estimated_vf = estimated_poly_value (new_vf);
-      HOST_WIDE_INT old_estimated_vf = estimated_poly_value (old_vf);
-      widest_int estimated_rel_new = (new_loop_vinfo->vec_inside_cost
-				      * widest_int (old_estimated_vf));
-      widest_int estimated_rel_old = (old_loop_vinfo->vec_inside_cost
-				      * widest_int (new_estimated_vf));
-      return estimated_rel_new * 2 <= estimated_rel_old;
-    }
-  if (known_lt (rel_new, rel_old))
+  poly_int64 rel_new = new_loop_vinfo->vec_inside_cost * old_vf;
+  poly_int64 rel_old = old_loop_vinfo->vec_inside_cost * new_vf;
+
+  HOST_WIDE_INT est_rel_new_min
+    = estimated_poly_value (rel_new, POLY_VALUE_MIN);
+  HOST_WIDE_INT est_rel_new_max
+    = estimated_poly_value (rel_new, POLY_VALUE_MAX);
+
+  HOST_WIDE_INT est_rel_old_min
+    = estimated_poly_value (rel_old, POLY_VALUE_MIN);
+  HOST_WIDE_INT est_rel_old_max
+    = estimated_poly_value (rel_old, POLY_VALUE_MAX);
+
+  /* Check first if we can make out an unambigous total order from the minimum
+     and maximum estimates.  */
+  if (est_rel_new_min < est_rel_old_min
+      && est_rel_new_max < est_rel_old_max)
     return true;
+  else if (est_rel_old_min < est_rel_new_min
+	   && est_rel_old_max < est_rel_new_max)
+    return false;
+  /* When old_loop_vinfo uses a variable vectorization factor,
+     we know that it has a lower cost for at least one runtime VF.
+     However, we don't know how likely that VF is.
+
+     One option would be to compare the costs for the estimated VFs.
+     The problem is that that can put too much pressure on the cost
+     model.  E.g. if the estimated VF is also the lowest possible VF,
+     and if old_loop_vinfo is 1 unit worse than new_loop_vinfo
+     for the estimated VF, we'd then choose new_loop_vinfo even
+     though (a) new_loop_vinfo might not actually be better than
+     old_loop_vinfo for that VF and (b) it would be significantly
+     worse at larger VFs.
+
+     Here we go for a hacky compromise: pick new_loop_vinfo if it is
+     no more expensive than old_loop_vinfo even after doubling the
+     estimated old_loop_vinfo VF.  For all but trivial loops, this
+     ensures that we only pick new_loop_vinfo if it is significantly
+     better than old_loop_vinfo at the estimated VF.  */
+
+  if (est_rel_old_min != est_rel_new_min
+      || est_rel_old_max != est_rel_new_max)
+    {
+      HOST_WIDE_INT est_rel_new_likely
+	= estimated_poly_value (rel_new, POLY_VALUE_LIKELY);
+      HOST_WIDE_INT est_rel_old_likely
+	= estimated_poly_value (rel_old, POLY_VALUE_LIKELY);
+
+      return est_rel_new_likely * 2 <= est_rel_old_likely;
+    }
 
   /* If there's nothing to choose between the loop bodies, see whether
      there's a difference in the prologue and epilogue costs.  */
author	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2020-12-17 18:02:37 +0000
committer	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2020-12-17 18:04:21 +0000
commit	64432b680eab0bddbe9a4ad4798457cf6a14ad60 (patch)
tree	c88d7613be91dc9c472634980e0de2f9a8897c37 /gcc/tree-vect-loop.c
parent	2d7a40fa60fb8b9870cfd053a37fc67404353ee2 (diff)
download	gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.zip gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.gz gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.bz2