From 64432b680eab0bddbe9a4ad4798457cf6a14ad60 Mon Sep 17 00:00:00 2001
From: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date: Thu, 17 Dec 2020 18:02:37 +0000
Subject: vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in
 vect_better_loop_vinfo_p

While experimenting with some backend costs for Advanced SIMD and SVE I
hit many cases where GCC would pick SVE for VLA auto-vectorisation even when
the backend very clearly presented cheaper costs for Advanced SIMD.
For a simple float addition loop the SVE costs were:

vec.c:9:21: note:  Cost model analysis:
  Vector inside of loop cost: 28
  Vector prologue cost: 2
  Vector epilogue cost: 0
  Scalar iteration cost: 10
  Scalar outside cost: 0
  Vector outside cost: 2
  prologue iterations: 0
  epilogue iterations: 0
  Minimum number of vector iterations: 1
  Calculated minimum iters for profitability: 4

and for Advanced SIMD (Neon) they're:

vec.c:9:21: note:  Cost model analysis:
  Vector inside of loop cost: 11
  Vector prologue cost: 0
  Vector epilogue cost: 0
  Scalar iteration cost: 10
  Scalar outside cost: 0
  Vector outside cost: 0
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 0
vec.c:9:21: note:    Runtime profitability threshold = 4

yet the SVE one was always picked. With guidance from Richard this seems
to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in
particular the part with the big comment explaining the
estimated_rel_new * 2 <= estimated_rel_old heuristic.

This patch extends the comparisons by introducing a three-way estimate
kind for poly_int values that the backend can distinguish.
This allows vect_better_loop_vinfo_p to ask for minimum, maximum and
likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper.

gcc/
	* target.h (enum poly_value_estimate_kind): Define.
	(estimated_poly_value): Take an estimate kind argument.
	* target.def (estimated_poly_value): Update definition for the
	above.
	* doc/tm.texi: Regenerate.
	* targhooks.c (estimated_poly_value): Update prototype.
	* tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and
	likely estimates of VF to pick between vinfos.
	* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use
	estimated_poly_value instead of aarch64_estimated_poly_value.
	(aarch64_estimated_poly_value): Take a kind argument and handle
	it.
---
 gcc/target.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

(limited to 'gcc/target.h')

diff --git a/gcc/target.h b/gcc/target.h
index 9601880..68ef519 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -252,6 +252,13 @@ enum type_context_kind {
   TCTX_CAPTURE_BY_COPY
 };
 
+enum poly_value_estimate_kind
+{
+  POLY_VALUE_MIN,
+  POLY_VALUE_MAX,
+  POLY_VALUE_LIKELY
+};
+
 extern bool verify_type_context (location_t, type_context_kind, const_tree,
 				 bool = false);
 
@@ -272,12 +279,13 @@ extern struct gcc_target targetm;
    provides a rough guess.  */
 
 static inline HOST_WIDE_INT
-estimated_poly_value (poly_int64 x)
+estimated_poly_value (poly_int64 x,
+		      poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
 {
   if (NUM_POLY_INT_COEFFS == 1)
     return x.coeffs[0];
   else
-    return targetm.estimated_poly_value (x);
+    return targetm.estimated_poly_value (x, kind);
 }
 
 #ifdef GCC_TM_H
-- 
cgit v1.1