vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in vect_better_loop_vinfo_p

While experimenting with some backend costs for Advanced SIMD and SVE I hit many cases where GCC would pick SVE for VLA auto-vectorisation even when the backend very clearly presented cheaper costs for Advanced SIMD. For a simple float addition loop the SVE costs were: vec.c:9:21: note: Cost model analysis: Vector inside of loop cost: 28 Vector prologue cost: 2 Vector epilogue cost: 0 Scalar iteration cost: 10 Scalar outside cost: 0 Vector outside cost: 2 prologue iterations: 0 epilogue iterations: 0 Minimum number of vector iterations: 1 Calculated minimum iters for profitability: 4 and for Advanced SIMD (Neon) they're: vec.c:9:21: note: Cost model analysis: Vector inside of loop cost: 11 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar iteration cost: 10 Scalar outside cost: 0 Vector outside cost: 0 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 0 vec.c:9:21: note: Runtime profitability threshold = 4 yet the SVE one was always picked. With guidance from Richard this seems to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in particular the part with the big comment explaining the estimated_rel_new * 2 <= estimated_rel_old heuristic. This patch extends the comparisons by introducing a three-way estimate kind for poly_int values that the backend can distinguish. This allows vect_better_loop_vinfo_p to ask for minimum, maximum and likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper. gcc/ * target.h (enum poly_value_estimate_kind): Define. (estimated_poly_value): Take an estimate kind argument. * target.def (estimated_poly_value): Update definition for the above. * doc/tm.texi: Regenerate. * targhooks.c (estimated_poly_value): Update prototype. * tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and likely estimates of VF to pick between vinfos. * config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use estimated_poly_value instead of aarch64_estimated_poly_value. (aarch64_estimated_poly_value): Take a kind argument and handle it.
author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> 2020-12-17 18:02:37 +0000
committer: Kyrylo Tkachov <kyrylo.tkachov@arm.com> 2020-12-17 18:04:21 +0000
commit: 64432b680eab0bddbe9a4ad4798457cf6a14ad60 (patch)
tree: c88d7613be91dc9c472634980e0de2f9a8897c37 /gcc/doc
parent: 2d7a40fa60fb8b9870cfd053a37fc67404353ee2 (diff)
download: gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.zip
gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.gz
gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.bz2
1 files changed, 5 insertions, 2 deletions
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index d9b855c..900d584 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7005,9 +7005,12 @@ delay slot branches filled using the basic filler is often still desirable
 as the delay slot can hide a pipeline bubble.
 @end deftypefn
 
-@deftypefn {Target Hook} HOST_WIDE_INT TARGET_ESTIMATED_POLY_VALUE (poly_int64 @var{val})
+@deftypefn {Target Hook} HOST_WIDE_INT TARGET_ESTIMATED_POLY_VALUE (poly_int64 @var{val}, poly_value_estimate_kind @var{kind})
 Return an estimate of the runtime value of @var{val}, for use in
-things like cost calculations or profiling frequencies.  The default
+things like cost calculations or profiling frequencies.  @var{kind} is used
+to ask for the minimum, maximum, and likely estimates of the value through
+the @code{POLY_VALUE_MIN}, @code{POLY_VALUE_MAX} and
+@code{POLY_VALUE_LIKELY} values.  The default
 implementation returns the lowest possible value of @var{val}.
 @end deftypefn
author	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2020-12-17 18:02:37 +0000
committer	Kyrylo Tkachov <kyrylo.tkachov@arm.com>	2020-12-17 18:04:21 +0000
commit	64432b680eab0bddbe9a4ad4798457cf6a14ad60 (patch)
tree	c88d7613be91dc9c472634980e0de2f9a8897c37 /gcc/doc
parent	2d7a40fa60fb8b9870cfd053a37fc67404353ee2 (diff)
download	gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.zip gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.gz gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.bz2