diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2020-12-17 18:02:37 +0000 |
---|---|---|
committer | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2020-12-17 18:04:21 +0000 |
commit | 64432b680eab0bddbe9a4ad4798457cf6a14ad60 (patch) | |
tree | c88d7613be91dc9c472634980e0de2f9a8897c37 /gcc/doc | |
parent | 2d7a40fa60fb8b9870cfd053a37fc67404353ee2 (diff) | |
download | gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.zip gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.gz gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.bz2 |
vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in vect_better_loop_vinfo_p
While experimenting with some backend costs for Advanced SIMD and SVE I
hit many cases where GCC would pick SVE for VLA auto-vectorisation even when
the backend very clearly presented cheaper costs for Advanced SIMD.
For a simple float addition loop the SVE costs were:
vec.c:9:21: note: Cost model analysis:
Vector inside of loop cost: 28
Vector prologue cost: 2
Vector epilogue cost: 0
Scalar iteration cost: 10
Scalar outside cost: 0
Vector outside cost: 2
prologue iterations: 0
epilogue iterations: 0
Minimum number of vector iterations: 1
Calculated minimum iters for profitability: 4
and for Advanced SIMD (Neon) they're:
vec.c:9:21: note: Cost model analysis:
Vector inside of loop cost: 11
Vector prologue cost: 0
Vector epilogue cost: 0
Scalar iteration cost: 10
Scalar outside cost: 0
Vector outside cost: 0
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 0
vec.c:9:21: note: Runtime profitability threshold = 4
yet the SVE one was always picked. With guidance from Richard this seems
to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in
particular the part with the big comment explaining the
estimated_rel_new * 2 <= estimated_rel_old heuristic.
This patch extends the comparisons by introducing a three-way estimate
kind for poly_int values that the backend can distinguish.
This allows vect_better_loop_vinfo_p to ask for minimum, maximum and
likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper.
gcc/
* target.h (enum poly_value_estimate_kind): Define.
(estimated_poly_value): Take an estimate kind argument.
* target.def (estimated_poly_value): Update definition for the
above.
* doc/tm.texi: Regenerate.
* targhooks.c (estimated_poly_value): Update prototype.
* tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and
likely estimates of VF to pick between vinfos.
* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use
estimated_poly_value instead of aarch64_estimated_poly_value.
(aarch64_estimated_poly_value): Take a kind argument and handle
it.
Diffstat (limited to 'gcc/doc')
-rw-r--r-- | gcc/doc/tm.texi | 7 |
1 files changed, 5 insertions, 2 deletions
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index d9b855c..900d584 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -7005,9 +7005,12 @@ delay slot branches filled using the basic filler is often still desirable as the delay slot can hide a pipeline bubble. @end deftypefn -@deftypefn {Target Hook} HOST_WIDE_INT TARGET_ESTIMATED_POLY_VALUE (poly_int64 @var{val}) +@deftypefn {Target Hook} HOST_WIDE_INT TARGET_ESTIMATED_POLY_VALUE (poly_int64 @var{val}, poly_value_estimate_kind @var{kind}) Return an estimate of the runtime value of @var{val}, for use in -things like cost calculations or profiling frequencies. The default +things like cost calculations or profiling frequencies. @var{kind} is used +to ask for the minimum, maximum, and likely estimates of the value through +the @code{POLY_VALUE_MIN}, @code{POLY_VALUE_MAX} and +@code{POLY_VALUE_LIKELY} values. The default implementation returns the lowest possible value of @var{val}. @end deftypefn |