diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2020-12-17 18:02:37 +0000 |
---|---|---|
committer | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2020-12-17 18:04:21 +0000 |
commit | 64432b680eab0bddbe9a4ad4798457cf6a14ad60 (patch) | |
tree | c88d7613be91dc9c472634980e0de2f9a8897c37 /gcc/tree-vect-loop.c | |
parent | 2d7a40fa60fb8b9870cfd053a37fc67404353ee2 (diff) | |
download | gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.zip gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.gz gcc-64432b680eab0bddbe9a4ad4798457cf6a14ad60.tar.bz2 |
vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in vect_better_loop_vinfo_p
While experimenting with some backend costs for Advanced SIMD and SVE I
hit many cases where GCC would pick SVE for VLA auto-vectorisation even when
the backend very clearly presented cheaper costs for Advanced SIMD.
For a simple float addition loop the SVE costs were:
vec.c:9:21: note: Cost model analysis:
Vector inside of loop cost: 28
Vector prologue cost: 2
Vector epilogue cost: 0
Scalar iteration cost: 10
Scalar outside cost: 0
Vector outside cost: 2
prologue iterations: 0
epilogue iterations: 0
Minimum number of vector iterations: 1
Calculated minimum iters for profitability: 4
and for Advanced SIMD (Neon) they're:
vec.c:9:21: note: Cost model analysis:
Vector inside of loop cost: 11
Vector prologue cost: 0
Vector epilogue cost: 0
Scalar iteration cost: 10
Scalar outside cost: 0
Vector outside cost: 0
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 0
vec.c:9:21: note: Runtime profitability threshold = 4
yet the SVE one was always picked. With guidance from Richard this seems
to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in
particular the part with the big comment explaining the
estimated_rel_new * 2 <= estimated_rel_old heuristic.
This patch extends the comparisons by introducing a three-way estimate
kind for poly_int values that the backend can distinguish.
This allows vect_better_loop_vinfo_p to ask for minimum, maximum and
likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper.
gcc/
* target.h (enum poly_value_estimate_kind): Define.
(estimated_poly_value): Take an estimate kind argument.
* target.def (estimated_poly_value): Update definition for the
above.
* doc/tm.texi: Regenerate.
* targhooks.c (estimated_poly_value): Update prototype.
* tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and
likely estimates of VF to pick between vinfos.
* config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use
estimated_poly_value instead of aarch64_estimated_poly_value.
(aarch64_estimated_poly_value): Take a kind argument and handle
it.
Diffstat (limited to 'gcc/tree-vect-loop.c')
-rw-r--r-- | gcc/tree-vect-loop.c | 85 |
1 files changed, 49 insertions, 36 deletions
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 52757ad..688538a 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2773,43 +2773,56 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo, /* Check whether the (fractional) cost per scalar iteration is lower or higher: new_inside_cost / new_vf vs. old_inside_cost / old_vf. */ - poly_widest_int rel_new = (new_loop_vinfo->vec_inside_cost - * poly_widest_int (old_vf)); - poly_widest_int rel_old = (old_loop_vinfo->vec_inside_cost - * poly_widest_int (new_vf)); - if (maybe_lt (rel_old, rel_new)) - { - /* When old_loop_vinfo uses a variable vectorization factor, - we know that it has a lower cost for at least one runtime VF. - However, we don't know how likely that VF is. - - One option would be to compare the costs for the estimated VFs. - The problem is that that can put too much pressure on the cost - model. E.g. if the estimated VF is also the lowest possible VF, - and if old_loop_vinfo is 1 unit worse than new_loop_vinfo - for the estimated VF, we'd then choose new_loop_vinfo even - though (a) new_loop_vinfo might not actually be better than - old_loop_vinfo for that VF and (b) it would be significantly - worse at larger VFs. - - Here we go for a hacky compromise: pick new_loop_vinfo if it is - no more expensive than old_loop_vinfo even after doubling the - estimated old_loop_vinfo VF. For all but trivial loops, this - ensures that we only pick new_loop_vinfo if it is significantly - better than old_loop_vinfo at the estimated VF. */ - if (rel_new.is_constant ()) - return false; - - HOST_WIDE_INT new_estimated_vf = estimated_poly_value (new_vf); - HOST_WIDE_INT old_estimated_vf = estimated_poly_value (old_vf); - widest_int estimated_rel_new = (new_loop_vinfo->vec_inside_cost - * widest_int (old_estimated_vf)); - widest_int estimated_rel_old = (old_loop_vinfo->vec_inside_cost - * widest_int (new_estimated_vf)); - return estimated_rel_new * 2 <= estimated_rel_old; - } - if (known_lt (rel_new, rel_old)) + poly_int64 rel_new = new_loop_vinfo->vec_inside_cost * old_vf; + poly_int64 rel_old = old_loop_vinfo->vec_inside_cost * new_vf; + + HOST_WIDE_INT est_rel_new_min + = estimated_poly_value (rel_new, POLY_VALUE_MIN); + HOST_WIDE_INT est_rel_new_max + = estimated_poly_value (rel_new, POLY_VALUE_MAX); + + HOST_WIDE_INT est_rel_old_min + = estimated_poly_value (rel_old, POLY_VALUE_MIN); + HOST_WIDE_INT est_rel_old_max + = estimated_poly_value (rel_old, POLY_VALUE_MAX); + + /* Check first if we can make out an unambigous total order from the minimum + and maximum estimates. */ + if (est_rel_new_min < est_rel_old_min + && est_rel_new_max < est_rel_old_max) return true; + else if (est_rel_old_min < est_rel_new_min + && est_rel_old_max < est_rel_new_max) + return false; + /* When old_loop_vinfo uses a variable vectorization factor, + we know that it has a lower cost for at least one runtime VF. + However, we don't know how likely that VF is. + + One option would be to compare the costs for the estimated VFs. + The problem is that that can put too much pressure on the cost + model. E.g. if the estimated VF is also the lowest possible VF, + and if old_loop_vinfo is 1 unit worse than new_loop_vinfo + for the estimated VF, we'd then choose new_loop_vinfo even + though (a) new_loop_vinfo might not actually be better than + old_loop_vinfo for that VF and (b) it would be significantly + worse at larger VFs. + + Here we go for a hacky compromise: pick new_loop_vinfo if it is + no more expensive than old_loop_vinfo even after doubling the + estimated old_loop_vinfo VF. For all but trivial loops, this + ensures that we only pick new_loop_vinfo if it is significantly + better than old_loop_vinfo at the estimated VF. */ + + if (est_rel_old_min != est_rel_new_min + || est_rel_old_max != est_rel_new_max) + { + HOST_WIDE_INT est_rel_new_likely + = estimated_poly_value (rel_new, POLY_VALUE_LIKELY); + HOST_WIDE_INT est_rel_old_likely + = estimated_poly_value (rel_old, POLY_VALUE_LIKELY); + + return est_rel_new_likely * 2 <= est_rel_old_likely; + } /* If there's nothing to choose between the loop bodies, see whether there's a difference in the prologue and epilogue costs. */ |