diff options
author | Richard Sandiford <richard.sandiford@arm.com> | 2020-09-23 12:29:40 +0100 |
---|---|---|
committer | Richard Sandiford <richard.sandiford@arm.com> | 2020-09-23 12:29:40 +0100 |
commit | 4452a7660b224ff310d246bc7f8c612669c8cd98 (patch) | |
tree | eecf6ab863ceca04f5a7f854c470e2828f6eb38b /gcc/tree-vect-loop-manip.c | |
parent | 02b5377b3766804059b7824330d33d0e1cef2e5b (diff) | |
download | gcc-4452a7660b224ff310d246bc7f8c612669c8cd98.zip gcc-4452a7660b224ff310d246bc7f8c612669c8cd98.tar.gz gcc-4452a7660b224ff310d246bc7f8c612669c8cd98.tar.bz2 |
vect: Fix epilogue loop handling of partial vectors
This patch fixes the fallout that Kewen reported on Power after
the recent change to avoid unnecessary use of partial vectors.
As Kewen said, the problem is that vect_analyze_loop_2 doesn't
know how many epilogue iterations there will be, and so it
cannot make a final decision about whether the number of
iterations forces an epilogue loop to use partial vectors.
This is similar to the current situation for peeling: we don't know
during initial analysis whether an epilogue loop will itself require
peeling. Instead we decide that during vect_do_peeling, where the
final number of epilogue loop iterations is known.
The patch takes a similar approach for the decision about whether
to use partial vectors. As the comments in the patch say, the
idea is that vect_analyze_loop_2 should make peeling and partial-
vector decisions based on the assumption that the loop_vinfo will
be used as the main loop, while vect_do_peeling should make them
in the knowledge that the loop_vinfo will be used as an epilogue loop.
This allows the same analysis to be used for both cases, which we
rely on for implementing VECT_COMPARE_COSTS; see the big comment
in vect_analyze_loop for details.
I hope the patch makes the (mostly preexisting) structure a bit
more obvious. It isn't what anyone would design from scratch,
but that's the nature of working with a mature vector framework.
Arranging things this way means that vect_verify_full_masking
and vect_verify_loop_lens now become part of the “can” rather
than “will” test for partial vectors.
Also, while splitting out the logic that handles epilogues with
constant iterations, I added a check to make sure that we don't
try to use partial vectors to vectorise a single-scalar loop.
This required some changes to the Power tests.
gcc/
* tree-vectorizer.h (determine_peel_for_niter): Delete in favor of...
(vect_determine_partial_vectors_and_peeling): ...this new function.
* tree-vect-loop-manip.c (vect_update_epilogue_niters): New function.
Reject using vector epilogue loops for single iterations. Install
the constant number of epilogue loop iterations in the associated
loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling
to do the main part of the test.
(vect_do_peeling): Use vect_update_epilogue_niters to handle
epilogue loops with a known number of iterations. Skip recomputing
the number of iterations later in that case. Otherwise, use
vect_determine_partial_vectors_and_peeling to decide whether the
epilogue loop needs to use partial vectors or peeling.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the
default can_use_partial_vectors_p to false if partial-vector-usage=0.
(determine_peel_for_niter): Remove in favor of...
(vect_determine_partial_vectors_and_peeling): ...this new function,
split out from...
(vect_analyze_loop_2): ...here. Reflect the vect_verify_full_masking
and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P
rather than USING_PARTIAL_VECTORS_P.
gcc/testsuite/
* gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the
single-iteration epilogues of the 64-bit loops to be vectorized.
* gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
Diffstat (limited to 'gcc/tree-vect-loop-manip.c')
-rw-r--r-- | gcc/tree-vect-loop-manip.c | 83 |
1 files changed, 57 insertions, 26 deletions
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 47cfa6f..7cf00e6 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2386,6 +2386,34 @@ slpeel_update_phi_nodes_for_lcssa (class loop *epilog) rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e)); } +/* EPILOGUE_VINFO is an epilogue loop that we now know would need to + iterate exactly CONST_NITERS times. Make a final decision about + whether the epilogue loop should be used, returning true if so. */ + +static bool +vect_update_epilogue_niters (loop_vec_info epilogue_vinfo, + unsigned HOST_WIDE_INT const_niters) +{ + /* Avoid wrap-around when computing const_niters - 1. Also reject + using an epilogue loop for a single scalar iteration, even if + we could in principle implement that using partial vectors. */ + unsigned int gap_niters = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); + if (const_niters <= gap_niters + 1) + return false; + + /* Install the number of iterations. */ + tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (epilogue_vinfo)); + tree niters_tree = build_int_cst (niters_type, const_niters); + tree nitersm1_tree = build_int_cst (niters_type, const_niters - 1); + + LOOP_VINFO_NITERS (epilogue_vinfo) = niters_tree; + LOOP_VINFO_NITERSM1 (epilogue_vinfo) = nitersm1_tree; + + /* Decide what to do if the number of epilogue iterations is not + a multiple of the epilogue loop's vectorization factor. */ + return vect_determine_partial_vectors_and_peeling (epilogue_vinfo, true); +} + /* Function vect_do_peeling. Input: @@ -2493,6 +2521,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, int estimated_vf; int prolog_peeling = 0; bool vect_epilogues = loop_vinfo->epilogue_vinfos.length () > 0; + bool vect_epilogues_updated_niters = false; /* We currently do not support prolog peeling if the target alignment is not known at compile time. 'vect_gen_prolog_loop_niters' depends on the target alignment being constant. */ @@ -2601,8 +2630,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (vect_epilogues && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && prolog_peeling >= 0 - && known_eq (vf, lowest_vf) - && !LOOP_VINFO_USING_PARTIAL_VECTORS_P (epilogue_vinfo)) + && known_eq (vf, lowest_vf)) { unsigned HOST_WIDE_INT eiters = (LOOP_VINFO_INT_NITERS (loop_vinfo) @@ -2612,13 +2640,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, eiters = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); - unsigned int ratio; - unsigned int epilogue_gaps - = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); - while (!(constant_multiple_p - (GET_MODE_SIZE (loop_vinfo->vector_mode), - GET_MODE_SIZE (epilogue_vinfo->vector_mode), &ratio) - && eiters >= lowest_vf / ratio + epilogue_gaps)) + while (!vect_update_epilogue_niters (epilogue_vinfo, eiters)) { delete epilogue_vinfo; epilogue_vinfo = NULL; @@ -2629,8 +2651,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, } epilogue_vinfo = loop_vinfo->epilogue_vinfos[0]; loop_vinfo->epilogue_vinfos.ordered_remove (0); - epilogue_gaps = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); } + vect_epilogues_updated_niters = true; } /* Prolog loop may be skipped. */ bool skip_prolog = (prolog_peeling != 0); @@ -2928,7 +2950,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, skip_e edge. */ if (skip_vector) { - gcc_assert (update_e != NULL && skip_e != NULL); + gcc_assert (update_e != NULL + && skip_e != NULL + && !vect_epilogues_updated_niters); gphi *new_phi = create_phi_node (make_ssa_name (TREE_TYPE (niters)), update_e->dest); tree new_ssa = make_ssa_name (TREE_TYPE (niters)); @@ -2953,25 +2977,32 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, niters = PHI_RESULT (new_phi); } - /* Subtract the number of iterations performed by the vectorized loop - from the number of total iterations. */ - tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), - before_loop_niters, - niters); - - LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters; - LOOP_VINFO_NITERSM1 (epilogue_vinfo) - = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters), - epilogue_niters, - build_one_cst (TREE_TYPE (epilogue_niters))); - /* Set ADVANCE to the number of iterations performed by the previous loop and its prologue. */ *advance = niters; - /* Redo the peeling for niter analysis as the NITERs and alignment - may have been updated to take the main loop into account. */ - determine_peel_for_niter (epilogue_vinfo); + if (!vect_epilogues_updated_niters) + { + /* Subtract the number of iterations performed by the vectorized loop + from the number of total iterations. */ + tree epilogue_niters = fold_build2 (MINUS_EXPR, TREE_TYPE (niters), + before_loop_niters, + niters); + + LOOP_VINFO_NITERS (epilogue_vinfo) = epilogue_niters; + LOOP_VINFO_NITERSM1 (epilogue_vinfo) + = fold_build2 (MINUS_EXPR, TREE_TYPE (epilogue_niters), + epilogue_niters, + build_one_cst (TREE_TYPE (epilogue_niters))); + + /* Decide what to do if the number of epilogue iterations is not + a multiple of the epilogue loop's vectorization factor. + We should have rejected the loop during the analysis phase + if this fails. */ + if (!vect_determine_partial_vectors_and_peeling (epilogue_vinfo, + true)) + gcc_unreachable (); + } } adjust_vec.release (); |