diff options
author | Richard Sandiford <richard.sandiford@arm.com> | 2021-10-04 23:55:43 +0100 |
---|---|---|
committer | Richard Sandiford <richard.sandiford@arm.com> | 2021-10-08 13:17:47 +0100 |
commit | 0ee3dc6052361290c92bba492cc0a9e556b31055 (patch) | |
tree | b191e3bb94fe67445ee327752388ceef80c7197e /gcc/tree-ssa-loop-manip.c | |
parent | 816da497dfba541aabdbe08a03e6e9988a7d9573 (diff) | |
download | gcc-0ee3dc6052361290c92bba492cc0a9e556b31055.zip gcc-0ee3dc6052361290c92bba492cc0a9e556b31055.tar.gz gcc-0ee3dc6052361290c92bba492cc0a9e556b31055.tar.bz2 |
loop: Fix profile updates after unrolling [PR102385]
In g:62acc72a957b5614 I'd stopped the unroller from using
an epilogue loop in cases where the iteration count was
known to be a multiple of the unroll factor. The epilogue
and non-epilogue cases still shared this (preexisting) code
to update the edge frequencies:
basic_block exit_bb = single_pred (loop->latch);
new_exit = find_edge (exit_bb, rest);
new_exit->probability = profile_probability::always ()
.apply_scale (1, new_est_niter + 1);
[etc]
But of course (in hindsight) that only makes sense for the
epilogue case, where we've already moved the main loop's exit edge
to be a sibling of the latch edge. For the non-epilogue case,
the exit edge stays (and needs to stay) in its original position.
I don't really understand what the code is trying to do for
the epilogue case. It has:
/* Ensure that the frequencies in the loop match the new estimated
number of iterations, and change the probability of the new
exit edge. */
profile_count freq_h = loop->header->count;
profile_count freq_e = (loop_preheader_edge (loop))->count ();
if (freq_h.nonzero_p ())
{
...
scale_loop_frequencies (loop, freq_e.probability_in (freq_h));
}
Here, freq_e.probability_in (freq_h) is freq_e / freq_h, so for the
header block, this has the effect of:
new header count = freq_h * (freq_e / freq_h)
i.e. we say that the header executes exactly as often as the
preheader edge, which would only make sense if the loop never
iterates. Also, after setting the probability of the nonexit edge
(correctly) to new_est_niter / (new_est_niter + 1), the code does:
scale_bbs_frequencies (&loop->latch, 1, prob);
for this new probability. I think that only makes sense if the
nonexit edge was previously unconditional (100%). But the code
carefully preserved the probability of the original exit edge
when creating the new one.
All I'm trying to do here though is fix the mess I created
and get the probabilities right for the non-epilogue case.
Things are simpler there since we don't have to worry about
loop versioning. Hopefully the comments explain the approach.
The function's current interface implies that it can cope with
multiple exit edges and that the function only needs the iteration
count relative to one of those edges in order to work correctly.
In practice that's not the case: it assumes there is exactly one
exit edge and all current callers also ensure that the exit test
dominates the latch. I think the function is easier to follow
if we remove the implied generality.
gcc/
PR tree-optimization/102385
* predict.h (change_edge_frequency): Declare.
* predict.c (change_edge_frequency): New function.
* tree-ssa-loop-manip.h (tree_transform_and_unroll_loop): Remove
edge argument.
(tree_unroll_loop): Likewise.
* gimple-loop-jam.c (tree_loop_unroll_and_jam): Update accordingly.
* tree-predcom.c (pcom_worker::tree_predictive_commoning_loop):
Likewise.
* tree-ssa-loop-prefetch.c (loop_prefetch_arrays): Likewise.
* tree-ssa-loop-manip.c (tree_unroll_loop): Likewise.
(tree_transform_and_unroll_loop): Likewise. Use single_dom_exit
to retrieve the exit edges. Make all the old profile update code
conditional on !single_loop_p -- the case it was written for --
and use a different approach for the single-loop case.
gcc/testsuite/
* gcc.dg/pr102385.c: New test.
Diffstat (limited to 'gcc/tree-ssa-loop-manip.c')
-rw-r--r-- | gcc/tree-ssa-loop-manip.c | 111 |
1 files changed, 83 insertions, 28 deletions
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c index c7a2f67..2d576bf 100644 --- a/gcc/tree-ssa-loop-manip.c +++ b/gcc/tree-ssa-loop-manip.c @@ -1182,8 +1182,9 @@ niter_for_unrolled_loop (class loop *loop, unsigned factor) return new_est_niter; } -/* Unroll LOOP FACTOR times. DESC describes number of iterations of LOOP. - EXIT is the exit of the loop to that DESC corresponds. +/* Unroll LOOP FACTOR times. LOOP is known to have a single exit edge + whose source block dominates the latch. DESC describes the number of + iterations of LOOP. If N is number of iterations of the loop and MAY_BE_ZERO is the condition under that loop exits in the first iteration even if N != 0, @@ -1241,7 +1242,7 @@ niter_for_unrolled_loop (class loop *loop, unsigned factor) void tree_transform_and_unroll_loop (class loop *loop, unsigned factor, - edge exit, class tree_niter_desc *desc, + class tree_niter_desc *desc, transform_callback transform, void *data) { @@ -1265,10 +1266,11 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, gcond *exit_if = nullptr; class loop *new_loop = nullptr; - basic_block rest; edge new_exit; if (!single_loop_p) { + edge exit = single_dom_exit (loop); + /* The values for scales should keep profile consistent, and somewhat close to correct. @@ -1291,7 +1293,7 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, /* Prepare the cfg and update the phi nodes. Move the loop exit to the loop latch (and make its condition dummy, for the moment). */ - rest = loop_preheader_edge (new_loop)->src; + basic_block rest = loop_preheader_edge (new_loop)->src; edge precond_edge = single_pred_edge (rest); split_edge (loop_latch_edge (loop)); basic_block exit_bb = single_pred (loop->latch); @@ -1373,10 +1375,7 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, remove_path (exit); } else - { - new_exit = exit; - rest = exit->dest; - } + new_exit = single_dom_exit (loop); /* Transform the loop. */ if (transform) @@ -1401,6 +1400,7 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, } update_ssa (TODO_update_ssa); + new_exit = single_dom_exit (loop); if (!single_loop_p) { /* Ensure that the frequencies in the loop match the new estimated @@ -1417,29 +1417,24 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, freq_e = freq_e.force_nonzero (); scale_loop_frequencies (loop, freq_e.probability_in (freq_h)); } - } - basic_block exit_bb = single_pred (loop->latch); - new_exit = find_edge (exit_bb, rest); - new_exit->probability = profile_probability::always () - .apply_scale (1, new_est_niter + 1); + basic_block rest = new_exit->dest; + new_exit->probability = profile_probability::always () + .apply_scale (1, new_est_niter + 1); - if (!single_loop_p) - rest->count += new_exit->count (); + rest->count += new_exit->count (); - edge new_nonexit = single_pred_edge (loop->latch); - profile_probability prob = new_nonexit->probability; - new_nonexit->probability = new_exit->probability.invert (); - prob = new_nonexit->probability / prob; - if (prob.initialized_p ()) - scale_bbs_frequencies (&loop->latch, 1, prob); + edge new_nonexit = single_pred_edge (loop->latch); + profile_probability prob = new_nonexit->probability; + new_nonexit->probability = new_exit->probability.invert (); + prob = new_nonexit->probability / prob; + if (prob.initialized_p ()) + scale_bbs_frequencies (&loop->latch, 1, prob); - if (!single_loop_p) - { /* Finally create the new counter for number of iterations and add the new exit instruction. */ tree ctr_before, ctr_after; - gimple_stmt_iterator bsi = gsi_last_nondebug_bb (exit_bb); + gimple_stmt_iterator bsi = gsi_last_nondebug_bb (new_exit->src); exit_if = as_a <gcond *> (gsi_stmt (bsi)); create_iv (exit_base, exit_step, NULL_TREE, loop, &bsi, false, &ctr_before, &ctr_after); @@ -1448,6 +1443,67 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, gimple_cond_set_rhs (exit_if, exit_bound); update_stmt (exit_if); } + else + { + /* gimple_duplicate_loop_to_header_edge has adjusted the loop body's + original profile counts in line with the unroll factor. However, + the old counts might not have been consistent with the old + iteration count. + + Therefore, if the iteration count is known exactly, make sure that the + profile counts of the loop header (and any other blocks that might be + executed in the final iteration) are consistent with the combination + of (a) the incoming profile count and (b) the new iteration count. */ + profile_count in_count = loop_preheader_edge (loop)->count (); + profile_count old_header_count = loop->header->count; + if (in_count.nonzero_p () + && old_header_count.nonzero_p () + && TREE_CODE (desc->niter) == INTEGER_CST) + { + /* The + 1 converts latch counts to iteration counts. */ + profile_count new_header_count + = (in_count.apply_scale (new_est_niter + 1, 1)); + basic_block *body = get_loop_body (loop); + scale_bbs_frequencies_profile_count (body, loop->num_nodes, + new_header_count, + old_header_count); + free (body); + } + + /* gimple_duplicate_loop_to_header_edge discarded FACTOR - 1 + exit edges and adjusted the loop body's profile counts for the + new probabilities of the remaining non-exit edges. However, + the remaining exit edge still has the same probability as it + did before, even though it is now more likely. + + Therefore, all blocks executed after a failed exit test now have + a profile count that is too high, and the sum of the profile counts + for the header's incoming edges is greater than the profile count + of the header itself. + + Adjust the profile counts of all code in the loop body after + the exit test so that the sum of the counts on entry to the + header agree. */ + profile_count old_latch_count = loop_latch_edge (loop)->count (); + profile_count new_latch_count = loop->header->count - in_count; + if (old_latch_count.nonzero_p () && new_latch_count.nonzero_p ()) + scale_dominated_blocks_in_loop (loop, new_exit->src, new_latch_count, + old_latch_count); + + /* Set the probability of the exit edge based on NEW_EST_NITER + (which estimates latch counts rather than iteration counts). + Update the probabilities of other edges to match. + + If the profile counts are large enough to give the required + precision, the updates above will have made + + e->dest->count / e->src->count ~= new e->probability + + for every outgoing edge e of NEW_EXIT->src. */ + profile_probability new_exit_prob = profile_probability::always () + .apply_scale (1, new_est_niter + 1); + change_edge_frequency (new_exit, new_exit_prob); + } checking_verify_flow_info (); checking_verify_loop_structure (); @@ -1461,10 +1517,9 @@ tree_transform_and_unroll_loop (class loop *loop, unsigned factor, void tree_unroll_loop (class loop *loop, unsigned factor, - edge exit, class tree_niter_desc *desc) + class tree_niter_desc *desc) { - tree_transform_and_unroll_loop (loop, factor, exit, desc, - NULL, NULL); + tree_transform_and_unroll_loop (loop, factor, desc, NULL, NULL); } /* Rewrite the phi node at position PSI in function of the main |