diff options
author | Jan Hubicka <jh@suse.cz> | 2023-08-03 22:47:55 +0200 |
---|---|---|
committer | Jan Hubicka <jh@suse.cz> | 2023-08-03 22:47:55 +0200 |
commit | d6ac3aae2a32869d9d37f3bc7783d9bbad27d72b (patch) | |
tree | 1dbc8e4a296ff280d8916550d6219fc6e1b3d077 /gcc | |
parent | 93236ad9e8fa9208e754e8806dc369e1a79dbdf7 (diff) | |
download | gcc-d6ac3aae2a32869d9d37f3bc7783d9bbad27d72b.zip gcc-d6ac3aae2a32869d9d37f3bc7783d9bbad27d72b.tar.gz gcc-d6ac3aae2a32869d9d37f3bc7783d9bbad27d72b.tar.bz2 |
Update loop iteration estimates after splitting
Hmmer's internal function has 4 loops. The following is the profile at start:
loop 1:
estimate 472
iterations by profile: 473.497707 (reliable) count in:84821 (precise, freq 0.9979)
loop 2:
estimate 99
iterations by profile: 100.000000 (reliable) count in:39848881 (precise, freq 468.8104)
loop 3:
estimate 99
iterations by profile: 100.000000 (reliable) count in:39848881 (precise, freq 468.8104)
loop 4:
estimate 100
iterations by profile: 100.999596 (reliable) execution count:84167 (precise, freq 0.9902)
So the first loops is outer loop and second/third loops are nesed. Fourth loop is not critical.
Precise iteraiton counts are unknown (473 and 100 comes from profile)
Nested loop has following form:
for (k = 1; k <= M; k++) {
mc[k] = mpp[k-1] + tpmm[k-1];
if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc;
if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc;
if ((sc = xmb + bp[k]) > mc[k]) mc[k] = sc;
mc[k] += ms[k];
if (mc[k] < -INFTY) mc[k] = -INFTY;
dc[k] = dc[k-1] + tpdd[k-1];
if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc;
if (dc[k] < -INFTY) dc[k] = -INFTY;
if (k < M) {
ic[k] = mpp[k] + tpmi[k];
if ((sc = ip[k] + tpii[k]) > ic[k]) ic[k] = sc;
ic[k] += is[k];
if (ic[k] < -INFTY) ic[k] = -INFTY;
}
We do quite some belly dancing here.
1) loop-ch slightly misupdates profile, so the estimates of 99
does not match profile setimate of 100.
2) loops-split splits on if (k < M) and produces two loops.
It fails to notice that the second loop never iterates.
It used to misupdate profile a lot which later caused internal
loop to become cold. This is fixed now.
3) loop-dist introduces runtime aliasing checks for both loops
4) tree vectorizer vectorizes some of the copies of the loop produces
and drops expected iteration counts
5) loop peeling peels the loops with expected low iteration counts
6) complete loop unrolling kills some loops in prologues/epilogues.
We end up with quite many loops and run out of registers:
iterations by profile: 5.312499 (unreliable, maybe flat)
this is vectorized internal loops after loop peeling
iterations by profile: 0.009495 (unreliable, maybe flat)
iterations by profile: 0.009495 (unreliable, maybe flat)
iterations by profile: 0.009495 (unreliable, maybe flat)
iterations by profile: 0.009495 (unreliable, maybe flat)
Those are all versioned/peeled and vectorized variants of the loop never looping
iterations by profile: 100.000008 (unreliable)
iterations by profile: 100.000000 (unreliable)
Those are variants with failed aliasing checks
iterations by profile: 9.662853 (unreliable, maybe flat)
iterations by profile: 4.646072 (unreliable)
iterations by profile: 100.000007 (unreliable)
iterations by profile: 5.312500 (unreliable)
iterations by profile: 473.497707 (reliable)
This is loop 1
iterations by profile: 100.999596 (reliable)
This is the loop 4.
This patch fixes loop iteration estimate update after loop split so we get:
iterations by profile: 5.312499 (unreliable, maybe flat) entry count:12742188 (guessed, freq 149.9081)
This is remainder of the peeled vectorized loop 2. It misses estimate that is correct since after peeling it 6 times it is essentially
impossible to tell what the remaining loop profile is (without histograms)
iterations by profile: 0.009496 (unreliable, maybe flat) entry count:374801 (guessed, freq 4.4094)
Peeled split part of loop 2 (one that never loops). We ought to work this out
but at least w
estimate 99
iterations by profile: 100.000008 (unreliable) entry count:3945039 (guessed, freq 46.4122)
estimate 99
iterations by profile: 100.000000 (unreliable) entry count:35505353 (guessed, freq 417.7100)
estimate 99
iterations by profile: 9.662853 (unreliable, maybe flat) entry count:35505353 (guessed, freq 417.7100)
Profile here mismatches estimate - I will need to work out why.
estimate 5
iterations by profile: 4.646072 (unreliable) entry count:31954818 (guessed, freq 375.9390)
This is vectorized but not peeled loop 3
estimate 99
iterations by profile: 100.000007 (unreliable) entry count:7101070 (guessed, freq 83.5420)
Unvectorized variant of loop 3
estimate 5
iterations by profile: 5.312500 (unreliable) entry count:25563855 (guessed, freq 300.7512)
Another vectorized variant of loop 3
estimate 472
iterations by profile: 473.497707 (reliable) entry count:84821 (precise, freq 0.9979)
Outer loop
estimate 100
iterations by profile: 100.999596 (reliable) entry count:84167 (precise, freq 0.9902)
loop 4, not vectorized/peeled
So there is still work to do on this testcase, but with the patch we prevent 3 useless loops.
Bootstrapped/regtested x86_64-linux, plan to commit it later today.
gcc/ChangeLog:
* tree-ssa-loop-split.cc (split_loop): Update estimated iteration counts.
Diffstat (limited to 'gcc')
-rw-r--r-- | gcc/tree-ssa-loop-split.cc | 34 |
1 files changed, 32 insertions, 2 deletions
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc index 923e49e..2f7918c 100644 --- a/gcc/tree-ssa-loop-split.cc +++ b/gcc/tree-ssa-loop-split.cc @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "gimplify-me.h" #include "print-tree.h" #include "value-query.h" +#include "sreal.h" /* This file implements two kinds of loop splitting. @@ -691,6 +692,37 @@ split_loop (class loop *loop1) = loop1_prob.invert (); fix_loop_bb_probability (loop1, loop2, true_edge, false_edge); + /* If conditional we split on has reliable profilea nd both + preconditionals of loop1 and loop2 are constant true, we can + only redistribute the iteration counts to the split loops. + + If the conditionals we insert before loop1 or loop2 are non-trivial + they increase expected loop count, so account this accordingly. + If we do not know the probability of split conditional, avoid + reudcing loop estimates, since we do not really know how they are + split between of the two new loops. Keep orignal estimate since + it is likely better then completely dropping it. + + TODO: If we know that onle of the new loops has constant + number of iterations, we can do better. We could also update + upper bounds. */ + if (loop1->any_estimate + && wi::fits_shwi_p (loop1->nb_iterations_estimate)) + { + sreal scale = true_edge->probability.reliable_p () + ? true_edge->probability.to_sreal () : (sreal)1; + sreal scale2 = false_edge->probability.reliable_p () + ? false_edge->probability.to_sreal () : (sreal)1; + /* +1 to get header interations rather than latch iterations and then + -1 to convert back. */ + loop1->nb_iterations_estimate + = MAX ((((sreal)loop1->nb_iterations_estimate.to_shwi () + 1) * scale + / loop1_prob.to_sreal ()).to_nearest_int () - 1, 0); + loop2->nb_iterations_estimate + = MAX ((((sreal)loop2->nb_iterations_estimate.to_shwi () + 1) * scale2 + / profile_probability::very_likely ().to_sreal ()) + .to_nearest_int () - 1, 0); + } update_loop_exit_probability_scale_dom_bbs (loop1); update_loop_exit_probability_scale_dom_bbs (loop2); @@ -711,8 +743,6 @@ split_loop (class loop *loop1) tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop1)); patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true); - /* TODO: Update any_esitmate and upper bounds. */ - /* Finally patch out the two copies of the condition to be always true/false (or opposite). */ gcond *force_true = as_a<gcond *> (*gsi_last_bb (bbs[i])); |