diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2021-12-01 19:58:40 +0000 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2021-12-01 19:58:40 +0000 |
commit | de3e5aae6c4b540e808c822c1e878b0a3304d09c (patch) | |
tree | cb1cd979d6b4ccc24d28a9fca6bbda3041b5345d /gcc/tree-ssa-loop-niter.c | |
parent | 5b1ef8b9db964ec2375df29a73d2b1651afe7ea9 (diff) | |
download | gcc-de3e5aae6c4b540e808c822c1e878b0a3304d09c.zip gcc-de3e5aae6c4b540e808c822c1e878b0a3304d09c.tar.gz gcc-de3e5aae6c4b540e808c822c1e878b0a3304d09c.tar.bz2 |
Final value replacement improvements for until-wrap loops.
This middle-end patch is inspired by the Richard Beiner's until-wrap
loop example in PR tree-optimization/101145.
unsigned foo(unsigned val, unsigned start)
{
unsigned cnt = 0;
for (unsigned i = start; i > val; ++i)
cnt++;
return cnt;
}
For this loop, the tree optimizers currently generate:
unsigned int foo (unsigned int val, unsigned int start)
{
unsigned int cnt;
unsigned int _1;
unsigned int _5;
<bb 2> [local count: 118111600]:
if (start_3(D) > val_4(D))
goto <bb 3>; [89.00%]
else
goto <bb 4>; [11.00%]
<bb 3> [local count: 105119324]:
_1 = start_3(D) + 1;
_5 = -start_3(D);
cnt_2 = _1 > val_4(D) ? _5 : 1;
<bb 4> [local count: 118111600]:
# cnt_11 = PHI <cnt_2(3), 0(2)>
return cnt_11;
}
or perhaps slightly easier to read:
if (start > val) {
cnt = (start+1) > val ? -start : 1;
} else cnt = 0;
In this snippet, if we know start > val, then (start+1) > val
unless start+1 overflows, i.e. (start+1) == 0 and start == ~0.
We can use this (loop header) context to simplify the ternary
expression to "(start != -1) ? -start : 1", which with a little
help from match.pd can be folded to -start. Hence the optimal
final value replacement should be:
cnt = (start > val) ? -start : 0;
Or as now generated by this patch:
unsigned int foo (unsigned int val, unsigned int start)
{
unsigned int cnt;
<bb 2> [local count: 118111600]:
if (start_3(D) > val_4(D))
goto <bb 3>; [89.00%]
else
goto <bb 4>; [11.00%]
<bb 3> [local count: 105119324]:
cnt_2 = -start_3(D);
<bb 4> [local count: 118111600]:
# cnt_11 = PHI <cnt_2(3), 0(2)>
return cnt_11;
}
We can also improve until-wrap loops that don't have a (suitable) loop
header, as determined by simplify_using_initial_conditions.
unsigned bar(unsigned val, unsigned start)
{
unsigned cnt = 0;
unsigned i = start;
do {
cnt++;
i++;
} while (i > val);
return cnt;
}
which is currently optimized to:
unsigned int foo (unsigned int val, unsigned int start)
{
unsigned int cnt;
unsigned int _9;
unsigned int _10;
<bb 2> [local count: 118111600]:
_9 = start_4(D) + 1;
_10 = -start_4(D);
cnt_3 = val_7(D) < _9 ? _10 : 1;
return cnt_3;
}
Here we have "val < (start+1) ? -start : 1", which again with the
help of match.pd can be slightly simplified to "val <= start ? -start : 1"
when dealing with unsigned types, because at the complicating value where
start == ~0, we fortunately have -start == 1, hence it doesn't matter
whether the second or third operand of the ternary operator is returned.
To summarize, this patch (in addition to tweaking may_be_zero in
number_of_iterations_until_wrap) adds three new constant folding
transforms to match.pd.
X != C1 ? -X : C2 simplifies to -X when -C1 == C2.
which is the generalized form of the simplification above.
X != C1 ? ~X : C2 simplifies to ~X when ~C1 == C2.
which is the BIT_NOT_EXPR analog of the NEGATE_EXPR case.
and the "until-wrap final value replacement without context":
(X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.
2021-12-01 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
* tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
Check if simplify_using_initial_conditions allows us to
simplify the expression for may_be_zero.
* match.pd (X != C ? -X : -C -> -X): New transform.
(X != C ? ~X : ~C -> ~X): Likewise.
((X+1) > Y ? -X : 1 -> X >= Y ? -X : 1): Likewise.
gcc/testsuite/ChangeLog
* gcc.dg/fold-condneg-1.c: New test case.
* gcc.dg/fold-condneg-2.c: New test case.
* gcc.dg/fold-condnot-1.c: New test case.
* gcc.dg/pr101145-1.c: New test case.
* gcc.dg/pr101145-2.c: New test case.
Diffstat (limited to 'gcc/tree-ssa-loop-niter.c')
-rw-r--r-- | gcc/tree-ssa-loop-niter.c | 19 |
1 files changed, 18 insertions, 1 deletions
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 7510940..06954e4 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -1478,7 +1478,7 @@ assert_loop_rolls_lt (tree type, affine_iv *iv0, affine_iv *iv1, The number of iterations is stored to NITER. */ static bool -number_of_iterations_until_wrap (class loop *, tree type, affine_iv *iv0, +number_of_iterations_until_wrap (class loop *loop, tree type, affine_iv *iv0, affine_iv *iv1, class tree_niter_desc *niter) { tree niter_type = unsigned_type_for (type); @@ -1506,6 +1506,23 @@ number_of_iterations_until_wrap (class loop *, tree type, affine_iv *iv0, num = fold_build2 (MINUS_EXPR, niter_type, wide_int_to_tree (type, max), iv1->base); + + /* When base has the form iv + 1, if we know iv >= n, then iv + 1 < n + only when iv + 1 overflows, i.e. when iv == TYPE_VALUE_MAX. */ + if (sgn == UNSIGNED + && integer_onep (step) + && TREE_CODE (iv1->base) == PLUS_EXPR + && integer_onep (TREE_OPERAND (iv1->base, 1))) + { + tree cond = fold_build2 (GE_EXPR, boolean_type_node, + TREE_OPERAND (iv1->base, 0), iv0->base); + cond = simplify_using_initial_conditions (loop, cond); + if (integer_onep (cond)) + may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, + TREE_OPERAND (iv1->base, 0), + TYPE_MAX_VALUE (type)); + } + high = max; if (TREE_CODE (iv1->base) == INTEGER_CST) low = wi::to_wide (iv1->base) - 1; |