Age | Commit message (Collapse) | Author | Files | Lines |
|
This adds a simple match pattern for this case.
I noticed it a couple of different places.
One while I was looking at code generation of a parser and
also while I was looking at locations where bitwise_inverted_equal_p
should be used more.
Committed as approved after bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/110937
PR tree-optimization/100798
gcc/ChangeLog:
* match.pd (`a ? ~b : b`): Handle this
case.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bool-14.c: New test.
* gcc.dg/tree-ssa/bool-15.c: New test.
* gcc.dg/tree-ssa/phi-opt-33.c: New test.
* gcc.dg/tree-ssa/20030709-2.c: Update testcase
so `a ? -1 : 0` is not used to hit the match
pattern.
|
|
I noticed while looking into some code generation of bitmap_single_bit_set_p,
that sometimes:
```
if (uns > 1)
return 0;
return uns == 1;
```
Would not optimize down to just:
```
return uns == 1;
```
In this case, VRP likes to change `a == 1` into `(bool)a` if
a has a range of [0,1] due to `a <= 1` side of the branch.
We might end up with this similar code even without VRP,
in the case of builtin-sprintf-warn-23.c (and Wrestrict.c), we had:
```
if (s < 0 || 1 < s)
s = 0;
```
Which is the same as `s = ((unsigned)s) <= 1 ? s : 0`;
So we should be able to catch that also.
This adds 2 patterns to catch `(uns <= 1) & uns` and
`(uns > 1) ? 0 : uns` and convert those into:
`(convert) uns == 1`.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/109959
gcc/ChangeLog:
* match.pd (`(a > 1) ? 0 : (cast)a`, `(a <= 1) & (cast)a`):
New patterns.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: Remove xfail.
* c-c++-common/Wrestrict.c: Update test and remove some xfail.
* gcc.dg/tree-ssa/cmpeq-1.c: New test.
* gcc.dg/tree-ssa/cmpeq-2.c: New test.
* gcc.dg/tree-ssa/cmpeq-3.c: New test.
|
|
Since we already had the infrastructure to optimize
`(x == 0) && (x > y)` to false for integer types,
this extends the same to pointer types as indirectly
requested by PR 96695.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/96695
* match.pd (min_value, max_value): Extend to
pointer types too.
gcc/testsuite/ChangeLog:
PR tree-optimization/96695
* gcc.dg/pr96695-1.c: New test.
* gcc.dg/pr96695-10.c: New test.
* gcc.dg/pr96695-11.c: New test.
* gcc.dg/pr96695-12.c: New test.
* gcc.dg/pr96695-2.c: New test.
* gcc.dg/pr96695-3.c: New test.
* gcc.dg/pr96695-4.c: New test.
* gcc.dg/pr96695-5.c: New test.
* gcc.dg/pr96695-6.c: New test.
* gcc.dg/pr96695-7.c: New test.
* gcc.dg/pr96695-8.c: New test.
* gcc.dg/pr96695-9.c: New test.
|
|
The following adjusts the shift simplification patterns to avoid
touching out-of-bound shift value arithmetic right shifts of
possibly negative values. While simplifying those to zero isn't
wrong it's violating the principle of least surprise.
PR tree-optimization/110838
* match.pd (([rl]shift @0 out-of-bounds) -> zero): Restrict
the arithmetic right-shift case to non-negative operands.
|
|
This changes gimple_bitwise_inverted_equal_p to use a 2 different match patterns
to try to match bit_not wrapped with a possible nop_convert and a comparison
also wrapped with a possible nop_convert. This is to avoid being recursive.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/110874
* gimple-match-head.cc (gimple_bit_not_with_nop): New declaration.
(gimple_maybe_cmp): Likewise.
(gimple_bitwise_inverted_equal_p): Rewrite to use gimple_bit_not_with_nop
and gimple_maybe_cmp instead of being recursive.
* match.pd (bit_not_with_nop): New match pattern.
(maybe_cmp): Likewise.
gcc/testsuite/ChangeLog:
PR tree-optimization/110874
* gcc.c-torture/compile/pr110874-a.c: New test.
|
|
Canonicalizes (signed x << c) >> c into the lowest
precision(type) - c bits of x IF those bits have a mode precision or a
precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
((unsigned)-1 >> c) to prevent duplicate pattern.
PR middle-end/101955
* match.pd ((signed x << c) >> c): New canonicalization.
* gcc.dg/pr101955.c: New test.
|
|
As Jakub noticed in https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626039.html
what I did was not totally correct because sometimes chosing the wrong type.
So to get back to what the original code but keeping around the use of bitwise_inverted_equal_p,
we just need to check if the types of the two catupures are the same type.
Also adds a testcase for the problem Jakub found.
Committed as obvious after a bootstrap and test.
gcc/ChangeLog:
* match.pd (`~X & X`): Check that the types match.
(`~x | x`, `~x ^ x`): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/20230802-1.c: New test.
|
|
This is a simple patch to move these 2 patterns over to use
bitwise_inverted_equal_p. It also allows us to remove 2 other patterns
which were used on comparisons as they are now handled by
the original pattern.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (`~X & X`, `~X | X`): Move over to
use bitwise_inverted_equal_p, removing :c as bitwise_inverted_equal_p
handles that already.
Remove range test simplifications to true/false as they
are now handled by these patterns.
|
|
In this case we are not removing convert to a bigger size
back to the same size (or smaller) if signedness does not
match.
For an example:
```
signed char _1;
...
_1 = *a_4(D);
b_5 = (short unsigned int) _1;
_2 = (unsigned char) b_5;
```
The inner cast is not needed and can be removed but was not.
The match pattern for removing the extra cast is overly
complex so decided to add a new case for rather than trying
to modify the current if statement here.
Committed as approved. Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/93044
* match.pd (nested int casts): A truncation (to the same size or smaller)
can always remove the inner cast.
gcc/testsuite/ChangeLog:
PR tree-optimization/93044
* gcc.dg/tree-ssa/cast-1.c: New test.
* gcc.dg/tree-ssa/cast-2.c: New test.
|
|
Even though these are done by combine_comparisons, we can add them to match
to allow simplifcations during match rather than just during reassoc/ifcombine.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/106164
* match.pd (`a != b & a <= b`, `a != b & a >= b`,
`a == b | a < b`, `a == b | a > b`): Handle these cases
too.
gcc/testsuite/ChangeLog:
PR tree-optimization/106164
* gcc.dg/tree-ssa/cmpbit-2.c: New test.
|
|
I noticed that there are patterns that optimize
`(X CMP1 CST1) AND/IOR (X CMP2 CST2)` and we can easily extend
them to support the `(X CMP1 Y) AND/IOR (X CMP2 Y)` by saying they
compare equal. This allows for this kind of optimization for integral
and pointer types (which have the same semantics).
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/106164
* match.pd: Extend the `(X CMP1 CST1) AND/IOR (X CMP2 CST2)`
patterns to support `(X CMP1 Y) AND/IOR (X CMP2 Y)`.
gcc/testsuite/ChangeLog:
PR tree-optimization/106164
* gcc.dg/tree-ssa/cmpbit-1.c: New test.
|
|
comparisons
This is a new version of the patch.
Instead of doing the matching of inversion comparison directly inside
match, creating a new function (bitwise_inverted_equal_p) to do it.
It is very similar to bitwise_equal_p that was added in r14-2751-g2a3556376c69a1fb
but instead it says `expr1 == ~expr2`. A follow on patch, will
use this function in other patterns where we try to match `@0` and `(bit_not @0)`.
Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.
Committed as approved after a Bootstrapped and test on x86_64-linux-gnu with no regressions.
PR tree-optimization/100864
gcc/ChangeLog:
* generic-match-head.cc (bitwise_inverted_equal_p): New function.
* gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
(gimple_bitwise_inverted_equal_p): New function.
* match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
instead of direct matching bit_not.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bitops-3.c: New test.
|
|
The following fixes the lack of simplification of a vector shift
by an out-of-bounds shift value. For scalars this is done both
by CCP and VRP but vectors are not handled there. This results
in PR91838 differences in outcome dependent on whether a vector
shift ISA is available and thus vector lowering does or does not
expose scalar shifts here.
The following adds a match.pd pattern to catch uniform out-of-bound
shifts, simplifying them to zero when not sanitizing shift amounts.
PR tree-optimization/91838
* gimple-match-head.cc: Include attribs.h and asan.h.
* generic-match-head.cc: Likewise.
* match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
|
|
Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
Also adds the macro bitwise_equal_p for generic and gimple which
returns true iff EXPR1 and EXPR2 have the same value. This helps
to reduce the number of nop_converts necessary to match the pattern.
PR middle-end/109986
gcc/ChangeLog:
* generic-match-head.cc (bitwise_equal_p): New macro.
* gimple-match-head.cc (bitwise_equal_p): New macro.
(gimple_nop_convert): Declare.
(gimple_bitwise_equal_p): Helper for bitwise_equal_p.
* match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr109986.c: New test.
* gcc.dg/tree-ssa/pr109986.c: New test.
Co-authored-by: Jakub Jelinek <jakub@redhat.com>
|
|
This adds a simple match pattern to simplify
`max<max<a,b>,a>` to `max<a,b>`. Reassociation handles
this already (r0-77700-ge969dbde29bfd396259357) but
seems like we should be able to handle this even before
reassociation.
This fixes part of PR tree-optimization/80574 but more
work is needed fix it the rest of the way. The original
testcase there is fixed but the RTL level is what fixes
it the rest of the way.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (minmax<minmax<a,b>,a>->minmax<a,b>): New
transformation.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/reassoc-12.c: Disable all of
the passes that enables match-and-simplify.
* gcc.dg/tree-ssa/minmax-23.c: New test.
|
|
So I had missed/forgot that EQ_EXPR could have an non boolean
type for generic when I implemented r14-2556-g0407ae8a7732d9.
This patch adds check for one bit precision intergal type
which fixes the problem.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/110726
gcc/ChangeLog:
* match.pd ((a|b)&(a==b),a|(a==b),(a&b)|(a==b)):
Add checks to make sure the type was one bit precision
intergal type.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/bitops-1.c: New test.
|
|
This adds the boolean version of some of the simplifications
that were added with r8-4395-ge268a77b59cb78.
That are the following:
(a | b) & (a == b) --> a & b
a | (a == b) --> a | (b ^ 1)
(a & b) | (a == b) --> a == b
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/95923
* match.pd ((a|b)&(a==b),a|(a==b),(a&b)|(a==b)): New transformation.
gcc/testsuite/ChangeLog:
PR tree-optimization/95923
* gcc.dg/tree-ssa/bitops-2.c: New test.
* gcc.dg/tree-ssa/bool-checks-1.c: New test.
|
|
I had messed up the case where the outer operator is `==`.
The check for the resulting should have been `==` and not `!=`.
This patch fixes that and adds a full runtime testcase now for
all cases to make sure it works.
OK? Bootstrapped and tested on x86-64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/110666
* match.pd (A NEEQ (A NEEQ CST)): Fix Outer EQ case.
gcc/testsuite/ChangeLog:
PR tree-optimization/110666
* gcc.c-torture/execute/pr110666-1.c: New test.
|
|
This fixes part of PR 110293, for the outer comparison case
being `!=` or `==`. In turn PR 110539 is able to be optimized
again as the if statement for `(a&1) == ((a & 1) != 0)` gets optimized
to `false` early enough to allow FRE/DOM to do a CSE for memory store/load.
OK? Bootstrapped and tested on x86_64-linux with no regressions.
gcc/ChangeLog:
PR tree-optimization/110293
PR tree-optimization/110539
* match.pd: Expand the `x != (typeof x)(x == 0)`
pattern to handle where the inner and outer comparsions
are either `!=` or `==` and handle other constants
than 0.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr110293-1.c: New test.
* gcc.dg/tree-ssa/pr110539-1.c: New test.
* gcc.dg/tree-ssa/pr110539-2.c: New test.
* gcc.dg/tree-ssa/pr110539-3.c: New test.
* gcc.dg/tree-ssa/pr110539-4.c: New test.
|
|
The problem here is we might produce some values out of the type's
min/max (and/or valid values, e.g. signed booleans). The fix is to
use an integer type which has the same precision and signedness
as the original type.
Note two_value_replacement in phiopt had the same issue in previous
versions; though I don't know if a problem will show up there.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/110487
* match.pd (a !=/== CST1 ? CST2 : CST3): Always
build a nonstandard integer and use that.
|
|
This fixes the first part of this bug where `a ? -1 : 0`
would cause a value of 1 into the signed boolean value.
It fixes the problem by casting to an integer type of
the same size/signedness before doing the negative and
then casting to the type of expression.
OK? Bootstrapped and tested on x86_64.
gcc/ChangeLog:
* match.pd (a?-1:0): Cast type an integer type
rather the type before the negative.
(a?0:-1): Likewise.
|
|
When trying to associate (v + INT_MAX) + INT_MAX we are using
the TREE_OVERFLOW bit to check for correctness. That isn't
working for VECTOR_CSTs and it can't in general when one considers
VL vectors. It looks like it should work for COMPLEX_CSTs but
I didn't try to single out _Complex int in this change.
The following makes sure that for vectors we use the fallback of
using unsigned arithmetic when associating the above to
v + (INT_MAX + INT_MAX).
PR middle-end/110495
* tree.h (TREE_OVERFLOW): Do not mention VECTOR_CSTs
since we do not set TREE_OVERFLOW on those since the
introduction of VL vectors.
* match.pd (x +- CST +- CST): For VECTOR_CST do not look
at TREE_OVERFLOW to determine validity of association.
* gcc.dg/tree-ssa/addadd-2.c: Amend.
* gcc.dg/tree-ssa/forwprop-27.c: Adjust.
|
|
The following guards a match.pd pattern that wasn't supposed to
apply to vectors and thus runs into TYPE_PRECISION checking. For
vector support the constant case is lacking and the pattern would
have missing optab support checking for the result operation.
PR middle-end/110461
* match.pd (bitop (convert@2 @0) (convert?@3 @1)): Disable
for VECTOR_TYPE_P.
* gcc.dg/pr110461.c: New testcase.
|
|
This patch changes TYPE_MODE into element_mode in a match.pd
simplification. As the simplification can be also called with vector
types real_can_shorten_arithmetic would ICE in REAL_MODE_FORMAT which
expects a scalar mode. Therefore, use element_mode instead of
TYPE_MODE.
Additionally, check if the target supports the resulting operation. One
target that supports e.g. a float addition but not a _Float16 addition
is the RISC-V vector extension Zvfhmin.
gcc/ChangeLog:
* match.pd: Use element_mode and check if target supports
operation with new type.
|
|
gcc/ChangeLog:
PR tree-optimization/110280
* match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector
using build_vector_from_val with the element of input operand, and
mask's type if operand and mask's types don't match.
gcc/testsuite/ChangeLog:
PR tree-optimization/110280
* gcc.target/aarch64/sve/pr110280.c: New test.
|
|
The simplification (outertype)((innertype0)a+(innertype1)b) to
((newtype)a+(newtype)b) ends up using TYPE_PRECISION to check
whether it can elide a conversion but in some paths there can
be VECTOR_TYPEs where this instead compares the number of lanes.
The following fixes the missed optimizations and uses
element_precision in those places.
* match.pd ((outertype)((innertype0)a+(innertype1)b)
-> ((newtype)a+(newtype)b)): Use element_precision
where appropriate.
|
|
fold_binary tries to transform (double)float1 CMP (double)float2
into float1 CMP float2 but ends up using TYPE_PRECISION on the
argument types. For vector types that compares the number of
lanes which should be always equal (so it's harmless as to
not generating wrong code). The following instead properly
uses element_precision.
The same happens in the corresponding match.pd pattern.
* fold-const.cc (fold_binary_loc): Use element_precision
when trying (double)float1 CMP (double)float2 to
float1 CMP float2 simplification.
* match.pd: Likewise.
|
|
The following adds two patterns simplifying comparisons,
uns < (typeof uns)(uns != 0) is always false and x != (typeof x)(x == 0)
is always true.
PR tree-optimization/110278
* match.pd (uns < (typeof uns)(uns != 0) -> false): New.
(x != (typeof x)(x == 0) -> true): Likewise.
|
|
The following makes sure we optimize x != 0 using range info
via tree_expr_nonzero_p via match.pd.
PR tree-optimization/110269
* fold-const.cc (fold_binary_loc): Merge x != 0 folding
with tree_expr_nonzero_p ...
* match.pd (cmp (convert? addr@0) integer_zerop): With this
pattern.
* gcc.dg/tree-ssa/pr110269.c: New testcase.
|
|
This adds plus to the op list of `(zero_one == 0) ? y : z <op> y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/97711
PR tree-optimization/110155
gcc/ChangeLog:
* match.pd ((zero_one == 0) ? y : z <op> y): Add plus to the op.
((zero_one != 0) ? z <op> y : y): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/branchless-cond-add-2.c: New test.
* gcc.dg/tree-ssa/branchless-cond-add.c: New test.
|
|
rather than `(-zero_one) & z`
Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.
gcc/ChangeLog:
* match.pd ((zero_one ==/!= 0) ? y : z <op> y): Use
multiply rather than negation/bit_and.
|
|
This allows unsigned types if the inner type where the negation is
located has greater than or equal to precision than the outer type.
branchless-cond.c needs to be updated since now we change it to
use a multiply rather than still having (-a)&c in there.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (`X & -Y -> X * Y`): Allow for truncation
and the same type for unsigned types.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
|
|
So for the attached testcase, we assumed that zero_one_valued_p would
be the value [0,1] but currently zero_one_valued_p matches also
signed 1 bit integers.
This changes that not to match that and fixes the 2 new testcases at
all optimization levels.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note the GCC 13 patch will be slightly different due to the changes
made to zero_one_valued_p.
PR tree-optimization/110165
PR tree-optimization/110166
gcc/ChangeLog:
* match.pd (zero_one_valued_p): Don't accept
signed 1-bit integers.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr110165-1.c: New test.
* gcc.c-torture/execute/pr110166-1.c: New test.
|
|
When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.
* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.
|
|
The patterns match more than just `a & 1` so change the comment
for these two patterns to say that.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd: Fix comment for the
`(zero_one ==/!= 0) ? y : z <op> y` patterns.
|
|
Recently zero_one_valued_p was changed to handle integer_zerop
case specially, because tree_nonzero_bits (@0) == 1 only returns
true for non-constant values with range [0, 1] or constant 1,
constant 0 has tree_nonzero_bits (integer_zero_node) == 0.
The following patch reverts that change and instead checks
that tree_nonzero_bits is <= 1U.
2023-06-07 Jakub Jelinek <jakub@redhat.com>
* match.pd (zero_one_valued_p): Don't handle integer_zerop specially,
instead compare tree_nonzero_bits <= 1U rather than just == 1.
|
|
I noticed while looking at some code generation issue, that forwprop
was not handling `-a == 0` for unsigned types and I was confused why
it was not.
r6-1814-g66e1cacf608045 removed these from fold because they
were supposed to be already handled by the match.pd patterns
but it was missed that the match.pd patterns checked
TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ.
This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/110134
* match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer
types.
(-A CMP CST -> B CMP (-CST)): Likewise.
gcc/testsuite/ChangeLog:
PR tree-optimization/110134
* gcc.dg/tree-ssa/negneq-1.c: New test.
* gcc.dg/tree-ssa/negneq-2.c: New test.
* gcc.dg/tree-ssa/negneq-3.c: New test.
* gcc.dg/tree-ssa/negneq-4.c: New test.
|
|
are constant
This adds a match pattern that are for boolean values
that optimizes `a ? onezero : 0` to `a & onezero` and
`a ? 1 : onezero` to `a | onezero`.
This was reported a few times and I thought I would finally
add the match pattern for this.
This hits a few times in GCC itself too.
Notes on the testcases:
* phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine
* phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment.
* ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt.
Note PR 109957 is filed for the now missing optimization in that testcase too.
gcc/ChangeLog:
PR tree-optimization/89263
PR tree-optimization/99069
PR tree-optimization/20083
PR tree-optimization/94898
* match.pd: Add patterns to optimize `a ? onezero : onezero` with
one of the operands are constant.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase.
* gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase.
* gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt.
* gcc.dg/tree-ssa/phi-opt-27.c: New test.
* gcc.dg/tree-ssa/phi-opt-28.c: New test.
* gcc.dg/tree-ssa/phi-opt-29.c: New test.
* gcc.dg/tree-ssa/phi-opt-30.c: New test.
* gcc.dg/tree-ssa/phi-opt-31.c: New test.
* gcc.dg/tree-ssa/phi-opt-32.c: New test.
|
|
While working on `bool0 ? bool1 : bool2` I noticed that
zero_one_valued_p does not match on the constant zero
as in that case tree_nonzero_bits will return 0 and
that is different from 1.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (zero_one_valued_p): Match 0 integer constant
too.
|
|
This patch adds the support for match that was implemented for PR 87913 in phiopt.
It implements it by adding support to minmax_from_comparison for the check.
It uses the range information if available which allows to produce MIN/MAX expression
when comparing against the lower/upper bound of the range instead of lower/upper
of the type.
minmax-20.c is the new testcase which tests the ranges part.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* fold-const.cc (minmax_from_comparison): Add support for NE_EXPR.
* match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern):
Add ne as a possible cmp.
((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/minmax-22.c: New test.
|
|
This moves the `a <= CST1 ? MAX<a, CST2> : a` optimization
from phiopt to match. It just adds a new pattern to match.pd.
There is one more change needed before being able to remove
minmax_replacement from phiopt.
A few notes on the testsuite changes:
* phi-opt-5.c is now able to optimize at phiopt1 so remove
the xfail.
* pr66726-4.c can be optimized during fold before phiopt1
so need to change the scanning.
* pr66726-5.c needs two phiopt passes currently to optimize
to the right thing, it needed 2 phiopt passes before, the cast
from int to unsigned char is the reason.
* pr66726-6.c is what the original pr66726-4.c was testing
before the fold was able to optimize it.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (`(a CMP CST1) ? max<a,CST2> : a`): New
pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail.
* gcc.dg/tree-ssa/pr66726-4.c: Change how scanning
works.
* gcc.dg/tree-ssa/pr66726-5.c: New test.
* gcc.dg/tree-ssa/pr66726-6.c: New test.
|
|
The Ada compiler gives a bogus warning:
storage_offset1.ads:16:52: warning: Constraint_Error will be raised at run
time [enabled by default]
Ironically enough, this occurs because of an intermediate conversion to an
unsigned type which is supposed to hide overflows but is counter-productive
for constants because TREE_OVERFLOW is always set for them, so it ends up
setting a bogus TREE_OVERFLOW when converting back to the original type.
The fix simply redirects INTEGER_CSTs to the other, direct path without the
intermediate conversion to the unsigned type.
gcc/
* match.pd ((T)P - (T)(P + A) -> -(T) A): Avoid artificial overflow
on constants.
gcc/testsuite/
* gnat.dg/specs/storage_offset1.ads: New test.
|
|
PR middle-end/109840 is a regression introduced by my recent patch to
fold popcount(bswap(x)) as popcount(x). When the bswap and the popcount
have the same precision, everything works fine, but this optimization also
allowed a zero-extension between the two. The oversight is that we need
to be strict with type conversions, both to avoid accidentally changing
the argument type to popcount, and also to reflect the effects of
argument/return-value promotion in the call to bswap, so this zero extension
needs to be preserved/explicit in the optimized form.
Interestingly, match.pd should (in theory) be able to narrow calls to
popcount and parity, removing a zero-extension from its argument, but
that is an independent optimization, that needs to check IFN_ support.
Many thanks to Andrew Pinski for his help/fixes with these transformations.
2023-05-24 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/109840
* match.pd <popcount optimizations>: Preserve zero-extension when
optimizing popcount((T)bswap(x)) and popcount((T)rotate(x,y)) as
popcount((T)x), so the popcount's argument keeps the same type.
<parity optimizations>: Likewise preserve extensions when
simplifying parity((T)bswap(x)) and parity((T)rotate(x,y)) as
parity((T)x), so that the parity's argument type is the same.
gcc/testsuite/ChangeLog
PR middle-end/109840
* gcc.dg/fold-parity-8.c: New test.
* gcc.dg/fold-popcount-11.c: Likewise.
|
|
On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
(x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
simplification actually relies on the (CST1 & CST2) simplification,
otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
running into
/* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
operands are another bit-wise operation with a common input. If so,
distribute the bit operations to save an operation and possibly two if
constants are involved. For example, convert
(A | B) & (A | C) into A | (B & C)
Further simplification will occur if B and C are constants. */
simplification which simplifies that
(x & CST2) | (CST1 & CST2) back to
CST2 & (x | CST1).
I went through all other places I could find where we have a simplification
with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
while the other spots aren't that severe (just trade 2 operations for
another 2 if the two constants don't simplify, rather than as in the above
case trading 2 ops for 3), I still think all those spots really intend
to optimize only if the 2 constants simplify.
So, the following patch adds to those a ! modifier to ensure that,
even at GENERIC that modifier means !EXPR_P which is exactly what we want
IMHO.
2023-05-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109505
* match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
Combine successive equal operations with constants,
(A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
operands.
* gcc.target/aarch64/sve/pr109505.c: New test.
|
|
This is version 2 of https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html
which does not depend on adding gimple_truth_valued_p at this point.
Instead will use zero_one_valued_p which is already used for mult simplifications
to make sure that we only have [0,1] rather having the mistake of maybe having [-1,0]
as the range for signed bools.
This shows up in a few places in GCC itself but only at -O1, we miss the min/max conversion
because of PR 107888 (which I will be testing seperately).
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Thanks,
Andrew Pinski
PR tree-optimization/109424
gcc/ChangeLog:
* match.pd: Add patterns for min/max of zero_one_valued
values to `&`/`|`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bool-12.c: New test.
* gcc.dg/tree-ssa/bool-13.c: New test.
* gcc.dg/tree-ssa/minmax-20.c: New test.
* gcc.dg/tree-ssa/minmax-21.c: New test.
|
|
This adds a simple pattern to match.pd for `signbit(x) ? x : -x`
into abs<x>. This can be done for all types even ones that honor
signed zeros and NaNs because both signbit and - are considered
only looking at/touching the sign bit of those types and does
not trap either.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/109829
gcc/ChangeLog:
* match.pd: Add pattern for `signbit(x) !=/== 0 ? x : -x`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/abs-3.c: New test.
* gcc.dg/tree-ssa/abs-4.c: New test.
|
|
After r14-673-gc0dd80e4c4c3, there was a check in the match
patterns which was checking the type is unsigned but
instead of using the type, the patch used the expression.
This adds the needed TREE_TYPE so get the correct answer and don't ICE.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR tree-optimization/109834
gcc/ChangeLog:
* match.pd (popcount(bswap(x))->popcount(x)): Fix up unsigned type checking.
(popcount(rotate(x,y))->popcount(x)): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr109834-1.c: New test.
* gcc.dg/tree-ssa/pr109834-1.c: New test.
|
|
The following adds another variant of address difference simplification.
The utility ptr_difference_const only handles constant differences
(we also cannot code generate anything else), so exposing a possible
POINTER_PLUS_EXPR in the match and computing the difference on the
base only makes it possible to handle one case of a variable offset.
This simplifies
(unsigned long) &MEM <char[3]> [(void *)&str + 2B] - (unsigned long) (&str + (_69 + 1))
down to (1 - (unsigned long) _69) during niter analysis, allowing
ranger to eliminate a condition later and avoiding a bogus
-Wstringop-overflow diagnostic for the testcase in the PR.
PR tree-optimization/109791
* match.pd (minus (convert ADDR_EXPR@0) (convert (pointer_plus @1 @2))):
New pattern.
(minus (convert (pointer_plus @1 @2)) (convert ADDR_EXPR@0)):
Likewise.
|
|
When using SWAR (SIMD in a register) techniques a comparison operation
within such a register can be made by using a combination of shifts,
bitwise and and multiplication. If code using this scheme is
vectorized then there is potential to replace all these operations
with a single vector comparison, by reinterpreting the vector types to
match the width of the SWAR register.
For example, for the test function packed_cmp_16_32, the original
generated code is:
ldr q0, [x0]
add w1, w1, 1
ushr v0.4s, v0.4s, 15
and v0.16b, v0.16b, v2.16b
shl v1.4s, v0.4s, 16
sub v0.4s, v1.4s, v0.4s
str q0, [x0], 16
cmp w2, w1
bhi .L20
with this pattern the above can be optimized to:
ldr q0, [x0]
add w1, w1, 1
cmlt v0.8h, v0.8h, #0
str q0, [x0], 16
cmp w2, w1
bhi .L20
The effect is similar for x86-64.
Bootstrapped and reg-tested for x86 and aarch64.
gcc/ChangeLog:
* match.pd: simplify vector shift + bit_and + multiply.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/swar_to_vec_cmp.c: New test.
Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
|
|
This patch teaches match.pd to simplify popcount(X&Y)+popcount(X|Y) as
popcount(X)+popcount(Y), and the related simplifications that
popcount(X)+popcount(Y)-popcount(X&Y) is popcount(X|Y). As surprising
as it might seem, this idiom is common in cheminformatics codes
(for Tanimoto coefficient calculations).
2023-05-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd <popcount optimizations>: Simplify popcount(X|Y) +
popcount(X&Y) as popcount(X)+popcount(Y). Likewise, simplify
popcount(X)+popcount(Y)-popcount(X&Y) as popcount(X|Y), and
vice versa.
gcc/testsuite/ChangeLog
* gcc.dg/fold-popcount-8.c: New test case.
* gcc.dg/fold-popcount-9.c: Likewise.
* gcc.dg/fold-popcount-10.c: Likewise.
|