aboutsummaryrefslogtreecommitdiff
path: root/gcc/match.pd
AgeCommit message (Collapse)AuthorFilesLines
2023-08-09MATCH: [PR110937/PR100798] (a ? ~b : b) should be optimized to b ^ -(a)Andrew Pinski1-0/+14
This adds a simple match pattern for this case. I noticed it a couple of different places. One while I was looking at code generation of a parser and also while I was looking at locations where bitwise_inverted_equal_p should be used more. Committed as approved after bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/110937 PR tree-optimization/100798 gcc/ChangeLog: * match.pd (`a ? ~b : b`): Handle this case. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bool-14.c: New test. * gcc.dg/tree-ssa/bool-15.c: New test. * gcc.dg/tree-ssa/phi-opt-33.c: New test. * gcc.dg/tree-ssa/20030709-2.c: Update testcase so `a ? -1 : 0` is not used to hit the match pattern.
2023-08-07MATCH: [PR109959] `(uns <= 1) & uns` could be optimized to `uns == 1`Andrew Pinski1-0/+20
I noticed while looking into some code generation of bitmap_single_bit_set_p, that sometimes: ``` if (uns > 1) return 0; return uns == 1; ``` Would not optimize down to just: ``` return uns == 1; ``` In this case, VRP likes to change `a == 1` into `(bool)a` if a has a range of [0,1] due to `a <= 1` side of the branch. We might end up with this similar code even without VRP, in the case of builtin-sprintf-warn-23.c (and Wrestrict.c), we had: ``` if (s < 0 || 1 < s) s = 0; ``` Which is the same as `s = ((unsigned)s) <= 1 ? s : 0`; So we should be able to catch that also. This adds 2 patterns to catch `(uns <= 1) & uns` and `(uns > 1) ? 0 : uns` and convert those into: `(convert) uns == 1`. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/109959 gcc/ChangeLog: * match.pd (`(a > 1) ? 0 : (cast)a`, `(a <= 1) & (cast)a`): New patterns. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/builtin-sprintf-warn-23.c: Remove xfail. * c-c++-common/Wrestrict.c: Update test and remove some xfail. * gcc.dg/tree-ssa/cmpeq-1.c: New test. * gcc.dg/tree-ssa/cmpeq-2.c: New test. * gcc.dg/tree-ssa/cmpeq-3.c: New test.
2023-08-07MATCH: Extend min_value/max_value to pointer typesAndrew Pinski1-2/+4
Since we already had the infrastructure to optimize `(x == 0) && (x > y)` to false for integer types, this extends the same to pointer types as indirectly requested by PR 96695. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/96695 * match.pd (min_value, max_value): Extend to pointer types too. gcc/testsuite/ChangeLog: PR tree-optimization/96695 * gcc.dg/pr96695-1.c: New test. * gcc.dg/pr96695-10.c: New test. * gcc.dg/pr96695-11.c: New test. * gcc.dg/pr96695-12.c: New test. * gcc.dg/pr96695-2.c: New test. * gcc.dg/pr96695-3.c: New test. * gcc.dg/pr96695-4.c: New test. * gcc.dg/pr96695-5.c: New test. * gcc.dg/pr96695-6.c: New test. * gcc.dg/pr96695-7.c: New test. * gcc.dg/pr96695-8.c: New test. * gcc.dg/pr96695-9.c: New test.
2023-08-04tree-optimization/110838 - less aggressively fold out-of-bound shiftsRichard Biener1-0/+4
The following adjusts the shift simplification patterns to avoid touching out-of-bound shift value arithmetic right shifts of possibly negative values. While simplifying those to zero isn't wrong it's violating the principle of least surprise. PR tree-optimization/110838 * match.pd (([rl]shift @0 out-of-bounds) -> zero): Restrict the arithmetic right-shift case to non-negative operands.
2023-08-04Fix PR 110874: infinite loop in gimple_bitwise_inverted_equal_p with freAndrew Pinski1-0/+17
This changes gimple_bitwise_inverted_equal_p to use a 2 different match patterns to try to match bit_not wrapped with a possible nop_convert and a comparison also wrapped with a possible nop_convert. This is to avoid being recursive. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/110874 * gimple-match-head.cc (gimple_bit_not_with_nop): New declaration. (gimple_maybe_cmp): Likewise. (gimple_bitwise_inverted_equal_p): Rewrite to use gimple_bit_not_with_nop and gimple_maybe_cmp instead of being recursive. * match.pd (bit_not_with_nop): New match pattern. (maybe_cmp): Likewise. gcc/testsuite/ChangeLog: PR tree-optimization/110874 * gcc.c-torture/compile/pr110874-a.c: New test.
2023-08-04match.pd: Canonicalize (signed x << c) >> c [PR101955]Drew Ross1-6/+14
Canonicalizes (signed x << c) >> c into the lowest precision(type) - c bits of x IF those bits have a mode precision or a precision of 1. Also combines this rule with (unsigned x << c) >> c -> x & ((unsigned)-1 >> c) to prevent duplicate pattern. PR middle-end/101955 * match.pd ((signed x << c) >> c): New canonicalization. * gcc.dg/pr101955.c: New test.
2023-08-02Fix `~X & X` and `~X | X` patternsAndrew Pinski1-2/+4
As Jakub noticed in https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626039.html what I did was not totally correct because sometimes chosing the wrong type. So to get back to what the original code but keeping around the use of bitwise_inverted_equal_p, we just need to check if the types of the two catupures are the same type. Also adds a testcase for the problem Jakub found. Committed as obvious after a bootstrap and test. gcc/ChangeLog: * match.pd (`~X & X`): Check that the types match. (`~x | x`, `~x ^ x`): Likewise. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/20230802-1.c: New test.
2023-08-02Move `~X & X` and `~X | X` over to use bitwise_inverted_equal_pAndrew Pinski1-22/+6
This is a simple patch to move these 2 patterns over to use bitwise_inverted_equal_p. It also allows us to remove 2 other patterns which were used on comparisons as they are now handled by the original pattern. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (`~X & X`, `~X | X`): Move over to use bitwise_inverted_equal_p, removing :c as bitwise_inverted_equal_p handles that already. Remove range test simplifications to true/false as they are now handled by these patterns.
2023-08-01Fix PR 93044: extra cast is not removedAndrew Pinski1-0/+10
In this case we are not removing convert to a bigger size back to the same size (or smaller) if signedness does not match. For an example: ``` signed char _1; ... _1 = *a_4(D); b_5 = (short unsigned int) _1; _2 = (unsigned char) b_5; ``` The inner cast is not needed and can be removed but was not. The match pattern for removing the extra cast is overly complex so decided to add a new case for rather than trying to modify the current if statement here. Committed as approved. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/93044 * match.pd (nested int casts): A truncation (to the same size or smaller) can always remove the inner cast. gcc/testsuite/ChangeLog: PR tree-optimization/93044 * gcc.dg/tree-ssa/cast-1.c: New test. * gcc.dg/tree-ssa/cast-2.c: New test.
2023-07-31MATCH: Add `a == b | a cmp b` and `a != b & a cmp b` simplificationsAndrew Pinski1-2/+30
Even though these are done by combine_comparisons, we can add them to match to allow simplifcations during match rather than just during reassoc/ifcombine. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/106164 * match.pd (`a != b & a <= b`, `a != b & a >= b`, `a == b | a < b`, `a == b | a > b`): Handle these cases too. gcc/testsuite/ChangeLog: PR tree-optimization/106164 * gcc.dg/tree-ssa/cmpbit-2.c: New test.
2023-07-31MATCH: PR 106164 : Optimize `(X CMP1 Y) AND/IOR (X CMP2 Y)`Andrew Pinski1-14/+52
I noticed that there are patterns that optimize `(X CMP1 CST1) AND/IOR (X CMP2 CST2)` and we can easily extend them to support the `(X CMP1 Y) AND/IOR (X CMP2 Y)` by saying they compare equal. This allows for this kind of optimization for integral and pointer types (which have the same semantics). OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/106164 * match.pd: Extend the `(X CMP1 CST1) AND/IOR (X CMP2 CST2)` patterns to support `(X CMP1 Y) AND/IOR (X CMP2 Y)`. gcc/testsuite/ChangeLog: PR tree-optimization/106164 * gcc.dg/tree-ssa/cmpbit-1.c: New test.
2023-07-31tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for ↵Andrew Pinski1-2/+3
comparisons This is a new version of the patch. Instead of doing the matching of inversion comparison directly inside match, creating a new function (bitwise_inverted_equal_p) to do it. It is very similar to bitwise_equal_p that was added in r14-2751-g2a3556376c69a1fb but instead it says `expr1 == ~expr2`. A follow on patch, will use this function in other patterns where we try to match `@0` and `(bit_not @0)`. Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p. Committed as approved after a Bootstrapped and test on x86_64-linux-gnu with no regressions. PR tree-optimization/100864 gcc/ChangeLog: * generic-match-head.cc (bitwise_inverted_equal_p): New function. * gimple-match-head.cc (bitwise_inverted_equal_p): New macro. (gimple_bitwise_inverted_equal_p): New function. * match.pd ((~x | y) & x): Use bitwise_inverted_equal_p instead of direct matching bit_not. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-3.c: New test.
2023-07-27tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.CRichard Biener1-0/+10
The following fixes the lack of simplification of a vector shift by an out-of-bounds shift value. For scalars this is done both by CCP and VRP but vectors are not handled there. This results in PR91838 differences in outcome dependent on whether a vector shift ISA is available and thus vector lowering does or does not expose scalar shifts here. The following adds a match.pd pattern to catch uniform out-of-bound shifts, simplifying them to zero when not sanitizing shift amounts. PR tree-optimization/91838 * gimple-match-head.cc: Include attribs.h and asan.h. * generic-match-head.cc: Likewise. * match.pd (([rl]shift @0 out-of-bounds) -> zero): New pattern.
2023-07-24match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]Drew Ross1-0/+6
Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y). Also adds the macro bitwise_equal_p for generic and gimple which returns true iff EXPR1 and EXPR2 have the same value. This helps to reduce the number of nop_converts necessary to match the pattern. PR middle-end/109986 gcc/ChangeLog: * generic-match-head.cc (bitwise_equal_p): New macro. * gimple-match-head.cc (bitwise_equal_p): New macro. (gimple_nop_convert): Declare. (gimple_bitwise_equal_p): Helper for bitwise_equal_p. * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr109986.c: New test. * gcc.dg/tree-ssa/pr109986.c: New test. Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2023-07-21MATCH: Add Max<Max<a,b>,a> -> Max<a,b> simplifcationAndrew Pinski1-1/+5
This adds a simple match pattern to simplify `max<max<a,b>,a>` to `max<a,b>`. Reassociation handles this already (r0-77700-ge969dbde29bfd396259357) but seems like we should be able to handle this even before reassociation. This fixes part of PR tree-optimization/80574 but more work is needed fix it the rest of the way. The original testcase there is fixed but the RTL level is what fixes it the rest of the way. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd (minmax<minmax<a,b>,a>->minmax<a,b>): New transformation. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/reassoc-12.c: Disable all of the passes that enables match-and-simplify. * gcc.dg/tree-ssa/minmax-23.c: New test.
2023-07-19Fix PR110726: a | (a == b) can sometimes produce wrong codeAndrew Pinski1-3/+9
So I had missed/forgot that EQ_EXPR could have an non boolean type for generic when I implemented r14-2556-g0407ae8a7732d9. This patch adds check for one bit precision intergal type which fixes the problem. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/110726 gcc/ChangeLog: * match.pd ((a|b)&(a==b),a|(a==b),(a&b)|(a==b)): Add checks to make sure the type was one bit precision intergal type. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/bitops-1.c: New test.
2023-07-17PR 95923: More (boolean) bitop simplifications in match.pdAndrew Pinski1-0/+15
This adds the boolean version of some of the simplifications that were added with r8-4395-ge268a77b59cb78. That are the following: (a | b) & (a == b) --> a & b a | (a == b) --> a | (b ^ 1) (a & b) | (a == b) --> a == b OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/95923 * match.pd ((a|b)&(a==b),a|(a==b),(a&b)|(a==b)): New transformation. gcc/testsuite/ChangeLog: PR tree-optimization/95923 * gcc.dg/tree-ssa/bitops-2.c: New test. * gcc.dg/tree-ssa/bool-checks-1.c: New test.
2023-07-16Fix PR 110666: `(a != 2) == a` produces wrong codeAndrew Pinski1-14/+20
I had messed up the case where the outer operator is `==`. The check for the resulting should have been `==` and not `!=`. This patch fixes that and adds a full runtime testcase now for all cases to make sure it works. OK? Bootstrapped and tested on x86-64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/110666 * match.pd (A NEEQ (A NEEQ CST)): Fix Outer EQ case. gcc/testsuite/ChangeLog: PR tree-optimization/110666 * gcc.c-torture/execute/pr110666-1.c: New test.
2023-07-13Fix part of PR 110293: `A NEEQ (A NEEQ CST)` partAndrew Pinski1-4/+35
This fixes part of PR 110293, for the outer comparison case being `!=` or `==`. In turn PR 110539 is able to be optimized again as the if statement for `(a&1) == ((a & 1) != 0)` gets optimized to `false` early enough to allow FRE/DOM to do a CSE for memory store/load. OK? Bootstrapped and tested on x86_64-linux with no regressions. gcc/ChangeLog: PR tree-optimization/110293 PR tree-optimization/110539 * match.pd: Expand the `x != (typeof x)(x == 0)` pattern to handle where the inner and outer comparsions are either `!=` or `==` and handle other constants than 0. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr110293-1.c: New test. * gcc.dg/tree-ssa/pr110539-1.c: New test. * gcc.dg/tree-ssa/pr110539-2.c: New test. * gcc.dg/tree-ssa/pr110539-3.c: New test. * gcc.dg/tree-ssa/pr110539-4.c: New test.
2023-07-04PR 110487: `(a !=/== CST1 ? CST2 : CST3)` pattern for type safetyAndrew Pinski1-16/+8
The problem here is we might produce some values out of the type's min/max (and/or valid values, e.g. signed booleans). The fix is to use an integer type which has the same precision and signedness as the original type. Note two_value_replacement in phiopt had the same issue in previous versions; though I don't know if a problem will show up there. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/110487 * match.pd (a !=/== CST1 ? CST2 : CST3): Always build a nonstandard integer and use that.
2023-07-04Fix PR 110487: invalid signed boolean valueAndrew Pinski1-2/+20
This fixes the first part of this bug where `a ? -1 : 0` would cause a value of 1 into the signed boolean value. It fixes the problem by casting to an integer type of the same size/signedness before doing the negative and then casting to the type of expression. OK? Bootstrapped and tested on x86_64. gcc/ChangeLog: * match.pd (a?-1:0): Cast type an integer type rather the type before the negative. (a?0:-1): Likewise.
2023-07-04middle-end/110495 - avoid associating constants with (VL) vectorsRichard Biener1-13/+15
When trying to associate (v + INT_MAX) + INT_MAX we are using the TREE_OVERFLOW bit to check for correctness. That isn't working for VECTOR_CSTs and it can't in general when one considers VL vectors. It looks like it should work for COMPLEX_CSTs but I didn't try to single out _Complex int in this change. The following makes sure that for vectors we use the fallback of using unsigned arithmetic when associating the above to v + (INT_MAX + INT_MAX). PR middle-end/110495 * tree.h (TREE_OVERFLOW): Do not mention VECTOR_CSTs since we do not set TREE_OVERFLOW on those since the introduction of VL vectors. * match.pd (x +- CST +- CST): For VECTOR_CST do not look at TREE_OVERFLOW to determine validity of association. * gcc.dg/tree-ssa/addadd-2.c: Amend. * gcc.dg/tree-ssa/forwprop-27.c: Adjust.
2023-06-29middle-end/110461 - pattern applying wrongly to vectorsRichard Biener1-0/+1
The following guards a match.pd pattern that wasn't supposed to apply to vectors and thus runs into TYPE_PRECISION checking. For vector support the constant case is lacking and the pattern would have missing optab support checking for the result operation. PR middle-end/110461 * match.pd (bitop (convert@2 @0) (convert?@3 @1)): Disable for VECTOR_TYPE_P. * gcc.dg/pr110461.c: New testcase.
2023-06-27match.pd: Use element_mode instead of TYPE_MODE.Robin Dapp1-2/+4
This patch changes TYPE_MODE into element_mode in a match.pd simplification. As the simplification can be also called with vector types real_can_shorten_arithmetic would ICE in REAL_MODE_FORMAT which expects a scalar mode. Therefore, use element_mode instead of TYPE_MODE. Additionally, check if the target supports the resulting operation. One target that supports e.g. a float addition but not a _Float16 addition is the RISC-V vector extension Zvfhmin. gcc/ChangeLog: * match.pd: Use element_mode and check if target supports operation with new type.
2023-06-23[aarch64/match.pd] Fix ICE observed in PR110280.Prathamesh Kulkarni1-1/+8
gcc/ChangeLog: PR tree-optimization/110280 * match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector using build_vector_from_val with the element of input operand, and mask's type if operand and mask's types don't match. gcc/testsuite/ChangeLog: PR tree-optimization/110280 * gcc.target/aarch64/sve/pr110280.c: New test.
2023-06-23Use element_precision for match.pd arith conversion optimizationRichard Biener1-4/+4
The simplification (outertype)((innertype0)a+(innertype1)b) to ((newtype)a+(newtype)b) ends up using TYPE_PRECISION to check whether it can elide a conversion but in some paths there can be VECTOR_TYPEs where this instead compares the number of lanes. The following fixes the missed optimizations and uses element_precision in those places. * match.pd ((outertype)((innertype0)a+(innertype1)b) -> ((newtype)a+(newtype)b)): Use element_precision where appropriate.
2023-06-23Bogus and missed folding on vector comparesRichard Biener1-2/+2
fold_binary tries to transform (double)float1 CMP (double)float2 into float1 CMP float2 but ends up using TYPE_PRECISION on the argument types. For vector types that compares the number of lanes which should be always equal (so it's harmless as to not generating wrong code). The following instead properly uses element_precision. The same happens in the corresponding match.pd pattern. * fold-const.cc (fold_binary_loc): Use element_precision when trying (double)float1 CMP (double)float2 to float1 CMP float2 simplification. * match.pd: Likewise.
2023-06-16tree-optimization/110278 - uns < (typeof uns)(uns != 0) is always falseRichard Biener1-0/+11
The following adds two patterns simplifying comparisons, uns < (typeof uns)(uns != 0) is always false and x != (typeof x)(x == 0) is always true. PR tree-optimization/110278 * match.pd (uns < (typeof uns)(uns != 0) -> false): New. (x != (typeof x)(x == 0) -> true): Likewise.
2023-06-16tree-optimization/110269 - restore missed condition foldingRichard Biener1-2/+2
The following makes sure we optimize x != 0 using range info via tree_expr_nonzero_p via match.pd. PR tree-optimization/110269 * fold-const.cc (fold_binary_loc): Merge x != 0 folding with tree_expr_nonzero_p ... * match.pd (cmp (convert? addr@0) integer_zerop): With this pattern. * gcc.dg/tree-ssa/pr110269.c: New testcase.
2023-06-09Add Plus to the op list of `(zero_one == 0) ? y : z <op> y` patternAndrew Pinski1-2/+2
This adds plus to the op list of `(zero_one == 0) ? y : z <op> y` patterns which currently has bit_ior and bit_xor. This shows up now in GCC after the boolization work that Uroš has been doing. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/97711 PR tree-optimization/110155 gcc/ChangeLog: * match.pd ((zero_one == 0) ? y : z <op> y): Add plus to the op. ((zero_one != 0) ? z <op> y : y): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/branchless-cond-add-2.c: New test. * gcc.dg/tree-ssa/branchless-cond-add.c: New test.
2023-06-09Change the `(zero_one ==/!= 0) ? y : z <op> y` patterns to use multiply ↵Andrew Pinski1-4/+4
rather than `(-zero_one) & z` Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` already, it is better if we don't do a secondary transformation. This reduces the extra statements produced by match-and-simplify on the gimple level too. gcc/ChangeLog: * match.pd ((zero_one ==/!= 0) ? y : z <op> y): Use multiply rather than negation/bit_and.
2023-06-09MATCH: Allow unsigned types for `X & -Y -> X * Y` patternAndrew Pinski1-1/+4
This allows unsigned types if the inner type where the negation is located has greater than or equal to precision than the outer type. branchless-cond.c needs to be updated since now we change it to use a multiply rather than still having (-a)&c in there. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd (`X & -Y -> X * Y`): Allow for truncation and the same type for unsigned types. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
2023-06-09MATCH: Fix zero_one_valued_p not to match signed 1 bit integersAndrew Pinski1-3/+10
So for the attached testcase, we assumed that zero_one_valued_p would be the value [0,1] but currently zero_one_valued_p matches also signed 1 bit integers. This changes that not to match that and fixes the 2 new testcases at all optimization levels. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Note the GCC 13 patch will be slightly different due to the changes made to zero_one_valued_p. PR tree-optimization/110165 PR tree-optimization/110166 gcc/ChangeLog: * match.pd (zero_one_valued_p): Don't accept signed 1-bit integers. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr110165-1.c: New test. * gcc.c-torture/execute/pr110166-1.c: New test.
2023-06-09middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-codeRichard Biener1-3/+3
When folding two conversions in a row we use TYPE_PRECISION but that's invalid for VECTOR_TYPE. The following fixes this by using element_precision instead. * match.pd (two conversions in a row): Use element_precision to DTRT for VECTOR_TYPE.
2023-06-07MATCH: Fix comment for `(zero_one ==/!= 0) ? y : z <op> y` patternsAndrew Pinski1-2/+2
The patterns match more than just `a & 1` so change the comment for these two patterns to say that. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Fix comment for the `(zero_one ==/!= 0) ? y : z <op> y` patterns.
2023-06-07match.pd: Improve zero_one_valued_pJakub Jelinek1-5/+2
Recently zero_one_valued_p was changed to handle integer_zerop case specially, because tree_nonzero_bits (@0) == 1 only returns true for non-constant values with range [0, 1] or constant 1, constant 0 has tree_nonzero_bits (integer_zero_node) == 0. The following patch reverts that change and instead checks that tree_nonzero_bits is <= 1U. 2023-06-07 Jakub Jelinek <jakub@redhat.com> * match.pd (zero_one_valued_p): Don't handle integer_zerop specially, instead compare tree_nonzero_bits <= 1U rather than just == 1.
2023-06-06For the `-A CMP -B -> B CMP A` pattern allow EQ/NE for all integer typesAndrew Pinski1-2/+6
I noticed while looking at some code generation issue, that forwprop was not handling `-a == 0` for unsigned types and I was confused why it was not. r6-1814-g66e1cacf608045 removed these from fold because they were supposed to be already handled by the match.pd patterns but it was missed that the match.pd patterns checked TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ. This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/110134 * match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer types. (-A CMP CST -> B CMP (-CST)): Likewise. gcc/testsuite/ChangeLog: PR tree-optimization/110134 * gcc.dg/tree-ssa/negneq-1.c: New test. * gcc.dg/tree-ssa/negneq-2.c: New test. * gcc.dg/tree-ssa/negneq-3.c: New test. * gcc.dg/tree-ssa/negneq-4.c: New test.
2023-06-06Add match patterns for `a ? onezero : onezero` where one of the two operands ↵Andrew Pinski1-0/+18
are constant This adds a match pattern that are for boolean values that optimizes `a ? onezero : 0` to `a & onezero` and `a ? 1 : onezero` to `a | onezero`. This was reported a few times and I thought I would finally add the match pattern for this. This hits a few times in GCC itself too. Notes on the testcases: * phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine * phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment. * ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt. Note PR 109957 is filed for the now missing optimization in that testcase too. gcc/ChangeLog: PR tree-optimization/89263 PR tree-optimization/99069 PR tree-optimization/20083 PR tree-optimization/94898 * match.pd: Add patterns to optimize `a ? onezero : onezero` with one of the operands are constant. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase. * gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase. * gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt. * gcc.dg/tree-ssa/phi-opt-27.c: New test. * gcc.dg/tree-ssa/phi-opt-28.c: New test. * gcc.dg/tree-ssa/phi-opt-29.c: New test. * gcc.dg/tree-ssa/phi-opt-30.c: New test. * gcc.dg/tree-ssa/phi-opt-31.c: New test. * gcc.dg/tree-ssa/phi-opt-32.c: New test.
2023-06-06Match: zero_one_valued_p should match 0 constants tooAndrew Pinski1-0/+5
While working on `bool0 ? bool1 : bool2` I noticed that zero_one_valued_p does not match on the constant zero as in that case tree_nonzero_bits will return 0 and that is different from 1. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (zero_one_valued_p): Match 0 integer constant too.
2023-05-30Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparisonAndrew Pinski1-2/+2
This patch adds the support for match that was implemented for PR 87913 in phiopt. It implements it by adding support to minmax_from_comparison for the check. It uses the range information if available which allows to produce MIN/MAX expression when comparing against the lower/upper bound of the range instead of lower/upper of the type. minmax-20.c is the new testcase which tests the ranges part. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * fold-const.cc (minmax_from_comparison): Add support for NE_EXPR. * match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern): Add ne as a possible cmp. ((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-22.c: New test.
2023-05-30MATCH: Move `a <= CST1 ? MAX<a, CST2> : a` optimization to matchAndrew Pinski1-0/+18
This moves the `a <= CST1 ? MAX<a, CST2> : a` optimization from phiopt to match. It just adds a new pattern to match.pd. There is one more change needed before being able to remove minmax_replacement from phiopt. A few notes on the testsuite changes: * phi-opt-5.c is now able to optimize at phiopt1 so remove the xfail. * pr66726-4.c can be optimized during fold before phiopt1 so need to change the scanning. * pr66726-5.c needs two phiopt passes currently to optimize to the right thing, it needed 2 phiopt passes before, the cast from int to unsigned char is the reason. * pr66726-6.c is what the original pr66726-4.c was testing before the fold was able to optimize it. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd (`(a CMP CST1) ? max<a,CST2> : a`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail. * gcc.dg/tree-ssa/pr66726-4.c: Change how scanning works. * gcc.dg/tree-ssa/pr66726-5.c: New test. * gcc.dg/tree-ssa/pr66726-6.c: New test.
2023-05-29Fix artificial overflow during GENERIC foldingEric Botcazou1-0/+9
The Ada compiler gives a bogus warning: storage_offset1.ads:16:52: warning: Constraint_Error will be raised at run time [enabled by default] Ironically enough, this occurs because of an intermediate conversion to an unsigned type which is supposed to hide overflows but is counter-productive for constants because TREE_OVERFLOW is always set for them, so it ends up setting a bogus TREE_OVERFLOW when converting back to the original type. The fix simply redirects INTEGER_CSTs to the other, direct path without the intermediate conversion to the unsigned type. gcc/ * match.pd ((T)P - (T)(P + A) -> -(T) A): Avoid artificial overflow on constants. gcc/testsuite/ * gnat.dg/specs/storage_offset1.ads: New test.
2023-05-24PR middle-end/109840: Preserve popcount/parity type in match.pd.Roger Sayle1-10/+17
PR middle-end/109840 is a regression introduced by my recent patch to fold popcount(bswap(x)) as popcount(x). When the bswap and the popcount have the same precision, everything works fine, but this optimization also allowed a zero-extension between the two. The oversight is that we need to be strict with type conversions, both to avoid accidentally changing the argument type to popcount, and also to reflect the effects of argument/return-value promotion in the call to bswap, so this zero extension needs to be preserved/explicit in the optimized form. Interestingly, match.pd should (in theory) be able to narrow calls to popcount and parity, removing a zero-extension from its argument, but that is an independent optimization, that needs to check IFN_ support. Many thanks to Andrew Pinski for his help/fixes with these transformations. 2023-05-24 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/109840 * match.pd <popcount optimizations>: Preserve zero-extension when optimizing popcount((T)bswap(x)) and popcount((T)rotate(x,y)) as popcount((T)x), so the popcount's argument keeps the same type. <parity optimizations>: Likewise preserve extensions when simplifying parity((T)bswap(x)) and parity((T)rotate(x,y)) as parity((T)x), so that the parity's argument type is the same. gcc/testsuite/ChangeLog PR middle-end/109840 * gcc.dg/fold-parity-8.c: New test. * gcc.dg/fold-popcount-11.c: Likewise.
2023-05-21atch.pd: Ensure (op CONSTANT_CLASS_P CONSTANT_CLASS_P) is simplified [PR109505]Jakub Jelinek1-10/+10
On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P, but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) simplification actually relies on the (CST1 & CST2) simplification, otherwise it is a deoptimization, trading 2 ops for 3 and furthermore running into /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both operands are another bit-wise operation with a common input. If so, distribute the bit operations to save an operation and possibly two if constants are involved. For example, convert (A | B) & (A | C) into A | (B & C) Further simplification will occur if B and C are constants. */ simplification which simplifies that (x & CST2) | (CST1 & CST2) back to CST2 & (x | CST1). I went through all other places I could find where we have a simplification with 2 CONSTANT_CLASS_P operands and perform some operation on those two, while the other spots aren't that severe (just trade 2 operations for another 2 if the two constants don't simplify, rather than as in the above case trading 2 ops for 3), I still think all those spots really intend to optimize only if the 2 constants simplify. So, the following patch adds to those a ! modifier to ensure that, even at GENERIC that modifier means !EXPR_P which is exactly what we want IMHO. 2023-05-21 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109505 * match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2), Combine successive equal operations with constants, (A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A, CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P operands. * gcc.target/aarch64/sve/pr109505.c: New test.
2023-05-16MATCH: [PR109424] Simplify min/max of boolean argumentsAndrew Pinski1-0/+8
This is version 2 of https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html which does not depend on adding gimple_truth_valued_p at this point. Instead will use zero_one_valued_p which is already used for mult simplifications to make sure that we only have [0,1] rather having the mistake of maybe having [-1,0] as the range for signed bools. This shows up in a few places in GCC itself but only at -O1, we miss the min/max conversion because of PR 107888 (which I will be testing seperately). OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski PR tree-optimization/109424 gcc/ChangeLog: * match.pd: Add patterns for min/max of zero_one_valued values to `&`/`|`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bool-12.c: New test. * gcc.dg/tree-ssa/bool-13.c: New test. * gcc.dg/tree-ssa/minmax-20.c: New test. * gcc.dg/tree-ssa/minmax-21.c: New test.
2023-05-14MATCH: Add pattern for `signbit(x) ? x : -x` into abs (and swapped)Andrew Pinski1-0/+10
This adds a simple pattern to match.pd for `signbit(x) ? x : -x` into abs<x>. This can be done for all types even ones that honor signed zeros and NaNs because both signbit and - are considered only looking at/touching the sign bit of those types and does not trap either. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/109829 gcc/ChangeLog: * match.pd: Add pattern for `signbit(x) !=/== 0 ? x : -x`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/abs-3.c: New test. * gcc.dg/tree-ssa/abs-4.c: New test.
2023-05-12MATCH: Fix PR 109834, ICE with popcount combined with bswapAndrew Pinski1-2/+2
After r14-673-gc0dd80e4c4c3, there was a check in the match patterns which was checking the type is unsigned but instead of using the type, the patch used the expression. This adds the needed TREE_TYPE so get the correct answer and don't ICE. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR tree-optimization/109834 gcc/ChangeLog: * match.pd (popcount(bswap(x))->popcount(x)): Fix up unsigned type checking. (popcount(rotate(x,y))->popcount(x)): Likewise. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109834-1.c: New test. * gcc.dg/tree-ssa/pr109834-1.c: New test.
2023-05-12tree-optimization/109791 - simplify (unsigned)&foo - (unsigned)(&foo + o)Richard Biener1-0/+12
The following adds another variant of address difference simplification. The utility ptr_difference_const only handles constant differences (we also cannot code generate anything else), so exposing a possible POINTER_PLUS_EXPR in the match and computing the difference on the base only makes it possible to handle one case of a variable offset. This simplifies (unsigned long) &MEM <char[3]> [(void *)&str + 2B] - (unsigned long) (&str + (_69 + 1)) down to (1 - (unsigned long) _69) during niter analysis, allowing ranger to eliminate a condition later and avoiding a bogus -Wstringop-overflow diagnostic for the testcase in the PR. PR tree-optimization/109791 * match.pd (minus (convert ADDR_EXPR@0) (convert (pointer_plus @1 @2))): New pattern. (minus (convert (pointer_plus @1 @2)) (convert ADDR_EXPR@0)): Likewise.
2023-05-11aarch64: convert vector shift + bitwise and + multiply to vector comparemtsamis1-0/+61
When using SWAR (SIMD in a register) techniques a comparison operation within such a register can be made by using a combination of shifts, bitwise and and multiplication. If code using this scheme is vectorized then there is potential to replace all these operations with a single vector comparison, by reinterpreting the vector types to match the width of the SWAR register. For example, for the test function packed_cmp_16_32, the original generated code is: ldr q0, [x0] add w1, w1, 1 ushr v0.4s, v0.4s, 15 and v0.16b, v0.16b, v2.16b shl v1.4s, v0.4s, 16 sub v0.4s, v1.4s, v0.4s str q0, [x0], 16 cmp w2, w1 bhi .L20 with this pattern the above can be optimized to: ldr q0, [x0] add w1, w1, 1 cmlt v0.8h, v0.8h, #0 str q0, [x0], 16 cmp w2, w1 bhi .L20 The effect is similar for x86-64. Bootstrapped and reg-tested for x86 and aarch64. gcc/ChangeLog: * match.pd: simplify vector shift + bit_and + multiply. gcc/testsuite/ChangeLog: * gcc.target/aarch64/swar_to_vec_cmp.c: New test. Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
2023-05-11match.pd: Simplify popcount(X&Y)+popcount(X|Y) as popcount(X)+popcount(Y)Roger Sayle1-0/+19
This patch teaches match.pd to simplify popcount(X&Y)+popcount(X|Y) as popcount(X)+popcount(Y), and the related simplifications that popcount(X)+popcount(Y)-popcount(X&Y) is popcount(X|Y). As surprising as it might seem, this idiom is common in cheminformatics codes (for Tanimoto coefficient calculations). 2023-05-11 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * match.pd <popcount optimizations>: Simplify popcount(X|Y) + popcount(X&Y) as popcount(X)+popcount(Y). Likewise, simplify popcount(X)+popcount(Y)-popcount(X&Y) as popcount(X|Y), and vice versa. gcc/testsuite/ChangeLog * gcc.dg/fold-popcount-8.c: New test case. * gcc.dg/fold-popcount-9.c: Likewise. * gcc.dg/fold-popcount-10.c: Likewise.