aboutsummaryrefslogtreecommitdiff
path: root/gcc/match.pd
AgeCommit message (Collapse)AuthorFilesLines
2023-06-23[aarch64/match.pd] Fix ICE observed in PR110280.Prathamesh Kulkarni1-1/+8
gcc/ChangeLog: PR tree-optimization/110280 * match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector using build_vector_from_val with the element of input operand, and mask's type if operand and mask's types don't match. gcc/testsuite/ChangeLog: PR tree-optimization/110280 * gcc.target/aarch64/sve/pr110280.c: New test.
2023-06-23Use element_precision for match.pd arith conversion optimizationRichard Biener1-4/+4
The simplification (outertype)((innertype0)a+(innertype1)b) to ((newtype)a+(newtype)b) ends up using TYPE_PRECISION to check whether it can elide a conversion but in some paths there can be VECTOR_TYPEs where this instead compares the number of lanes. The following fixes the missed optimizations and uses element_precision in those places. * match.pd ((outertype)((innertype0)a+(innertype1)b) -> ((newtype)a+(newtype)b)): Use element_precision where appropriate.
2023-06-23Bogus and missed folding on vector comparesRichard Biener1-2/+2
fold_binary tries to transform (double)float1 CMP (double)float2 into float1 CMP float2 but ends up using TYPE_PRECISION on the argument types. For vector types that compares the number of lanes which should be always equal (so it's harmless as to not generating wrong code). The following instead properly uses element_precision. The same happens in the corresponding match.pd pattern. * fold-const.cc (fold_binary_loc): Use element_precision when trying (double)float1 CMP (double)float2 to float1 CMP float2 simplification. * match.pd: Likewise.
2023-06-16tree-optimization/110278 - uns < (typeof uns)(uns != 0) is always falseRichard Biener1-0/+11
The following adds two patterns simplifying comparisons, uns < (typeof uns)(uns != 0) is always false and x != (typeof x)(x == 0) is always true. PR tree-optimization/110278 * match.pd (uns < (typeof uns)(uns != 0) -> false): New. (x != (typeof x)(x == 0) -> true): Likewise.
2023-06-16tree-optimization/110269 - restore missed condition foldingRichard Biener1-2/+2
The following makes sure we optimize x != 0 using range info via tree_expr_nonzero_p via match.pd. PR tree-optimization/110269 * fold-const.cc (fold_binary_loc): Merge x != 0 folding with tree_expr_nonzero_p ... * match.pd (cmp (convert? addr@0) integer_zerop): With this pattern. * gcc.dg/tree-ssa/pr110269.c: New testcase.
2023-06-09Add Plus to the op list of `(zero_one == 0) ? y : z <op> y` patternAndrew Pinski1-2/+2
This adds plus to the op list of `(zero_one == 0) ? y : z <op> y` patterns which currently has bit_ior and bit_xor. This shows up now in GCC after the boolization work that Uroš has been doing. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/97711 PR tree-optimization/110155 gcc/ChangeLog: * match.pd ((zero_one == 0) ? y : z <op> y): Add plus to the op. ((zero_one != 0) ? z <op> y : y): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/branchless-cond-add-2.c: New test. * gcc.dg/tree-ssa/branchless-cond-add.c: New test.
2023-06-09Change the `(zero_one ==/!= 0) ? y : z <op> y` patterns to use multiply ↵Andrew Pinski1-4/+4
rather than `(-zero_one) & z` Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` already, it is better if we don't do a secondary transformation. This reduces the extra statements produced by match-and-simplify on the gimple level too. gcc/ChangeLog: * match.pd ((zero_one ==/!= 0) ? y : z <op> y): Use multiply rather than negation/bit_and.
2023-06-09MATCH: Allow unsigned types for `X & -Y -> X * Y` patternAndrew Pinski1-1/+4
This allows unsigned types if the inner type where the negation is located has greater than or equal to precision than the outer type. branchless-cond.c needs to be updated since now we change it to use a multiply rather than still having (-a)&c in there. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd (`X & -Y -> X * Y`): Allow for truncation and the same type for unsigned types. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
2023-06-09MATCH: Fix zero_one_valued_p not to match signed 1 bit integersAndrew Pinski1-3/+10
So for the attached testcase, we assumed that zero_one_valued_p would be the value [0,1] but currently zero_one_valued_p matches also signed 1 bit integers. This changes that not to match that and fixes the 2 new testcases at all optimization levels. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Note the GCC 13 patch will be slightly different due to the changes made to zero_one_valued_p. PR tree-optimization/110165 PR tree-optimization/110166 gcc/ChangeLog: * match.pd (zero_one_valued_p): Don't accept signed 1-bit integers. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr110165-1.c: New test. * gcc.c-torture/execute/pr110166-1.c: New test.
2023-06-09middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-codeRichard Biener1-3/+3
When folding two conversions in a row we use TYPE_PRECISION but that's invalid for VECTOR_TYPE. The following fixes this by using element_precision instead. * match.pd (two conversions in a row): Use element_precision to DTRT for VECTOR_TYPE.
2023-06-07MATCH: Fix comment for `(zero_one ==/!= 0) ? y : z <op> y` patternsAndrew Pinski1-2/+2
The patterns match more than just `a & 1` so change the comment for these two patterns to say that. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Fix comment for the `(zero_one ==/!= 0) ? y : z <op> y` patterns.
2023-06-07match.pd: Improve zero_one_valued_pJakub Jelinek1-5/+2
Recently zero_one_valued_p was changed to handle integer_zerop case specially, because tree_nonzero_bits (@0) == 1 only returns true for non-constant values with range [0, 1] or constant 1, constant 0 has tree_nonzero_bits (integer_zero_node) == 0. The following patch reverts that change and instead checks that tree_nonzero_bits is <= 1U. 2023-06-07 Jakub Jelinek <jakub@redhat.com> * match.pd (zero_one_valued_p): Don't handle integer_zerop specially, instead compare tree_nonzero_bits <= 1U rather than just == 1.
2023-06-06For the `-A CMP -B -> B CMP A` pattern allow EQ/NE for all integer typesAndrew Pinski1-2/+6
I noticed while looking at some code generation issue, that forwprop was not handling `-a == 0` for unsigned types and I was confused why it was not. r6-1814-g66e1cacf608045 removed these from fold because they were supposed to be already handled by the match.pd patterns but it was missed that the match.pd patterns checked TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ. This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/110134 * match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer types. (-A CMP CST -> B CMP (-CST)): Likewise. gcc/testsuite/ChangeLog: PR tree-optimization/110134 * gcc.dg/tree-ssa/negneq-1.c: New test. * gcc.dg/tree-ssa/negneq-2.c: New test. * gcc.dg/tree-ssa/negneq-3.c: New test. * gcc.dg/tree-ssa/negneq-4.c: New test.
2023-06-06Add match patterns for `a ? onezero : onezero` where one of the two operands ↵Andrew Pinski1-0/+18
are constant This adds a match pattern that are for boolean values that optimizes `a ? onezero : 0` to `a & onezero` and `a ? 1 : onezero` to `a | onezero`. This was reported a few times and I thought I would finally add the match pattern for this. This hits a few times in GCC itself too. Notes on the testcases: * phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine * phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment. * ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt. Note PR 109957 is filed for the now missing optimization in that testcase too. gcc/ChangeLog: PR tree-optimization/89263 PR tree-optimization/99069 PR tree-optimization/20083 PR tree-optimization/94898 * match.pd: Add patterns to optimize `a ? onezero : onezero` with one of the operands are constant. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase. * gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase. * gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt. * gcc.dg/tree-ssa/phi-opt-27.c: New test. * gcc.dg/tree-ssa/phi-opt-28.c: New test. * gcc.dg/tree-ssa/phi-opt-29.c: New test. * gcc.dg/tree-ssa/phi-opt-30.c: New test. * gcc.dg/tree-ssa/phi-opt-31.c: New test. * gcc.dg/tree-ssa/phi-opt-32.c: New test.
2023-06-06Match: zero_one_valued_p should match 0 constants tooAndrew Pinski1-0/+5
While working on `bool0 ? bool1 : bool2` I noticed that zero_one_valued_p does not match on the constant zero as in that case tree_nonzero_bits will return 0 and that is different from 1. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (zero_one_valued_p): Match 0 integer constant too.
2023-05-30Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparisonAndrew Pinski1-2/+2
This patch adds the support for match that was implemented for PR 87913 in phiopt. It implements it by adding support to minmax_from_comparison for the check. It uses the range information if available which allows to produce MIN/MAX expression when comparing against the lower/upper bound of the range instead of lower/upper of the type. minmax-20.c is the new testcase which tests the ranges part. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * fold-const.cc (minmax_from_comparison): Add support for NE_EXPR. * match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern): Add ne as a possible cmp. ((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-22.c: New test.
2023-05-30MATCH: Move `a <= CST1 ? MAX<a, CST2> : a` optimization to matchAndrew Pinski1-0/+18
This moves the `a <= CST1 ? MAX<a, CST2> : a` optimization from phiopt to match. It just adds a new pattern to match.pd. There is one more change needed before being able to remove minmax_replacement from phiopt. A few notes on the testsuite changes: * phi-opt-5.c is now able to optimize at phiopt1 so remove the xfail. * pr66726-4.c can be optimized during fold before phiopt1 so need to change the scanning. * pr66726-5.c needs two phiopt passes currently to optimize to the right thing, it needed 2 phiopt passes before, the cast from int to unsigned char is the reason. * pr66726-6.c is what the original pr66726-4.c was testing before the fold was able to optimize it. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd (`(a CMP CST1) ? max<a,CST2> : a`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail. * gcc.dg/tree-ssa/pr66726-4.c: Change how scanning works. * gcc.dg/tree-ssa/pr66726-5.c: New test. * gcc.dg/tree-ssa/pr66726-6.c: New test.
2023-05-29Fix artificial overflow during GENERIC foldingEric Botcazou1-0/+9
The Ada compiler gives a bogus warning: storage_offset1.ads:16:52: warning: Constraint_Error will be raised at run time [enabled by default] Ironically enough, this occurs because of an intermediate conversion to an unsigned type which is supposed to hide overflows but is counter-productive for constants because TREE_OVERFLOW is always set for them, so it ends up setting a bogus TREE_OVERFLOW when converting back to the original type. The fix simply redirects INTEGER_CSTs to the other, direct path without the intermediate conversion to the unsigned type. gcc/ * match.pd ((T)P - (T)(P + A) -> -(T) A): Avoid artificial overflow on constants. gcc/testsuite/ * gnat.dg/specs/storage_offset1.ads: New test.
2023-05-24PR middle-end/109840: Preserve popcount/parity type in match.pd.Roger Sayle1-10/+17
PR middle-end/109840 is a regression introduced by my recent patch to fold popcount(bswap(x)) as popcount(x). When the bswap and the popcount have the same precision, everything works fine, but this optimization also allowed a zero-extension between the two. The oversight is that we need to be strict with type conversions, both to avoid accidentally changing the argument type to popcount, and also to reflect the effects of argument/return-value promotion in the call to bswap, so this zero extension needs to be preserved/explicit in the optimized form. Interestingly, match.pd should (in theory) be able to narrow calls to popcount and parity, removing a zero-extension from its argument, but that is an independent optimization, that needs to check IFN_ support. Many thanks to Andrew Pinski for his help/fixes with these transformations. 2023-05-24 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/109840 * match.pd <popcount optimizations>: Preserve zero-extension when optimizing popcount((T)bswap(x)) and popcount((T)rotate(x,y)) as popcount((T)x), so the popcount's argument keeps the same type. <parity optimizations>: Likewise preserve extensions when simplifying parity((T)bswap(x)) and parity((T)rotate(x,y)) as parity((T)x), so that the parity's argument type is the same. gcc/testsuite/ChangeLog PR middle-end/109840 * gcc.dg/fold-parity-8.c: New test. * gcc.dg/fold-popcount-11.c: Likewise.
2023-05-21atch.pd: Ensure (op CONSTANT_CLASS_P CONSTANT_CLASS_P) is simplified [PR109505]Jakub Jelinek1-10/+10
On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P, but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) simplification actually relies on the (CST1 & CST2) simplification, otherwise it is a deoptimization, trading 2 ops for 3 and furthermore running into /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both operands are another bit-wise operation with a common input. If so, distribute the bit operations to save an operation and possibly two if constants are involved. For example, convert (A | B) & (A | C) into A | (B & C) Further simplification will occur if B and C are constants. */ simplification which simplifies that (x & CST2) | (CST1 & CST2) back to CST2 & (x | CST1). I went through all other places I could find where we have a simplification with 2 CONSTANT_CLASS_P operands and perform some operation on those two, while the other spots aren't that severe (just trade 2 operations for another 2 if the two constants don't simplify, rather than as in the above case trading 2 ops for 3), I still think all those spots really intend to optimize only if the 2 constants simplify. So, the following patch adds to those a ! modifier to ensure that, even at GENERIC that modifier means !EXPR_P which is exactly what we want IMHO. 2023-05-21 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109505 * match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2), Combine successive equal operations with constants, (A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A, CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P operands. * gcc.target/aarch64/sve/pr109505.c: New test.
2023-05-16MATCH: [PR109424] Simplify min/max of boolean argumentsAndrew Pinski1-0/+8
This is version 2 of https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html which does not depend on adding gimple_truth_valued_p at this point. Instead will use zero_one_valued_p which is already used for mult simplifications to make sure that we only have [0,1] rather having the mistake of maybe having [-1,0] as the range for signed bools. This shows up in a few places in GCC itself but only at -O1, we miss the min/max conversion because of PR 107888 (which I will be testing seperately). OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski PR tree-optimization/109424 gcc/ChangeLog: * match.pd: Add patterns for min/max of zero_one_valued values to `&`/`|`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bool-12.c: New test. * gcc.dg/tree-ssa/bool-13.c: New test. * gcc.dg/tree-ssa/minmax-20.c: New test. * gcc.dg/tree-ssa/minmax-21.c: New test.
2023-05-14MATCH: Add pattern for `signbit(x) ? x : -x` into abs (and swapped)Andrew Pinski1-0/+10
This adds a simple pattern to match.pd for `signbit(x) ? x : -x` into abs<x>. This can be done for all types even ones that honor signed zeros and NaNs because both signbit and - are considered only looking at/touching the sign bit of those types and does not trap either. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/109829 gcc/ChangeLog: * match.pd: Add pattern for `signbit(x) !=/== 0 ? x : -x`. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/abs-3.c: New test. * gcc.dg/tree-ssa/abs-4.c: New test.
2023-05-12MATCH: Fix PR 109834, ICE with popcount combined with bswapAndrew Pinski1-2/+2
After r14-673-gc0dd80e4c4c3, there was a check in the match patterns which was checking the type is unsigned but instead of using the type, the patch used the expression. This adds the needed TREE_TYPE so get the correct answer and don't ICE. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR tree-optimization/109834 gcc/ChangeLog: * match.pd (popcount(bswap(x))->popcount(x)): Fix up unsigned type checking. (popcount(rotate(x,y))->popcount(x)): Likewise. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109834-1.c: New test. * gcc.dg/tree-ssa/pr109834-1.c: New test.
2023-05-12tree-optimization/109791 - simplify (unsigned)&foo - (unsigned)(&foo + o)Richard Biener1-0/+12
The following adds another variant of address difference simplification. The utility ptr_difference_const only handles constant differences (we also cannot code generate anything else), so exposing a possible POINTER_PLUS_EXPR in the match and computing the difference on the base only makes it possible to handle one case of a variable offset. This simplifies (unsigned long) &MEM <char[3]> [(void *)&str + 2B] - (unsigned long) (&str + (_69 + 1)) down to (1 - (unsigned long) _69) during niter analysis, allowing ranger to eliminate a condition later and avoiding a bogus -Wstringop-overflow diagnostic for the testcase in the PR. PR tree-optimization/109791 * match.pd (minus (convert ADDR_EXPR@0) (convert (pointer_plus @1 @2))): New pattern. (minus (convert (pointer_plus @1 @2)) (convert ADDR_EXPR@0)): Likewise.
2023-05-11aarch64: convert vector shift + bitwise and + multiply to vector comparemtsamis1-0/+61
When using SWAR (SIMD in a register) techniques a comparison operation within such a register can be made by using a combination of shifts, bitwise and and multiplication. If code using this scheme is vectorized then there is potential to replace all these operations with a single vector comparison, by reinterpreting the vector types to match the width of the SWAR register. For example, for the test function packed_cmp_16_32, the original generated code is: ldr q0, [x0] add w1, w1, 1 ushr v0.4s, v0.4s, 15 and v0.16b, v0.16b, v2.16b shl v1.4s, v0.4s, 16 sub v0.4s, v1.4s, v0.4s str q0, [x0], 16 cmp w2, w1 bhi .L20 with this pattern the above can be optimized to: ldr q0, [x0] add w1, w1, 1 cmlt v0.8h, v0.8h, #0 str q0, [x0], 16 cmp w2, w1 bhi .L20 The effect is similar for x86-64. Bootstrapped and reg-tested for x86 and aarch64. gcc/ChangeLog: * match.pd: simplify vector shift + bit_and + multiply. gcc/testsuite/ChangeLog: * gcc.target/aarch64/swar_to_vec_cmp.c: New test. Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
2023-05-11match.pd: Simplify popcount(X&Y)+popcount(X|Y) as popcount(X)+popcount(Y)Roger Sayle1-0/+19
This patch teaches match.pd to simplify popcount(X&Y)+popcount(X|Y) as popcount(X)+popcount(Y), and the related simplifications that popcount(X)+popcount(Y)-popcount(X&Y) is popcount(X|Y). As surprising as it might seem, this idiom is common in cheminformatics codes (for Tanimoto coefficient calculations). 2023-05-11 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * match.pd <popcount optimizations>: Simplify popcount(X|Y) + popcount(X&Y) as popcount(X)+popcount(Y). Likewise, simplify popcount(X)+popcount(Y)-popcount(X&Y) as popcount(X|Y), and vice versa. gcc/testsuite/ChangeLog * gcc.dg/fold-popcount-8.c: New test case. * gcc.dg/fold-popcount-9.c: Likewise. * gcc.dg/fold-popcount-10.c: Likewise.
2023-05-11match.pd: Simplify popcount/parity of bswap/rotate.Roger Sayle1-0/+50
This is the latest iteration of my patch from August 2020 https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552391.html incorperating feedback and suggestions from reviewers. This patch to match.pd optimizes away bit permutation operations, specifically bswap and rotate, in calls to popcount and parity. 2023-05-11 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * match.pd <popcount optimizations>: Simplify popcount(bswap(x)) as popcount(x). Simplify popcount(rotate(x,y)) as popcount(x). <parity optimizations>: Simplify parity(bswap(x)) as parity(x). Simplify parity(rotate(x,y)) as parity(x). gcc/testsuite/ChangeLog * gcc.dg/fold-parity-6.c: New test. * gcc.dg/fold-parity-7.c: Likewise. * gcc.dg/fold-popcount-6.c: Likewise. * gcc.dg/fold-popcount-7.c: Likewise.
2023-05-05MATCH: Add ABSU<a> == 0 to a == 0 simplificationAndrew Pinski1-5/+6
There is already an `ABS<a> == 0` to `a == 0` pattern, this just extends that to ABSU too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/109722 gcc/ChangeLog: * match.pd: Extend the `ABS<a> == 0` pattern to cover `ABSU<a> == 0` too. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/abs-1.c: New test.
2023-05-02MATCH: Port CLRSB part of builtin_zero_patternAndrew Pinski1-0/+8
This ports the clrsb builtin part of builtin_zero_pattern to match.pd. A simple pattern to port. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New pattern.
2023-05-02tree-optimization: [PR109702] MATCH: Fix a ? func(a) : N patternsAndrew Pinski1-8/+8
I accidently messed up these patterns so the comparison against 0 and the arguments was not matching up when they need to be. I committed this as obvious after a bootstrap/test on x86_64-linux-gnu PR tree-optimization/109702 gcc/ChangeLog: * match.pd: Fix "a != 0 ? FUNC(a) : CST" patterns for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-25b.c: New test.
2023-04-30MATCH: add some of what phiopt's builtin_zero_pattern didAndrew Pinski1-2/+39
This adds the patterns for POPCOUNT BSWAP FFS PARITY CLZ and CTZ. For "a != 0 ? FUNC(a) : CST". CLRSB, CLRSBL, and CLRSBLL will be moved next. Note this is not enough to remove cond_removal_in_builtin_zero_pattern as we need to handle the case where there is an NOP_CONVERT inside the conditional to move out of the condition inside match_simplify_replacement. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Add patterns for "a != 0 ? FUNC(a) : CST" for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
2023-04-28PHIOPT: Move two_value_replacement to match.pdAndrew Pinski1-0/+94
This patch converts two_value_replacement function into a match.pd pattern. It is a direct translation with only one minor change, does not check for the {0,+-1} case as that is handled before in match.pd so there is no reason to do the extra check for it. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/100958 * tree-ssa-phiopt.cc (two_value_replacement): Remove. (pass_phiopt::execute): Don't call two_value_replacement. * match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to handle what two_value_replacement did.
2023-04-28MATCH: Add patterns from phiopt's minmax_replacementAndrew Pinski1-0/+16
This adds a few patterns from phiopt's minmax_replacement for (A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C> . It is progress to remove minmax_replacement from phiopt. There are still some more cases dealing with constants on the edges (0/INT_MAX) to handle in match. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: Add patterns for "(A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C>". gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly. * gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert as that now does the combining.
2023-04-28MATCH: Factor out code that for min max detection with constantsAndrew Pinski1-28/+1
This factors out some of the code from the min/max detection from match.pd into a function so it can be reused in other places. This is mainly used to detect the conversions of >= to > which causes the integer values to be changed by one. Changes since v1: * factor out the checks for INTEGER_CSTs so it is more obvious. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * match.pd: Factor out the deciding the min/max from the "(cond (cmp (convert1? x) c1) (convert2? x) c2)" pattern to ... * fold-const.cc (minmax_from_comparison): this new function. * fold-const.h (minmax_from_comparison): New prototype.
2023-04-27wrong GIMPLE from (bit_field_ref CTOR ..) simplificationRichard Biener1-2/+7
When we simplify a BIT_FIELD_REF of a CTOR like { _1, _2, _3, _4 } and attempt to produce (view converted) { _1, _2 } for a selected subset we fail to realize this cannot be done from match.pd since we have no way to write the resulting CTOR "operation" and the built CTOR { _1, _2 } isn't a GIMPLE value. This kind of simplifications have to be done in forwprop (or would need a match.pd syntax extension) where we can split out the CTOR to a separate stmt. The following disables this particular simplification when we are simplifying GIMPLE. With enhanced IL checking this otherwise causes ICEs in the testsuite from vectorized code. * match.pd (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2): Do not create a CTOR operand in the result when simplifying GIMPLE.
2023-04-26Remove some uses of deprecated irange API.Aldy Hernandez1-5/+5
gcc/ChangeLog: * builtins.cc (expand_builtin_strnlen): Rewrite deprecated irange API uses to new API. * gimple-predicate-analysis.cc (find_var_cmp_const): Same. * internal-fn.cc (get_min_precision): Same. * match.pd: Same. * tree-affine.cc (expr_to_aff_combination): Same. * tree-data-ref.cc (dr_step_indicator): Same. * tree-dfa.cc (get_ref_base_and_extent): Same. * tree-scalar-evolution.cc (iv_can_overflow_p): Same. * tree-ssa-phiopt.cc (two_value_replacement): Same. * tree-ssa-pre.cc (insert_into_preds_of_block): Same. * tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Same. * tree-ssa-strlen.cc (compare_nonzero_chars): Same. * tree-switch-conversion.cc (bit_test_cluster::emit): Same. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Same. * tree.cc (get_range_pos_neg): Same.
2023-04-22match.pd: Fix fneg/fadd optimization [PR109583]Jakub Jelinek1-1/+2
The following testcase ICEs on x86, foo function since my r14-22 improvement, but bar already since r13-4122. The problem is the same, in the if expression related_vector_mode is called and that starts with gcc_assert (VECTOR_MODE_P (vector_mode)); but nothing in the fneg/fadd match.pd pattern actually checks if the VEC_PERM type has VECTOR_MODE_P (vec_mode). In this case it has BLKmode and so it ICEs. The following patch makes sure we don't ICE on it. 2023-04-22 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109583 * match.pd (fneg/fadd simplify): Don't call related_vector_mode if vec_mode is not VECTOR_MODE_P. * gcc.dg/pr109583.c: New test.
2023-04-18match.pd: Improve fneg/fadd optimization [PR109240]Jakub Jelinek1-54/+67
match.pd has mostly for AArch64 an optimization in which it optimizes certain forms of __builtin_shuffle of x + y and x - y vectors into fneg using twice as wide element type so that every other sign is changed, followed by fadd. The following patch extends that optimization, so that it can handle other forms as well, using the same fneg but fsub instead of fadd. As the plus is commutative and minus is not and I want to handle vec_perm with plus minus and minus plus order preferrably in one pattern, I had to do the matching operand checks by hand. 2023-04-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109240 * match.pd (fneg/fadd): Rewrite such that it handles both plus as first vec_perm operand and minus as second using fneg/fadd and minus as first vec_perm operand and plus as second using fneg/fsub. * gcc.target/aarch64/simd/addsub_2.c: New test. * gcc.target/aarch64/sve/addsub_2.c: New test.
2023-04-04sanitizer: missing signed integer overflow errors [PR109107]Marek Polacek1-3/+3
Here we're failing to detect a signed overflow with -O because match.pd, since r8-1516, transforms c = (a + 1) - (int) (short int) b; into c = (int) ((unsigned int) a + 4294946117); wrongly eliding the overflow. This kind of problems is usually avoided by using TYPE_OVERFLOW_SANITIZED in the appropriate place. The first match.pd hunk in the patch fixes it. I've constructed a testcase for each of the surrounding cases as well. Then I noticed that fold_binary_loc/associate has the same problem, so I've added a TYPE_OVERFLOW_SANITIZED there as well (it may be too coarse, sorry). Then I found yet another problem, but instead of fixing it now I've opened 109134. I could probably go on and find a dozen more. PR sanitizer/109107 gcc/ChangeLog: * fold-const.cc (fold_binary_loc): Use TYPE_OVERFLOW_SANITIZED when associating. * match.pd: Use TYPE_OVERFLOW_SANITIZED. gcc/testsuite/ChangeLog: * c-c++-common/ubsan/pr109107-1.c: New test. * c-c++-common/ubsan/pr109107-2.c: New test. * c-c++-common/ubsan/pr109107-3.c: New test. * c-c++-common/ubsan/pr109107-4.c: New test.
2023-03-26match.pd: Fix up fneg/fadd simplification [PR109230]Jakub Jelinek1-0/+1
The following testcase is miscompiled on aarch64-linux. match.pd has a simplification for addsub, where it negates one of the vectors in twice as large floating point element vector (effectively negating every other element) and then doing addition. But a requirement for that is that the permutation picks the right elements, in particular 0, nelts+1, 2, nelts+3, 4, nelts+5, ... The pattern tests this with sel.series_p (0, 2, 0, 2) check, which as documented verifies that the even elements of the permutation mask are identity, but doesn't say anything about the others. The following patch fixes it by also checking that the odd elements start at nelts + 1 with the same step of 2. 2023-03-26 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109230 * match.pd (fneg/fadd simplify): Verify also odd permutation indexes. * gcc.dg/pr109230.c: New test.
2023-02-18Remove #if GIMPLE around 1 - a patternAndrew Pinski1-2/+0
This removes the "#if GIMPLE" around the "1 - a" pattern as ssa_name_has_boolean_range (get_range_query) works when cfun is a nullptr. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd: Remove #if GIMPLE around the "1 - a" pattern
2023-02-14Simplify "1 - bool_val" to "bool_val ^ 1"Andrew Pinski1-0/+13
For bool values, it is easier to deal with xor 1 rather than having 1 - a. This is because we are more likely to simplify the xor further in many cases. This is a special case for (MASK - b) where MASK is a powerof2 - 1 and b <= MASK but only for bool ranges ([0,1]) as that is the main case where the difference comes into play. Note this is enabled for gimple folding only as the ranges are only know while doing gimple folding and cfun is not always set when fold is called. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/108355 PR tree-optimization/96921 * match.pd: Add pattern for "1 - bool_val". gcc/testsuite/ChangeLog: PR tree-optimization/108355 PR tree-optimization/96921 * gcc.dg/tree-ssa/bool-minus-1.c: New test. * gcc.dg/tree-ssa/bool-minus-2.c: New test. * gcc.dg/tree-ssa/pr108354-1.c: New test.
2023-02-09match.pd: Simplify BFR of insert when extracting exactly all inserted bits ↵Jakub Jelinek1-3/+6
[PR108688] On Thu, Feb 09, 2023 at 09:16:17AM +0100, Richard Biener via Gcc-patches wrote: > Hmm. Can we handle the case of the extraction exactly covering the > insertion separately then and simplify to plain @1? I was suggesting that in the PR. Here it is as an incremental patch on top of Andrew's patch. On the newly added testcase the ifcvt-folding difference without/with the incremental patch is: --- pr108688.c.171t.ifcvt_ 2023-02-09 10:47:30.169916845 +0100 +++ pr108688.c.171t.ifcvt 2023-02-09 10:48:44.942793453 +0100 @@ -25,6 +25,8 @@ Number of blocks in CFG: 11 Number of blocks to update: 5 ( 45%) +Applying pattern match.pd:7487, gimple-match.cc:243200 +Applying pattern match.pd:3987, gimple-match.cc:75423 Matching expression match.pd:1677, gimple-match.cc:209 Applying pattern match.pd:1733, gimple-match.cc:109481 Matching expression match.pd:2393, gimple-match.cc:852 @@ -70,7 +72,6 @@ void foo () signed char _29; <unnamed-signed:7> _30; unsigned int ivtmp_33; - <unnamed-signed:7> _ifc__35; unsigned char _ifc__37; unsigned char _ifc__38; unsigned char _ifc__39; @@ -91,8 +92,7 @@ void foo () _2 = (<unnamed-signed:7>) a.0_1; _ifc__38 = u.D.2741; _ifc__39 = BIT_INSERT_EXPR <_ifc__38, _2, 0 (7 bits)>; - _ifc__35 = BIT_FIELD_REF <_ifc__39, 7, 0>; - _4 = (signed char) _ifc__35; + _4 = (signed char) _2; b.1_5 = b; _6 = (signed char) b.1_5; _7 = _4 ^ _6; 2023-02-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108688 * match.pd (bit_field_ref [bit_insert]): Simplify BIT_FIELD_REF of BIT_INSERT_EXPR extracting exactly all inserted bits even when without mode precision. Formatting fixes. * gcc.c-torture/compile/pr108688-1.c: Add PR number as comment. * gcc.dg/pr108688.c: New test.
2023-02-09match.pd: When simplifying BFR of an insert, require a mode precision ↵Andrew Pinski1-1/+3
integral type [PR108688] The same problem as PR 88739 has crept in but this time in match.pd when simplifying bit_field_ref of an bit_insert. That is we are generating a BIT_FIELD_REF of a non-mode-precision integral type. PR tree-optimization/108688 * match.pd (bit_field_ref [bit_insert]): Avoid generating BIT_FIELD_REFs of non-mode-precision integral operands. * gcc.c-torture/compile/pr108688-1.c: New test.
2023-01-27Add support for conditional xorsign [PR96373]Richard Sandiford1-0/+14
This patch is an optimisation, but it's also a prerequisite for fixing PR96373 without regressing vect-xorsign_exec.c. Currently the vectoriser vectorises: for (i = 0; i < N; i++) r[i] = a[i] * __builtin_copysignf (1.0f, b[i]); as two unconditional operations (copysign and mult). tree-ssa-math-opts.cc later combines them into an "xorsign" function. This works for both Advanced SIMD and SVE. However, with the fix for PR96373, the vectoriser will instead generate a conditional multiplication (IFN_COND_MUL). Something then needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional xorsign. Three obvious options were: (1) Extend tree-ssa-math-opts.cc. (2) Do the fold in match.pd. (3) Leave it to rtl combine. I'm against (3), because this isn't a target-specific optimisation. (1) would be possible, but would involve open-coding a lot of what match.pd does for us. And, in contrast to doing the current tree-ssa-math-opts.cc optimisation in match.pd, there should be no danger of (2) happening too early. If we have an IFN_COND_MUL then we're already past the stage of simplifying the original source code. There was also a choice between adding a conditional xorsign ifn and simply open-coding the xorsign. The latter seems simpler, and means less boiler-plate for target-specific code. The signed_or_unsigned_type_for change is needed to make sure that we stay in "SVE space" when doing the optimisation on 128-bit fixed-length SVE. gcc/ PR tree-optimization/96373 * tree.h (sign_mask_for): Declare. * tree.cc (sign_mask_for): New function. (signed_or_unsigned_type_for): For vector types, try to use the related_int_vector_mode. * genmatch.cc (commutative_op): Handle conditional internal functions. * match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and. gcc/testsuite/ PR tree-optimization/96373 * gcc.target/aarch64/sve/cond_xorsign_1.c: New test. * gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.
2023-01-12PR tree-optimization/92342: Optimize b & -(a==c) in match.pdRoger Sayle1-3/+13
This patch is an update/tweak of Andrew Pinski's two patches for PR tree-optimization/92342, that were originally posted by in November: https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585111.html https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585112.html Technically, the first of those was approved by Richard Biener, though never committed, and my first thought was to simply push it for Andrew, but the review of the second piece expressed concerns over comparisons in non-integral modes, where the result may not be zero-one valued. Indeed both transformations misbehave in the presence of vector mode comparisons (these transformations are already implemented for vec_cond elsewhere in match.pd), so my minor contribution is to limit these new transformations to scalars, by testing that both the operands and results are INTEGRAL_TYPE_P. 2023-01-12 Andrew Pinski <apinski@marvell.com> Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog: PR tree-optimization/92342 * match.pd ((m1 CMP m2) * d -> (m1 CMP m2) ? d : 0): Use tcc_comparison and :c for the multiply. (b & -(a CMP c) -> (a CMP c)?b:0): New pattern. gcc/testsuite/ChangeLog: PR tree-optimization/92342 * gcc.dg/tree-ssa/andnegcmp-1.c: New test. * gcc.dg/tree-ssa/andnegcmp-2.c: New test. * gcc.dg/tree-ssa/multcmp-1.c: New test. * gcc.dg/tree-ssa/multcmp-2.c: New test.
2023-01-02Update copyright years.Jakub Jelinek1-1/+1
2022-12-21Fix PR 105532: match.pd patterns calling tree_nonzero_bits with vector typesAndrew Pinski1-11/+14
Even though this PR was reported with an ubsan issue, the problem is tree_nonzero_bits is being called with an expression which is a vector type. This fixes three patterns I noticed which does that. And adds a testcase for one of the patterns. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions gcc/ChangeLog: PR tree-optimization/105532 * match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral type before calling tree_nonzero_bits. (popcount(X) + popcount(Y)): Likewise. (popcount(X&C1)): Likewise. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/vector-shift-1.c: New test.
2022-12-12middle-end: simplify complex if expressions where comparisons are inverse of ↵Tamar Christina1-0/+55
one another. This optimizes the following sequence ((a < b) & c) | ((a >= b) & d) into (a < b ? c : d) & 1 for scalar and on vector we can omit the & 1. Also recognizes (-(a < b) & c) | (-(a >= b) & d) into a < b ? c : d This changes the code generation from zoo2: cmp w0, w1 cset w0, lt cset w1, ge and w0, w0, w2 and w1, w1, w3 orr w0, w0, w1 ret into cmp w0, w1 csel w0, w2, w3, lt and w0, w0, 1 ret and significantly reduces the number of selects we have to do in the vector code. gcc/ChangeLog: * match.pd: Add new rule. gcc/testsuite/ChangeLog: * gcc.target/aarch64/if-compare_1.c: New test. * gcc.target/aarch64/if-compare_2.c: New test.
2022-12-12Revert parts of ADDR_EXPR/CONSTRUCTOR treatment change in match.pdRichard Biener1-7/+15
This reverts the part that substitutes from the definition of an SSA name to the capture, thus ADDR_EXPR@0 eventually yielding &y_1->a[i_2] instead of _3. That's because I didn't think of how to deal with substituting @0 in the result pattern. So the following re-instantiates the SSA def CONSTRUCTOR handling and in the ADDR_EXPR helpers used by match.pd handles SSA names defined to ADDR_EXPRs transparently. * genmatch.cc (dt_simplify::gen): Revert last change. * match.pd: Revert simplification of CONSTUCTOR leaf handling. (&x cmp SSA_NAME): Handle ADDR_EXPR in SSA defs. * fold-const.cc (split_address_to_core_and_offset): Handle ADDR_EXPRs in SSA defs. (address_compare): Likewise.