Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
PR tree-optimization/110280
* match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector
using build_vector_from_val with the element of input operand, and
mask's type if operand and mask's types don't match.
gcc/testsuite/ChangeLog:
PR tree-optimization/110280
* gcc.target/aarch64/sve/pr110280.c: New test.
|
|
The simplification (outertype)((innertype0)a+(innertype1)b) to
((newtype)a+(newtype)b) ends up using TYPE_PRECISION to check
whether it can elide a conversion but in some paths there can
be VECTOR_TYPEs where this instead compares the number of lanes.
The following fixes the missed optimizations and uses
element_precision in those places.
* match.pd ((outertype)((innertype0)a+(innertype1)b)
-> ((newtype)a+(newtype)b)): Use element_precision
where appropriate.
|
|
fold_binary tries to transform (double)float1 CMP (double)float2
into float1 CMP float2 but ends up using TYPE_PRECISION on the
argument types. For vector types that compares the number of
lanes which should be always equal (so it's harmless as to
not generating wrong code). The following instead properly
uses element_precision.
The same happens in the corresponding match.pd pattern.
* fold-const.cc (fold_binary_loc): Use element_precision
when trying (double)float1 CMP (double)float2 to
float1 CMP float2 simplification.
* match.pd: Likewise.
|
|
The following adds two patterns simplifying comparisons,
uns < (typeof uns)(uns != 0) is always false and x != (typeof x)(x == 0)
is always true.
PR tree-optimization/110278
* match.pd (uns < (typeof uns)(uns != 0) -> false): New.
(x != (typeof x)(x == 0) -> true): Likewise.
|
|
The following makes sure we optimize x != 0 using range info
via tree_expr_nonzero_p via match.pd.
PR tree-optimization/110269
* fold-const.cc (fold_binary_loc): Merge x != 0 folding
with tree_expr_nonzero_p ...
* match.pd (cmp (convert? addr@0) integer_zerop): With this
pattern.
* gcc.dg/tree-ssa/pr110269.c: New testcase.
|
|
This adds plus to the op list of `(zero_one == 0) ? y : z <op> y` patterns
which currently has bit_ior and bit_xor.
This shows up now in GCC after the boolization work that Uroš has been doing.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/97711
PR tree-optimization/110155
gcc/ChangeLog:
* match.pd ((zero_one == 0) ? y : z <op> y): Add plus to the op.
((zero_one != 0) ? z <op> y : y): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/branchless-cond-add-2.c: New test.
* gcc.dg/tree-ssa/branchless-cond-add.c: New test.
|
|
rather than `(-zero_one) & z`
Since there is a pattern to convert `(-zero_one) & z` into `zero_one * z` already,
it is better if we don't do a secondary transformation. This reduces the extra
statements produced by match-and-simplify on the gimple level too.
gcc/ChangeLog:
* match.pd ((zero_one ==/!= 0) ? y : z <op> y): Use
multiply rather than negation/bit_and.
|
|
This allows unsigned types if the inner type where the negation is
located has greater than or equal to precision than the outer type.
branchless-cond.c needs to be updated since now we change it to
use a multiply rather than still having (-a)&c in there.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (`X & -Y -> X * Y`): Allow for truncation
and the same type for unsigned types.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/branchless-cond.c: Update testcase.
|
|
So for the attached testcase, we assumed that zero_one_valued_p would
be the value [0,1] but currently zero_one_valued_p matches also
signed 1 bit integers.
This changes that not to match that and fixes the 2 new testcases at
all optimization levels.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note the GCC 13 patch will be slightly different due to the changes
made to zero_one_valued_p.
PR tree-optimization/110165
PR tree-optimization/110166
gcc/ChangeLog:
* match.pd (zero_one_valued_p): Don't accept
signed 1-bit integers.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr110165-1.c: New test.
* gcc.c-torture/execute/pr110166-1.c: New test.
|
|
When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.
* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.
|
|
The patterns match more than just `a & 1` so change the comment
for these two patterns to say that.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd: Fix comment for the
`(zero_one ==/!= 0) ? y : z <op> y` patterns.
|
|
Recently zero_one_valued_p was changed to handle integer_zerop
case specially, because tree_nonzero_bits (@0) == 1 only returns
true for non-constant values with range [0, 1] or constant 1,
constant 0 has tree_nonzero_bits (integer_zero_node) == 0.
The following patch reverts that change and instead checks
that tree_nonzero_bits is <= 1U.
2023-06-07 Jakub Jelinek <jakub@redhat.com>
* match.pd (zero_one_valued_p): Don't handle integer_zerop specially,
instead compare tree_nonzero_bits <= 1U rather than just == 1.
|
|
I noticed while looking at some code generation issue, that forwprop
was not handling `-a == 0` for unsigned types and I was confused why
it was not.
r6-1814-g66e1cacf608045 removed these from fold because they
were supposed to be already handled by the match.pd patterns
but it was missed that the match.pd patterns checked
TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ.
This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/110134
* match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer
types.
(-A CMP CST -> B CMP (-CST)): Likewise.
gcc/testsuite/ChangeLog:
PR tree-optimization/110134
* gcc.dg/tree-ssa/negneq-1.c: New test.
* gcc.dg/tree-ssa/negneq-2.c: New test.
* gcc.dg/tree-ssa/negneq-3.c: New test.
* gcc.dg/tree-ssa/negneq-4.c: New test.
|
|
are constant
This adds a match pattern that are for boolean values
that optimizes `a ? onezero : 0` to `a & onezero` and
`a ? 1 : onezero` to `a | onezero`.
This was reported a few times and I thought I would finally
add the match pattern for this.
This hits a few times in GCC itself too.
Notes on the testcases:
* phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine
* phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment.
* ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt.
Note PR 109957 is filed for the now missing optimization in that testcase too.
gcc/ChangeLog:
PR tree-optimization/89263
PR tree-optimization/99069
PR tree-optimization/20083
PR tree-optimization/94898
* match.pd: Add patterns to optimize `a ? onezero : onezero` with
one of the operands are constant.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase.
* gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase.
* gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt.
* gcc.dg/tree-ssa/phi-opt-27.c: New test.
* gcc.dg/tree-ssa/phi-opt-28.c: New test.
* gcc.dg/tree-ssa/phi-opt-29.c: New test.
* gcc.dg/tree-ssa/phi-opt-30.c: New test.
* gcc.dg/tree-ssa/phi-opt-31.c: New test.
* gcc.dg/tree-ssa/phi-opt-32.c: New test.
|
|
While working on `bool0 ? bool1 : bool2` I noticed that
zero_one_valued_p does not match on the constant zero
as in that case tree_nonzero_bits will return 0 and
that is different from 1.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (zero_one_valued_p): Match 0 integer constant
too.
|
|
This patch adds the support for match that was implemented for PR 87913 in phiopt.
It implements it by adding support to minmax_from_comparison for the check.
It uses the range information if available which allows to produce MIN/MAX expression
when comparing against the lower/upper bound of the range instead of lower/upper
of the type.
minmax-20.c is the new testcase which tests the ranges part.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* fold-const.cc (minmax_from_comparison): Add support for NE_EXPR.
* match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern):
Add ne as a possible cmp.
((a CMP b) ? minmax<a, c> : minmax<b, c> pattern): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/minmax-22.c: New test.
|
|
This moves the `a <= CST1 ? MAX<a, CST2> : a` optimization
from phiopt to match. It just adds a new pattern to match.pd.
There is one more change needed before being able to remove
minmax_replacement from phiopt.
A few notes on the testsuite changes:
* phi-opt-5.c is now able to optimize at phiopt1 so remove
the xfail.
* pr66726-4.c can be optimized during fold before phiopt1
so need to change the scanning.
* pr66726-5.c needs two phiopt passes currently to optimize
to the right thing, it needed 2 phiopt passes before, the cast
from int to unsigned char is the reason.
* pr66726-6.c is what the original pr66726-4.c was testing
before the fold was able to optimize it.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (`(a CMP CST1) ? max<a,CST2> : a`): New
pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail.
* gcc.dg/tree-ssa/pr66726-4.c: Change how scanning
works.
* gcc.dg/tree-ssa/pr66726-5.c: New test.
* gcc.dg/tree-ssa/pr66726-6.c: New test.
|
|
The Ada compiler gives a bogus warning:
storage_offset1.ads:16:52: warning: Constraint_Error will be raised at run
time [enabled by default]
Ironically enough, this occurs because of an intermediate conversion to an
unsigned type which is supposed to hide overflows but is counter-productive
for constants because TREE_OVERFLOW is always set for them, so it ends up
setting a bogus TREE_OVERFLOW when converting back to the original type.
The fix simply redirects INTEGER_CSTs to the other, direct path without the
intermediate conversion to the unsigned type.
gcc/
* match.pd ((T)P - (T)(P + A) -> -(T) A): Avoid artificial overflow
on constants.
gcc/testsuite/
* gnat.dg/specs/storage_offset1.ads: New test.
|
|
PR middle-end/109840 is a regression introduced by my recent patch to
fold popcount(bswap(x)) as popcount(x). When the bswap and the popcount
have the same precision, everything works fine, but this optimization also
allowed a zero-extension between the two. The oversight is that we need
to be strict with type conversions, both to avoid accidentally changing
the argument type to popcount, and also to reflect the effects of
argument/return-value promotion in the call to bswap, so this zero extension
needs to be preserved/explicit in the optimized form.
Interestingly, match.pd should (in theory) be able to narrow calls to
popcount and parity, removing a zero-extension from its argument, but
that is an independent optimization, that needs to check IFN_ support.
Many thanks to Andrew Pinski for his help/fixes with these transformations.
2023-05-24 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/109840
* match.pd <popcount optimizations>: Preserve zero-extension when
optimizing popcount((T)bswap(x)) and popcount((T)rotate(x,y)) as
popcount((T)x), so the popcount's argument keeps the same type.
<parity optimizations>: Likewise preserve extensions when
simplifying parity((T)bswap(x)) and parity((T)rotate(x,y)) as
parity((T)x), so that the parity's argument type is the same.
gcc/testsuite/ChangeLog
PR middle-end/109840
* gcc.dg/fold-parity-8.c: New test.
* gcc.dg/fold-popcount-11.c: Likewise.
|
|
On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
(x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
simplification actually relies on the (CST1 & CST2) simplification,
otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
running into
/* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
operands are another bit-wise operation with a common input. If so,
distribute the bit operations to save an operation and possibly two if
constants are involved. For example, convert
(A | B) & (A | C) into A | (B & C)
Further simplification will occur if B and C are constants. */
simplification which simplifies that
(x & CST2) | (CST1 & CST2) back to
CST2 & (x | CST1).
I went through all other places I could find where we have a simplification
with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
while the other spots aren't that severe (just trade 2 operations for
another 2 if the two constants don't simplify, rather than as in the above
case trading 2 ops for 3), I still think all those spots really intend
to optimize only if the 2 constants simplify.
So, the following patch adds to those a ! modifier to ensure that,
even at GENERIC that modifier means !EXPR_P which is exactly what we want
IMHO.
2023-05-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109505
* match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
Combine successive equal operations with constants,
(A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
operands.
* gcc.target/aarch64/sve/pr109505.c: New test.
|
|
This is version 2 of https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577394.html
which does not depend on adding gimple_truth_valued_p at this point.
Instead will use zero_one_valued_p which is already used for mult simplifications
to make sure that we only have [0,1] rather having the mistake of maybe having [-1,0]
as the range for signed bools.
This shows up in a few places in GCC itself but only at -O1, we miss the min/max conversion
because of PR 107888 (which I will be testing seperately).
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Thanks,
Andrew Pinski
PR tree-optimization/109424
gcc/ChangeLog:
* match.pd: Add patterns for min/max of zero_one_valued
values to `&`/`|`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bool-12.c: New test.
* gcc.dg/tree-ssa/bool-13.c: New test.
* gcc.dg/tree-ssa/minmax-20.c: New test.
* gcc.dg/tree-ssa/minmax-21.c: New test.
|
|
This adds a simple pattern to match.pd for `signbit(x) ? x : -x`
into abs<x>. This can be done for all types even ones that honor
signed zeros and NaNs because both signbit and - are considered
only looking at/touching the sign bit of those types and does
not trap either.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/109829
gcc/ChangeLog:
* match.pd: Add pattern for `signbit(x) !=/== 0 ? x : -x`.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/abs-3.c: New test.
* gcc.dg/tree-ssa/abs-4.c: New test.
|
|
After r14-673-gc0dd80e4c4c3, there was a check in the match
patterns which was checking the type is unsigned but
instead of using the type, the patch used the expression.
This adds the needed TREE_TYPE so get the correct answer and don't ICE.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR tree-optimization/109834
gcc/ChangeLog:
* match.pd (popcount(bswap(x))->popcount(x)): Fix up unsigned type checking.
(popcount(rotate(x,y))->popcount(x)): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr109834-1.c: New test.
* gcc.dg/tree-ssa/pr109834-1.c: New test.
|
|
The following adds another variant of address difference simplification.
The utility ptr_difference_const only handles constant differences
(we also cannot code generate anything else), so exposing a possible
POINTER_PLUS_EXPR in the match and computing the difference on the
base only makes it possible to handle one case of a variable offset.
This simplifies
(unsigned long) &MEM <char[3]> [(void *)&str + 2B] - (unsigned long) (&str + (_69 + 1))
down to (1 - (unsigned long) _69) during niter analysis, allowing
ranger to eliminate a condition later and avoiding a bogus
-Wstringop-overflow diagnostic for the testcase in the PR.
PR tree-optimization/109791
* match.pd (minus (convert ADDR_EXPR@0) (convert (pointer_plus @1 @2))):
New pattern.
(minus (convert (pointer_plus @1 @2)) (convert ADDR_EXPR@0)):
Likewise.
|
|
When using SWAR (SIMD in a register) techniques a comparison operation
within such a register can be made by using a combination of shifts,
bitwise and and multiplication. If code using this scheme is
vectorized then there is potential to replace all these operations
with a single vector comparison, by reinterpreting the vector types to
match the width of the SWAR register.
For example, for the test function packed_cmp_16_32, the original
generated code is:
ldr q0, [x0]
add w1, w1, 1
ushr v0.4s, v0.4s, 15
and v0.16b, v0.16b, v2.16b
shl v1.4s, v0.4s, 16
sub v0.4s, v1.4s, v0.4s
str q0, [x0], 16
cmp w2, w1
bhi .L20
with this pattern the above can be optimized to:
ldr q0, [x0]
add w1, w1, 1
cmlt v0.8h, v0.8h, #0
str q0, [x0], 16
cmp w2, w1
bhi .L20
The effect is similar for x86-64.
Bootstrapped and reg-tested for x86 and aarch64.
gcc/ChangeLog:
* match.pd: simplify vector shift + bit_and + multiply.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/swar_to_vec_cmp.c: New test.
Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
|
|
This patch teaches match.pd to simplify popcount(X&Y)+popcount(X|Y) as
popcount(X)+popcount(Y), and the related simplifications that
popcount(X)+popcount(Y)-popcount(X&Y) is popcount(X|Y). As surprising
as it might seem, this idiom is common in cheminformatics codes
(for Tanimoto coefficient calculations).
2023-05-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd <popcount optimizations>: Simplify popcount(X|Y) +
popcount(X&Y) as popcount(X)+popcount(Y). Likewise, simplify
popcount(X)+popcount(Y)-popcount(X&Y) as popcount(X|Y), and
vice versa.
gcc/testsuite/ChangeLog
* gcc.dg/fold-popcount-8.c: New test case.
* gcc.dg/fold-popcount-9.c: Likewise.
* gcc.dg/fold-popcount-10.c: Likewise.
|
|
This is the latest iteration of my patch from August 2020
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552391.html
incorperating feedback and suggestions from reviewers.
This patch to match.pd optimizes away bit permutation operations,
specifically bswap and rotate, in calls to popcount and parity.
2023-05-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* match.pd <popcount optimizations>: Simplify popcount(bswap(x))
as popcount(x). Simplify popcount(rotate(x,y)) as popcount(x).
<parity optimizations>: Simplify parity(bswap(x)) as parity(x).
Simplify parity(rotate(x,y)) as parity(x).
gcc/testsuite/ChangeLog
* gcc.dg/fold-parity-6.c: New test.
* gcc.dg/fold-parity-7.c: Likewise.
* gcc.dg/fold-popcount-6.c: Likewise.
* gcc.dg/fold-popcount-7.c: Likewise.
|
|
There is already an `ABS<a> == 0` to `a == 0` pattern,
this just extends that to ABSU too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/109722
gcc/ChangeLog:
* match.pd: Extend the `ABS<a> == 0` pattern
to cover `ABSU<a> == 0` too.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/abs-1.c: New test.
|
|
This ports the clrsb builtin part of builtin_zero_pattern
to match.pd. A simple pattern to port.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New
pattern.
|
|
I accidently messed up these patterns so the comparison
against 0 and the arguments was not matching up when they
need to be.
I committed this as obvious after a bootstrap/test on x86_64-linux-gnu
PR tree-optimization/109702
gcc/ChangeLog:
* match.pd: Fix "a != 0 ? FUNC(a) : CST" patterns
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-25b.c: New test.
|
|
This adds the patterns for
POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
For "a != 0 ? FUNC(a) : CST".
CLRSB, CLRSBL, and CLRSBLL will be moved next.
Note this is not enough to remove
cond_removal_in_builtin_zero_pattern as we need to handle
the case where there is an NOP_CONVERT inside the conditional
to move out of the condition inside match_simplify_replacement.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd: Add patterns for "a != 0 ? FUNC(a) : CST"
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
|
|
This patch converts two_value_replacement function
into a match.pd pattern.
It is a direct translation with only one minor change,
does not check for the {0,+-1} case as that is handled
before in match.pd so there is no reason to do the extra
check for it.
OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.
gcc/ChangeLog:
PR tree-optimization/100958
* tree-ssa-phiopt.cc (two_value_replacement): Remove.
(pass_phiopt::execute): Don't call two_value_replacement.
* match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to
handle what two_value_replacement did.
|
|
This adds a few patterns from phiopt's minmax_replacement
for (A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C> .
It is progress to remove minmax_replacement from phiopt.
There are still some more cases dealing with constants on the
edges (0/INT_MAX) to handle in match.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd: Add patterns for
"(A CMP B) ? MIN/MAX<A, C> : MIN/MAX <B, C>".
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/minmax-16.c: Update testcase slightly.
* gcc.dg/tree-ssa/split-path-1.c: Also disable tree-loop-if-convert
as that now does the combining.
|
|
This factors out some of the code from the min/max detection
from match.pd into a function so it can be reused in other
places. This is mainly used to detect the conversions
of >= to > which causes the integer values to be changed by
one.
Changes since v1:
* factor out the checks for INTEGER_CSTs so it is more obvious.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd: Factor out the deciding the min/max from
the "(cond (cmp (convert1? x) c1) (convert2? x) c2)"
pattern to ...
* fold-const.cc (minmax_from_comparison): this new function.
* fold-const.h (minmax_from_comparison): New prototype.
|
|
When we simplify a BIT_FIELD_REF of a CTOR like { _1, _2, _3, _4 }
and attempt to produce (view converted) { _1, _2 } for a selected
subset we fail to realize this cannot be done from match.pd since
we have no way to write the resulting CTOR "operation" and the
built CTOR { _1, _2 } isn't a GIMPLE value.
This kind of simplifications have to be done in forwprop (or would
need a match.pd syntax extension) where we can split out the CTOR
to a separate stmt.
The following disables this particular simplification when we are
simplifying GIMPLE. With enhanced IL checking this otherwise
causes ICEs in the testsuite from vectorized code.
* match.pd (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2): Do not
create a CTOR operand in the result when simplifying GIMPLE.
|
|
gcc/ChangeLog:
* builtins.cc (expand_builtin_strnlen): Rewrite deprecated irange
API uses to new API.
* gimple-predicate-analysis.cc (find_var_cmp_const): Same.
* internal-fn.cc (get_min_precision): Same.
* match.pd: Same.
* tree-affine.cc (expr_to_aff_combination): Same.
* tree-data-ref.cc (dr_step_indicator): Same.
* tree-dfa.cc (get_ref_base_and_extent): Same.
* tree-scalar-evolution.cc (iv_can_overflow_p): Same.
* tree-ssa-phiopt.cc (two_value_replacement): Same.
* tree-ssa-pre.cc (insert_into_preds_of_block): Same.
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Same.
* tree-ssa-strlen.cc (compare_nonzero_chars): Same.
* tree-switch-conversion.cc (bit_test_cluster::emit): Same.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Same.
* tree.cc (get_range_pos_neg): Same.
|
|
The following testcase ICEs on x86, foo function since my r14-22
improvement, but bar already since r13-4122. The problem is the same,
in the if expression related_vector_mode is called and that starts with
gcc_assert (VECTOR_MODE_P (vector_mode));
but nothing in the fneg/fadd match.pd pattern actually checks if the
VEC_PERM type has VECTOR_MODE_P (vec_mode). In this case it has BLKmode
and so it ICEs.
The following patch makes sure we don't ICE on it.
2023-04-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109583
* match.pd (fneg/fadd simplify): Don't call related_vector_mode
if vec_mode is not VECTOR_MODE_P.
* gcc.dg/pr109583.c: New test.
|
|
match.pd has mostly for AArch64 an optimization in which it optimizes
certain forms of __builtin_shuffle of x + y and x - y vectors into
fneg using twice as wide element type so that every other sign is changed,
followed by fadd.
The following patch extends that optimization, so that it can handle
other forms as well, using the same fneg but fsub instead of fadd.
As the plus is commutative and minus is not and I want to handle
vec_perm with plus minus and minus plus order preferrably in one
pattern, I had to do the matching operand checks by hand.
2023-04-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109240
* match.pd (fneg/fadd): Rewrite such that it handles both plus as
first vec_perm operand and minus as second using fneg/fadd and
minus as first vec_perm operand and plus as second using fneg/fsub.
* gcc.target/aarch64/simd/addsub_2.c: New test.
* gcc.target/aarch64/sve/addsub_2.c: New test.
|
|
Here we're failing to detect a signed overflow with -O because match.pd,
since r8-1516, transforms
c = (a + 1) - (int) (short int) b;
into
c = (int) ((unsigned int) a + 4294946117);
wrongly eliding the overflow. This kind of problems is usually
avoided by using TYPE_OVERFLOW_SANITIZED in the appropriate place.
The first match.pd hunk in the patch fixes it. I've constructed
a testcase for each of the surrounding cases as well. Then I
noticed that fold_binary_loc/associate has the same problem, so I've
added a TYPE_OVERFLOW_SANITIZED there as well (it may be too coarse,
sorry). Then I found yet another problem, but instead of fixing it
now I've opened 109134. I could probably go on and find a dozen more.
PR sanitizer/109107
gcc/ChangeLog:
* fold-const.cc (fold_binary_loc): Use TYPE_OVERFLOW_SANITIZED
when associating.
* match.pd: Use TYPE_OVERFLOW_SANITIZED.
gcc/testsuite/ChangeLog:
* c-c++-common/ubsan/pr109107-1.c: New test.
* c-c++-common/ubsan/pr109107-2.c: New test.
* c-c++-common/ubsan/pr109107-3.c: New test.
* c-c++-common/ubsan/pr109107-4.c: New test.
|
|
The following testcase is miscompiled on aarch64-linux. match.pd
has a simplification for addsub, where it negates one of the vectors
in twice as large floating point element vector (effectively negating every
other element) and then doing addition.
But a requirement for that is that the permutation picks the right elements,
in particular 0, nelts+1, 2, nelts+3, 4, nelts+5, ...
The pattern tests this with sel.series_p (0, 2, 0, 2) check, which as
documented verifies that the even elements of the permutation mask are
identity, but doesn't say anything about the others.
The following patch fixes it by also checking that the odd elements
start at nelts + 1 with the same step of 2.
2023-03-26 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109230
* match.pd (fneg/fadd simplify): Verify also odd permutation indexes.
* gcc.dg/pr109230.c: New test.
|
|
This removes the "#if GIMPLE" around the
"1 - a" pattern as ssa_name_has_boolean_range
(get_range_query) works when cfun is a nullptr.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd: Remove #if GIMPLE around the
"1 - a" pattern
|
|
For bool values, it is easier to deal with
xor 1 rather than having 1 - a. This is because
we are more likely to simplify the xor further in many
cases.
This is a special case for (MASK - b) where MASK
is a powerof2 - 1 and b <= MASK but only for bool
ranges ([0,1]) as that is the main case where the
difference comes into play.
Note this is enabled for gimple folding only
as the ranges are only know while doing gimple
folding and cfun is not always set when fold is called.
OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.
gcc/ChangeLog:
PR tree-optimization/108355
PR tree-optimization/96921
* match.pd: Add pattern for "1 - bool_val".
gcc/testsuite/ChangeLog:
PR tree-optimization/108355
PR tree-optimization/96921
* gcc.dg/tree-ssa/bool-minus-1.c: New test.
* gcc.dg/tree-ssa/bool-minus-2.c: New test.
* gcc.dg/tree-ssa/pr108354-1.c: New test.
|
|
[PR108688]
On Thu, Feb 09, 2023 at 09:16:17AM +0100, Richard Biener via Gcc-patches wrote:
> Hmm. Can we handle the case of the extraction exactly covering the
> insertion separately then and simplify to plain @1?
I was suggesting that in the PR. Here it is as an incremental patch
on top of Andrew's patch.
On the newly added testcase the ifcvt-folding difference without/with the
incremental patch is:
--- pr108688.c.171t.ifcvt_ 2023-02-09 10:47:30.169916845 +0100
+++ pr108688.c.171t.ifcvt 2023-02-09 10:48:44.942793453 +0100
@@ -25,6 +25,8 @@ Number of blocks in CFG: 11
Number of blocks to update: 5 ( 45%)
+Applying pattern match.pd:7487, gimple-match.cc:243200
+Applying pattern match.pd:3987, gimple-match.cc:75423
Matching expression match.pd:1677, gimple-match.cc:209
Applying pattern match.pd:1733, gimple-match.cc:109481
Matching expression match.pd:2393, gimple-match.cc:852
@@ -70,7 +72,6 @@ void foo ()
signed char _29;
<unnamed-signed:7> _30;
unsigned int ivtmp_33;
- <unnamed-signed:7> _ifc__35;
unsigned char _ifc__37;
unsigned char _ifc__38;
unsigned char _ifc__39;
@@ -91,8 +92,7 @@ void foo ()
_2 = (<unnamed-signed:7>) a.0_1;
_ifc__38 = u.D.2741;
_ifc__39 = BIT_INSERT_EXPR <_ifc__38, _2, 0 (7 bits)>;
- _ifc__35 = BIT_FIELD_REF <_ifc__39, 7, 0>;
- _4 = (signed char) _ifc__35;
+ _4 = (signed char) _2;
b.1_5 = b;
_6 = (signed char) b.1_5;
_7 = _4 ^ _6;
2023-02-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108688
* match.pd (bit_field_ref [bit_insert]): Simplify BIT_FIELD_REF
of BIT_INSERT_EXPR extracting exactly all inserted bits even
when without mode precision. Formatting fixes.
* gcc.c-torture/compile/pr108688-1.c: Add PR number as comment.
* gcc.dg/pr108688.c: New test.
|
|
integral type [PR108688]
The same problem as PR 88739 has crept in but
this time in match.pd when simplifying bit_field_ref of
an bit_insert. That is we are generating a BIT_FIELD_REF
of a non-mode-precision integral type.
PR tree-optimization/108688
* match.pd (bit_field_ref [bit_insert]): Avoid generating
BIT_FIELD_REFs of non-mode-precision integral operands.
* gcc.c-torture/compile/pr108688-1.c: New test.
|
|
This patch is an optimisation, but it's also a prerequisite for
fixing PR96373 without regressing vect-xorsign_exec.c.
Currently the vectoriser vectorises:
for (i = 0; i < N; i++)
r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
as two unconditional operations (copysign and mult).
tree-ssa-math-opts.cc later combines them into an "xorsign" function.
This works for both Advanced SIMD and SVE.
However, with the fix for PR96373, the vectoriser will instead
generate a conditional multiplication (IFN_COND_MUL). Something then
needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional
xorsign. Three obvious options were:
(1) Extend tree-ssa-math-opts.cc.
(2) Do the fold in match.pd.
(3) Leave it to rtl combine.
I'm against (3), because this isn't a target-specific optimisation.
(1) would be possible, but would involve open-coding a lot of what
match.pd does for us. And, in contrast to doing the current
tree-ssa-math-opts.cc optimisation in match.pd, there should be
no danger of (2) happening too early. If we have an IFN_COND_MUL
then we're already past the stage of simplifying the original
source code.
There was also a choice between adding a conditional xorsign ifn
and simply open-coding the xorsign. The latter seems simpler,
and means less boiler-plate for target-specific code.
The signed_or_unsigned_type_for change is needed to make sure
that we stay in "SVE space" when doing the optimisation on 128-bit
fixed-length SVE.
gcc/
PR tree-optimization/96373
* tree.h (sign_mask_for): Declare.
* tree.cc (sign_mask_for): New function.
(signed_or_unsigned_type_for): For vector types, try to use the
related_int_vector_mode.
* genmatch.cc (commutative_op): Handle conditional internal functions.
* match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and.
gcc/testsuite/
PR tree-optimization/96373
* gcc.target/aarch64/sve/cond_xorsign_1.c: New test.
* gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.
|
|
This patch is an update/tweak of Andrew Pinski's two patches for
PR tree-optimization/92342, that were originally posted by in November:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585111.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585112.html
Technically, the first of those was approved by Richard Biener, though
never committed, and my first thought was to simply push it for Andrew,
but the review of the second piece expressed concerns over comparisons
in non-integral modes, where the result may not be zero-one valued.
Indeed both transformations misbehave in the presence of vector mode
comparisons (these transformations are already implemented for
vec_cond elsewhere in match.pd), so my minor contribution is to limit
these new transformations to scalars, by testing that both the operands
and results are INTEGRAL_TYPE_P.
2023-01-12 Andrew Pinski <apinski@marvell.com>
Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog:
PR tree-optimization/92342
* match.pd ((m1 CMP m2) * d -> (m1 CMP m2) ? d : 0):
Use tcc_comparison and :c for the multiply.
(b & -(a CMP c) -> (a CMP c)?b:0): New pattern.
gcc/testsuite/ChangeLog:
PR tree-optimization/92342
* gcc.dg/tree-ssa/andnegcmp-1.c: New test.
* gcc.dg/tree-ssa/andnegcmp-2.c: New test.
* gcc.dg/tree-ssa/multcmp-1.c: New test.
* gcc.dg/tree-ssa/multcmp-2.c: New test.
|
|
|
|
Even though this PR was reported with an ubsan issue, the problem is
tree_nonzero_bits is being called with an expression which is a vector type.
This fixes three patterns I noticed which does that.
And adds a testcase for one of the patterns.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions
gcc/ChangeLog:
PR tree-optimization/105532
* match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral
type before calling tree_nonzero_bits.
(popcount(X) + popcount(Y)): Likewise.
(popcount(X&C1)): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/vector-shift-1.c: New test.
|
|
one another.
This optimizes the following sequence
((a < b) & c) | ((a >= b) & d)
into
(a < b ? c : d) & 1
for scalar and on vector we can omit the & 1.
Also recognizes
(-(a < b) & c) | (-(a >= b) & d)
into
a < b ? c : d
This changes the code generation from
zoo2:
cmp w0, w1
cset w0, lt
cset w1, ge
and w0, w0, w2
and w1, w1, w3
orr w0, w0, w1
ret
into
cmp w0, w1
csel w0, w2, w3, lt
and w0, w0, 1
ret
and significantly reduces the number of selects we have to do in the vector
code.
gcc/ChangeLog:
* match.pd: Add new rule.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/if-compare_1.c: New test.
* gcc.target/aarch64/if-compare_2.c: New test.
|
|
This reverts the part that substitutes from the definition of an
SSA name to the capture, thus ADDR_EXPR@0 eventually yielding
&y_1->a[i_2] instead of _3. That's because I didn't think of
how to deal with substituting @0 in the result pattern. So
the following re-instantiates the SSA def CONSTRUCTOR handling
and in the ADDR_EXPR helpers used by match.pd handles SSA names
defined to ADDR_EXPRs transparently.
* genmatch.cc (dt_simplify::gen): Revert last change.
* match.pd: Revert simplification of CONSTUCTOR leaf handling.
(&x cmp SSA_NAME): Handle ADDR_EXPR in SSA defs.
* fold-const.cc (split_address_to_core_and_offset): Handle
ADDR_EXPRs in SSA defs.
(address_compare): Likewise.
|