Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
PR tree-optimization/110600
* cfgloopmanip.cc (scale_loop_profile): Add mising profile_dump check.
gcc/testsuite/ChangeLog:
PR tree-optimization/110600
* gcc.c-torture/compile/pr110600.c: New test.
|
|
fold_ctor_reference attempts to use a recursive local processing in order
to call native_encode_expr on the leaf nodes of the constructor, before
falling back to calling native_encode_initializer if this fails.
There are a couple of issues related to endianness present in it:
1) it does not specifically handle integral bit-fields; now these are left
justified on big-endian platforms so cannot be treated like ordinary fields.
2) it does not check that the constructor uses the native storage order.
gcc/
* gimple-fold.cc (fold_array_ctor_reference): Fix head comment.
(fold_nonarray_ctor_reference): Likewise. Specifically deal
with integral bit-fields.
(fold_ctor_reference): Make sure that the constructor uses the
native storage order.
gcc/testsuite/
* gcc.c-torture/execute/20230630-1.c: New test.
* gcc.c-torture/execute/20230630-2.c: Likewise.
* gcc.c-torture/execute/20230630-3.c: Likewise
* gcc.c-torture/execute/20230630-4.c: Likewise
|
|
Early inliner currently skips always_inline functions and moreover we ignore
calls from always_inline in ipa_reverse_postorder. This leads to disabling
most of propagation done using early optimization that is quite bad when
early inline functions are not leaf functions, which is now quite common
in libstdc++.
This patch instead of fully disabling the inline checks calls in callee.
I am quite conservative about what can be inlined as this patch is bit
touchy anyway. To avoid problems with always_inline being optimized
after early inline I extended inline_always_inline_functions to lazilly
compute fnsummary when needed.
gcc/ChangeLog:
PR middle-end/110334
* ipa-fnsummary.h (ipa_fn_summary): Add
safe_to_inline_to_always_inline.
* ipa-inline.cc (can_early_inline_edge_p): ICE
if SSA is not built; do cycle checking for
always_inline functions.
(inline_always_inline_functions): Be recrusive;
watch for cycles; do not updat overall summary.
(early_inliner): Do not give up on always_inlines.
* ipa-utils.cc (ipa_reverse_postorder): Do not skip
always inlines.
gcc/testsuite/ChangeLog:
PR middle-end/110334
* g++.dg/opt/pr66119.C: Disable early inlining.
* gcc.c-torture/compile/pr110334.c: New test.
* gcc.dg/tree-ssa/pr110334.c: New test.
|
|
This testcase was fixed after r14-2135-gd915762ea9043da85 and
there was no testcase for it before so adding one is a good thing.
Committed as obvious after testing the testcase to make sure it works.
gcc/testsuite/ChangeLog:
PR tree-optimization/110444
* gcc.c-torture/compile/pr110444-1.c: New test.
|
|
The manual references asm goto as being implicitly volatile already
and that was done when asm goto could not have outputs. When outputs
were added to `asm goto`, only asm goto without outputs were still being
marked as volatile. Now some parts of GCC decide, removing the `asm goto`
is ok if the output is not used, though not updating the CFG (this happens
on both the RTL level and the gimple level). Since the biggest user of `asm goto`
is the Linux kernel and they expect them to be volatile (they use them to
copy to/from userspace), we should just mark the inline-asm as volatile.
OK? Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/110420
PR middle-end/103979
PR middle-end/98619
gcc/ChangeLog:
* gimplify.cc (gimplify_asm_expr): Mark asm with labels as volatile.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/asmgoto-6.c: New test.
|
|
The following testcase ICEs, because I misremembered what the return value
from match_arith_overflow is. It isn't true if __builtin_*_overflow was
matched, but it is true only in the BIT_NOT_EXPR case if stmt was removed.
So, if match_arith_overflow matches something, gsi_stmt (gsi) will not
be stmt and match_uaddc_usubc will be confused and can ICE.
The following patch fixes it by checking if gsi_stmt (gsi) == stmt,
in that case we know it is still a PLUS_EXPR/MINUS_EXPR and we can try to
pattern match it further as UADDC/USUBC.
2023-06-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110271
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children)
<case PLUS_EXPR>: Ignore return value from match_arith_overflow,
instead call match_uaddc_usubc only if gsi_stmt (gsi) is still stmt.
* gcc.c-torture/compile/pr110271.c: New test.
|
|
Since the combining of sin/cos into cexpi is depedent
on the target, this adds another testcase which had failed (earlier in
evpr rather than vrp2) that will fail on all targets rather than
ones which have sincos or C99 math functions.
Committed as obvious after a quick test.
gcc/testsuite/ChangeLog:
PR tree-optimization/110266
* gcc.c-torture/compile/pr110266.c: New test.
|
|
clobbers
So as Georg-Johann discusses in the BZ, reload_cse_move2add can generate
incorrect code when optimizing code with clobbers. Specifically in the
case where we try to optimize a sequence of 4 operations down to 3
operations we can reset INSN to the next instruction and continue the loop.
That skips the code to invalidate objects based on things like REG_INC
nodes, stack pushes and most importantly clobbers attached to the current
insn.
This patch factors all of the invalidation code used by reload_cse_move2add
into a new function and calls it at the appropriate time.
Georg-Johann has confirmed this patch fixes his avr bug and I've had it in
my tester over the weekend. It's bootstrapped and regression tested on
aarch64, m68k, sh4, alpha and hppa. It's also regression tested successfully
on a wide variety of other targets.
gcc/
PR rtl-optimization/101188
* postreload.cc (reload_cse_move2add_invalidate): New function,
extracted from...
(reload_cse_move2add): Call reload_cse_move2add_invalidate.
gcc/testsuite
PR rtl-optimization/101188
* gcc.c-torture/execute/pr101188.c: New test
|
|
So for the attached testcase, we assumed that zero_one_valued_p would
be the value [0,1] but currently zero_one_valued_p matches also
signed 1 bit integers.
This changes that not to match that and fixes the 2 new testcases at
all optimization levels.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note the GCC 13 patch will be slightly different due to the changes
made to zero_one_valued_p.
PR tree-optimization/110165
PR tree-optimization/110166
gcc/ChangeLog:
* match.pd (zero_one_valued_p): Don't accept
signed 1-bit integers.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr110165-1.c: New test.
* gcc.c-torture/execute/pr110166-1.c: New test.
|
|
gcc/testsuite/
PR testsuite/52641
* gcc.c-torture/compile/pr108892.c: Require int32.
* gcc.c-torture/compile/pr98199.c: Require int32plus.
* gcc.dg/analyzer/call-summaries-pr107072.c: Same.
* gcc.dg/analyzer/null-deref-pr105755.c: Same.
* gcc.dg/tree-ssa/pr102232.c: Same.
* gcc.dg/tree-ssa/pr105860.c: Same.
* gcc.dg/tree-ssa/pr96730.c: Same.
* gcc.dg/tree-ssa/pr96779-disabled.c: Same.
* gcc.dg/tree-ssa/pr96779.c: Same.
* gcc.dg/tree-ssa/pr98513.c: Same.
* gcc.dg/tree-ssa/ssa-sink-18.c
* gcc.dg/analyzer/coreutils-cksum-pr108664.c: Require int32plus,
size24plus.
* gcc.dg/analyzer/doom-s_sound-pr108867.c: Require size32plus.
* gcc.dg/analyzer/malloc-CWE-590-examples.c: Same.
* gcc.dg/debug/btf/btf-bitfields-4.c: Same.
* gcc.dg/tree-ssa/pr93435.c: Same.
* gcc.dg/analyzer/null-deref-pr102671-1.c: Require ptr_eq_long:
* gcc.dg/analyzer/null-deref-pr102671-2.c: Same.
* gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Same.
* gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c:
Same.
* gcc.dg/tree-ssa/pr103345.c: Use uint32_t.
* gcc.dg/tree-ssa/ssa-ccp-41.c [sizeof(int)==2]: Same.
* gcc.dg/tree-ssa/pr109031-1.c: Use uint16_t, uint32_t.
* gcc.dg/tree-ssa/pr109031-2.c: Same.
* gcc.dg/Warray-bounds-49.c (dg-warning): Discriminate int != short.
* gcc.dg/Warray-bounds-52.c (dg-warning): Discriminate avr.
* gcc.dg/Warray-bounds-33.c: Skip target avr.
* gcc.dg/analyzer/fd-access-mode-target-headers.c: Same.
* gcc.dg/analyzer/flex-with-call-summaries.c: Same.
* gcc.dg/analyzer/isatty-1.c: Same.
* gcc.dg/analyzer/pipe-glibc.c: Same.
|
|
gcc/testsuite/
PR testsuite/52641
* c-c++-common/pr19807-2.c: Use __SIZEOF_INT__ instead of 4.
* gcc.c-torture/compile/pr103813.c: Require size32plus.
* gcc.c-torture/execute/pr108498-2.c: Same.
* gcc.c-torture/compile/pr96426.c: Condition on
__SIZEOF_LONG_LONG__ == __SIZEOF_DOUBLE__.
* gcc.c-torture/execute/pr103417.c: Require int32plus.
* gcc.dg/pr104198.c: Same.
* gcc.dg/pr21137.c: Same.
* gcc.dg/pr88905.c: Same.
* gcc.dg/pr90838.c: Same.
* gcc.dg/pr97317.c: Same.
* gcc.dg/pr100292.c: Require int32.
* gcc.dg/pr101008.c: Same.
* gcc.dg/pr96542.c: Same.
* gcc.dg/pr96674.c: Same.
* gcc.dg/pr97750.c: Require ptr_eq_long.
|
|
The problem is I used expand_expr with the target but
we don't want to use the target here as it is the wrong
mode for the original expression. The testcase would ICE
deap down while trying to do a move to use the target.
Anyways just calling expand_expr with NULL_EXPR fixes
the issue.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR middle-end/109919
gcc/ChangeLog:
* expr.cc (expand_single_bit_test): Don't use the
target for expand_expr.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr109919-1.c: New test.
|
|
After r14-673-gc0dd80e4c4c3, there was a check in the match
patterns which was checking the type is unsigned but
instead of using the type, the patch used the expression.
This adds the needed TREE_TYPE so get the correct answer and don't ICE.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR tree-optimization/109834
gcc/ChangeLog:
* match.pd (popcount(bswap(x))->popcount(x)): Fix up unsigned type checking.
(popcount(rotate(x,y))->popcount(x)): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr109834-1.c: New test.
* gcc.dg/tree-ssa/pr109834-1.c: New test.
|
|
While working on improving min/max detection, this
code (which is reduced from worse_state in ipa-pure-const.cc)
was being miscompiled. Since there was no testcase in the
testsuite yet for this, this patch adds one.
Committed as obvious after testing the testcase via:
make check-gcc RUNTESTFLAGS="execute.exp=20230510-1.c"
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/20230510-1.c: New test.
|
|
While I was writting a match.pd patch, I can across GCC was being miscompiled
but no testcase was failing. So this adds that testcase.
Committed after testing on x86_64 with
make check-gcc RUNTESTFLAGS="execute.exp=20230509-1.c"
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/20230509-1.c: New test.
|
|
[PR109778]
The following testcase is miscompiled, because bitwise ccp2 handles
a rotate with a signed type incorrectly.
Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3
arguments, all other callers just rotate in the right precision and
I think work correctly. ccp works with widest_ints and so rotations
by the excessive precision certainly don't match what it wants
when it sees a rotate in some specific bitsize. Still, if it is
unsigned rotate and the widest_int is zero extended from width,
the functions perform left shift and logical right shift on the value
and then at the end zero extend the result of left shift and uselessly
also the result of logical right shift and return | of that.
On the testcase we the signed char rrotate by 4 argument is
CONSTANT -75 i.e. 0xffffffff....fffffb5 with mask 2.
The mask is correctly rotated to 0x20, but because the 8-bit constant
is sign extended to 192-bit one, the logical right shift by 4 doesn't
yield expected 0xb, but gives 0xfffffffffff....ffffb, and then
return wi::zext (left, width) | wi::zext (right, width); where left is
0xfffffff....fb50, so we return 0xfb instead of the expected
0x5b.
The following patch fixes that by doing the zero extension in case of
the right variable before doing wi::lrshift rather than after it.
Also, wi::[lr]rotate widht width < precision always zero extends
the result. I'm afraid it can't do better because it doesn't know
if it is done for an unsigned or signed type, but the caller in this
case knows that very well, so I've done the extension based on sgn
in the caller. E.g. 0x5b rotated right (or left) by 4 with width 8
previously gave 0xb5, but sgn == SIGNED in widest_int it should be
0xffffffff....fffb5 instead.
2023-05-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109778
* wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on
wi::zext (x, width) rather than x if width != precision, rather
than using wi::zext (right, width) after the shift.
* tree-ssa-ccp.cc (bit_value_binop): Call wi::ext on the results
of wi::lrotate or wi::rrotate.
* gcc.c-torture/execute/pr109778.c: New test.
|
|
minmax_replacement
This moves the check to make sure on the diamond shaped form bbs that
the the two middle bbs are only for that diamond shaped form earlier
in the shared code.
Also remove the redundant check for single_succ_p since that was already
done before hand.
The next patch will simplify the code even further and remove redundant
checks.
PR tree-optimization/109604
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Move the
diamond form check from ...
(minmax_replacement): Here.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr109604-1.c: New test.
* gcc.c-torture/compile/pr109604-2.c: New test.
|
|
The following testcase is miscompiled on riscv since the addition
of *mvconst_internal define_insn_and_split.
We have:
(insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame)
(const_int -64 [0xffffffffffffffc0])) [2 S4 A128])
(reg:SI 166)) "pr109040.c":9:11 178 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 166)
(nil)))
(insn 39 36 40 2 (set (reg:SI 171)
(zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame)
(const_int -64 [0xffffffffffffffc0])) [0 S2 A128]))) "pr109040.c":9:11 111 {*zero_extendhisi2}
(nil))
and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles
even different modes like in the above case, and so it optimizes it into:
(insn 47 35 39 2 (set (reg:HI 175)
(subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {*movhi_internal}
(expr_list:REG_DEAD (reg:SI 166)
(nil)))
(insn 39 47 40 2 (set (reg:SI 171)
(zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {*zero_extendhisi2}
(expr_list:REG_DEAD (reg:HI 175)
(nil)))
Pseudo 166 is result of AND with 0x8084c constant (forced into a register).
Combine attempts to combine the AND with the insn 47 above created by DSE,
and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all
the subword operations are actually done on word mode into:
(set (subreg:SI (reg:HI 175) 0)
(and:SI (reg:SI 167 [ m ])
(reg:SI 168)))
and later on the ZERO_EXTEND is thrown away.
We then see
(and:SI (subreg:SI (reg:HI 175) 0) (const_int 0x84c))
and optimize that into
(subreg:SI (and:HI (reg:HI 175) (const_int 0x84c)) 0)
which is still fine, in WORD_REGISTER_OPERATIONS the AND in HImode
will set all upper bits up to BITS_PER_WORD to zeros.
But later on simplify_binary_operation_1 or simplify_and_const_int_1
sees that because nonzero_bits ((reg:HI 175), HImode) == 0x84c, we can
optimize the AND into (reg:HI 175). That isn't correct, because while
the low 16 bits of that REG are known to have all bits but 0x84c cleared,
we don't know that all the upper 16 bits are all clear as well.
So, for WORD_REGISTER_OPERATIONS for integral modes smaller than word mode,
we need to check all bits from word_mode in nonzero_bits for the optimizations.
2023-04-14 Jeff Law <jlaw@ventanamicro.com>
Jakub Jelinek <jakub@redhat.com>
PR target/108947
PR target/109040
* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
smaller than word_mode.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
<case AND>: Likewise.
* gcc.dg/pr108947.c: New test.
* gcc.c-torture/execute/pr109040.c: New test.
|
|
RTL structure of an insn
So as mentioned in the PR the underlying issue here is combine changes the form of an existing insn, but fails to force re-recognition. As a result other parts of the compiler blow up.
> /* Temporarily replace the set's source with the
> contents of the REG_EQUAL note. The insn will
> be deleted or recognized by try_combine. */
> rtx orig_src = SET_SRC (set); rtx orig_dest = SET_DEST (set); if (GET_CODE (SET_DEST (set)) == ZERO_EXTRACT)
> SET_DEST (set) = XEXP (SET_DEST (set), 0);
> SET_SRC (set) = note;
> i2mod = temp;
> i2mod_old_rhs = copy_rtx (orig_src);
> i2mod_new_rhs = copy_rtx (note);
> next = try_combine (insn, i2mod, NULL, NULL,
> &new_direct_jump_p, last_combined_insn);
> i2mod = NULL;
> if (next)
> {
> statistics_counter_event (cfun, "insn-with-note combine", 1);
> goto retry;
> } SET_SRC (set) = orig_src;
> SET_DEST (set) = orig_dest;
This code replaces the SET_SRC of an insn in the RTL stream with the contents of a REG_EQUAL note. So given an insn like this:
> (insn 122 117 127 2 (set (reg:DI 157 [ _46 ])
> (ior:DI (reg:DI 200)
> (reg:DI 251))) "j.c":14:5 -1
> (expr_list:REG_EQUAL (const_int 25769803782 [0x600000006])
> (nil)))
It replaces the (ior ...) with a (const_int ...). The resulting insn is passed to try_combine which will try to recognize it, then use it in a combination attempt. Recognition succeeds with the special define_insn_and_split pattern in the risc-v backend resulting in:
> (insn 122 117 127 2 (set (reg:DI 157 [ _46 ])
> (const_int 25769803782 [0x600000006])) "j.c":14:5 177 {*mvconst_internal}
> (expr_list:REG_EQUAL (const_int 25769803782 [0x600000006])
> (nil)))
This is as-expected. Now assume we were unable to combine anything, so try_combine returns NULL_RTX. The quoted code above restores SET_SRC (and SET_DEST) resulting in:
> (insn 122 117 127 2 (set (reg:DI 157 [ _46 ])
> (ior:DI (reg:DI 200)
> (reg:DI 251))) "j.c":14:5 177 {*mvconst_internal}
> (expr_list:REG_EQUAL (const_int 25769803782 [0x600000006])
> (nil)))
But this doesn't get re-recognized and we ICE later in LRA.
The fix is trivial, reset the INSN_CODE to force re-recognition in the case where try_combine fails.
PR target/108892
gcc/
* combine.cc (combine_instructions): Force re-recognition when
after restoring the body of an insn to its original form.
gcc/testsuite/
* gcc.c-torture/compile/pr108892.c: New test.
|
|
I've missed one of my recent range-op-float.cc changes (likely the
r13-6967 one) caused
FAIL: libphobos.phobos/std/math/algebraic.d execution test
FAIL: libphobos.phobos_shared/std/math/algebraic.d execution test
regressions, distilled into a C testcase below.
In the testcase, we have
!(u >= v)
condition where both u and v are results of fabs*, which guards
t1 = u u<= __FLT_MAX__;
and
t2 = v u<= __FLT_MAX__;
comparisons. From false u >= v where u and v have [0.0, +Inf] NAN
ranges we (incorrectly deduce that one of them is [nextafterf (0.0, 1.0), +Inf] NAN
and the other is [0.0, nextafterf (+Inf, 0.0)] NAN and from that deduce that
one of the comparisons is always true, because UNLE_EXPR with the maximum
representable number are false only if the value is +Inf and our ranges tell
that is not possible.
The bug is that the u >= v comparison determines a sensible range only when
it is true - we then know neither operand can be NAN and it behaves
correctly. But when the comparison is false, our current code gives
sensible answers only if the other op can't be NAN. If it can be NAN,
whenever it is NAN, the comparison is always false regardless of the other
value, so the other value needs to be VARYING.
Now, foperator_unordered_lt::op1_range etc. had code to deal with that
for op?.known_nan (), but as the testcase shows, it is enough if it may be a
NAN at runtime to make it VARYING.
So, the following patch replaces for all those BRS_FALSE cases of the normal
non-equality comparisons if (opOTHER.known_isnan ()) r.set_varying (type);
to do it also if maybe_isnan ().
For the unordered or ... comparisons, it is similar for BRS_TRUE. Those
comparisons are true whenever either operand is NAN, or if neither is NAN,
the corresponding normal comparison. So, if those comparisons are true
and other operand might be a NAN, we can't tell (VARYING), if it is false,
currently handling is correct.
2023-04-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109386
* range-op-float.cc (foperator_lt::op1_range, foperator_lt::op2_range,
foperator_le::op1_range, foperator_le::op2_range,
foperator_gt::op1_range, foperator_gt::op2_range,
foperator_ge::op1_range, foperator_ge::op2_range): Make r varying for
BRS_FALSE case even if the other op is maybe_isnan, not just
known_isnan.
(foperator_unordered_lt::op1_range, foperator_unordered_lt::op2_range,
foperator_unordered_le::op1_range, foperator_unordered_le::op2_range,
foperator_unordered_gt::op1_range, foperator_unordered_gt::op2_range,
foperator_unordered_ge::op1_range, foperator_unordered_ge::op2_range):
Make r varying for BRS_TRUE case even if the other op is maybe_isnan,
not just known_isnan.
* gcc.c-torture/execute/ieee/pr109386.c: New test.
|
|
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/103818.c: Enable test for llp64.
Signed-off-by: Jonathan Yong <10walls@gmail.com>
|
|
The following testcase is reduced from miscompilation of scipy package.
If we have say lhs = [1., 1.] - [1., 1.] and want to compute the range
of lhs from it, we correctly determine it is [0., 0.] (if computations
are exact, we generally don't try to round them further in
frange_arithmetic). In the testcase it is about a reverse operation,
[1., 1.] = op1 + [1., 1.] and we want to compute range of op1 from that.
Right now we just perform the inverse operation (there are some corner
cases about NaN and infinities handling) and so arrive to range
[0., 0.] as well, and because it is a singleton, optimize return eps;
to return 0. That is incorrect though, for the reverse ops we need to
take into account also rounding, the right exact range is
[-0x1.0p-54, 0x1.0p-53] in this case when rounding to nearest, i.e.
all numbers which added to 1. with round to nearest still produce 1.
The problem isn't solely on singleton ranges, and isn't solely on
results around zero. We basically need to consider also values
where the result is up to 0.5ulp away from the lhs range boundaries
in each direction.
The following patch fixes it by extending the lhs range for the
reverse operations by 1ulp in each direction. The PR contains
a pseudo-random test generator I've used to generate 300000 tests
of + and - and then used the same test with * and / instead of + and -
together with a hack to print the discovered ranges by the patch in
a form that another test could then verify the range is conservatively
correct and how far it is from a minimal range.
I believe the results are good enough for now, though plan to look
incrementally into trying to do something better on the -XXX_MAX or
XXX_MAX boundaries (where I think frange_nextafter will use -inf or +inf)
and also try to increase the range just by 0.5ulp rather than 1ulp
if !flag_rounding_math. But dunno if either of those will be doable
and will pass the testing, so I think it is worth committing this fix
first.
2023-03-09 Jakub Jelinek <jakub@redhat.com>
Richard Biener <rguenther@suse.de>
PR tree-optimization/109008
* range-op-float.cc (float_widen_lhs_range): New function.
(foperator_plus::op1_range, foperator_minus::op1_range,
foperator_minus::op2_range, foperator_mult::op1_range,
foperator_div::op1_range, foperator_div::op2_range): Use it.
* gcc.c-torture/execute/ieee/pr109008.c: New test.
|
|
The following testcase ICEs because eliminate_redundant_comparison sees
redundant comparisons in &&/|| where the comparison has (ab) SSA_NAME,
maybe_fold_{and,or}_comparisons optimizes them into a single comparison
and build_and_add_sum emits a new comparison close to the definition
operands, which in this case is before a returns_twice call (which is
invalid). Generally reassoc just punts on (ab) SSA_NAMEs, declares them
non-reassociable etc., so the second half of this patch does that.
Though we can do better in this case; the function has special code
when maybe_fold_{and,or}_comparisons returns INTEGER_CST (false/true)
or when what it returns is the same as curr->op (the first of the
comparisons we are considering) - in that case we just remove the
second one and keep the first one. The reason it doesn't match is that
curr->op is a SSA_NAME whose SSA_NAME_DEF_STMT is checked to be a
comparison, in this case _42 = a_1(ab) != 0 and the other comparison
is also like that. maybe_fold_{and,or}_comparisons looks through the
definitions though and so returns a_1(ab) != 0 as tree.
So the first part of the patch checks whether that returned comparison
isn't the same as the curr->op comparison and if yes, it just overrides
t back to curr->op so that its SSA_NAME is reused. In that case we can
handle even (ab) in {,new}op{1,2} because we don't create a new comparison
of that, just keep using the existing one. And t can't be (ab) because
otherwise it wouldn't be considered a reassociable operand.
The (ab) checks are needed say when we have a_1(ab) == 42 || a_1(ab) > 42
kind of comparisons where maybe_fold_{and,or}_comparisons returns a new
comparison not existing in the IL yet.
2023-02-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108783
* tree-ssa-reassoc.cc (eliminate_redundant_comparison): If lcode
is equal to TREE_CODE (t), op1 to newop1 and op2 to newop2, set
t to curr->op. Otherwise, punt if either newop1 or newop2 are
SSA_NAME_OCCURS_IN_ABNORMAL_PHI SSA_NAMEs.
* gcc.c-torture/compile/pr108783.c: New test.
|
|
In simple_dce_from_worklist, we were removing an inline-asm which had a vdef.
We should not be removing inline-asm which have a vdef as this code
does not check to the store.
This fixes that oversight. This was a latent bug exposed recently
by both VRP and removal of stores to static starting to use
simple_dce_from_worklist.
Committed as approved.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/108684
gcc/ChangeLog:
* tree-ssa-dce.cc (simple_dce_from_worklist):
Check all ssa names and not just non-vdef ones
before accepting the inline-asm.
Call unlink_stmt_vdef on the statement before
removing it.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/dce-inline-asm-1.c: New test.
* gcc.c-torture/compile/dce-inline-asm-2.c: New test.
* gcc.dg/tree-ssa/pr108684-1.c: New test.
co-authored-by: Andrew Macleod <amacleod@redhat.com>
|
|
[PR108688]
On Thu, Feb 09, 2023 at 09:16:17AM +0100, Richard Biener via Gcc-patches wrote:
> Hmm. Can we handle the case of the extraction exactly covering the
> insertion separately then and simplify to plain @1?
I was suggesting that in the PR. Here it is as an incremental patch
on top of Andrew's patch.
On the newly added testcase the ifcvt-folding difference without/with the
incremental patch is:
--- pr108688.c.171t.ifcvt_ 2023-02-09 10:47:30.169916845 +0100
+++ pr108688.c.171t.ifcvt 2023-02-09 10:48:44.942793453 +0100
@@ -25,6 +25,8 @@ Number of blocks in CFG: 11
Number of blocks to update: 5 ( 45%)
+Applying pattern match.pd:7487, gimple-match.cc:243200
+Applying pattern match.pd:3987, gimple-match.cc:75423
Matching expression match.pd:1677, gimple-match.cc:209
Applying pattern match.pd:1733, gimple-match.cc:109481
Matching expression match.pd:2393, gimple-match.cc:852
@@ -70,7 +72,6 @@ void foo ()
signed char _29;
<unnamed-signed:7> _30;
unsigned int ivtmp_33;
- <unnamed-signed:7> _ifc__35;
unsigned char _ifc__37;
unsigned char _ifc__38;
unsigned char _ifc__39;
@@ -91,8 +92,7 @@ void foo ()
_2 = (<unnamed-signed:7>) a.0_1;
_ifc__38 = u.D.2741;
_ifc__39 = BIT_INSERT_EXPR <_ifc__38, _2, 0 (7 bits)>;
- _ifc__35 = BIT_FIELD_REF <_ifc__39, 7, 0>;
- _4 = (signed char) _ifc__35;
+ _4 = (signed char) _2;
b.1_5 = b;
_6 = (signed char) b.1_5;
_7 = _4 ^ _6;
2023-02-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108688
* match.pd (bit_field_ref [bit_insert]): Simplify BIT_FIELD_REF
of BIT_INSERT_EXPR extracting exactly all inserted bits even
when without mode precision. Formatting fixes.
* gcc.c-torture/compile/pr108688-1.c: Add PR number as comment.
* gcc.dg/pr108688.c: New test.
|
|
integral type [PR108688]
The same problem as PR 88739 has crept in but
this time in match.pd when simplifying bit_field_ref of
an bit_insert. That is we are generating a BIT_FIELD_REF
of a non-mode-precision integral type.
PR tree-optimization/108688
* match.pd (bit_field_ref [bit_insert]): Avoid generating
BIT_FIELD_REFs of non-mode-precision integral operands.
* gcc.c-torture/compile/pr108688-1.c: New test.
|
|
The following testcase ICEs, because we determine only in late pure const
pass that bar is const (the content of the function loses a store to a
global var during dse3 and read from it during cddce2) and local-pure-const2
makes it const. The cgraph ordering is that post IPA (in late IPA simd
clones are created) bar is processed first, then foo as its caller, then
foo.simdclone* and finally bar.simdclone*. Conceptually I think that is the
right ordering which allows for static simd clones to be removed.
The reason for the ICE is that because bar was marked const, the call to
it lost vops before vectorization, and when we in foo.simdclone* try to
vectorize the call to bar, we replace it with bar.simdclone* which hasn't
been marked const and so needs vops, which we don't add.
Now, because the simd clones are created from the same IL, just in a loop
with different argument/return value passing, I think generally if the base
function is determined to be const or pure, the simd clones should be too,
unless e.g. the vectorization causes different optimization decisions, but
then still the global memory reads if any shouldn't affect what the function
does and global memory stores shouldn't be reachable at runtime.
So, the following patch changes set_{const,pure}_flag to mark also simd
clones.
2023-02-07 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/106433
* cgraph.cc (set_const_flag_1): Recurse on simd clones too.
(cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too.
* gcc.c-torture/compile/pr106433.c: New test.
|
|
The problem here is we are trying to compare two ranges with different
precisions and the == operator in wide_int is complaining.
Interestingly, the problem is not the nonzero bits, but the fact that
the entire ranges have different precisions. The reason we don't ICE
when comparing the sub-ranges, is because the code in
irange::operator== works on trees, and tree_int_cst_equal is
promoting the comparison to a widest int:
if (TREE_CODE (t1) == INTEGER_CST
&& TREE_CODE (t2) == INTEGER_CST
&& wi::to_widest (t1) == wi::to_widest (t2))
return 1;
This is why we don't see the ICE until the nonzero bits comparison is
done on wide ints. I think we should maintain the current equality
behavior, and follow suit in the nonzero bit comparison.
I have also fixed the legacy equality code, even though technically
nonzero bits shouldn't appear in legacy. But better safe than sorry.
PR tree-optimization/108639
gcc/ChangeLog:
* value-range.cc (irange::legacy_equal_p): Compare nonzero bits as
widest_int.
(irange::operator==): Same.
|
|
Switch from using stacks in the "private segment" to using a memory block
allocated on the host side. The primary reason is to permit the reverse
offload implementation to access values located on the device stack, but
there may also be performance benefits, especially with repeated kernel
invocations.
This implementation unifies the stacks with the "team arena" optimization
feature, and now allows both to have run-time configurable sizes.
A new ABI is needed, so all libraries must be rebuilt, and newlib must be
version 4.3.0.20230120 or newer.
gcc/ChangeLog:
* config/gcn/gcn-run.cc: Include libgomp-gcn.h.
(struct kernargs): Replace the common content with kernargs_abi.
(struct heap): Delete.
(main): Read GCN_STACK_SIZE envvar.
Allocate space for the device stacks.
Write the new kernargs fields.
* config/gcn/gcn.cc (gcn_option_override): Remove stack_size_opt.
(default_requested_args): Remove PRIVATE_SEGMENT_BUFFER_ARG and
PRIVATE_SEGMENT_WAVE_OFFSET_ARG.
(gcn_addr_space_convert): Mask the QUEUE_PTR_ARG content.
(gcn_expand_prologue): Move the TARGET_PACKED_WORK_ITEMS to the top.
Set up the stacks from the values in the kernargs, not private.
(gcn_expand_builtin_1): Match the stack configuration in the prologue.
(gcn_hsa_declare_function_name): Turn off the private segment.
(gcn_conditional_register_usage): Ensure QUEUE_PTR is fixed.
* config/gcn/gcn.h (FIXED_REGISTERS): Fix the QUEUE_PTR register.
* config/gcn/gcn.opt (mstack-size): Change the description.
include/ChangeLog:
* gomp-constants.h (GOMP_VERSION_GCN): Bump.
libgomp/ChangeLog:
* config/gcn/libgomp-gcn.h (DEFAULT_GCN_STACK_SIZE): New define.
(DEFAULT_TEAM_ARENA_SIZE): New define.
(struct heap): Move to this file.
(struct kernargs_abi): Likewise.
* config/gcn/team.c (gomp_gcn_enter_kernel): Use team arena size from
the kernargs.
* libgomp.h: Include libgomp-gcn.h.
(TEAM_ARENA_SIZE): Remove.
(team_malloc): Update the error message.
* plugin/plugin-gcn.c (struct kernargs): Move common content to
struct kernargs_abi.
(struct agent_info): Rename team arenas to ephemeral memories.
(struct team_arena_list): Rename ....
(struct ephemeral_memories_list): to this.
(struct heap): Delete.
(team_arena_size): New variable.
(stack_size): New variable.
(print_kernel_dispatch): Update debug messages.
(init_environment_variables): Read GCN_TEAM_ARENA_SIZE.
Read GCN_STACK_SIZE.
(get_team_arena): Rename ...
(configure_ephemeral_memories): ... to this, and set up stacks.
(release_team_arena): Rename ...
(release_ephemeral_memories): ... to this.
(destroy_team_arenas): Rename ...
(destroy_ephemeral_memories): ... to this.
(create_kernel_dispatch): Add num_threads parameter.
Adjust for kernargs_abi refactor and ephemeral memories.
(release_kernel_dispatch): Adjust for ephemeral memories.
(run_kernel): Pass thread-count to create_kernel_dispatch.
(GOMP_OFFLOAD_init_device): Adjust for ephemeral memories.
(GOMP_OFFLOAD_fini_device): Adjust for ephemeral memories.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr47237.c: Xfail on amdgcn.
* gcc.dg/builtin-apply3.c: Xfail for amdgcn.
* gcc.dg/builtin-apply4.c: Xfail for amdgcn.
* gcc.dg/torture/stackalign/builtin-apply-3.c: Xfail for amdgcn.
* gcc.dg/torture/stackalign/builtin-apply-4.c: Xfail for amdgcn.
|
|
On the following testcase we have asm goto in hot block with 2 successors,
one cold to which it both falls through and has one of the label
pointing to it and another hot successor with another label.
Now, during bbpart we want to ensure that no blocks from one partition fall
through into a block in a different partition. fix_up_fall_thru_edges
does that by temporarily clearing the EDGE_CROSSING on the fallthrough edge,
calling force_nonfallthru and then depending on whether it created a new
bb either set EDGE_CROSSING on the single successor edge from the new bb
(the new bb is kept in the same partition as the predecessor block), or
if no new bb has been created setting EDGE_CROSSING back on the fallthru
edge which has been forced non-EDGE_FALLTHRU.
For asm goto this doesn't always work, force_nonfallthru can create a new bb
and change the fallthrough edge to point to that, but if the original
fallthru destination block has its label referenced among the asm goto
labels, it will create a new non-fallthru edge for the label(s).
But because we've temporarily cheated and cleared EDGE_CROSSING on the edge,
it is cleared on the new edge as well, then the caller sees we've created
a new bb and just sets EDGE_CROSSING on the single fallthru edge from the
new bb. But the direct edge from cur_bb to fallthru edge's destination
isn't handled and fails afterwards consistency checks, because it crosses
partitions.
The following patch notes the case and sets EDGE_CROSSING on that edge too.
2023-01-31 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/108596
* bb-reorder.cc (fix_up_fall_thru_edges): Handle the case where cur_bb
ends with asm goto and has a crossing fallthrough edge to the same bb
that contains at least one of its labels by restoring EDGE_CROSSING
flag even on possible edge from cur_bb to new_bb successor.
* gcc.c-torture/compile/pr108596.c: New test.
|
|
The following testcases are miscompiled, because threader sees some
SSA_NAME would have -0.0 value and when computing range of SSA_NAME == 0.0
foperator_equal::fold_range sees one operand has [-0.0, -0.0] singleton
range, the other [0.0, 0.0], they aren't equal (frange operator== uses
real_identical etc. rather than real comparisons) and so it thinks they
compare unequal. With signed zeros -0.0 == 0.0 is true though, so we
need to special case the both ranges singleton code.
Similarly, if we see op1 range being say [-42.0, -0.0] and op2 range
[0.0, 42.0], we'd check that the intersection of the two ranges is empty
(that is correct) and fold the result of == between such operands to
[0, 0] which is wrong, because -0.0 == 0.0, it needs to be [0, 1].
Similarly for foperator_not_equal::fold_range.
2023-01-26 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108540
* range-op-float.cc (foperator_equal::fold_range): If both op1 and op2
are singletons, use range_true even if op1 != op2
when one range is [-0.0, -0.0] and another [0.0, 0.0]. Similarly,
even if intersection of the ranges is empty and one has
zero low bound and another zero high bound, use range_true_and_false
rather than range_false.
(foperator_not_equal::fold_range): If both op1 and op2
are singletons, use range_false even if op1 != op2
when one range is [-0.0, -0.0] and another [0.0, 0.0]. Similarly,
even if intersection of the ranges is empty and one has
zero low bound and another zero high bound, use range_true_and_false
rather than range_true.
* gcc.c-torture/execute/ieee/pr108540-1.c: New test.
* gcc.c-torture/execute/ieee/pr108540-2.c: New test.
|
|
aligned [PR108498]
The first of the following testcases is miscompiled on powerpc64-linux -O2
-m64 at least, the latter at least on x86_64-linux -m32/-m64.
Since GCC 11 store-merging has a separate string_concatenation mode which
turns stores into setting a MEM_REF from a STRING_CST.
This mode is triggered if at least one of the to be merged stores
is a STRING_CST store and either the first store (to earliest address)
is that STRING_CST store or the first store is 8-bit INTEGER_CST store
and then there are some rules when to turn that mode off or not merge
further stores into it.
The problem with these 2 testcases is that the actual implementation
relies on start/width of the store to be at byte boundaries, as it
simply creates a char array, MEM_REF can be only on byte boundaries
and the char array too, plus obviously STRING_CST as well.
But as can be easily seen in the second testcase, nothing verifies this,
while the first store has to be a STRING_CST (which will be aligned)
or 8-bit INTEGER_CST, that 8-bit INTEGER_CST store could be a bitfield
store, nothing verifies any stores in between whether they actually are
8-bit and aligned, the only major requirement is that all the stores
are consecutive.
For GCC 14 I think we should reconsider this, simply treat STRING_CST
stores during the merging like INTEGER_CST stores and deal with it only
during split_group where we can create multiple parts, this part
would be a normal store, this part would be STRING_CST store, this part
another normal store etc. But that is quite a lot of work, the following
patch just disables the string_concatenate mode if boundaries aren't byte
aligned in the spot where we disable it if it is too short too.
If that happens, we'll just try to do the merging using normal 1/2/4/8 etc.
byte stores as usually with RMW masking for any bits that shouldn't be
touched or punt if we end up with too many stores compared to the original.
Note, an original STRING_CST store will count as one store in that case,
something we might want to reconsider later too (but, after all, CONSTRUCTOR
stores (aka zeroing) already have the same problem, they can be large and
expensive and we still count them as one store).
2023-01-25 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108498
* gimple-ssa-store-merging.cc (class store_operand_info):
End coment with full stop rather than comma.
(split_group): Likewise.
(merged_store_group::apply_stores): Clear string_concatenation if
start or end aren't on a byte boundary.
* gcc.c-torture/execute/pr108498-1.c: New test.
* gcc.c-torture/execute/pr108498-2.c: New test.
|
|
The comment above simplify_rotate roughly describes what patterns
are matched into what:
We are looking for X with unsigned type T with bitsize B, OP being
+, | or ^, some type T2 wider than T. For:
(X << CNT1) OP (X >> CNT2) iff CNT1 + CNT2 == B
((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2)) iff CNT1 + CNT2 == B
transform these into:
X r<< CNT1
Or for:
(X << Y) OP (X >> (B - Y))
(X << (int) Y) OP (X >> (int) (B - Y))
((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
(X << Y) | (X >> ((-Y) & (B - 1)))
(X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
transform these into (last 2 only if ranger can prove Y < B):
X r<< Y
Or for:
(X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
(X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) (Y & (B - 1)))) \
| ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
transform these into:
X r<< (Y & (B - 1))
The following testcase shows that 2 of these are problematic.
If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
of the shift counts but Y on the can do something different from
rotate. E.g.:
__attribute__((noipa)) unsigned char
f7 (unsigned char x, unsigned int y)
{
unsigned int t = x;
return (t << y) | (t >> ((-y) & 7));
}
if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
then it is UB, but for y in [9, 31] the left shift in this case
will never leave any bits in the result, while in a rotate they are
left there. Say for y 5 and x 0xaa the expression gives
0x55 which is the same thing as rotate, while for y 19 and x 0xaa
0x5, which is different.
Now, I believe the
((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms are ok, because B - Y still needs to be a valid shift count,
and if Y > B then B - Y should be either negative or very large
positive (for unsigned types).
And similarly the last 2 cases above which use & (B - 1) on both
shift operands are definitely ok.
The following patch disables the
((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
unless ranger says Y is not in [B, B2 - 1] range.
And, looking at it again this morning, actually the Y equal to B
case is still fine, if Y is equal to 0, then it is
(T) (((T2) X << 0) | ((T2) X >> 0))
and so X, for Y == B it is
(T) (((T2) X << B) | ((T2) X >> 0))
which is the same as
(T) (0 | ((T2) X >> 0))
which is also X. So instead of the [B, B2 - 1] range we could use
[B + 1, B2 - 1]. And, if we wanted to go further, even multiplies
of B are ok if they are smaller than B2, so we could construct a detailed
int_range_max if we wanted.
2023-01-17 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/106523
* tree-ssa-forwprop.cc (simplify_rotate): For the
patterns with (-Y) & (B - 1) in one operand's shift
count and Y in another, if T2 has wider precision than T,
punt if Y could have a value in [B, B2 - 1] range.
* c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-2b.c: New test.
* c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-4b.c: New test.
* gcc.c-torture/execute/pr106523.c: New test.
|
|
|
|
This one is hand reduced to problematic code from optimized dump
that used to be miscompiled during combine starting with
r12-303 and fixed with r13-3530 aka PR107172 fix.
2023-01-13 Jakub Jelinek <jakub@redhat.com>
PR target/107131
* gcc.c-torture/execute/pr107131.c: New test.
|
|
These PRs were for now fixed by reversion of the r13-4977
patch, but so that the problems don't reappear during stage 1,
I'm adding testcase coverage from those PRs.
2023-01-06 Jakub Jelinek <jakub@redhat.com>
PR target/108292
PR target/108308
* gcc.c-torture/execute/pr108292.c: New test.
* gcc.target/i386/pr108292.c: New test.
* gcc.dg/pr108308.c: New test.
|
|
We ICE on the following testcase, because a valid V2DImode
!= comparison is folded into an unsupported V2DImode > comparison.
The match.pd pattern which does this looks like:
/* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
where ~Y + 1 == pow2 and Z = ~Y. */
(for cst (VECTOR_CST INTEGER_CST)
(for cmp (eq ne)
icmp (le gt)
(simplify
(cmp (bit_and:c@2 @0 cst@1) integer_zerop)
(with { tree csts = bitmask_inv_cst_vector_p (@1); }
(if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
(with { auto optab = VECTOR_TYPE_P (TREE_TYPE (@1))
? optab_vector : optab_default;
tree utype = unsigned_type_for (TREE_TYPE (@1)); }
(if (target_supports_op_p (utype, icmp, optab)
|| (optimize_vectors_before_lowering_p ()
&& (!target_supports_op_p (type, cmp, optab)
|| !target_supports_op_p (type, BIT_AND_EXPR, optab))))
(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
(icmp @0 { csts; })
(icmp (view_convert:utype @0) { csts; })))))))))
and that optimize_vectors_before_lowering_p () guarded stuff there
already deals with this problem, not trying to fold a supported comparison
into a non-supported one. The reason it doesn't work in this case is that
it isn't GIMPLE folding which does this, but GENERIC folding done during
forwprop4 - forward_propagate_into_comparison -> forward_propagate_into_comparison_1
-> combine_cond_expr_cond -> fold_binary_loc -> generic_simplify
and we simply assumed that GENERIC folding happens only before
gimplification.
The following patch fixes that by checking cfun properties instead of
always returning true in those cases.
2023-01-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108237
* generic-match-head.cc: Include tree-pass.h.
(canonicalize_math_p, optimize_vectors_before_lowering_p): Define
to false if cfun and cfun->curr_properties has PROP_gimple_opt_math
resp. PROP_gimple_lvec property set.
* gcc.c-torture/compile/pr108237.c: New test.
|
|
This fixes the following on LLP64 mingw-w64 target:
Excess errors:
gcc/testsuite/gcc.c-torture/compile/pr55569.c:13:12: warning: overflow in conversion from 'long long unsigned int' to 'long int' changes value from '4611686018427387903' to '-1' [-Woverflow]
gcc/testsuite/gcc.c-torture/compile/pr55569.c:13:34: warning: iteration 2147483647 invokes undefined behavior [-Waggressive-loop-optimizations]
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr55569.c: fix excess errors.
Signed-off-by: Jonathan Yong <10walls@gmail.com>
|
|
Even though this PR was reported with an ubsan issue, the problem is
tree_nonzero_bits is being called with an expression which is a vector type.
This fixes three patterns I noticed which does that.
And adds a testcase for one of the patterns.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions
gcc/ChangeLog:
PR tree-optimization/105532
* match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral
type before calling tree_nonzero_bits.
(popcount(X) + popcount(Y)): Likewise.
(popcount(X&C1)): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/vector-shift-1.c: New test.
|
|
[PR106751]
The RTL loop passes only request simple preheaders, but don't require
fallthru preheaders, while move_invariant_reg apparently assumes the
latter, that it can just append instruction(s) to the end of the preheader
basic block.
The following patch fixes that by splitting the preheader edge if
the preheader bb ends with a JUMP_INSN (asm goto in this case).
Without that we get control flow in the middle of a bb.
2022-12-16 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/106751
* loop-invariant.cc (move_invariant_reg): If preheader bb ends
with a JUMP_INSN, split the preheader edge and emit invariants
into the new preheader basic block.
* gcc.c-torture/compile/pr106751.c: New test.
|
|
Since vect_recog_rotate_pattern has been extended to work also
on signed types in r13-1100 we miscompile the testcase below.
vect_recog_rotate_pattern actually emits correct scalar code into
the pattern def sequence (in particular cast to utype, doing the
2 shifts in utype so that the right shift is logical and not arithmetic,
or and then cast back to the signed type), but it didn't supply vectype
for most of those pattern statements, which means that the generic handling
fills it up later with the vectype provided by vect_recog_rotate_pattern.
The problem is that it is vectype of the result of the whole pattern,
i.e. vector of signed values in this case, while the conversion to utype,
2 shifts and or (everything with utype lhs in scalar code) should have
uvectype as STMT_VINFO_VECTYPE.
2022-12-13 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108064
* tree-vect-patterns.cc (vect_recog_rotate_pattern): Pass uvectype
as 4th argument to append_pattern_def_seq for statements with lhs
with utype type.
* gcc.c-torture/execute/pr108064.c: New test.
|
|
The following testcase ICEs, because the latch bb ends with
asm goto which has both fallthrough to the header and one or more labels
in the header too. In that case there is just a single edge out of the
latch block, but still the asm goto is stmt_ends_bb_p statement, yet
ivopts decides to emit an IV bump at the IP_END position and inserts
it into the same bb as the asm goto after it, which then fails verification
(control flow in the middle of bb).
The following patch fixes it by splitting the latch -> header edge in that
case and inserting into the newly created bb, where split_edge ->
redirect_edge_and_branch is able to deal with this case correctly.
2022-12-10 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/107997
* tree-ssa-loop-ivopts.cc: Include cfganal.h.
(create_new_iv) <case IP_END>: If ip_end_pos bb is non-empty and ends
with a stmt which ends bb, instead of adding iv update after it split
the latch edge and insert iterator into the new latch bb.
* gcc.c-torture/compile/pr107997.c: New test.
|
|
add_options_for_ieee has:
if { [istarget alpha*-*-*]
|| [istarget sh*-*-*] } {
return "$flags -mieee"
}
if { [istarget rx-*-*] } {
return "$flags -mnofpu"
}
but ieee.exp doesn't use add_options_for_ieee, instead it has:
if { [istarget "alpha*-*-*"]
|| [istarget "sh*-*-*"] } then {
lappend additional_flags "-mieee"
}
among other things (plus -ffloat-store on some arches etc.).
The following patch adds the rx -mnofpu similarly in the hope
of fixing ieee.exp FAILs on rx.
Preapproved in the PR by Jeff, committed to trunk.
2022-12-06 Jakub Jelinek <jakub@redhat.com>
PR testsuite/107046
* gcc.c-torture/execute/ieee/ieee.exp: For rx-*-* append
-mnofpu.
|
|
As reported in the PR, the following pr106805.c testcase is miscompiled
with the default -ftrapping-math, because we fold all the comparisons into
constants and don't raise any exceptions.
The match.pd pattern handles just simple comparisons, from those
EQ/NE are quiet and don't raise exceptions on anything but sNaN, while
GT/GE/LT/LE are signaling and do raise exceptions even on qNaN.
fold_relational_const handles this IMHO correctly:
/* Handle the cases where either operand is a NaN. */
if (real_isnan (c0) || real_isnan (c1))
{
switch (code)
{
case EQ_EXPR:
case ORDERED_EXPR:
result = 0;
break;
case NE_EXPR:
case UNORDERED_EXPR:
case UNLT_EXPR:
case UNLE_EXPR:
case UNGT_EXPR:
case UNGE_EXPR:
case UNEQ_EXPR:
result = 1;
break;
case LT_EXPR:
case LE_EXPR:
case GT_EXPR:
case GE_EXPR:
case LTGT_EXPR:
if (flag_trapping_math)
return NULL_TREE;
result = 0;
break;
default:
gcc_unreachable ();
}
return constant_boolean_node (result, type);
}
by folding the signaling comparisons only if -fno-trapping-math.
The following patch does the same in match.pd.
Unfortunately the pr106805.c testcase still fails, but no longer because of
match.pd, but on the trunk because of the still unresolved ranger problems
(same issue as for fold-overflow-1.c etc.) and on 12 branch (and presumably
trunk too) somewhere during expansion the comparisons are also expanded
into constants (which is ok for -fno-trapping-math, but not ok with that).
Though, I think the patch is a small step in the direction, so I'd like
to commit this patch without the gcc.dg/pr106805.c testcase for now.
2022-12-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106805
* match.pd (cmp @0 REAL_CST@1): Don't optimize x cmp NaN
or NaN cmp x to false/true for cmp >/>=/</<= if -ftrapping-math.
* c-c++-common/pr57371-4.c: Revert 2021-09-19 changes.
* c-c++-common/pr57371-5.c: New test.
* gcc.c-torture/execute/ieee/fp-cmp-6.x: Add -fno-trapping-math.
* gcc.c-torture/execute/ieee/fp-cmp-9.c: New test.
* gcc.c-torture/execute/ieee/fp-cmp-9.x: New file.
|
|
While for the normal cases it seems to be correct to implement
reverse multiplication (op1_range/op2_range) through division
with float_binary_op_range_finish, reverse division (op1_range)
through multiplication with float_binary_op_range_finish or
(op2_range) through division with float_binary_op_range_finish,
as e.g. following testcase shows for the corner cases it is
incorrect.
Say on the testcase we are doing reverse multiplication, we
have [-0., 0.] range (no NAN) on lhs and VARYING on op1 (or op2).
We implement that through division, because x from
lhs = x * op2
is
x = lhs / op2
For the division, [-0., 0.] / VARYING is computed (IMHO correctly)
as [-0., 0.] +-NAN, because 0 / anything but 0 or NAN is still
0 and 0 / 0 is NAN and ditto 0 / NAN. And then we just
float_binary_op_range_finish, which figures out that because lhs
can't be NAN, neither operand can be NAN. So, the end range is
[-0., 0.]. But that is not correct for the reverse multiplication.
When the result is 0, if op2 can be zero, then x can be anything
(VARYING), to be precise anything but INF (unless result can be NAN),
because anything * 0 is 0 (or NAN for INF). While if op2 must be
non-zero, then x must be 0. Of course the sign logic
(signbit(x) = signbit(lhs) ^ signbit(op2)) still holds, so it actually
isn't full VARYING if both lhs and op2 have known sign bits.
And going through other corner cases one by one shows other differences
between what we compute for the corresponding forward operation and
what we should compute for the reverse operations.
The following patch is slightly conservative and includes INF
(in case of result including 0 and not NAN) in the ranges or
0 in the ranges (in case of result including INF and not NAN).
The latter is what happens anyway because we flush denormals to 0,
and the former just not to deal with all the corner cases.
So, the end test is that for reverse multiplication and division
op2_range the cases we need to adjust to VARYING or VARYING positive
or VARYING negative are if lhs and op? ranges both contain 0,
or both contain some infinity, while for division op1_range the
corner case is if lhs range contains 0 and op2 range contains INF or vice
versa. Otherwise I believe ranges from the corresponding operation
are ok, or could be slightly more conservative (e.g. for
reverse multiplication, if op? range is singleton INF and lhs
range doesn't include any INF, then x's range should be UNDEFINED or
known NAN (depending on if lhs can be NAN), while the division computes
[-0., 0.] +-NAN; or similarly if op? range is only 0 and lhs range
doesn't include 0, division would compute +INF +-NAN, or -INF +-NAN,
or (for lack of multipart franges -INF +INF +-NAN just VARYING +-NAN),
while again it is UNDEFINED or known NAN.
Oh, and I found by code inspection wrong condition for the division's
known NAN result, due to thinko it would trigger not just when
both operands are known to be 0 or both are known to be INF, but
when either both are known to be 0, or at least one is known to be INF.
2022-12-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/107879
* range-op-float.cc (foperator_mult::op1_range): If both
lhs and op2 ranges contain zero or both ranges contain
some infinity, set r range to zero_to_inf_range depending on
signbit_known_p.
(foperator_div::op2_range): Similarly for lhs and op1 ranges.
(foperator_div::op1_range): If lhs range contains zero and op2
range contains some infinity or vice versa, set r range to
zero_to_inf_range depending on signbit_known_p.
(foperator_div::rv_fold): Fix up condition for returning known NAN.
|
|
r13-254-gdd3c7873a61019e9 added an optimization for {a, +, a} (x-1),
but as can be seen on the following testcase, the way it is written
where chrec_fold_multiply is called with type doesn't work for pointers:
res = build_int_cst (TREE_TYPE (x), 1);
res = chrec_fold_plus (TREE_TYPE (x), x, res);
res = chrec_convert_rhs (type, res, NULL);
res = chrec_fold_multiply (type, chrecr, res);
while what we were doing before and what is still used if the condition
doesn't match is fine:
res = chrec_convert_rhs (TREE_TYPE (chrecr), x, NULL);
res = chrec_fold_multiply (TREE_TYPE (chrecr), chrecr, res);
res = chrec_fold_plus (type, CHREC_LEFT (chrec), res);
because it performs chrec_fold_multiply on TREE_TYPE (chrecr) and converts
only afterwards.
I think the easiest fix is to ignore the new path for pointer types.
2022-11-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/107835
* tree-chrec.cc (chrec_apply): Don't handle "{a, +, a} (x-1)"
as "a*x" if type is a pointer type.
* gcc.c-torture/compile/pr107835.c: New test.
|
|
On Wed, Nov 09, 2022 at 04:43:56PM +0100, Aldy Hernandez wrote:
> On Wed, Nov 9, 2022 at 3:58 PM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Wed, Nov 09, 2022 at 10:02:46AM +0100, Aldy Hernandez wrote:
> > > We can implement the op[12]_range entries for plus and minus in terms
> > > of each other. These are adapted from the integer versions.
> >
> > I think for NANs the op[12]_range shouldn't act this way.
> > For the forward binary operations, we have the (maybe/known) NAN handling
> > of one or both NAN operands resulting in VARYING sign (maybe/known) NAN
> > result, that is the somehow the case for the reverse binary operations too,
> > if result is (maybe/known) NAN and the other op is not NAN, op is
> > VARYING sign (maybe/known) NAN, if other op is (maybe/known) NAN,
> > then op is VARYING sign maybe NAN (always maybe, never known).
> > But then for + we have the -INF + INF or vice versa into NAN, and that
> > is something that shouldn't be considered. If result isn't NAN, then
> > neither operand can be NAN, regardless of whether result can be
> > +/- INF and the other op -/+ INF.
>
> Heh. I just ran into this while debugging the problem reported by Xi.
>
> We are solving NAN = op1 - VARYING, and trying to do it with op1 = NAN
> + VARYING, which returns op1 = NAN (incorrectly).
>
> I suppose in the above case op1 should ideally be
> [-INF,-INF][+INF,+INF]+-NAN, but since we can't represent that then
> [-INF,+INF] +-NAN, which is actually VARYING. Do you agree?
>
> I'm reverting this patch as attached, while I sort this out.
Here is a patch which reinstalls your change, add the fixups I've talked
about and also hooks up reverse operators for MULT_EXPR/RDIV_EXPR.
2022-11-12 Aldy Hernandez <aldyh@redhat.com>
Jakub Jelinek <jakub@redhat.com>
* range-op-float.cc (float_binary_op_range_finish): New function.
(foperator_plus::op1_range): New.
(foperator_plus::op2_range): New.
(foperator_minus::op1_range): New.
(foperator_minus::op2_range): New.
(foperator_mult::op1_range): New.
(foperator_mult::op2_range): New.
(foperator_div::op1_range): New.
(foperator_div::op2_range): New.
* gcc.c-torture/execute/ieee/inf-4.c: New test.
|
|
There is a loophole in the unroll-and-jam pass that can quickly result in
wrong code generation. The code reads:
if (!compute_data_dependences_for_loop (outer, true, &loop_nest,
&datarefs, &dependences))
{
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Cannot analyze data dependencies\n");
free_data_refs (datarefs);
free_dependence_relations (dependences);
continue;
}
but compute_data_dependences_for_loop may return true even if the analysis
is reported as failing by compute_affine_dependence for a dependence pair:
(compute_affine_dependence
ref_a: data[_14], stmt_a: data[_14] = i_59;
ref_b: data[_14], stmt_b: data[_14] = i_59;
Data ref a:
Data ref b:
affine dependence test not usable: access function not affine or constant.
) -> dependence analysis failed
Note that this is a self-dependence pair and the code for them reads:
/* Nothing interesting for the self dependencies. */
if (dra == drb)
continue;
This means that the pass may reorder "complex" accesses to the same memory
location in successive iterations, which is OK for reads but not for writes.
gcc/
* gimple-loop-jam.cc (tree_loop_unroll_and_jam): Bail out for a self
dependency that is a write-after-write if the access function is not
affine or constant.
gcc/testsuite/
* gcc.c-torture/execute/20221006-1.c: New test.
|
|
As the following testcase reduced from glibc fmtmsg.c shows
(it doesn't ICE on x86_64/i686 unfortunately, but does on various other
arches), my last optimize_range_tests_cmp_bitwise change wasn't fully
correct. The intent was to let all pointer operands be cast to
pointer_sized_int_node first in addition to the other casts (to type1)
which are done for id >= l cases.
But one spot I've touched used always cast to type1 (note, the (b % 4) == 3
case is impossible for pointer operands because that is for !TYPE_UNSIGNED
operands and pointers are TYPE_UNSIGNED) and in the other spot the cast
would be done only for id >= l if not useless, but for pointers we need
to cast it always.
2022-09-17 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/106958
* tree-ssa-reassoc.cc (optimize_range_tests_cmp_bitwise): If
id >= l, cast op to type1, otherwise to pointer_sized_int_node.
If type has pointer type, cast exp to pointer_sized_int_node
even when id < l.
* gcc.c-torture/compile/pr106958.c: New test.
|
|
My change to match.pd (that added the two simplifications this patch
touches) results in more |/^/& assignments with pointer arguments,
but since r12-1608 we reject pointer operands for BIT_NOT_EXPR.
Disallowing them for BIT_NOT_EXPR and allowing for BIT_{IOR,XOR,AND}_EXPR
leads to a match.pd maintainance nightmare (see one of the patches in the
PR), so either we want to allow pointer operand on BIT_NOT_EXPR (but then
we run into issues e.g. with the ranger which expects it can emulate
BIT_NOT_EXPR ~X as - 1 - X which doesn't work for pointers which don't
support MINUS_EXPR), or the following patch disallows pointer arguments
for all of BIT_{IOR,XOR,AND}_EXPR with the exception of BIT_AND_EXPR
with INTEGER_CST last operand (for simpler pointer realignment).
I had to tweak one reassoc optimization and the two match.pd
simplifications.
2022-09-14 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/106878
* tree-cfg.cc (verify_gimple_assign_binary): Disallow pointer,
reference or OFFSET_TYPE BIT_IOR_EXPR, BIT_XOR_EXPR or, unless
the second argument is INTEGER_CST, BIT_AND_EXPR.
* match.pd ((type) X op CST -> (type) (X op ((type-x) CST)),
(type) (((type2) X) op Y) -> (X op (type) Y)): Punt for
POINTER_TYPE_P or OFFSET_TYPE.
* tree-ssa-reassoc.cc (optimize_range_tests_cmp_bitwise): For
pointers cast them to pointer sized integers first.
* gcc.c-torture/compile/pr106878.c: New test.
|