riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-07-08	Add missing profile_dump check	Jan Hubicka	1	-0/+6
	gcc/ChangeLog: PR tree-optimization/110600 * cfgloopmanip.cc (scale_loop_profile): Add mising profile_dump check. gcc/testsuite/ChangeLog: PR tree-optimization/110600 * gcc.c-torture/compile/pr110600.c: New test.
2023-06-30	Fix couple of endianness issues in fold_ctor_reference	Eric Botcazou	4	-0/+112
	fold_ctor_reference attempts to use a recursive local processing in order to call native_encode_expr on the leaf nodes of the constructor, before falling back to calling native_encode_initializer if this fails. There are a couple of issues related to endianness present in it: 1) it does not specifically handle integral bit-fields; now these are left justified on big-endian platforms so cannot be treated like ordinary fields. 2) it does not check that the constructor uses the native storage order. gcc/ * gimple-fold.cc (fold_array_ctor_reference): Fix head comment. (fold_nonarray_ctor_reference): Likewise. Specifically deal with integral bit-fields. (fold_ctor_reference): Make sure that the constructor uses the native storage order. gcc/testsuite/ * gcc.c-torture/execute/20230630-1.c: New test. * gcc.c-torture/execute/20230630-2.c: Likewise. * gcc.c-torture/execute/20230630-3.c: Likewise * gcc.c-torture/execute/20230630-4.c: Likewise
2023-06-28	Enable early inlining into always_inline functions	Jan Hubicka	1	-0/+20
	Early inliner currently skips always_inline functions and moreover we ignore calls from always_inline in ipa_reverse_postorder. This leads to disabling most of propagation done using early optimization that is quite bad when early inline functions are not leaf functions, which is now quite common in libstdc++. This patch instead of fully disabling the inline checks calls in callee. I am quite conservative about what can be inlined as this patch is bit touchy anyway. To avoid problems with always_inline being optimized after early inline I extended inline_always_inline_functions to lazilly compute fnsummary when needed. gcc/ChangeLog: PR middle-end/110334 * ipa-fnsummary.h (ipa_fn_summary): Add safe_to_inline_to_always_inline. * ipa-inline.cc (can_early_inline_edge_p): ICE if SSA is not built; do cycle checking for always_inline functions. (inline_always_inline_functions): Be recrusive; watch for cycles; do not updat overall summary. (early_inliner): Do not give up on always_inlines. * ipa-utils.cc (ipa_reverse_postorder): Do not skip always inlines. gcc/testsuite/ChangeLog: PR middle-end/110334 * g++.dg/opt/pr66119.C: Disable early inlining. * gcc.c-torture/compile/pr110334.c: New test. * gcc.dg/tree-ssa/pr110334.c: New test.
2023-06-28	Add testcase for PR 110444	Andrew Pinski	1	-0/+11
	This testcase was fixed after r14-2135-gd915762ea9043da85 and there was no testcase for it before so adding one is a good thing. Committed as obvious after testing the testcase to make sure it works. gcc/testsuite/ChangeLog: PR tree-optimization/110444 * gcc.c-torture/compile/pr110444-1.c: New test.
2023-06-27	Mark asm goto with outputs as volatile	Andrew Pinski	1	-0/+26
	The manual references asm goto as being implicitly volatile already and that was done when asm goto could not have outputs. When outputs were added to `asm goto`, only asm goto without outputs were still being marked as volatile. Now some parts of GCC decide, removing the `asm goto` is ok if the output is not used, though not updating the CFG (this happens on both the RTL level and the gimple level). Since the biggest user of `asm goto` is the Linux kernel and they expect them to be volatile (they use them to copy to/from userspace), we should just mark the inline-asm as volatile. OK? Bootstrapped and tested on x86_64-linux-gnu. PR middle-end/110420 PR middle-end/103979 PR middle-end/98619 gcc/ChangeLog: * gimplify.cc (gimplify_asm_expr): Mark asm with labels as volatile. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/asmgoto-6.c: New test.
2023-06-16	tree-ssa-math-opts: Fix up uaddc/usubc pattern matching [PR110271]	Jakub Jelinek	1	-0/+24
	The following testcase ICEs, because I misremembered what the return value from match_arith_overflow is. It isn't true if __builtin__overflow was matched, but it is true only in the BIT_NOT_EXPR case if stmt was removed. So, if match_arith_overflow matches something, gsi_stmt (gsi) will not be stmt and match_uaddc_usubc will be confused and can ICE. The following patch fixes it by checking if gsi_stmt (gsi) == stmt, in that case we know it is still a PLUS_EXPR/MINUS_EXPR and we can try to pattern match it further as UADDC/USUBC. 2023-06-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/110271 tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children) <case PLUS_EXPR>: Ignore return value from match_arith_overflow, instead call match_uaddc_usubc only if gsi_stmt (gsi) is still stmt. * gcc.c-torture/compile/pr110271.c: New test.
2023-06-15	Add another testcase for PR 110266	Andrew Pinski	1	-0/+9
	Since the combining of sin/cos into cexpi is depedent on the target, this adds another testcase which had failed (earlier in evpr rather than vrp2) that will fail on all targets rather than ones which have sincos or C99 math functions. Committed as obvious after a quick test. gcc/testsuite/ChangeLog: PR tree-optimization/110266 * gcc.c-torture/compile/pr110266.c: New test.
2023-06-12	[committed] [PR rtl-optimization/101188] Fix reload_cse_move2add ignoring ↵	Jeff Law	1	-0/+61
	clobbers So as Georg-Johann discusses in the BZ, reload_cse_move2add can generate incorrect code when optimizing code with clobbers. Specifically in the case where we try to optimize a sequence of 4 operations down to 3 operations we can reset INSN to the next instruction and continue the loop. That skips the code to invalidate objects based on things like REG_INC nodes, stack pushes and most importantly clobbers attached to the current insn. This patch factors all of the invalidation code used by reload_cse_move2add into a new function and calls it at the appropriate time. Georg-Johann has confirmed this patch fixes his avr bug and I've had it in my tester over the weekend. It's bootstrapped and regression tested on aarch64, m68k, sh4, alpha and hppa. It's also regression tested successfully on a wide variety of other targets. gcc/ PR rtl-optimization/101188 * postreload.cc (reload_cse_move2add_invalidate): New function, extracted from... (reload_cse_move2add): Call reload_cse_move2add_invalidate. gcc/testsuite PR rtl-optimization/101188 * gcc.c-torture/execute/pr101188.c: New test
2023-06-09	MATCH: Fix zero_one_valued_p not to match signed 1 bit integers	Andrew Pinski	2	-0/+61
	So for the attached testcase, we assumed that zero_one_valued_p would be the value [0,1] but currently zero_one_valued_p matches also signed 1 bit integers. This changes that not to match that and fixes the 2 new testcases at all optimization levels. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Note the GCC 13 patch will be slightly different due to the changes made to zero_one_valued_p. PR tree-optimization/110165 PR tree-optimization/110166 gcc/ChangeLog: * match.pd (zero_one_valued_p): Don't accept signed 1-bit integers. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr110165-1.c: New test. * gcc.c-torture/execute/pr110166-1.c: New test.
2023-05-22	testsuite/52641: Fix more of implicit int=32 assumption fallout.	Georg-Johann Lay	2	-0/+3
	gcc/testsuite/ PR testsuite/52641 * gcc.c-torture/compile/pr108892.c: Require int32. * gcc.c-torture/compile/pr98199.c: Require int32plus. * gcc.dg/analyzer/call-summaries-pr107072.c: Same. * gcc.dg/analyzer/null-deref-pr105755.c: Same. * gcc.dg/tree-ssa/pr102232.c: Same. * gcc.dg/tree-ssa/pr105860.c: Same. * gcc.dg/tree-ssa/pr96730.c: Same. * gcc.dg/tree-ssa/pr96779-disabled.c: Same. * gcc.dg/tree-ssa/pr96779.c: Same. * gcc.dg/tree-ssa/pr98513.c: Same. * gcc.dg/tree-ssa/ssa-sink-18.c * gcc.dg/analyzer/coreutils-cksum-pr108664.c: Require int32plus, size24plus. * gcc.dg/analyzer/doom-s_sound-pr108867.c: Require size32plus. * gcc.dg/analyzer/malloc-CWE-590-examples.c: Same. * gcc.dg/debug/btf/btf-bitfields-4.c: Same. * gcc.dg/tree-ssa/pr93435.c: Same. * gcc.dg/analyzer/null-deref-pr102671-1.c: Require ptr_eq_long: * gcc.dg/analyzer/null-deref-pr102671-2.c: Same. * gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c: Same. * gcc.dg/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c: Same. * gcc.dg/tree-ssa/pr103345.c: Use uint32_t. * gcc.dg/tree-ssa/ssa-ccp-41.c [sizeof(int)==2]: Same. * gcc.dg/tree-ssa/pr109031-1.c: Use uint16_t, uint32_t. * gcc.dg/tree-ssa/pr109031-2.c: Same. * gcc.dg/Warray-bounds-49.c (dg-warning): Discriminate int != short. * gcc.dg/Warray-bounds-52.c (dg-warning): Discriminate avr. * gcc.dg/Warray-bounds-33.c: Skip target avr. * gcc.dg/analyzer/fd-access-mode-target-headers.c: Same. * gcc.dg/analyzer/flex-with-call-summaries.c: Same. * gcc.dg/analyzer/isatty-1.c: Same. * gcc.dg/analyzer/pipe-glibc.c: Same.
2023-05-22	testsuite/52641: Fix tests that fail for 16-bit int / pointer targets.	Georg-Johann Lay	4	-0/+7
	gcc/testsuite/ PR testsuite/52641 * c-c++-common/pr19807-2.c: Use __SIZEOF_INT__ instead of 4. * gcc.c-torture/compile/pr103813.c: Require size32plus. * gcc.c-torture/execute/pr108498-2.c: Same. * gcc.c-torture/compile/pr96426.c: Condition on __SIZEOF_LONG_LONG__ == __SIZEOF_DOUBLE__. * gcc.c-torture/execute/pr103417.c: Require int32plus. * gcc.dg/pr104198.c: Same. * gcc.dg/pr21137.c: Same. * gcc.dg/pr88905.c: Same. * gcc.dg/pr90838.c: Same. * gcc.dg/pr97317.c: Same. * gcc.dg/pr100292.c: Require int32. * gcc.dg/pr101008.c: Same. * gcc.dg/pr96542.c: Same. * gcc.dg/pr96674.c: Same. * gcc.dg/pr97750.c: Require ptr_eq_long.
2023-05-21	Fix PR 109919: ICE in emit_move_insn with some bit tests	Andrew Pinski	1	-0/+9
	The problem is I used expand_expr with the target but we don't want to use the target here as it is the wrong mode for the original expression. The testcase would ICE deap down while trying to do a move to use the target. Anyways just calling expand_expr with NULL_EXPR fixes the issue. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/109919 gcc/ChangeLog: * expr.cc (expand_single_bit_test): Don't use the target for expand_expr. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109919-1.c: New test.
2023-05-12	MATCH: Fix PR 109834, ICE with popcount combined with bswap	Andrew Pinski	1	-0/+4
	After r14-673-gc0dd80e4c4c3, there was a check in the match patterns which was checking the type is unsigned but instead of using the type, the patch used the expression. This adds the needed TREE_TYPE so get the correct answer and don't ICE. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR tree-optimization/109834 gcc/ChangeLog: * match.pd (popcount(bswap(x))->popcount(x)): Fix up unsigned type checking. (popcount(rotate(x,y))->popcount(x)): Likewise. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109834-1.c: New test. * gcc.dg/tree-ssa/pr109834-1.c: New test.
2023-05-10	Add another new testcase	Andrew Pinski	1	-0/+34
	While working on improving min/max detection, this code (which is reduced from worse_state in ipa-pure-const.cc) was being miscompiled. Since there was no testcase in the testsuite yet for this, this patch adds one. Committed as obvious after testing the testcase via: make check-gcc RUNTESTFLAGS="execute.exp=20230510-1.c" gcc/testsuite/ChangeLog: * gcc.c-torture/execute/20230510-1.c: New test.
2023-05-10	New testcase	Andrew Pinski	1	-0/+28
	While I was writting a match.pd patch, I can across GCC was being miscompiled but no testcase was failing. So this adds that testcase. Committed after testing on x86_64 with make check-gcc RUNTESTFLAGS="execute.exp=20230509-1.c" gcc/testsuite/ChangeLog: * gcc.c-torture/execute/20230509-1.c: New test.
2023-05-09	tree-ssa-ccp, wide-int: Fix up handling of [LR]ROTATE_EXPR in bitwise ccp ↵	Jakub Jelinek	1	-0/+26
	[PR109778] The following testcase is miscompiled, because bitwise ccp2 handles a rotate with a signed type incorrectly. Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3 arguments, all other callers just rotate in the right precision and I think work correctly. ccp works with widest_ints and so rotations by the excessive precision certainly don't match what it wants when it sees a rotate in some specific bitsize. Still, if it is unsigned rotate and the widest_int is zero extended from width, the functions perform left shift and logical right shift on the value and then at the end zero extend the result of left shift and uselessly also the result of logical right shift and return \| of that. On the testcase we the signed char rrotate by 4 argument is CONSTANT -75 i.e. 0xffffffff....fffffb5 with mask 2. The mask is correctly rotated to 0x20, but because the 8-bit constant is sign extended to 192-bit one, the logical right shift by 4 doesn't yield expected 0xb, but gives 0xfffffffffff....ffffb, and then return wi::zext (left, width) \| wi::zext (right, width); where left is 0xfffffff....fb50, so we return 0xfb instead of the expected 0x5b. The following patch fixes that by doing the zero extension in case of the right variable before doing wi::lrshift rather than after it. Also, wi::[lr]rotate widht width < precision always zero extends the result. I'm afraid it can't do better because it doesn't know if it is done for an unsigned or signed type, but the caller in this case knows that very well, so I've done the extension based on sgn in the caller. E.g. 0x5b rotated right (or left) by 4 with width 8 previously gave 0xb5, but sgn == SIGNED in widest_int it should be 0xffffffff....fffb5 instead. 2023-05-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109778 * wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on wi::zext (x, width) rather than x if width != precision, rather than using wi::zext (right, width) after the shift. * tree-ssa-ccp.cc (bit_value_binop): Call wi::ext on the results of wi::lrotate or wi::rrotate. * gcc.c-torture/execute/pr109778.c: New test.
2023-04-24	PHIOPT: Move check on diamond bb to tree_ssa_phiopt_worker from ↵	Andrew Pinski	2	-0/+26
	minmax_replacement This moves the check to make sure on the diamond shaped form bbs that the the two middle bbs are only for that diamond shaped form earlier in the shared code. Also remove the redundant check for single_succ_p since that was already done before hand. The next patch will simplify the code even further and remove redundant checks. PR tree-optimization/109604 gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Move the diamond form check from ... (minmax_replacement): Here. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109604-1.c: New test. * gcc.c-torture/compile/pr109604-2.c: New test.
2023-04-14	combine: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]	Jakub Jelinek	1	-0/+23
	The following testcase is miscompiled on riscv since the addition of mvconst_internal define_insn_and_split. We have: (insn 36 35 39 2 (set (mem/c:SI (plus:SI (reg/f:SI 65 frame) (const_int -64 [0xffffffffffffffc0])) [2 S4 A128]) (reg:SI 166)) "pr109040.c":9:11 178 {movsi_internal} (expr_list:REG_DEAD (reg:SI 166) (nil))) (insn 39 36 40 2 (set (reg:SI 171) (zero_extend:SI (mem/c:HI (plus:SI (reg/f:SI 65 frame) (const_int -64 [0xffffffffffffffc0])) [0 S2 A128]))) "pr109040.c":9:11 111 {zero_extendhisi2} (nil)) and RTL DSE's replace_read since r0-86337-g18b526e806ab6455 handles even different modes like in the above case, and so it optimizes it into: (insn 47 35 39 2 (set (reg:HI 175) (subreg:HI (reg:SI 166) 0)) "pr109040.c":9:11 179 {movhi_internal} (expr_list:REG_DEAD (reg:SI 166) (nil))) (insn 39 47 40 2 (set (reg:SI 171) (zero_extend:SI (reg:HI 175))) "pr109040.c":9:11 111 {zero_extendhisi2} (expr_list:REG_DEAD (reg:HI 175) (nil))) Pseudo 166 is result of AND with 0x8084c constant (forced into a register). Combine attempts to combine the AND with the insn 47 above created by DSE, and turns it because of WORD_REGISTER_OPERATIONS and its assumption that all the subword operations are actually done on word mode into: (set (subreg:SI (reg:HI 175) 0) (and:SI (reg:SI 167 [ m ]) (reg:SI 168))) and later on the ZERO_EXTEND is thrown away. We then see (and:SI (subreg:SI (reg:HI 175) 0) (const_int 0x84c)) and optimize that into (subreg:SI (and:HI (reg:HI 175) (const_int 0x84c)) 0) which is still fine, in WORD_REGISTER_OPERATIONS the AND in HImode will set all upper bits up to BITS_PER_WORD to zeros. But later on simplify_binary_operation_1 or simplify_and_const_int_1 sees that because nonzero_bits ((reg:HI 175), HImode) == 0x84c, we can optimize the AND into (reg:HI 175). That isn't correct, because while the low 16 bits of that REG are known to have all bits but 0x84c cleared, we don't know that all the upper 16 bits are all clear as well. So, for WORD_REGISTER_OPERATIONS for integral modes smaller than word mode, we need to check all bits from word_mode in nonzero_bits for the optimizations. 2023-04-14 Jeff Law <jlaw@ventanamicro.com> Jakub Jelinek <jakub@redhat.com> PR target/108947 PR target/109040 combine.cc (simplify_and_const_int_1): Compute nonzero_bits in word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is smaller than word_mode. * simplify-rtx.cc (simplify_context::simplify_binary_operation_1) <case AND>: Likewise. * gcc.dg/pr108947.c: New test. * gcc.c-torture/execute/pr109040.c: New test.
2023-04-05	[RFA][Bug target/108892 ][13 regression] Force re-recognition after changing ↵	Jeff Law	1	-0/+23
	RTL structure of an insn So as mentioned in the PR the underlying issue here is combine changes the form of an existing insn, but fails to force re-recognition. As a result other parts of the compiler blow up. > /* Temporarily replace the set's source with the > contents of the REG_EQUAL note. The insn will > be deleted or recognized by try_combine. / > rtx orig_src = SET_SRC (set); rtx orig_dest = SET_DEST (set); if (GET_CODE (SET_DEST (set)) == ZERO_EXTRACT) > SET_DEST (set) = XEXP (SET_DEST (set), 0); > SET_SRC (set) = note; > i2mod = temp; > i2mod_old_rhs = copy_rtx (orig_src); > i2mod_new_rhs = copy_rtx (note); > next = try_combine (insn, i2mod, NULL, NULL, > &new_direct_jump_p, last_combined_insn); > i2mod = NULL; > if (next) > { > statistics_counter_event (cfun, "insn-with-note combine", 1); > goto retry; > } SET_SRC (set) = orig_src; > SET_DEST (set) = orig_dest; This code replaces the SET_SRC of an insn in the RTL stream with the contents of a REG_EQUAL note. So given an insn like this: > (insn 122 117 127 2 (set (reg:DI 157 [ _46 ]) > (ior:DI (reg:DI 200) > (reg:DI 251))) "j.c":14:5 -1 > (expr_list:REG_EQUAL (const_int 25769803782 [0x600000006]) > (nil))) It replaces the (ior ...) with a (const_int ...). The resulting insn is passed to try_combine which will try to recognize it, then use it in a combination attempt. Recognition succeeds with the special define_insn_and_split pattern in the risc-v backend resulting in: > (insn 122 117 127 2 (set (reg:DI 157 [ _46 ]) > (const_int 25769803782 [0x600000006])) "j.c":14:5 177 {mvconst_internal} > (expr_list:REG_EQUAL (const_int 25769803782 [0x600000006]) > (nil))) This is as-expected. Now assume we were unable to combine anything, so try_combine returns NULL_RTX. The quoted code above restores SET_SRC (and SET_DEST) resulting in: > (insn 122 117 127 2 (set (reg:DI 157 [ _46 ]) > (ior:DI (reg:DI 200) > (reg:DI 251))) "j.c":14:5 177 {mvconst_internal} > (expr_list:REG_EQUAL (const_int 25769803782 [0x600000006]) > (nil))) But this doesn't get re-recognized and we ICE later in LRA. The fix is trivial, reset the INSN_CODE to force re-recognition in the case where try_combine fails. PR target/108892 gcc/ combine.cc (combine_instructions): Force re-recognition when after restoring the body of an insn to its original form. gcc/testsuite/ * gcc.c-torture/compile/pr108892.c: New test.
2023-04-04	range-op-float: Fix reverse ops of comparisons [PR109386]	Jakub Jelinek	1	-0/+21
	I've missed one of my recent range-op-float.cc changes (likely the r13-6967 one) caused FAIL: libphobos.phobos/std/math/algebraic.d execution test FAIL: libphobos.phobos_shared/std/math/algebraic.d execution test regressions, distilled into a C testcase below. In the testcase, we have !(u >= v) condition where both u and v are results of fabs, which guards t1 = u u<= __FLT_MAX__; and t2 = v u<= __FLT_MAX__; comparisons. From false u >= v where u and v have [0.0, +Inf] NAN ranges we (incorrectly deduce that one of them is [nextafterf (0.0, 1.0), +Inf] NAN and the other is [0.0, nextafterf (+Inf, 0.0)] NAN and from that deduce that one of the comparisons is always true, because UNLE_EXPR with the maximum representable number are false only if the value is +Inf and our ranges tell that is not possible. The bug is that the u >= v comparison determines a sensible range only when it is true - we then know neither operand can be NAN and it behaves correctly. But when the comparison is false, our current code gives sensible answers only if the other op can't be NAN. If it can be NAN, whenever it is NAN, the comparison is always false regardless of the other value, so the other value needs to be VARYING. Now, foperator_unordered_lt::op1_range etc. had code to deal with that for op?.known_nan (), but as the testcase shows, it is enough if it may be a NAN at runtime to make it VARYING. So, the following patch replaces for all those BRS_FALSE cases of the normal non-equality comparisons if (opOTHER.known_isnan ()) r.set_varying (type); to do it also if maybe_isnan (). For the unordered or ... comparisons, it is similar for BRS_TRUE. Those comparisons are true whenever either operand is NAN, or if neither is NAN, the corresponding normal comparison. So, if those comparisons are true and other operand might be a NAN, we can't tell (VARYING), if it is false, currently handling is correct. 2023-04-04 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109386 range-op-float.cc (foperator_lt::op1_range, foperator_lt::op2_range, foperator_le::op1_range, foperator_le::op2_range, foperator_gt::op1_range, foperator_gt::op2_range, foperator_ge::op1_range, foperator_ge::op2_range): Make r varying for BRS_FALSE case even if the other op is maybe_isnan, not just known_isnan. (foperator_unordered_lt::op1_range, foperator_unordered_lt::op2_range, foperator_unordered_le::op1_range, foperator_unordered_le::op2_range, foperator_unordered_gt::op1_range, foperator_unordered_gt::op2_range, foperator_unordered_ge::op1_range, foperator_unordered_ge::op2_range): Make r varying for BRS_TRUE case even if the other op is maybe_isnan, not just known_isnan. * gcc.c-torture/execute/ieee/pr109386.c: New test.
2023-03-12	gcc.c-torture/compile/103818.c: Enable for llp64 too	Jonathan Yong	1	-1/+1
	gcc/testsuite/ChangeLog: * gcc.c-torture/compile/103818.c: Enable test for llp64. Signed-off-by: Jonathan Yong <10walls@gmail.com>
2023-03-09	range-op-float: Fix up reverse binary operations [PR109008]	Jakub Jelinek	1	-0/+18
	The following testcase is reduced from miscompilation of scipy package. If we have say lhs = [1., 1.] - [1., 1.] and want to compute the range of lhs from it, we correctly determine it is [0., 0.] (if computations are exact, we generally don't try to round them further in frange_arithmetic). In the testcase it is about a reverse operation, [1., 1.] = op1 + [1., 1.] and we want to compute range of op1 from that. Right now we just perform the inverse operation (there are some corner cases about NaN and infinities handling) and so arrive to range [0., 0.] as well, and because it is a singleton, optimize return eps; to return 0. That is incorrect though, for the reverse ops we need to take into account also rounding, the right exact range is [-0x1.0p-54, 0x1.0p-53] in this case when rounding to nearest, i.e. all numbers which added to 1. with round to nearest still produce 1. The problem isn't solely on singleton ranges, and isn't solely on results around zero. We basically need to consider also values where the result is up to 0.5ulp away from the lhs range boundaries in each direction. The following patch fixes it by extending the lhs range for the reverse operations by 1ulp in each direction. The PR contains a pseudo-random test generator I've used to generate 300000 tests of + and - and then used the same test with * and / instead of + and - together with a hack to print the discovered ranges by the patch in a form that another test could then verify the range is conservatively correct and how far it is from a minimal range. I believe the results are good enough for now, though plan to look incrementally into trying to do something better on the -XXX_MAX or XXX_MAX boundaries (where I think frange_nextafter will use -inf or +inf) and also try to increase the range just by 0.5ulp rather than 1ulp if !flag_rounding_math. But dunno if either of those will be doable and will pass the testing, so I think it is worth committing this fix first. 2023-03-09 Jakub Jelinek <jakub@redhat.com> Richard Biener <rguenther@suse.de> PR tree-optimization/109008 * range-op-float.cc (float_widen_lhs_range): New function. (foperator_plus::op1_range, foperator_minus::op1_range, foperator_minus::op2_range, foperator_mult::op1_range, foperator_div::op1_range, foperator_div::op2_range): Use it. * gcc.c-torture/execute/ieee/pr109008.c: New test.
2023-02-16	reassoc: Fix up (ab) handling in eliminate_redundant_comparison [PR108783]	Jakub Jelinek	1	-0/+42
	The following testcase ICEs because eliminate_redundant_comparison sees redundant comparisons in &&/\|\| where the comparison has (ab) SSA_NAME, maybe_fold_{and,or}_comparisons optimizes them into a single comparison and build_and_add_sum emits a new comparison close to the definition operands, which in this case is before a returns_twice call (which is invalid). Generally reassoc just punts on (ab) SSA_NAMEs, declares them non-reassociable etc., so the second half of this patch does that. Though we can do better in this case; the function has special code when maybe_fold_{and,or}_comparisons returns INTEGER_CST (false/true) or when what it returns is the same as curr->op (the first of the comparisons we are considering) - in that case we just remove the second one and keep the first one. The reason it doesn't match is that curr->op is a SSA_NAME whose SSA_NAME_DEF_STMT is checked to be a comparison, in this case _42 = a_1(ab) != 0 and the other comparison is also like that. maybe_fold_{and,or}_comparisons looks through the definitions though and so returns a_1(ab) != 0 as tree. So the first part of the patch checks whether that returned comparison isn't the same as the curr->op comparison and if yes, it just overrides t back to curr->op so that its SSA_NAME is reused. In that case we can handle even (ab) in {,new}op{1,2} because we don't create a new comparison of that, just keep using the existing one. And t can't be (ab) because otherwise it wouldn't be considered a reassociable operand. The (ab) checks are needed say when we have a_1(ab) == 42 \|\| a_1(ab) > 42 kind of comparisons where maybe_fold_{and,or}_comparisons returns a new comparison not existing in the IL yet. 2023-02-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108783 * tree-ssa-reassoc.cc (eliminate_redundant_comparison): If lcode is equal to TREE_CODE (t), op1 to newop1 and op2 to newop2, set t to curr->op. Otherwise, punt if either newop1 or newop2 are SSA_NAME_OCCURS_IN_ABNORMAL_PHI SSA_NAMEs. * gcc.c-torture/compile/pr108783.c: New test.
2023-02-09	tree-optimization: [PR108684] ICE in verify_ssa due to simple_dce_from_worklist	Andrew Pinski	2	-0/+31
	In simple_dce_from_worklist, we were removing an inline-asm which had a vdef. We should not be removing inline-asm which have a vdef as this code does not check to the store. This fixes that oversight. This was a latent bug exposed recently by both VRP and removal of stores to static starting to use simple_dce_from_worklist. Committed as approved. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/108684 gcc/ChangeLog: * tree-ssa-dce.cc (simple_dce_from_worklist): Check all ssa names and not just non-vdef ones before accepting the inline-asm. Call unlink_stmt_vdef on the statement before removing it. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/dce-inline-asm-1.c: New test. * gcc.c-torture/compile/dce-inline-asm-2.c: New test. * gcc.dg/tree-ssa/pr108684-1.c: New test. co-authored-by: Andrew Macleod <amacleod@redhat.com>
2023-02-09	match.pd: Simplify BFR of insert when extracting exactly all inserted bits ↵	Jakub Jelinek	1	-2/+1
	[PR108688] On Thu, Feb 09, 2023 at 09:16:17AM +0100, Richard Biener via Gcc-patches wrote: > Hmm. Can we handle the case of the extraction exactly covering the > insertion separately then and simplify to plain @1? I was suggesting that in the PR. Here it is as an incremental patch on top of Andrew's patch. On the newly added testcase the ifcvt-folding difference without/with the incremental patch is: --- pr108688.c.171t.ifcvt_ 2023-02-09 10:47:30.169916845 +0100 +++ pr108688.c.171t.ifcvt 2023-02-09 10:48:44.942793453 +0100 @@ -25,6 +25,8 @@ Number of blocks in CFG: 11 Number of blocks to update: 5 ( 45%) +Applying pattern match.pd:7487, gimple-match.cc:243200 +Applying pattern match.pd:3987, gimple-match.cc:75423 Matching expression match.pd:1677, gimple-match.cc:209 Applying pattern match.pd:1733, gimple-match.cc:109481 Matching expression match.pd:2393, gimple-match.cc:852 @@ -70,7 +72,6 @@ void foo () signed char _29; <unnamed-signed:7> _30; unsigned int ivtmp_33; - <unnamed-signed:7> _ifc__35; unsigned char _ifc__37; unsigned char _ifc__38; unsigned char _ifc__39; @@ -91,8 +92,7 @@ void foo () _2 = (<unnamed-signed:7>) a.0_1; _ifc__38 = u.D.2741; _ifc__39 = BIT_INSERT_EXPR <_ifc__38, _2, 0 (7 bits)>; - _ifc__35 = BIT_FIELD_REF <_ifc__39, 7, 0>; - _4 = (signed char) _ifc__35; + _4 = (signed char) _2; b.1_5 = b; _6 = (signed char) b.1_5; _7 = _4 ^ _6; 2023-02-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108688 * match.pd (bit_field_ref [bit_insert]): Simplify BIT_FIELD_REF of BIT_INSERT_EXPR extracting exactly all inserted bits even when without mode precision. Formatting fixes. * gcc.c-torture/compile/pr108688-1.c: Add PR number as comment. * gcc.dg/pr108688.c: New test.
2023-02-09	match.pd: When simplifying BFR of an insert, require a mode precision ↵	Andrew Pinski	1	-0/+15
	integral type [PR108688] The same problem as PR 88739 has crept in but this time in match.pd when simplifying bit_field_ref of an bit_insert. That is we are generating a BIT_FIELD_REF of a non-mode-precision integral type. PR tree-optimization/108688 * match.pd (bit_field_ref [bit_insert]): Avoid generating BIT_FIELD_REFs of non-mode-precision integral operands. * gcc.c-torture/compile/pr108688-1.c: New test.
2023-02-07	cgraph: Handle simd clones in cgraph_node::set_{const,pure}_flag [PR106433]	Jakub Jelinek	1	-0/+24
	The following testcase ICEs, because we determine only in late pure const pass that bar is const (the content of the function loses a store to a global var during dse3 and read from it during cddce2) and local-pure-const2 makes it const. The cgraph ordering is that post IPA (in late IPA simd clones are created) bar is processed first, then foo as its caller, then foo.simdclone* and finally bar.simdclone. Conceptually I think that is the right ordering which allows for static simd clones to be removed. The reason for the ICE is that because bar was marked const, the call to it lost vops before vectorization, and when we in foo.simdclone try to vectorize the call to bar, we replace it with bar.simdclone* which hasn't been marked const and so needs vops, which we don't add. Now, because the simd clones are created from the same IL, just in a loop with different argument/return value passing, I think generally if the base function is determined to be const or pure, the simd clones should be too, unless e.g. the vectorization causes different optimization decisions, but then still the global memory reads if any shouldn't affect what the function does and global memory stores shouldn't be reachable at runtime. So, the following patch changes set_{const,pure}_flag to mark also simd clones. 2023-02-07 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106433 * cgraph.cc (set_const_flag_1): Recurse on simd clones too. (cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too. * gcc.c-torture/compile/pr106433.c: New test.
2023-02-03	irange: Compare nonzero bits in irange with widest_int [PR108639]	Aldy Hernandez	2	-0/+23
	The problem here is we are trying to compare two ranges with different precisions and the == operator in wide_int is complaining. Interestingly, the problem is not the nonzero bits, but the fact that the entire ranges have different precisions. The reason we don't ICE when comparing the sub-ranges, is because the code in irange::operator== works on trees, and tree_int_cst_equal is promoting the comparison to a widest int: if (TREE_CODE (t1) == INTEGER_CST && TREE_CODE (t2) == INTEGER_CST && wi::to_widest (t1) == wi::to_widest (t2)) return 1; This is why we don't see the ICE until the nonzero bits comparison is done on wide ints. I think we should maintain the current equality behavior, and follow suit in the nonzero bit comparison. I have also fixed the legacy equality code, even though technically nonzero bits shouldn't appear in legacy. But better safe than sorry. PR tree-optimization/108639 gcc/ChangeLog: * value-range.cc (irange::legacy_equal_p): Compare nonzero bits as widest_int. (irange::operator==): Same.
2023-02-02	amdgcn, libgomp: Manually allocated stacks	Andrew Stubbs	1	-1/+1
	Switch from using stacks in the "private segment" to using a memory block allocated on the host side. The primary reason is to permit the reverse offload implementation to access values located on the device stack, but there may also be performance benefits, especially with repeated kernel invocations. This implementation unifies the stacks with the "team arena" optimization feature, and now allows both to have run-time configurable sizes. A new ABI is needed, so all libraries must be rebuilt, and newlib must be version 4.3.0.20230120 or newer. gcc/ChangeLog: * config/gcn/gcn-run.cc: Include libgomp-gcn.h. (struct kernargs): Replace the common content with kernargs_abi. (struct heap): Delete. (main): Read GCN_STACK_SIZE envvar. Allocate space for the device stacks. Write the new kernargs fields. * config/gcn/gcn.cc (gcn_option_override): Remove stack_size_opt. (default_requested_args): Remove PRIVATE_SEGMENT_BUFFER_ARG and PRIVATE_SEGMENT_WAVE_OFFSET_ARG. (gcn_addr_space_convert): Mask the QUEUE_PTR_ARG content. (gcn_expand_prologue): Move the TARGET_PACKED_WORK_ITEMS to the top. Set up the stacks from the values in the kernargs, not private. (gcn_expand_builtin_1): Match the stack configuration in the prologue. (gcn_hsa_declare_function_name): Turn off the private segment. (gcn_conditional_register_usage): Ensure QUEUE_PTR is fixed. * config/gcn/gcn.h (FIXED_REGISTERS): Fix the QUEUE_PTR register. * config/gcn/gcn.opt (mstack-size): Change the description. include/ChangeLog: * gomp-constants.h (GOMP_VERSION_GCN): Bump. libgomp/ChangeLog: * config/gcn/libgomp-gcn.h (DEFAULT_GCN_STACK_SIZE): New define. (DEFAULT_TEAM_ARENA_SIZE): New define. (struct heap): Move to this file. (struct kernargs_abi): Likewise. * config/gcn/team.c (gomp_gcn_enter_kernel): Use team arena size from the kernargs. * libgomp.h: Include libgomp-gcn.h. (TEAM_ARENA_SIZE): Remove. (team_malloc): Update the error message. * plugin/plugin-gcn.c (struct kernargs): Move common content to struct kernargs_abi. (struct agent_info): Rename team arenas to ephemeral memories. (struct team_arena_list): Rename .... (struct ephemeral_memories_list): to this. (struct heap): Delete. (team_arena_size): New variable. (stack_size): New variable. (print_kernel_dispatch): Update debug messages. (init_environment_variables): Read GCN_TEAM_ARENA_SIZE. Read GCN_STACK_SIZE. (get_team_arena): Rename ... (configure_ephemeral_memories): ... to this, and set up stacks. (release_team_arena): Rename ... (release_ephemeral_memories): ... to this. (destroy_team_arenas): Rename ... (destroy_ephemeral_memories): ... to this. (create_kernel_dispatch): Add num_threads parameter. Adjust for kernargs_abi refactor and ephemeral memories. (release_kernel_dispatch): Adjust for ephemeral memories. (run_kernel): Pass thread-count to create_kernel_dispatch. (GOMP_OFFLOAD_init_device): Adjust for ephemeral memories. (GOMP_OFFLOAD_fini_device): Adjust for ephemeral memories. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr47237.c: Xfail on amdgcn. * gcc.dg/builtin-apply3.c: Xfail for amdgcn. * gcc.dg/builtin-apply4.c: Xfail for amdgcn. * gcc.dg/torture/stackalign/builtin-apply-3.c: Xfail for amdgcn. * gcc.dg/torture/stackalign/builtin-apply-4.c: Xfail for amdgcn.
2023-01-31	bbpart: Fix up ICE on asm goto [PR108596]	Jakub Jelinek	1	-0/+26
	On the following testcase we have asm goto in hot block with 2 successors, one cold to which it both falls through and has one of the label pointing to it and another hot successor with another label. Now, during bbpart we want to ensure that no blocks from one partition fall through into a block in a different partition. fix_up_fall_thru_edges does that by temporarily clearing the EDGE_CROSSING on the fallthrough edge, calling force_nonfallthru and then depending on whether it created a new bb either set EDGE_CROSSING on the single successor edge from the new bb (the new bb is kept in the same partition as the predecessor block), or if no new bb has been created setting EDGE_CROSSING back on the fallthru edge which has been forced non-EDGE_FALLTHRU. For asm goto this doesn't always work, force_nonfallthru can create a new bb and change the fallthrough edge to point to that, but if the original fallthru destination block has its label referenced among the asm goto labels, it will create a new non-fallthru edge for the label(s). But because we've temporarily cheated and cleared EDGE_CROSSING on the edge, it is cleared on the new edge as well, then the caller sees we've created a new bb and just sets EDGE_CROSSING on the single fallthru edge from the new bb. But the direct edge from cur_bb to fallthru edge's destination isn't handled and fails afterwards consistency checks, because it crosses partitions. The following patch notes the case and sets EDGE_CROSSING on that edge too. 2023-01-31 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/108596 * bb-reorder.cc (fix_up_fall_thru_edges): Handle the case where cur_bb ends with asm goto and has a crossing fallthrough edge to the same bb that contains at least one of its labels by restoring EDGE_CROSSING flag even on possible edge from cur_bb to new_bb successor. * gcc.c-torture/compile/pr108596.c: New test.
2023-01-26	frange: Fix up foperator_{,not_}equal::fold_range for signed zeros [PR108540]	Jakub Jelinek	2	-0/+107
	The following testcases are miscompiled, because threader sees some SSA_NAME would have -0.0 value and when computing range of SSA_NAME == 0.0 foperator_equal::fold_range sees one operand has [-0.0, -0.0] singleton range, the other [0.0, 0.0], they aren't equal (frange operator== uses real_identical etc. rather than real comparisons) and so it thinks they compare unequal. With signed zeros -0.0 == 0.0 is true though, so we need to special case the both ranges singleton code. Similarly, if we see op1 range being say [-42.0, -0.0] and op2 range [0.0, 42.0], we'd check that the intersection of the two ranges is empty (that is correct) and fold the result of == between such operands to [0, 0] which is wrong, because -0.0 == 0.0, it needs to be [0, 1]. Similarly for foperator_not_equal::fold_range. 2023-01-26 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108540 * range-op-float.cc (foperator_equal::fold_range): If both op1 and op2 are singletons, use range_true even if op1 != op2 when one range is [-0.0, -0.0] and another [0.0, 0.0]. Similarly, even if intersection of the ranges is empty and one has zero low bound and another zero high bound, use range_true_and_false rather than range_false. (foperator_not_equal::fold_range): If both op1 and op2 are singletons, use range_false even if op1 != op2 when one range is [-0.0, -0.0] and another [0.0, 0.0]. Similarly, even if intersection of the ranges is empty and one has zero low bound and another zero high bound, use range_true_and_false rather than range_true. * gcc.c-torture/execute/ieee/pr108540-1.c: New test. * gcc.c-torture/execute/ieee/pr108540-2.c: New test.
2023-01-25	store-merging: Disable string_concatenate mode if start or end aren't byte ↵	Jakub Jelinek	2	-0/+173
	aligned [PR108498] The first of the following testcases is miscompiled on powerpc64-linux -O2 -m64 at least, the latter at least on x86_64-linux -m32/-m64. Since GCC 11 store-merging has a separate string_concatenation mode which turns stores into setting a MEM_REF from a STRING_CST. This mode is triggered if at least one of the to be merged stores is a STRING_CST store and either the first store (to earliest address) is that STRING_CST store or the first store is 8-bit INTEGER_CST store and then there are some rules when to turn that mode off or not merge further stores into it. The problem with these 2 testcases is that the actual implementation relies on start/width of the store to be at byte boundaries, as it simply creates a char array, MEM_REF can be only on byte boundaries and the char array too, plus obviously STRING_CST as well. But as can be easily seen in the second testcase, nothing verifies this, while the first store has to be a STRING_CST (which will be aligned) or 8-bit INTEGER_CST, that 8-bit INTEGER_CST store could be a bitfield store, nothing verifies any stores in between whether they actually are 8-bit and aligned, the only major requirement is that all the stores are consecutive. For GCC 14 I think we should reconsider this, simply treat STRING_CST stores during the merging like INTEGER_CST stores and deal with it only during split_group where we can create multiple parts, this part would be a normal store, this part would be STRING_CST store, this part another normal store etc. But that is quite a lot of work, the following patch just disables the string_concatenate mode if boundaries aren't byte aligned in the spot where we disable it if it is too short too. If that happens, we'll just try to do the merging using normal 1/2/4/8 etc. byte stores as usually with RMW masking for any bits that shouldn't be touched or punt if we end up with too many stores compared to the original. Note, an original STRING_CST store will count as one store in that case, something we might want to reconsider later too (but, after all, CONSTRUCTOR stores (aka zeroing) already have the same problem, they can be large and expensive and we still count them as one store). 2023-01-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108498 * gimple-ssa-store-merging.cc (class store_operand_info): End coment with full stop rather than comma. (split_group): Likewise. (merged_store_group::apply_stores): Clear string_concatenation if start or end aren't on a byte boundary. * gcc.c-torture/execute/pr108498-1.c: New test. * gcc.c-torture/execute/pr108498-2.c: New test.
2023-01-17	forwprop: Fix up rotate pattern matching [PR106523]	Jakub Jelinek	1	-0/+22
	The comment above simplify_rotate roughly describes what patterns are matched into what: We are looking for X with unsigned type T with bitsize B, OP being +, \| or ^, some type T2 wider than T. For: (X << CNT1) OP (X >> CNT2) iff CNT1 + CNT2 == B ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2)) iff CNT1 + CNT2 == B transform these into: X r<< CNT1 Or for: (X << Y) OP (X >> (B - Y)) (X << (int) Y) OP (X >> (int) (B - Y)) ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y))) ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y))) (X << Y) \| (X >> ((-Y) & (B - 1))) (X << (int) Y) \| (X >> (int) ((-Y) & (B - 1))) ((T) ((T2) X << Y)) \| ((T) ((T2) X >> ((-Y) & (B - 1)))) ((T) ((T2) X << (int) Y)) \| ((T) ((T2) X >> (int) ((-Y) & (B - 1)))) transform these into (last 2 only if ranger can prove Y < B): X r<< Y Or for: (X << (Y & (B - 1))) \| (X >> ((-Y) & (B - 1))) (X << (int) (Y & (B - 1))) \| (X >> (int) ((-Y) & (B - 1))) ((T) ((T2) X << (Y & (B - 1)))) \| ((T) ((T2) X >> ((-Y) & (B - 1)))) ((T) ((T2) X << (int) (Y & (B - 1)))) \ \| ((T) ((T2) X >> (int) ((-Y) & (B - 1)))) transform these into: X r<< (Y & (B - 1)) The following testcase shows that 2 of these are problematic. If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one of the shift counts but Y on the can do something different from rotate. E.g.: __attribute__((noipa)) unsigned char f7 (unsigned char x, unsigned int y) { unsigned int t = x; return (t << y) \| (t >> ((-y) & 7)); } if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U] then it is UB, but for y in [9, 31] the left shift in this case will never leave any bits in the result, while in a rotate they are left there. Say for y 5 and x 0xaa the expression gives 0x55 which is the same thing as rotate, while for y 19 and x 0xaa 0x5, which is different. Now, I believe the ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y))) ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y))) forms are ok, because B - Y still needs to be a valid shift count, and if Y > B then B - Y should be either negative or very large positive (for unsigned types). And similarly the last 2 cases above which use & (B - 1) on both shift operands are definitely ok. The following patch disables the ((T) ((T2) X << Y)) \| ((T) ((T2) X >> ((-Y) & (B - 1)))) ((T) ((T2) X << (int) Y)) \| ((T) ((T2) X >> (int) ((-Y) & (B - 1)))) unless ranger says Y is not in [B, B2 - 1] range. And, looking at it again this morning, actually the Y equal to B case is still fine, if Y is equal to 0, then it is (T) (((T2) X << 0) \| ((T2) X >> 0)) and so X, for Y == B it is (T) (((T2) X << B) \| ((T2) X >> 0)) which is the same as (T) (0 \| ((T2) X >> 0)) which is also X. So instead of the [B, B2 - 1] range we could use [B + 1, B2 - 1]. And, if we wanted to go further, even multiplies of B are ok if they are smaller than B2, so we could construct a detailed int_range_max if we wanted. 2023-01-17 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106523 * tree-ssa-forwprop.cc (simplify_rotate): For the patterns with (-Y) & (B - 1) in one operand's shift count and Y in another, if T2 has wider precision than T, punt if Y could have a value in [B, B2 - 1] range. * c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16, f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using __builtin_unreachable about shift count. * c-c++-common/rotate-2b.c: New test. * c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16, f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using __builtin_unreachable about shift count. * c-c++-common/rotate-4b.c: New test. * gcc.c-torture/execute/pr106523.c: New test.
2023-01-16	Update copyright years.	Jakub Jelinek	5	-5/+5

2023-01-13	testsuite: Add another testcase from PR107131	Jakub Jelinek	1	-0/+18
	This one is hand reduced to problematic code from optimized dump that used to be miscompiled during combine starting with r12-303 and fixed with r13-3530 aka PR107172 fix. 2023-01-13 Jakub Jelinek <jakub@redhat.com> PR target/107131 * gcc.c-torture/execute/pr107131.c: New test.
2023-01-06	testsuite: Add testcases from PR108292 and PR108308	Jakub Jelinek	1	-0/+18
	These PRs were for now fixed by reversion of the r13-4977 patch, but so that the problems don't reappear during stage 1, I'm adding testcase coverage from those PRs. 2023-01-06 Jakub Jelinek <jakub@redhat.com> PR target/108292 PR target/108308 * gcc.c-torture/execute/pr108292.c: New test. * gcc.target/i386/pr108292.c: New test. * gcc.dg/pr108308.c: New test.
2023-01-04	generic-match-head: Don't assume GENERIC folding is done only early [PR108237]	Jakub Jelinek	1	-0/+14
	We ICE on the following testcase, because a valid V2DImode != comparison is folded into an unsupported V2DImode > comparison. The match.pd pattern which does this looks like: /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z where ~Y + 1 == pow2 and Z = ~Y. / (for cst (VECTOR_CST INTEGER_CST) (for cmp (eq ne) icmp (le gt) (simplify (cmp (bit_and:c@2 @0 cst@1) integer_zerop) (with { tree csts = bitmask_inv_cst_vector_p (@1); } (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) \|\| single_use (@2))) (with { auto optab = VECTOR_TYPE_P (TREE_TYPE (@1)) ? optab_vector : optab_default; tree utype = unsigned_type_for (TREE_TYPE (@1)); } (if (target_supports_op_p (utype, icmp, optab) \|\| (optimize_vectors_before_lowering_p () && (!target_supports_op_p (type, cmp, optab) \|\| !target_supports_op_p (type, BIT_AND_EXPR, optab)))) (if (TYPE_UNSIGNED (TREE_TYPE (@1))) (icmp @0 { csts; }) (icmp (view_convert:utype @0) { csts; }))))))))) and that optimize_vectors_before_lowering_p () guarded stuff there already deals with this problem, not trying to fold a supported comparison into a non-supported one. The reason it doesn't work in this case is that it isn't GIMPLE folding which does this, but GENERIC folding done during forwprop4 - forward_propagate_into_comparison -> forward_propagate_into_comparison_1 -> combine_cond_expr_cond -> fold_binary_loc -> generic_simplify and we simply assumed that GENERIC folding happens only before gimplification. The following patch fixes that by checking cfun properties instead of always returning true in those cases. 2023-01-04 Jakub Jelinek <jakub@redhat.com> PR middle-end/108237 generic-match-head.cc: Include tree-pass.h. (canonicalize_math_p, optimize_vectors_before_lowering_p): Define to false if cfun and cfun->curr_properties has PROP_gimple_opt_math resp. PROP_gimple_lvec property set. * gcc.c-torture/compile/pr108237.c: New test.
2022-12-21	testsuite: Fix pr55569.c excess errors on LLP64	Jonathan Yong	1	-1/+1
	This fixes the following on LLP64 mingw-w64 target: Excess errors: gcc/testsuite/gcc.c-torture/compile/pr55569.c:13:12: warning: overflow in conversion from 'long long unsigned int' to 'long int' changes value from '4611686018427387903' to '-1' [-Woverflow] gcc/testsuite/gcc.c-torture/compile/pr55569.c:13:34: warning: iteration 2147483647 invokes undefined behavior [-Waggressive-loop-optimizations] gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr55569.c: fix excess errors. Signed-off-by: Jonathan Yong <10walls@gmail.com>
2022-12-21	Fix PR 105532: match.pd patterns calling tree_nonzero_bits with vector types	Andrew Pinski	1	-0/+8
	Even though this PR was reported with an ubsan issue, the problem is tree_nonzero_bits is being called with an expression which is a vector type. This fixes three patterns I noticed which does that. And adds a testcase for one of the patterns. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions gcc/ChangeLog: PR tree-optimization/105532 * match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral type before calling tree_nonzero_bits. (popcount(X) + popcount(Y)): Likewise. (popcount(X&C1)): Likewise. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/vector-shift-1.c: New test.
2022-12-16	loop-invariant: Split preheader edge if the preheader bb ends with jump ↵	Jakub Jelinek	1	-0/+17
	[PR106751] The RTL loop passes only request simple preheaders, but don't require fallthru preheaders, while move_invariant_reg apparently assumes the latter, that it can just append instruction(s) to the end of the preheader basic block. The following patch fixes that by splitting the preheader edge if the preheader bb ends with a JUMP_INSN (asm goto in this case). Without that we get control flow in the middle of a bb. 2022-12-16 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/106751 * loop-invariant.cc (move_invariant_reg): If preheader bb ends with a JUMP_INSN, split the preheader edge and emit invariants into the new preheader basic block. * gcc.c-torture/compile/pr106751.c: New test.
2022-12-13	vect-patterns: Fix up vect_recog_rotate_pattern [PR108064]	Jakub Jelinek	1	-0/+28
	Since vect_recog_rotate_pattern has been extended to work also on signed types in r13-1100 we miscompile the testcase below. vect_recog_rotate_pattern actually emits correct scalar code into the pattern def sequence (in particular cast to utype, doing the 2 shifts in utype so that the right shift is logical and not arithmetic, or and then cast back to the signed type), but it didn't supply vectype for most of those pattern statements, which means that the generic handling fills it up later with the vectype provided by vect_recog_rotate_pattern. The problem is that it is vectype of the result of the whole pattern, i.e. vector of signed values in this case, while the conversion to utype, 2 shifts and or (everything with utype lhs in scalar code) should have uvectype as STMT_VINFO_VECTYPE. 2022-12-13 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108064 * tree-vect-patterns.cc (vect_recog_rotate_pattern): Pass uvectype as 4th argument to append_pattern_def_seq for statements with lhs with utype type. * gcc.c-torture/execute/pr108064.c: New test.
2022-12-10	ivopts: Fix IP_END handling for asm goto [PR107997]	Jakub Jelinek	1	-0/+23
	The following testcase ICEs, because the latch bb ends with asm goto which has both fallthrough to the header and one or more labels in the header too. In that case there is just a single edge out of the latch block, but still the asm goto is stmt_ends_bb_p statement, yet ivopts decides to emit an IV bump at the IP_END position and inserts it into the same bb as the asm goto after it, which then fails verification (control flow in the middle of bb). The following patch fixes it by splitting the latch -> header edge in that case and inserting into the newly created bb, where split_edge -> redirect_edge_and_branch is able to deal with this case correctly. 2022-12-10 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/107997 * tree-ssa-loop-ivopts.cc: Include cfganal.h. (create_new_iv) <case IP_END>: If ip_end_pos bb is non-empty and ends with a stmt which ends bb, instead of adding iv update after it split the latch edge and insert iterator into the new latch bb. * gcc.c-torture/compile/pr107997.c: New test.
2022-12-06	testsuite: Use -mnofpu for rx-- in ieee testsuite [PR107046]	Jakub Jelinek	1	-0/+3
	add_options_for_ieee has: if { [istarget alpha--] \|\| [istarget sh--] } { return "$flags -mieee" } if { [istarget rx--] } { return "$flags -mnofpu" } but ieee.exp doesn't use add_options_for_ieee, instead it has: if { [istarget "alpha--"] \|\| [istarget "sh--"] } then { lappend additional_flags "-mieee" } among other things (plus -ffloat-store on some arches etc.). The following patch adds the rx -mnofpu similarly in the hope of fixing ieee.exp FAILs on rx. Preapproved in the PR by Jeff, committed to trunk. 2022-12-06 Jakub Jelinek <jakub@redhat.com> PR testsuite/107046 * gcc.c-torture/execute/ieee/ieee.exp: For rx-- append -mnofpu.
2022-12-05	match.pd: Don't fold nan < x etc. for -ftrapping-math [PR106805]	Jakub Jelinek	3	-0/+48
	As reported in the PR, the following pr106805.c testcase is miscompiled with the default -ftrapping-math, because we fold all the comparisons into constants and don't raise any exceptions. The match.pd pattern handles just simple comparisons, from those EQ/NE are quiet and don't raise exceptions on anything but sNaN, while GT/GE/LT/LE are signaling and do raise exceptions even on qNaN. fold_relational_const handles this IMHO correctly: /* Handle the cases where either operand is a NaN. / if (real_isnan (c0) \|\| real_isnan (c1)) { switch (code) { case EQ_EXPR: case ORDERED_EXPR: result = 0; break; case NE_EXPR: case UNORDERED_EXPR: case UNLT_EXPR: case UNLE_EXPR: case UNGT_EXPR: case UNGE_EXPR: case UNEQ_EXPR: result = 1; break; case LT_EXPR: case LE_EXPR: case GT_EXPR: case GE_EXPR: case LTGT_EXPR: if (flag_trapping_math) return NULL_TREE; result = 0; break; default: gcc_unreachable (); } return constant_boolean_node (result, type); } by folding the signaling comparisons only if -fno-trapping-math. The following patch does the same in match.pd. Unfortunately the pr106805.c testcase still fails, but no longer because of match.pd, but on the trunk because of the still unresolved ranger problems (same issue as for fold-overflow-1.c etc.) and on 12 branch (and presumably trunk too) somewhere during expansion the comparisons are also expanded into constants (which is ok for -fno-trapping-math, but not ok with that). Though, I think the patch is a small step in the direction, so I'd like to commit this patch without the gcc.dg/pr106805.c testcase for now. 2022-12-05 Jakub Jelinek <jakub@redhat.com> PR middle-end/106805 match.pd (cmp @0 REAL_CST@1): Don't optimize x cmp NaN or NaN cmp x to false/true for cmp >/>=/</<= if -ftrapping-math. * c-c++-common/pr57371-4.c: Revert 2021-09-19 changes. * c-c++-common/pr57371-5.c: New test. * gcc.c-torture/execute/ieee/fp-cmp-6.x: Add -fno-trapping-math. * gcc.c-torture/execute/ieee/fp-cmp-9.c: New test. * gcc.c-torture/execute/ieee/fp-cmp-9.x: New file.
2022-12-05	range-op-float: Fix up multiplication and division reverse operation [PR107879]	Jakub Jelinek	1	-0/+25
	While for the normal cases it seems to be correct to implement reverse multiplication (op1_range/op2_range) through division with float_binary_op_range_finish, reverse division (op1_range) through multiplication with float_binary_op_range_finish or (op2_range) through division with float_binary_op_range_finish, as e.g. following testcase shows for the corner cases it is incorrect. Say on the testcase we are doing reverse multiplication, we have [-0., 0.] range (no NAN) on lhs and VARYING on op1 (or op2). We implement that through division, because x from lhs = x * op2 is x = lhs / op2 For the division, [-0., 0.] / VARYING is computed (IMHO correctly) as [-0., 0.] +-NAN, because 0 / anything but 0 or NAN is still 0 and 0 / 0 is NAN and ditto 0 / NAN. And then we just float_binary_op_range_finish, which figures out that because lhs can't be NAN, neither operand can be NAN. So, the end range is [-0., 0.]. But that is not correct for the reverse multiplication. When the result is 0, if op2 can be zero, then x can be anything (VARYING), to be precise anything but INF (unless result can be NAN), because anything * 0 is 0 (or NAN for INF). While if op2 must be non-zero, then x must be 0. Of course the sign logic (signbit(x) = signbit(lhs) ^ signbit(op2)) still holds, so it actually isn't full VARYING if both lhs and op2 have known sign bits. And going through other corner cases one by one shows other differences between what we compute for the corresponding forward operation and what we should compute for the reverse operations. The following patch is slightly conservative and includes INF (in case of result including 0 and not NAN) in the ranges or 0 in the ranges (in case of result including INF and not NAN). The latter is what happens anyway because we flush denormals to 0, and the former just not to deal with all the corner cases. So, the end test is that for reverse multiplication and division op2_range the cases we need to adjust to VARYING or VARYING positive or VARYING negative are if lhs and op? ranges both contain 0, or both contain some infinity, while for division op1_range the corner case is if lhs range contains 0 and op2 range contains INF or vice versa. Otherwise I believe ranges from the corresponding operation are ok, or could be slightly more conservative (e.g. for reverse multiplication, if op? range is singleton INF and lhs range doesn't include any INF, then x's range should be UNDEFINED or known NAN (depending on if lhs can be NAN), while the division computes [-0., 0.] +-NAN; or similarly if op? range is only 0 and lhs range doesn't include 0, division would compute +INF +-NAN, or -INF +-NAN, or (for lack of multipart franges -INF +INF +-NAN just VARYING +-NAN), while again it is UNDEFINED or known NAN. Oh, and I found by code inspection wrong condition for the division's known NAN result, due to thinko it would trigger not just when both operands are known to be 0 or both are known to be INF, but when either both are known to be 0, or at least one is known to be INF. 2022-12-05 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/107879 * range-op-float.cc (foperator_mult::op1_range): If both lhs and op2 ranges contain zero or both ranges contain some infinity, set r range to zero_to_inf_range depending on signbit_known_p. (foperator_div::op2_range): Similarly for lhs and op1 ranges. (foperator_div::op1_range): If lhs range contains zero and op2 range contains some infinity or vice versa, set r range to zero_to_inf_range depending on signbit_known_p. (foperator_div::rv_fold): Fix up condition for returning known NAN.
2022-11-30	tree-chrec: Fix up ICE on pointer multiplication [PR107835]	Jakub Jelinek	1	-0/+11
	r13-254-gdd3c7873a61019e9 added an optimization for {a, +, a} (x-1), but as can be seen on the following testcase, the way it is written where chrec_fold_multiply is called with type doesn't work for pointers: res = build_int_cst (TREE_TYPE (x), 1); res = chrec_fold_plus (TREE_TYPE (x), x, res); res = chrec_convert_rhs (type, res, NULL); res = chrec_fold_multiply (type, chrecr, res); while what we were doing before and what is still used if the condition doesn't match is fine: res = chrec_convert_rhs (TREE_TYPE (chrecr), x, NULL); res = chrec_fold_multiply (TREE_TYPE (chrecr), chrecr, res); res = chrec_fold_plus (type, CHREC_LEFT (chrec), res); because it performs chrec_fold_multiply on TREE_TYPE (chrecr) and converts only afterwards. I think the easiest fix is to ignore the new path for pointer types. 2022-11-30 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/107835 * tree-chrec.cc (chrec_apply): Don't handle "{a, +, a} (x-1)" as "ax" if type is a pointer type. gcc.c-torture/compile/pr107835.c: New test.
2022-11-12	range-op: Implement op[12]_range operators for {PLUS,MINUS,MULT,RDIV}_EXPR	Jakub Jelinek	1	-0/+26
	On Wed, Nov 09, 2022 at 04:43:56PM +0100, Aldy Hernandez wrote: > On Wed, Nov 9, 2022 at 3:58 PM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Wed, Nov 09, 2022 at 10:02:46AM +0100, Aldy Hernandez wrote: > > > We can implement the op[12]_range entries for plus and minus in terms > > > of each other. These are adapted from the integer versions. > > > > I think for NANs the op[12]_range shouldn't act this way. > > For the forward binary operations, we have the (maybe/known) NAN handling > > of one or both NAN operands resulting in VARYING sign (maybe/known) NAN > > result, that is the somehow the case for the reverse binary operations too, > > if result is (maybe/known) NAN and the other op is not NAN, op is > > VARYING sign (maybe/known) NAN, if other op is (maybe/known) NAN, > > then op is VARYING sign maybe NAN (always maybe, never known). > > But then for + we have the -INF + INF or vice versa into NAN, and that > > is something that shouldn't be considered. If result isn't NAN, then > > neither operand can be NAN, regardless of whether result can be > > +/- INF and the other op -/+ INF. > > Heh. I just ran into this while debugging the problem reported by Xi. > > We are solving NAN = op1 - VARYING, and trying to do it with op1 = NAN > + VARYING, which returns op1 = NAN (incorrectly). > > I suppose in the above case op1 should ideally be > [-INF,-INF][+INF,+INF]+-NAN, but since we can't represent that then > [-INF,+INF] +-NAN, which is actually VARYING. Do you agree? > > I'm reverting this patch as attached, while I sort this out. Here is a patch which reinstalls your change, add the fixups I've talked about and also hooks up reverse operators for MULT_EXPR/RDIV_EXPR. 2022-11-12 Aldy Hernandez <aldyh@redhat.com> Jakub Jelinek <jakub@redhat.com> * range-op-float.cc (float_binary_op_range_finish): New function. (foperator_plus::op1_range): New. (foperator_plus::op2_range): New. (foperator_minus::op1_range): New. (foperator_minus::op2_range): New. (foperator_mult::op1_range): New. (foperator_mult::op2_range): New. (foperator_div::op1_range): New. (foperator_div::op2_range): New. * gcc.c-torture/execute/ieee/inf-4.c: New test.
2022-10-06	Fix wrong code generated by unroll-and-jam pass	Eric Botcazou	1	-0/+29
	There is a loophole in the unroll-and-jam pass that can quickly result in wrong code generation. The code reads: if (!compute_data_dependences_for_loop (outer, true, &loop_nest, &datarefs, &dependences)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Cannot analyze data dependencies\n"); free_data_refs (datarefs); free_dependence_relations (dependences); continue; } but compute_data_dependences_for_loop may return true even if the analysis is reported as failing by compute_affine_dependence for a dependence pair: (compute_affine_dependence ref_a: data[_14], stmt_a: data[_14] = i_59; ref_b: data[_14], stmt_b: data[_14] = i_59; Data ref a: Data ref b: affine dependence test not usable: access function not affine or constant. ) -> dependence analysis failed Note that this is a self-dependence pair and the code for them reads: /* Nothing interesting for the self dependencies. / if (dra == drb) continue; This means that the pass may reorder "complex" accesses to the same memory location in successive iterations, which is OK for reads but not for writes. gcc/ gimple-loop-jam.cc (tree_loop_unroll_and_jam): Bail out for a self dependency that is a write-after-write if the access function is not affine or constant. gcc/testsuite/ * gcc.c-torture/execute/20221006-1.c: New test.
2022-09-17	reassoc: Fix up recent regression in optimize_range_tests_cmp_bitwise [PR106958]	Jakub Jelinek	1	-0/+13
	As the following testcase reduced from glibc fmtmsg.c shows (it doesn't ICE on x86_64/i686 unfortunately, but does on various other arches), my last optimize_range_tests_cmp_bitwise change wasn't fully correct. The intent was to let all pointer operands be cast to pointer_sized_int_node first in addition to the other casts (to type1) which are done for id >= l cases. But one spot I've touched used always cast to type1 (note, the (b % 4) == 3 case is impossible for pointer operands because that is for !TYPE_UNSIGNED operands and pointers are TYPE_UNSIGNED) and in the other spot the cast would be done only for id >= l if not useless, but for pointers we need to cast it always. 2022-09-17 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106958 * tree-ssa-reassoc.cc (optimize_range_tests_cmp_bitwise): If id >= l, cast op to type1, otherwise to pointer_sized_int_node. If type has pointer type, cast exp to pointer_sized_int_node even when id < l. * gcc.c-torture/compile/pr106958.c: New test.
2022-09-14	Disallow pointer operands for \|, ^ and partly & [PR106878]	Jakub Jelinek	1	-0/+15
	My change to match.pd (that added the two simplifications this patch touches) results in more \|/^/& assignments with pointer arguments, but since r12-1608 we reject pointer operands for BIT_NOT_EXPR. Disallowing them for BIT_NOT_EXPR and allowing for BIT_{IOR,XOR,AND}_EXPR leads to a match.pd maintainance nightmare (see one of the patches in the PR), so either we want to allow pointer operand on BIT_NOT_EXPR (but then we run into issues e.g. with the ranger which expects it can emulate BIT_NOT_EXPR ~X as - 1 - X which doesn't work for pointers which don't support MINUS_EXPR), or the following patch disallows pointer arguments for all of BIT_{IOR,XOR,AND}_EXPR with the exception of BIT_AND_EXPR with INTEGER_CST last operand (for simpler pointer realignment). I had to tweak one reassoc optimization and the two match.pd simplifications. 2022-09-14 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106878 * tree-cfg.cc (verify_gimple_assign_binary): Disallow pointer, reference or OFFSET_TYPE BIT_IOR_EXPR, BIT_XOR_EXPR or, unless the second argument is INTEGER_CST, BIT_AND_EXPR. * match.pd ((type) X op CST -> (type) (X op ((type-x) CST)), (type) (((type2) X) op Y) -> (X op (type) Y)): Punt for POINTER_TYPE_P or OFFSET_TYPE. * tree-ssa-reassoc.cc (optimize_range_tests_cmp_bitwise): For pointers cast them to pointer sized integers first. * gcc.c-torture/compile/pr106878.c: New test.