riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-08-12	16-bit testsuite fixes - excessive code size	Joern Rennecke	1	-0/+2
	gcc/testsuite/ * gcc.c-torture/execute/20021120-1.c: Skip if not size20plus or -Os. * gcc.dg/fixed-point/convert-float-4.c: Require size20plus. * gcc.dg/torture/pr112282.c: Skip if -O0 unless size20plus. * g++.dg/lookup/pr21802.C: Require size20plus.
2024-07-31	recog: Disallow subregs in mode-punned value [PR115881]	Richard Sandiford	1	-0/+16
	In g:9d20529d94b23275885f380d155fe8671ab5353a, I'd extended insn_propagation to handle simple cases of hard-reg mode punning. The punned "to" value was created using simplify_subreg rather than simplify_gen_subreg, on the basis that hard-coded subregs aren't generally useful after RA (where hard-reg propagation is expected to happen). This PR is about a case where the subreg gets pushed into the operands of a plus, but the subreg on one of the operands cannot be simplified. Specifically, we have to generate (subreg:SI (reg:DI sp) 0) rather than (reg:SI sp), since all references to the stack pointer must be via stack_pointer_rtx. However, code in x86 (reasonably) expects no subregs of registers to appear after RA, except for special cases like strict_low_part. This leads to an awkward situation where we can't ban subregs of sp (because of the strict_low_part use), can't allow direct references to sp in other modes (because of the stack_pointer_rtx requirement), and can't allow rvalue uses of the subreg (because of the "no subregs after RA" assumption). It all seems a bit of a mess... I sat on this for a while in the hope that a clean solution might become apparent, but in the end, I think we'll just have to check manually for nested subregs and punt on them. gcc/ PR rtl-optimization/115881 * recog.cc: Include rtl-iter.h. (insn_propagation::apply_to_rvalue_1): Check that the result of simplify_subreg does not include nested subregs. gcc/testsuite/ PR rtl-optimization/115881 * gcc.c-torture/compile/pr115881.c: New test.
2024-07-29	testsuite: fix PR111613 test	Sam James	1	-0/+0
	PR ipa/111613 * gcc.c-torture/pr111613.c: Rename to.. * gcc.c-torture/execute/pr111613.c: ...this.
2024-07-29	testsuite: make PR115277 test an execute one	Sam James	1	-0/+0
	PR middle-end/115277 * gcc.c-torture/compile/pr115277.c: Rename to... * gcc.c-torture/execute/pr115277.c: ...this.
2024-07-22	Fix modref's iteraction with store merging	Jan Hubicka	1	-0/+29
	Hi, this patch fixes wrong code in case store-merging introduces load of function parameter that was previously write-only (which happens for bitfields). Without this, the whole store-merged area is consdered to be killed. PR ipa/111613 gcc/ChangeLog: * ipa-modref.cc (analyze_parms): Do not preserve EAF_NO_DIRECT_READ and EAF_NO_INDIRECT_READ from past flags. gcc/testsuite/ChangeLog: * gcc.c-torture/pr111613.c: New test.
2024-07-22	Fix modref_eaf_analysis::analyze_ssa_name handling of values dereferenced to ↵	Jan Hubicka	1	-0/+35
	function call parameters modref_eaf_analysis::analyze_ssa_name misinterprets EAF flags. If dereferenced parameter is passed (to map_iterator in the testcase) it can be returned indirectly which in turn makes it to escape into the next function call. PR ipa/115033 gcc/ChangeLog: * ipa-modref.cc (modref_eaf_analysis::analyze_ssa_name): Fix checking of EAF flags when analysing values dereferenced as function parameters. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr115033.c: New test.
2024-07-22	Fix accounting of offsets in unadjusted_ptr_and_unit_offset	Jan Hubicka	1	-0/+23
	unadjusted_ptr_and_unit_offset accidentally throws away the offset computed by get_addr_base_and_unit_offset. Instead of passing extra_offset it passes offset. PR ipa/114207 gcc/ChangeLog: * ipa-prop.cc (unadjusted_ptr_and_unit_offset): Fix accounting of offsets in ADDR_EXPR. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr114207.c: New test.
2024-07-22	Compare loop bounds in ipa-icf	Jan Hubicka	1	-0/+28
	Hi, this testcase shows another poblem with missing comparators for metadata in ICF. With value ranges available to loop optimizations during early opts we can estimate number of iterations based on guarding condition that can be split away by the fnsplit pass. This patch disables ICF when number of iteraitons does not match. Bootstrapped/regtesed x86_64-linux, will commit it shortly gcc/ChangeLog: PR ipa/115277 * ipa-icf-gimple.cc (func_checker::compare_loops): compare loop bounds. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr115277.c: New test.
2024-07-22	rtl-ssa: Avoid using a stale splay tree root [PR116009]	Richard Sandiford	1	-0/+23
	In the fix for PR115928, I'd failed to notice that "root" was used later in the function, so needed to be updated. gcc/ PR rtl-optimization/116009 * rtl-ssa/accesses.cc (function_info::add_def): Set the root local variable after removing the old clobber group. gcc/testsuite/ PR rtl-optimization/116009 * gcc.c-torture/compile/pr116009.c: New test.
2024-06-05	Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.	liuhongt	1	-0/+2
	According to IEEE standard, for conversions from floating point to integer. When a NaN or infinite operand cannot be represented in the destination format and this cannot otherwise be indicated, the invalid operation exception shall be signaled. When a numeric operand would convert to an integer outside the range of the destination format, the invalid operation exception shall be signaled if this situation cannot otherwise be indicated. The patch prevent simplication of the conversion from floating point to integer for NAN/INF/out-of-range constant when flag_trapping_math. gcc/ChangeLog: PR rtl-optimization/100927 PR rtl-optimization/115161 PR rtl-optimization/115115 * simplify-rtx.cc (simplify_const_unary_operation): Prevent simplication of FIX/UNSIGNED_FIX for NAN/INF/out-of-range constant when flag_trapping_math. * fold-const.cc (fold_convert_const_int_from_real): Don't fold for overflow value when_trapping_math. gcc/testsuite/ChangeLog: * gcc.dg/pr100927.c: New test. * c-c++-common/Wconversion-1.c: Add -fno-trapping-math. * c-c++-common/dfp/convert-int-saturate.c: Ditto. * g++.dg/ubsan/pr63956.C: Ditto. * g++.dg/warn/Wconversion-real-integer.C: Ditto. * gcc.c-torture/execute/20031003-1.c: Ditto. * gcc.dg/Wconversion-complex-c99.c: Ditto. * gcc.dg/Wconversion-real-integer.c: Ditto. * gcc.dg/c90-const-expr-11.c: Ditto. * gcc.dg/overflow-warn-8.c: Ditto.
2024-06-04	builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow and ↵	Jakub Jelinek	1	-0/+39
	__builtin{add,sub}c [PR108789] The following testcase is miscompiled, because we use save_expr on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first two operands are not INTEGER_CSTs (in that case we just fold it right away) but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually create a SAVE_EXPR at all and so we lower it to arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \ IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1)) which evaluates the ifn twice and just hope it will be CSEd back. As arg2 aliases arg0, that is not the case. The builtins are really never const/pure as they store into what the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST case, I think we should just always use SAVE_EXPR. Just building SAVE_EXPR by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because c_fully_fold optimizes it away again, so the following patch marks the ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the __builtin_{add,sub,mul}_overflow_p case which were designed for use especially in constant expressions and don't really evaluate the realpart side, so we don't really need a SAVE_EXPR in that case). 2024-06-04 Jakub Jelinek <jakub@redhat.com> PR middle-end/108789 builtins.cc (fold_builtin_arith_overflow): For ovf_only, don't call save_expr and don't build REALPART_EXPR, otherwise set TREE_SIDE_EFFECTS on call before calling save_expr. (fold_builtin_addc_subc): Set TREE_SIDE_EFFECTS on call before calling save_expr. * gcc.c-torture/execute/pr108789.c: New test.
2024-05-21	match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]	Andrew Pinski	1	-0/+23
	The problem here is the pattern added in r13-1162-g9991d84d2a8435 assumes that it is well defined to multiply zero_one_valuep by the truncated converted integer constant. It is well defined for all types except for signed 1bit types. Where `a * -1` is produced which is undefined/ So disable this pattern for 1bit signed types. Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround the undefinedness except when `-fsanitize=undefined` is turned on, this is why I added a testcase for that. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/115154 gcc/ChangeLog: * match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): Disable for 1bit signed types. gcc/testsuite/ChangeLog: * c-c++-common/ubsan/signed1bitfield-1.c: New test. * gcc.c-torture/execute/signed1bitfield-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-05-20	PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]	Andrew Pinski	3	-0/+80
	The problem here is even if last_and_only_stmt returns a statement, the bb might still contain a phi node which defines a ssa name which is used in that statement so we need to add a check to make sure that the phi nodes are empty for the middle bbs in both the `CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases. Bootstrapped and tested on x86_64_linux-gnu with no regressions. PR tree-optimization/115143 gcc/ChangeLog: * tree-ssa-phiopt.cc (minmax_replacement): Check for empty phi nodes for middle bbs for the case where middle bb is not empty. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr115143-1.c: New test. * gcc.c-torture/compile/pr115143-2.c: New test. * gcc.c-torture/compile/pr115143-3.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-05-16	Fix points_to_local_or_readonly_memory_p wrt TARGET_MEM_REF	Jan Hubicka	1	-0/+38
	TARGET_MEM_REF can be used to offset constant base into a memory object (to produce lea instruction). This confuses points_to_local_or_readonly_memory_p which treats the constant address as a base of the access. Bootstrapped/regtsted x86_64-linux, comitted. Honza gcc/ChangeLog: PR ipa/113787 * ipa-fnsummary.cc (points_to_local_or_readonly_memory_p): Do not look into TARGET_MEM_REFS with constant opreand 0. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr113787.c: New test.
2024-05-08	reassoc: Fix up optimize_range_tests_to_bit_test [PR114965]	Jakub Jelinek	1	-0/+30
	The optimize_range_tests_to_bit_test optimization normally emits a range test first: if (entry_test_needed) { tem = build_range_check (loc, optype, unshare_expr (exp), false, lowi, high); if (tem == NULL_TREE \|\| is_gimple_val (tem)) continue; } so during the bit test we already know that exp is in the [lowi, high] range, but skips it if we have range info which tells us this isn't necessary. Also, normally it emits shifts by exp - lowi counter, but has an optimization to use just exp counter if the mask isn't a more expensive constant in that case and lowi is > 0 and high is smaller than prec. The following testcase is miscompiled because the two abnormal cases are triggered. The range of exp is [43, 43][48, 48][95, 95], so we on 64-bit arch decide we don't need the entry test, because 95 - 43 < 64. And we also decide to use just exp as counter, because the range test tests just for exp == 43 \|\| exp == 48, so high is smaller than 64 too. Because 95 is in the exp range, we can't do that, we'd either need to do a range test first, i.e. if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1)) or need to subtract lowi from the shift counter, i.e. if ((1UL << (exp - 43)) & mask2) but can't do both unless r.upper_bound () is < prec. The following patch ensures that. 2024-05-08 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/114965 * tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to optimize away exp - lowi subtraction from shift count unless entry test is emitted or unless r.upper_bound () is smaller than prec. * gcc.c-torture/execute/pr114965.c: New test.
2024-04-12	match: Fix `!a?b:c` and `a?~t:t` patterns for signed 1 bit types [PR114666]	Andrew Pinski	1	-0/+13
	The problem is `!a?b:c` pattern will create a COND_EXPR with an 1bit signed integer which breaks patterns like `a?~t:t`. This rejects when we have a signed operand for both patterns. Note for GCC 15, I am going to look at the canonicalization of `a?~t:t` where t was a constant since I think keeping it a COND_EXPR might be more canonical and is what VPR produces from the same IR; if anything expand should handle which one is better. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/114666 gcc/ChangeLog: * match.pd (`!a?b:c`): Reject signed types for the condition. (`a?~t:t`): Likewise. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/bitfld-signed1-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-04-08	New effective-target 'asm_goto_with_outputs'	Thomas Schwinge	4	-5/+4
	After commit e16f90be2dc8af6c371fe79044c3e668fa3dda62 "testsuite: Fix up lra effective target", we get for nvptx target: -PASS: gcc.c-torture/compile/asmgoto-2.c -O0 (test for excess errors) +ERROR: gcc.c-torture/compile/asmgoto-2.c -O0 : no files matched glob pattern "lra1020113.c.[0-9][0-9][0-9]r.reload" for " dg-do 2 compile { target lra } " Etc. However, nvptx appears to support 'asm goto' with outputs, including the new execution test case: PASS: gcc.dg/pr107385.c execution test Therefore, generally use new effective-target 'asm_goto_with_outputs' instead of 'lra'. One exceptions is 'gcc.dg/pr110079.c', which doesn't use 'asm goto' with outputs, and continues using effective-target 'lra', with special-casing nvptx target, to avoid ERROR for 'lra'. gcc/ * doc/sourcebuild.texi (Effective-Target Keywords): Document 'asm_goto_with_outputs'. Add comment to 'lra'. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_lra): Add comment. (check_effective_target_asm_goto_with_outputs): New. * gcc.c-torture/compile/asmgoto-2.c: Use it. * gcc.c-torture/compile/asmgoto-5.c: Likewise. * gcc.c-torture/compile/asmgoto-6.c: Likewise. * gcc.c-torture/compile/pr98096.c: Likewise. * gcc.dg/pr100590.c: Likewise. * gcc.dg/pr107385.c: Likewise. * gcc.dg/pr108095.c: Likewise. * gcc.dg/pr97954.c: Likewise. * gcc.dg/torture/pr100329.c: Likewise. * gcc.dg/torture/pr100398.c: Likewise. * gcc.dg/torture/pr100519.c: Likewise. * gcc.dg/torture/pr110422.c: Likewise. * gcc.dg/pr110079.c: Special-case nvptx target.
2024-04-03	expr: Fix up emit_push_insn [PR114552]	Jakub Jelinek	1	-0/+24
	r13-990 added optimizations in multiple spots to optimize during expansion storing of constant initializers into targets. In the load_register_parameters and expand_expr_real_1 cases, it checks it has a tree as the source and so knows we are reading that whole decl's value, so the code is fine as is, but in the emit_push_insn case it checks for a MEM from which something is pushed and checks for SYMBOL_REF as the MEM's address, but still assumes the whole object is copied, which as the following testcase shows might not always be the case. In the testcase, k is 6 bytes, then 2 bytes of padding, then another 4 bytes, while the emit_push_insn wants to store just the 6 bytes. The following patch simply verifies it is the whole initializer that is being stored, I think that is best thing to do so late in GCC 14 cycle as well for backporting. For GCC 15, perhaps the code could stop requiring it must be at offset zero, nor that the size is equal, but could use get_symbol_constant_value/fold_ctor_reference gimple-fold APIs to actually extract just part of the initializer if we e.g. push just some subset (of course, still verify that it is a subset). For sizes which are power of two bytes and we have some integer modes, we could use as type for fold_ctor_reference corresponding integral types, otherwise dunno, punt or use some structure (e.g. try to find one in the initializer?), whatever. But even in the other spots it could perhaps handle loading of COMPONENT_REFs or MEM_REFs from the .rodata vars. 2024-04-03 Jakub Jelinek <jakub@redhat.com> PR middle-end/114552 * expr.cc (emit_push_insn): Only use store_constructor for immediate_const_ctor_p if int_expr_size matches size. * gcc.c-torture/execute/pr114552.c: New test.
2024-03-28	profile-count: Avoid overflows into uninitialized [PR112303]	Jakub Jelinek	1	-0/+25
	The testcase in the patch ICEs with --- gcc/tree-scalar-evolution.cc +++ gcc/tree-scalar-evolution.cc @@ -3881,7 +3881,7 @@ final_value_replacement_loop (class loop loop) / Propagate constants immediately, but leave an unused initialization around to avoid invalidating the SCEV cache. / - if (CONSTANT_CLASS_P (def) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt)) + if (0 && CONSTANT_CLASS_P (def) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rslt)) replace_uses_by (rslt, def); / Create the replacement statements. / (the addition of the above made the ICE latent), because profile_count addition doesn't check for overflows and if unlucky, we can even overflow into the uninitialized value. Getting really huge profile counts is very easy even when not using recursive inlining in loops, e.g. __attribute__((noipa)) void bar (void) { __builtin_exit (0); } __attribute__((noipa)) void foo (void) { for (int i = 0; i < 1000; ++i) for (int j = 0; j < 1000; ++j) for (int k = 0; k < 1000; ++k) for (int l = 0; l < 1000; ++l) for (int m = 0; m < 1000; ++m) for (int n = 0; n < 1000; ++n) for (int o = 0; o < 1000; ++o) for (int p = 0; p < 1000; ++p) for (int q = 0; q < 1000; ++q) for (int r = 0; r < 1000; ++r) for (int s = 0; s < 1000; ++s) for (int t = 0; t < 1000; ++t) for (int u = 0; u < 1000; ++u) for (int v = 0; v < 1000; ++v) for (int w = 0; w < 1000; ++w) for (int x = 0; x < 1000; ++x) for (int y = 0; y < 1000; ++y) for (int z = 0; z < 1000; ++z) for (int a = 0; a < 1000; ++a) for (int b = 0; b < 1000; ++b) bar (); } int main () { foo (); } reaches the maximum count already on the 11th loop. Some other methods of profile_count like apply_scale already do use MIN (val, max_count) before assignment to m_val, this patch just extends that to operator{+,+=} methods. Furthermore, one overload of apply_probability wasn't using safe_scale_64bit and so could very easily overflow as well - prob is required to be [0, 10000] and if m_val is near the max_count, it can overflow even with multiplications by 8. 2024-03-28 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112303 profile-count.h (profile_count::operator+): Perform addition in uint64_t variable and set m_val to MIN of that val and max_count. (profile_count::operator+=): Likewise. (profile_count::operator-=): Formatting fix. (profile_count::apply_probability): Use safe_scale_64bit even in the int overload. * gcc.c-torture/compile/pr112303.c: New test.
2024-03-28	testsuite: Add testcase for already fixed PR [PR109925]	Jakub Jelinek	1	-0/+30
	This testcase was made latent by r14-4089 and got fixed both on the trunk and 13 branch with PR113372 fix. Adding testcase to the testsuite and will close the PR as a dup. 2024-03-28 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109925 * gcc.c-torture/execute/pr109925.c: New test.
2024-03-26	testsuite: Fix up pr111151.c testcase [PR114486]	Jakub Jelinek	1	-1/+1
	Apparently I've somehow screwed up the adjustments of the originally tested testcase, tweaked it so that in the second/third cases it actually see a MAX_EXPR rather than COND_EXPR the MAX_EXPR has been optimized into, and didn't update the expected value. 2024-03-26 Jakub Jelinek <jakub@redhat.com> PR middle-end/111151 PR testsuite/114486 * gcc.c-torture/execute/pr111151.c (main): Fix up expected value for f.
2024-03-26	fold-const: Punt on MULT_EXPR in extract_muldiv MIN/MAX_EXPR case [PR111151]	Jakub Jelinek	1	-0/+21
	As I've tried to explain in the comments, the extract_muldiv_1 MIN/MAX_EXPR optimization is wrong for code == MULT_EXPR. If the multiplication is done in unsigned type or in signed type with -fwrapv, it is fairly obvious that max (a, b) * c in many cases isn't equivalent to max (a * c, b * c) (or min if c is negative) due to overflows, but even for signed with undefined overflow, the optimization could turn something without UB in it (where say a * c invokes UB, but max (or min) picks the other operand where b * c doesn't). As for division/modulo, I think it is in most cases safe, except if the problematic INT_MIN / -1 case could be triggered, but we can just punt for MAX_EXPR because for MIN_EXPR if one operand is INT_MIN, we'd pick that operand already. It is just for completeness, match.pd already has an optimization which turns x / -1 into -x, so the division by zero is mostly theoretical. That is also why in the testcase the i case isn't actually miscompiled without the patch, while the c and f cases are. 2024-03-26 Jakub Jelinek <jakub@redhat.com> PR middle-end/111151 * fold-const.cc (extract_muldiv_1) <case MAX_EXPR>: Punt for MULT_EXPR altogether, or for MAX_EXPR if c is -1. * gcc.c-torture/execute/pr111151.c: New test.
2024-03-22	Move pr114396.c from gcc.target/i386 to gcc.c-torture/execute.	liuhongt	1	-0/+105
	Also fixed a typo in the testcase. gcc/testsuite/ChangeLog: PR tree-optimization/114396 * gcc.target/i386/pr114396.c: Move to... * gcc.c-torture/execute/pr114396.c: ...here.
2024-03-04	Fix 201001011-1.c on H8	Jan Dubiec	1	-0/+3
	Excerpt from gcc.sum: [...] PASS: gcc.c-torture/execute/20101011-1.c -O0 (test for excess errors) FAIL: gcc.c-torture/execute/20101011-1.c -O0 execution test PASS: gcc.c-torture/execute/20101011-1.c -O1 (test for excess errors) FAIL: gcc.c-torture/execute/20101011-1.c -O1 execution test [ ... ] This is because H8 MCUs do not throw a "divide by zero" exception. gcc/testsuite * gcc.c-torture/execute/20101011-1.c: Do not test on H8 series.
2024-03-03	[PATCH] combine: Don't simplify paradoxical SUBREG on ↵	Greg McGary	1	-0/+9
	WORD_REGISTER_OPERATIONS [PR113010] The sign-bit-copies of a sign-extending load cannot be known until runtime on WORD_REGISTER_OPERATIONS targets, except in the case of a zero-extending MEM load. See the fix for PR112758. gcc/ PR rtl-optimization/113010 * combine.cc (simplify_comparison): Simplify a SUBREG on WORD_REGISTER_OPERATIONS targets only if it is a zero-extending MEM load. gcc/testsuite * gcc.c-torture/execute/pr113010.c: New test.
2024-02-26	testsuite: xfail gcc.c-torture/compile/pr61159.c on Solaris/x86 with as ↵	Rainer Orth	1	-1/+1
	[PR61159] gcc.c-torture/compile/pr61159.c currently FAILs on 32 and 64-bit Solaris/x86 with the native assembler: FAIL: gcc.c-torture/compile/pr61159.c -O0 (test for excess errors) FAIL: gcc.c-torture/compile/pr61159.c -O1 (test for excess errors) FAIL: gcc.c-torture/compile/pr61159.c -O2 (test for excess errors) FAIL: gcc.c-torture/compile/pr61159.c -O2 -flto (test for excess errors) FAIL: gcc.c-torture/compile/pr61159.c -O2 -flto -flto-partition=none (test for excess errors) FAIL: gcc.c-torture/compile/pr61159.c -O3 -g (test for excess errors) FAIL: gcc.c-torture/compile/pr61159.c -Os (test for excess errors) Excess errors: Assembler: pr61159.c "/var/tmp//ccRtFPva.s", line 5 : Cannot set a weak symbol to a common symbol This is a bug/limitation in the native assembler. Given that this hasn't seen fixes for a long time, this patch xfails the test. Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu. 2024-02-24 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: PR ipa/61159 * gcc.c-torture/compile/pr61159.c: xfail on Solaris/x86 with as.
2024-02-14	Fix ICE in loop splitting with -fno-guess-branch-probability	Jan Hubicka	1	-0/+11
	PR tree-optimization/111054 gcc/ChangeLog: * tree-ssa-loop-split.cc (split_loop): Check for profile being present. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr111054.c: New test.
2024-02-11	Fix gcc.c-torture/execute/ieee/cdivchkf.c on hpux	John David Anglin	1	-4/+5
	2024-02-11 John David Anglin <danglin@gcc.gnu.org> gcc/testsuite/ChangeLog: * gcc.c-torture/execute/ieee/cdivchkf.c: Use ilogb and __builtin_fmax instead of ilogbf and __builtin_fmaxf.
2024-02-06	tree-ssa-math-opts: Fix up convert_{mult,plusminus}_to_widen [PR113759]	Jakub Jelinek	1	-0/+20
	On the following testcase we emit invalid stmt: error: type mismatch in ‘widen_mult_plus_expr’ 6 \| foo (int c, int b) \| ^~~ unsigned long int unsigned int unsigned long _31 = WIDEN_MULT_PLUS_EXPR <b_5(D), 2, _30>; The recent PR113560 r14-8680 changes tweaked convert_mult_to_widen, but didn't change convert_plusminus_to_widen for the TREE_TYPE (rhsN) != typeN cases, but looking at this, it was already before that change quite weird. Earlier in those functions it determines actual_precision and from_unsignedN and wants to use that precision and signedness for the operands and it used build_and_insert_cast for that (which emits a cast stmt, even for INTEGER_CSTs) and later on for INTEGER_CST arguments fold_converted them to typeN (which is unclear to me why, because it seems to have assumed that TREE_TYPE (rhsN) is typeN, for the actual_precision or from_unsignedN cases it would be wrong except that build_and_insert_cast forced a SSA_NAME and so it doesn't trigger anymore). Now, since r14-8680 it is possible that rhsN also has some other type from typeN and we again want to cast. The following patch changes this, so that for the differences in actual_precision and/or from_unsignedN we actually update typeN and then use it as the type to convert the arguments to if it isn't useless, for INTEGER_CSTs by just fold_converting, otherwise using build_and_insert_cast. And uses useless_type_conversion_p test so that we don't convert unless necessary. Plus by doing that effectively also doing the important part of the r14-8680 convert_mult_to_widen changes in convert_plusminus_to_widen. 2024-02-06 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113759 * tree-ssa-math-opts.cc (convert_mult_to_widen): If actual_precision or from_unsignedN differs from properties of typeN, update typeN to build_nonstandard_integer_type. If TREE_TYPE (rhsN) is not uselessly convertible to typeN, convert it using fold_convert or build_and_insert_cast depending on if rhsN is INTEGER_CST or not. (convert_plusminus_to_widen): Likewise. * gcc.c-torture/compile/pr113759.c: New test.
2024-01-31	c: Fix ICEs casting expressions with integer constant operands to bool ↵	Joseph Myers	14	-0/+102
	[PR111059, PR111911] C front-end bugs 111059 and 111911 both report ICEs with conversions to boolean of expressions with integer constant operands that can appear in an integer constant expression as long as they are not evaluated (such as division by zero). The issue is a nested C_MAYBE_CONST_EXPR, with the inner one generated in build_binary_op to indicate that a subexpression has been fully folded and should not be folded again, and the outer one in build_c_cast to indicate that the expression has integer constant operands. To avoid the inner one from build_binary_op, c_objc_common_truthvalue_conversion should be given an argument properly marked as having integer constant operands rather than that information having been removed by the caller - but because c_convert would then also wrap a C_MAYBE_CONST_EXPR with a NOP_EXPR converting to boolean, it seems most convenient to have c_objc_common_truthvalue_conversion produce the NE_EXPR directly in the desired type (boolean in this case), before generating any C_MAYBE_CONST_EXPR there, rather than it always producing a comparison in integer_type_node and doing a conversion to boolean in the caller. The same issue as in those PRs also applies for conversion to enums with a boolean fixed underlying type; that case is also fixed and tests added for it. Note that not all the tests added failed before the patch (in particular, the issue was specific to casts and did not apply for implicit conversions, but some tests of those are added as well). Bootstrapped with no regressions for x86_64-pc-linux-gnu. PR c/111059 PR c/111911 gcc/c/ * c-tree.h (c_objc_common_truthvalue_conversion): Add third argument. * c-convert.cc (c_convert): For conversions to boolean, pass third argument to c_objc_common_truthvalue_conversion rather than converting here. * c-typeck.cc (build_c_cast): Ensure arguments with integer operands are marked as such for conversion to boolean. (c_objc_common_truthvalue_conversion): Add third argument TYPE. gcc/testsuite/ * gcc.c-torture/compile/pr111059-1.c, gcc.c-torture/compile/pr111059-2.c, gcc.c-torture/compile/pr111059-3.c, gcc.c-torture/compile/pr111059-4.c, gcc.c-torture/compile/pr111059-5.c, gcc.c-torture/compile/pr111059-6.c, gcc.c-torture/compile/pr111059-7.c, gcc.c-torture/compile/pr111059-8.c, gcc.c-torture/compile/pr111059-9.c, gcc.c-torture/compile/pr111059-10.c, gcc.c-torture/compile/pr111059-11.c, gcc.c-torture/compile/pr111059-12.c, gcc.c-torture/compile/pr111911-1.c, gcc.c-torture/compile/pr111911-2.c: New tests.
2024-01-30	aarch64: Avoid allocating FPRs to address registers [PR113623]	Richard Sandiford	1	-0/+137
	For something like: void foo (void) { int ptr; asm volatile ("%0" : "=w" (ptr)); asm volatile ("%0" :: "m" (ptr)); } early-ra would allocate ptr to an FPR for the first asm, thus leaving an FPR address in the second asm. The address was then reloaded by LRA to make it valid. But early-ra shouldn't be allocating at all in that kind of situation. Doing so caused the ICE in the PR (with LDP fusion). Fixed by making sure that we record address references as GPR references. gcc/ PR target/113623 * config/aarch64/aarch64-early-ra.cc (early_ra::preprocess_insns): Mark all registers that occur in addresses as needing a GPR. gcc/testsuite/ PR target/113623 * gcc.c-torture/compile/pr113623.c: New test.
2024-01-30	tree-ssa-strlen: Fix up handle_store [PR113603]	Jakub Jelinek	1	-0/+40
	Since r10-2101-gb631bdb3c16e85f35d3 handle_store uses count_nonzero_bytes{,_addr} which (more recently limited to statements with the same vuse) can walk earlier statements feeding the rhs of the store and call get_stridx on it. Unlike most of the other functions where get_stridx is called first on rhs and only later on lhs, handle_store calls get_stridx on the lhs before the count_nonzero_bytes* call and does some si->nonzero_bytes comparison on it. Now, strinfo structures are refcounted and it is important not to screw it up. What happens on the following testcase is that we call get_strinfo on the destination idx's base (g), which returns a strinfo at that moment with refcount of 2, one copy referenced in bb 2 final strinfos, one in bb 3 (the vector of strinfos was unshared from the dominator there because some other strinfo was added) and finally we process a store in bb 6. Now, count_nonzero_bytes is called and that sees &g[1] in a PHI and calls get_stridx on it, which in turn calls get_stridx_plus_constant because &g + 1 address doesn't have stridx yet. This creates a new strinfo for it: si = new_strinfo (ptr, idx, build_int_cst (size_type_node, nonzero_chars), basesi->full_string_p); set_strinfo (idx, si); and the latter call, because it is the first one in bb 6 that needs it, unshares the stridx_to_strinfo vector (so refcount of the g strinfo becomes 3). Now, get_stridx_plus_constant needs to chain the new strinfo of &g[1] in between the related strinfos, so after the g record. Because the strinfo is now shared between the current bb and 2 other bbs, it needs to unshare_strinfo it (creating a new strinfo which can be modified as a copy of the old one, decrementing refcount of the old shared one and setting refcount of the new one to 1): if (strinfo nextsi = get_strinfo (chainsi->next)) { nextsi = unshare_strinfo (nextsi); si->next = nextsi->idx; nextsi->prev = idx; } chainsi = unshare_strinfo (chainsi); if (chainsi->first == 0) chainsi->first = chainsi->idx; chainsi->next = idx; Now, the bug is that the caller of this a couple of frames above, handle_store, holds on a pointer to this g strinfo (but doesn't know about the unsharing, so the pointer is to the old strinfo with refcount of 2), and later needs to update it, so it si = unshare_strinfo (si); and modifies some fields in it. This creates a new strinfo (with refcount of 1 which is stored into the vector of the current bb) based on the old strinfo for g and decrements refcount of the old one to 1. So, now we are in inconsistent state, because the old strinfo for g is referenced in bb 2 and bb 3 vectors, but has just refcount of 1, and then have one strinfo (the one created by unshare_strinfo (chainsi) in get_stridx_plus_constant) which has refcount of 1 but isn't referenced from anywhere anymore. Later on when we free one of the bb 2 or bb 3 vectors (forgot which) that decrements refcount from 1 to 0 and poisons the strinfo/returns it to the pool, but then maybe_invalidate when looking at the other bb's pointer to it ICEs. The following patch fixes it by calling get_strinfo again, it is guaranteed to return non-NULL, but could be an unshared copy instead of the originally fetched shared one. I believe we only need to do this refetching for the case where get_strinfo is called on the lhs before get_stridx is called on other operands, because we should be always modifying (apart from the chaining changes) the strinfo for the destination of the statements, not other strinfos just consumed in there. 2024-01-30 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113603 tree-ssa-strlen.cc (strlen_pass::handle_store): After count_nonzero_bytes call refetch si using get_strinfo in case it has been unshared in the meantime. * gcc.c-torture/compile/pr113603.c: New test.
2024-01-29	testsuite: no dfp run without dfprt	Alexandre Oliva	1	-0/+1
	newlib-src/libc/include/sys/fenv.h doesn't define the FE_* macros that libgcc expects to enable decimal float support. Only after newlib is configured and built does an overriding header that defines those macros become available in objdir/<target>/newlib/targ-include/, but by then, libgcc has already been built without dfp and libbid. This has exposed a number of tests that attempt to link dfp programs without requiring a dfprt effective target. dfp.exp already skips if dfp support is missing altogether, and sets the default to compile rather than run if dfp support is present in the compiler but missing in the runtime libraries. However, some of the dfp tests override the default without requiring dfprt. Drop the overriders where reasonable, and add the explicit requirement elsewhere. for gcc/testsuite/ChangeLog * c-c++-common/dfp/pr36800.c: Drop dg-do overrider. * c-c++-common/dfp/pr39034.c: Likewise. * c-c++-common/dfp/pr39035.c: Likewise. * gcc.dg/dfp/bid-non-canonical-d32-1.c: Likewise. * gcc.dg/dfp/bid-non-canonical-d32-2.c: Likewise. * gcc.dg/dfp/bid-non-canonical-d64-1.c: Likewise. * gcc.dg/dfp/bid-non-canonical-d64-2.c: Likewise. * gcc.dg/dfp/builtin-snan-1.c: Likewise. * gcc.dg/dfp/builtin-tgmath-dfp.c: Likewise. * gcc.dg/dfp/c23-float-dfp-4.c: Likewise. * gcc.dg/dfp/c23-float-dfp-5.c: Likewise. * gcc.dg/dfp/c23-float-dfp-6.c: Likewise. * gcc.dg/dfp/c23-float-dfp-7.c: Likewise. * gcc.dg/dfp/pr108068.c: Likewise. * gcc.dg/dfp/pr97439.c: Likewise. * g++.dg/compat/decimal/pass-1_main.C: Require dfprt. * g++.dg/compat/decimal/pass-2_main.C: Likewise. * g++.dg/compat/decimal/pass-3_main.C: Likewise. * g++.dg/compat/decimal/pass-4_main.C: Likewise. * g++.dg/compat/decimal/pass-5_main.C: Likewise. * g++.dg/compat/decimal/pass-6_main.C: Likewise. * g++.dg/compat/decimal/return-1_main.C: Likewise. * g++.dg/compat/decimal/return-2_main.C: Likewise. * g++.dg/compat/decimal/return-3_main.C: Likewise. * g++.dg/compat/decimal/return-4_main.C: Likewise. * g++.dg/compat/decimal/return-5_main.C: Likewise. * g++.dg/compat/decimal/return-6_main.C: Likewise. * g++.dg/eh/dfp-1.C: Likewise. * g++.dg/eh/dfp-2.C: Likewise. * g++.dg/eh/dfp-saves-aarch64.C: Likewise. * gcc.c-torture/execute/pr80692.c: Likewise. * gcc.dg/dfp/bid-non-canonical-d128-1.c: Likewise. * gcc.dg/dfp/bid-non-canonical-d128-2.c: Likewise. * gcc.dg/dfp/bid-non-canonical-d128-3.c: Likewise. * gcc.dg/dfp/bid-non-canonical-d128-4.c: Likewise.
2024-01-29	aarch64: Ensure iterator validity when updating debug uses [PR113616]	Alex Coplan	1	-0/+19
	The fix for PR113089 introduced range-based for loops over the debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a debug insn, the use would get removed from the use list, and thus we would end up using an invalidated iterator in the next iteration of the loop. In practice this means we end up terminating the loop prematurely, and hence ICE as in PR113089 since there are debug uses that we failed to fix up. This patch fixes that by introducing a general mechanism to avoid this sort of problem. We introduce a safe_iterator to iterator-utils.h which wraps an iterator, and also holds the end iterator value. It then pre-computes the next iterator value at all iterations, so it doesn't matter if the original iterator got invalidated during the loop body, we can still move safely to the next iteration. We introduce an iterate_safely helper which effectively adapts a container such as iterator_range into a container of safe_iterators over the original iterator type. We then use iterate_safely around all loops over debug_insn_uses () in the aarch64 ldp/stp pass to fix PR113616. While doing this, I remembered that cleanup_tombstones () had the same problem. I previously worked around this locally by manually maintaining the next nondebug insn, so this patch also refactors that loop to use the new iterate_safely helper. While doing that I noticed that a couple of cases in cleanup_tombstones could be converted from using dyn_cast<set_info > to as_a<set_info >, which should be safe because there are no clobbers of mem in RTL-SSA, so all defs of memory should be set_infos. gcc/ChangeLog: PR target/113616 * config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add): Use iterate_safely when iterating over debug uses. (fixup_debug_uses): Likewise. (ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate over nondebug insns instead of manually maintaining the next insn. * iterator-utils.h (class safe_iterator): New. (iterate_safely): New. gcc/testsuite/ChangeLog: PR target/113616 * gcc.c-torture/compile/pr113616.c: New test.
2024-01-29	tree-ssa-strlen: Fix pdata->maxlen computation [PR110603]	Jakub Jelinek	1	-0/+16
	On the following testcase we emit an invalid range of [2, 1] due to UB in the source. Older VRP code silently swapped the boundaries and made [1, 2] range out of it, but newer code just ICEs on it. The reason for pdata->minlen 2 is that we see a memcpy in this case setting both elements of the array to non-zero value, so strlen (a) can't be smaller than 2. The reason for pdata->maxlen 1 is that in char a[2] array without UB there can be at most 1 non-zero character because there needs to be '\0' termination in the buffer too. IMHO we shouldn't create invalid ranges like that and even creating for that case a range [1, 2] looks wrong to me, so the following patch just doesn't set maxlen in that case to the array size - 1, matching what will really happen at runtime when triggering such UB (strlen will be at least 2, perhaps more or will crash). This is what the second hunk of the patch does. The first hunk fixes a fortunately harmless thinko. If the strlen pass knows the string length (i.e. get_string_length function returns non-NULL), we take a different path, we get to this only if all we know is that there are certain number of non-zero characters but we don't know what it is followed with, whether further non-zero characters or zero termination or either of that. If we know exactly how many non-zero characters it is, such as char a[42]; ... memcpy (a, "01234567890123456789", 20); then we take an earlier if for the INTEGER_CST case and set correctly just pdata->minlen to 20 in that case, but if we have something like int len; ... if (len < 15 \|\| len > 32) return; memcpy (a, "0123456789012345678901234567890123456789", len); then we have [15, 32] range for the nonzero_chars and we set pdata->minlen correctly to 15, but incorrectly set also pdata->maxlen to 32. That is not what the above implies, it just means that in some cases we know that there are at least 32 non-zero characters, followed by something we don't know. There is no guarantee that there is '\0' right after it, so it means nothing. The reason this is harmless, just confusing, is that the code a few lines later fortunately overwrites this incorrect pdata->maxlen value with something different (either array length - 1 or all ones etc.). 2024-01-29 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/110603 * tree-ssa-strlen.cc (get_range_strlen_dynamic): Remove incorrect setting of pdata->maxlen to vr.upper_bound (which is unconditionally overwritten anyway). Avoid creating invalid range with minlen larger than maxlen. Formatting fix. * gcc.c-torture/compile/pr110603.c: New test.
2024-01-23	aarch64: Fix up debug uses in ldp/stp pass [PR113089]	Alex Coplan	1	-0/+26
	As the PR shows, we were missing code to update debug uses in the load/store pair fusion pass. This patch fixes that. The patch tries to give a complete treatment of the debug uses that will be affected by the changes we make, and in particular makes an effort to preserve debug info where possible, e.g. when re-ordering an update of a base register by a constant over a debug use of that register. When re-ordering loads over a debug use of a transfer register, we reset the debug insn. Likewise when re-ordering stores over debug uses of mem. While doing this I noticed that try_promote_writeback used a strange choice of move_range for the pair insn, in that it chose the previous nondebug insn instead of the insn itself. Since the insn is being changed, these move ranges are equivalent (at least in terms of nondebug insn placement as far as RTL-SSA is concerned), but I think it is more natural to choose the pair insn itself. This is needed to avoid incorrectly updating some debug uses. gcc/ChangeLog: PR target/113089 * config/aarch64/aarch64-ldp-fusion.cc (reset_debug_use): New. (fixup_debug_use): New. (fixup_debug_uses_trailing_add): New. (fixup_debug_uses): New. Use it ... (ldp_bb_info::fuse_pair): ... here. (try_promote_writeback): Call fixup_debug_uses_trailing_add to fix up debug uses of the base register that are affected by folding in the trailing add insn. gcc/testsuite/ChangeLog: PR target/113089 * gcc.c-torture/compile/pr113089.c: New test.
2024-01-23	aarch64: Don't assert recog success in ldp/stp pass [PR113114]	Alex Coplan	1	-0/+9
	The PR shows two different cases where try_promote_writeback produces an RTL pattern which isn't recognized. Currently this leads to an ICE, as we assert recog success, but I think it's better just to back out of the changes gracefully if recog fails (as we do in the main fuse_pair case). In theory since we check the ranges here recog shouldn't fail (which is why I had the assert in the first place), but the PR shows an edge case in the patterns where if we form a pre-writeback pair where the writeback offset is exactly -S, where S is the size in bytes of one transfer register, we fail to match the expected pattern as the patterns look explicitly for plus operands in the mems. I think fixing this would require adding at least four new special-case patterns to aarch64.md for what doesn't seem to be a particularly useful variant of the insns. Even if we were to do that, I think it would be GCC 15 material, and it's better to just punt for GCC 14. The ILP32 case in the PR is a bit different, as that shows us trying to combine a pair with DImode base register operands in the mems together with an SImode trailing update of the base register. This leads to us forming an RTL pattern which references the base register in both SImode and DImode, which also fails to recog. Again, I think it's best just to take the missed optimization for now. If we really want to make this (try_promote_writeback) work for ILP32, we can try to do it for GCC 15. gcc/ChangeLog: PR target/113114 * config/aarch64/aarch64-ldp-fusion.cc (try_promote_writeback): Don't assert recog success, just punt if the writeback pair isn't recognized. gcc/testsuite/ChangeLog: PR target/113114 * gcc.c-torture/compile/pr113114.c: New test. * gcc.target/aarch64/pr113114.c: New test.
2024-01-20	ipa: Add testcase for already fixed case [PR110705]	Andrew Pinski	1	-0/+27
	This testcase was fixed with r13-1695-gb0f02eeb906b63 which added an Ada testcase for the issue but adding a C testcase is a good idea and that is what this does. Committed after making sure it passes on x86_64-linux-gnu. PR ipa/110705 gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr110705-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-17	aarch64: Fix aarch64_ldp_reg_operand predicate not to allow all subreg ↵	Andrew Pinski	1	-0/+12
	[PR113221] So the problem here is that aarch64_ldp_reg_operand will all subreg even subreg of lo_sum. When LRA tries to fix that up, all things break. So the fix is to change the check to only allow reg and subreg of regs. Note the tendancy here is to use register_operand but that checks the mode of the register but we need to allow a mismatch modes for this predicate for now. Built and tested for aarch64-linux-gnu with no regressions (Also tested with the LD/ST pair pass back on). PR target/113221 gcc/ChangeLog: * config/aarch64/predicates.md (aarch64_ldp_reg_operand): For subreg, only allow REG operands instead of allowing all. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr113221-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-17	testsuite: Add testcase for already fixed PR [PR110251]	Jakub Jelinek	1	-0/+27
	This testcase started to hang at -O3 with r13-4208 and got fixed with r14-2097. 2024-01-17 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/110251 * gcc.c-torture/compile/pr110251.c: New test.
2024-01-16	cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]	Jakub Jelinek	3	-0/+122
	The following patch adds a quick workaround to bugs in VAR_DECL partitioning. The problem is that there is no dependency between ADDR_EXPRs of local decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs (including ivopts integral variants thereof), which can break add_scope_conflicts discovery of what variables are actually used in certain region. E.g. we can have ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void )&bitint.6 + 8B]; ... uses of ivtmp.40_3 ... bitint.6 ={v} {CLOBBER(eos)}; ... ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void )&bitint.6 + 8B]; ... uses of ivtmp.28_43 before VN (such as dom3), which the add_scope_conflicts code identifies as 2 independent uses of bitint.6 variable (which is correct), but then VN determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3 even in the second region; at that point add_scope_conflict thinks the bitint.6 variable is not used in that region anymore. The following patch does a simple single def-stmt check for such ADDR_EXPRs (rather than say trying to do a full propagation of what SSA_NAMEs can contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs. In addition to this patch I've used the attached one to gather statistics on the total size of all variable partitions in a function and seems besides the new testcases nothing is really affected compared to no patch (I've actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so it looks the same except that it never triggers). The comparison wasn't perfect because I've only gathered BITS_PER_WORD, main_input_filename (did some replacement of build directories and /tmp/ccXXXXXX names of LTO to make it more similar between the two bootstraps/regtests), current_function_name and the total size of all variable partitions if any, because I didn't record e.g. the optimization options and so e.g. torture tests which iterate over options could have different partition sizes even in one compiler when BITS_PER_WORD, main_input_filename and current_function_name are all equal. So had to write an awk script to check if the first triple in the second build appeared in the first one and the quadruple in the second build appeared in the first one too, otherwise print result and that only triggered in the new tests. Also, the cc1plus binary according to objdump -dr is identical between the two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots. 2024-01-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113372 PR middle-end/90348 PR middle-end/110115 PR middle-end/111422 * cfgexpand.cc (add_scope_conflicts_2): New function. (add_scope_conflicts_1): Use it. * gcc.dg/torture/bitint-49.c: New test. * gcc.c-torture/execute/pr90348.c: New test. * gcc.c-torture/execute/pr110115.c: New test. * gcc.c-torture/execute/pr111422.c: New test.
2024-01-11	expr: Limit the store flag optimization for single bit to non-vectors [PR113322]	Andrew Pinski	1	-0/+14
	The problem here is after the recent vectorizer improvements, we end up with a comparison against a vector bool 0 which then tries expand_single_bit_test which is not expecting vector comparisons at all. The IR was: vector(4) <signed-boolean:1> mask_patt_5.13; _Bool _12; mask_patt_5.13_44 = vect_perm_even_41 != { 0.0, 1.0e+0, 2.0e+0, 3.0e+0 }; _12 = mask_patt_5.13_44 == { 0, 0, 0, 0 }; and we tried to call expand_single_bit_test for the last comparison. Rejecting the vector comparison is needed. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/113322 gcc/ChangeLog: * expr.cc (do_store_flag): Don't try single bit tests with comparison on vector types. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr113322-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-10	reassoc vs uninitialized variable [PR112581]	Andrew Pinski	1	-0/+37
	Like r14-2293-g11350734240dba and r14-2289-gb083203f053f16, reassociation can combine across a few bb and one of the usage can be an uninitializated variable and if going from an conditional usage to an unconditional usage can cause wrong code. This uses maybe_undef_p like other passes where this can happen. Note if-to-switch uses the function (init_range_entry) provided by ressociation so we need to call mark_ssa_maybe_undefs there; otherwise we assume almost all ssa names are uninitialized. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/112581 * gimple-if-to-switch.cc (pass_if_to_switch::execute): Call mark_ssa_maybe_undefs. * tree-ssa-reassoc.cc (can_reassociate_op_p): Uninitialized variables can not be reassociated. (init_range_entry): Check for uninitialized variables too. (init_reassoc): Call mark_ssa_maybe_undefs. gcc/testsuite/ChangeLog: PR tree-optimization/112581 * gcc.c-torture/execute/pr112581-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-09	vect: Ensure both NITERSM1 and NITERS are INTEGER_CSTs or neither of them ↵	Jakub Jelinek	1	-0/+13
	[PR113210] On the following testcase e.g. on riscv64 or aarch64 (latter with -O3 -march=armv8-a+sve ) we ICE, because while NITERS is INTEGER_CST, NITERSM1 is a complex expression like (short unsigned int) (a.0_1 + 255) + 1 > 256 ? ~(short unsigned int) (a.0_1 + 255) : 0 where a.0_1 is unsigned char. The condition is never true, so the above is equivalent to just 0, but only when trying to fold the above with PLUS_EXPR 1 we manage to simplify it (first ~(short unsigned int) (a.0_1 + 255) to -(short unsigned int) (a.0_1 + 255) and then (short unsigned int) (a.0_1 + 255) + 1 > 256 ? -(short unsigned int) (a.0_1 + 255) : 1 to (short unsigned int) (a.0_1 + 255) >= 256 ? -(short unsigned int) (a.0_1 + 255) : 1 and only at this point we fold the condition to be false. But the vectorizer seems to assume that if NITERS is known (i.e. suitable INTEGER_CST) then NITERSM1 also is, so the following hack ensures that if NITERS folds into INTEGER_CST NITERSM1 will be one as well. 2024-01-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113210 * tree-vect-loop.cc (vect_get_loop_niters): If non-INTEGER_CST value in number_of_iterationsm1 PLUS_EXPR 1 is folded into INTEGER_CST, recompute number_of_iterationsm1 as the INTEGER_CST minus 1. * gcc.c-torture/compile/pr113210.c: New test.
2024-01-08	gimplify: Fix ICE in recalculate_side_effects [PR113228]	Jakub Jelinek	1	-0/+17
	The following testcase ICEs during regimplificatgion since the addition of (convert (eqne zero_one_valued_p@0 INTEGER_CST@1)) simplification. That simplification is novel in the sense that in gimplify_expr it can turn an expression (comparison in particular) into a SSA_NAME. Normally when gimplify_expr sees originally a SSA_NAME, it does case SSA_NAME: /* Allow callbacks into the gimplifier during optimization. / ret = GS_ALL_DONE; break; and doesn't try to recalculate side effects because of that, but in this case gimplify_expr normally enters the: default: switch (TREE_CODE_CLASS (TREE_CODE (expr_p))) { case tcc_comparison: then does expr_p = gimple_boolify (expr_p); and then expr_p = fold_convert_loc (input_location, org_type, expr_p); with this new match.pd simplification turns that tcc_comparison class into SSA_NAME. Unlike the outer SSA_NAME handling though, this falls through into recalculate_side_effects (expr_p); dont_recalculate: break; but unfortunately recalculate_side_effects doesn't handle SSA_NAME and ICEs on it. SSA_NAMEs don't ever have TREE_SIDE_EFFECTS set on those, so the following patch fixes it by handling it similarly to the tcc_constant case. 2024-01-08 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113228 gimplify.cc (recalculate_side_effects): Do nothing for SSA_NAMEs. * gcc.c-torture/compile/pr113228.c: New test.
2024-01-07	testsuite/52641: Fix sloppy tests that did not care for sizeof(int)=2 etc.	Georg-Johann Lay	8	-12/+15
	gcc/testsuite/ PR testsuite/52641 * gcc.c-torture/compile/attr-complex-method-2.c [target=avr]: Check for "divsc3" as double = float per default. * gcc.c-torture/compile/pr106537-1.c: Use __INTPTR_TYPE__ instead of hard-coded "long". * gcc.c-torture/compile/pr106537-2.c: Same. * gcc.c-torture/compile/pr106537-3.c: Same. * gcc.c-torture/execute/20230630-3.c: Use __INT32_TYPE__ for bit-field wider than 16 bits. * gcc.c-torture/execute/20230630-4.c: Same. * gcc.c-torture/execute/pr109938.c: Require int32plus. * gcc.c-torture/execute/pr109986.c: Same. * gcc.dg/fold-ior-4.c: Same. * gcc.dg/fold-ior-5.c: Same * gcc.dg/fold-parity-5.c: Same. * gcc.dg/fold-popcount-5.c: Same. * gcc.dg/builtin-bswap-13.c [sizeof(int) < 4]: Use __INT32_TYPE__ instead of int. * gcc.dg/builtin-bswap-14.c: Use __INT32_TYPE__ instead of int where required by code. * gcc.dg/c23-constexpr-9.c: Require large_double. * gcc.dg/c23-nullptr-1.c [target=avr]: xfail. * gcc.dg/loop-unswitch-10.c: Require size32plus. * gcc.dg/loop-unswitch-14.c: Same. * gcc.dg/loop-unswitch-11.c: Require int32. * gcc.dg/pr101836.c: Use __SIZEOF_INT instead of hard-coded 4. * gcc.dg/pr101836_1.c: Same. * gcc.dg/pr101836_2.c: Same. * gcc.dg/pr101836_3.c: Same.
2024-01-05	scev: Avoid ICE on results used in abnormal PHI args [PR113201]	Jakub Jelinek	1	-0/+15
	The following testcase ICEs when rslt is SSA_NAME_OCCURS_IN_ABNORMAL_PHI and we call replace_uses_by with a INTEGER_CST def, where it ICEs on: if (e->flags & EDGE_ABNORMAL && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (val)) because val is not an SSA_NAME. One way would be to add && TREE_CODE (val) == SSA_NAME check in between the above 2 lines in replace_uses_by. And/or the following patch just punts propagating constants to SSA_NAME_OCCURS_IN_ABNORMAL_PHI rslt uses. Or we could punt somewhere earlier in final value replacement (but dunno where). 2024-01-05 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113201 * tree-scalar-evolution.cc (final_value_replacement_loop): Don't call replace_uses_by on SSA_NAME_OCCURS_IN_ABNORMAL_PHI rslt. * gcc.c-torture/compile/pr113201.c: New test.
2024-01-04	MIPS/testsuite: Include stdio.h in mipscop tests	YunQiang Su	4	-0/+4
	gcc/testsuite * gcc.c-torture/compile/mipscop-1.c: Include stdio.h. * gcc.c-torture/compile/mipscop-2.c: Ditto. * gcc.c-torture/compile/mipscop-3.c: Ditto. * gcc.c-torture/compile/mipscop-4.c: Ditto.
2024-01-03	Update copyright years.	Jakub Jelinek	5	-5/+5

2023-12-22	combine: Don't optimize paradoxical SUBREG AND CONST_INT on ↵	Jakub Jelinek	1	-0/+15
	WORD_REGISTER_OPERATIONS targets [PR112758] As discussed in the PR, the following testcase is miscompiled on RISC-V 64-bit, because num_sign_bit_copies in one spot pretends the bits in a paradoxical SUBREG beyond SUBREG_REG SImode are all sign bit copies: 5444 /* For paradoxical SUBREGs on machines where all register operations 5445 affect the entire register, just look inside. Note that we are 5446 passing MODE to the recursive call, so the number of sign bit 5447 copies will remain relative to that mode, not the inner mode. 5448 5449 This works only if loads sign extend. Otherwise, if we get a 5450 reload for the inner part, it may be loaded from the stack, and 5451 then we lose all sign bit copies that existed before the store 5452 to the stack. / 5453 if (WORD_REGISTER_OPERATIONS 5454 && load_extend_op (inner_mode) == SIGN_EXTEND 5455 && paradoxical_subreg_p (x) 5456 && MEM_P (SUBREG_REG (x))) and then optimizes based on that in one place, but then the r7-1077 optimization triggers in and treats all the upper bits in paradoxical SUBREG as undefined and performs based on that another optimization. The r7-1077 optimization is done only if SUBREG_REG is either a REG or MEM, from the discussions in the PR seems that if it is a REG, the upper bits in paradoxical SUBREG on WORD_REGISTER_OPERATIONS targets aren't really undefined, but we can't tell what values they have because we don't see the operation which computed that REG, and for MEM it depends on load_extend_op - if it is SIGN_EXTEND, the upper bits are sign bit copies and so something not really usable for the optimization, if ZERO_EXTEND, they are zeros and it is usable for the optimization, for UNKNOWN I think it is better to punt as well. So, the following patch basically disables the r7-1077 optimization on WORD_REGISTER_OPERATIONS unless we know it is still ok for sure, which is either if sub_width is >= BITS_PER_WORD because then the WORD_REGISTER_OPERATIONS rules don't apply, or load_extend_op on a MEM is ZERO_EXTEND. 2023-12-22 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/112758 combine.cc (make_compopund_operation_int): Optimize AND of a SUBREG based on nonzero_bits of SUBREG_REG and constant mask on WORD_REGISTER_OPERATIONS targets only if it is a zero extending MEM load. * gcc.c-torture/execute/pr112758.c: New test.