riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-03-28	If the LHS does not contain zero, neither do multiply operands.	Andrew MacLeod	2	-0/+37
	Given ~[0,0] = op1 * op2, range-ops should determine that neither op1 nor op2 is zero. Add this to the operator_mult for op1_range. op2_range simply invokes op1_range, so both will be covered. PR tree-optimzation/110992.c PR tree-optimzation/119471.c gcc/ * range-op.cc (operator_mult::op1_range): If the LHS does not contain zero, return non-zero. gcc/testsuite/ * gcc.dg/pr110992.c: New. * gcc.dg/pr119471.c: New.
2025-03-28	testsuite: Add options for float16 for test [PR119133]	Christophe Lyon	1	-0/+1
	Some targets (like arm) need some flags to enable _Float16 support. gcc/testsuite/ChangeLog: PR target/119133 * gcc.dg/torture/pr119133.c: Add options for float16.
2025-03-27	testsuite: fix dg-* typos	Sam James	3	-4/+4
	I have a handful more of these left but those introduce FAILs, while these all introduce new PASSes. libstdc++-v3/ChangeLog: * testsuite/std/format/string_neg.cc: Add missing brace for dg-error. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/fd-datagram-socket.c: Fix 'dg-message' spelling. * gcc.dg/analyzer/out-of-bounds-zero.c: Fix whitespace in 'dg-additional-options'. * gcc.dg/analyzer/strchr-1.c: Fix 'dg-message' whitespace. * gnat.dg/sso/q11.adb: Fix 'dg-output' whitespace.
2025-03-27	testsuite: fix typos in comments	Sam James	26	-35/+35
	This fixes some 'scan-tree-dump-times' (vs '-time') typos and one or two others I noticed in passing. gcc/testsuite/ChangeLog: * g++.dg/warn/Winvalid-memory-model.C: Fix typo in comment. * gcc.dg/builtin-dynamic-object-size-19.c: Ditto. * gcc.dg/builtin-object-size-19.c: Ditto. * gcc.dg/strlenopt-40.c: Ditto. * gcc.dg/strlenopt-44.c: Ditto. * gcc.dg/strlenopt-45.c: Ditto. * gcc.dg/strlenopt-50.c: Ditto. * gcc.dg/strlenopt-51.c: Ditto. * gcc.dg/strlenopt-52.c: Ditto. * gcc.dg/strlenopt-53.c: Ditto. * gcc.dg/strlenopt-54.c: Ditto. * gcc.dg/strlenopt-55.c: Ditto. * gcc.dg/strlenopt-58.c: Ditto. * gcc.dg/strlenopt-59.c: Ditto. * gcc.dg/strlenopt-62.c: Ditto. * gcc.dg/strlenopt-65.c: Ditto. * gcc.dg/strlenopt-70.c: Ditto. * gcc.dg/strlenopt-72.c: Ditto. * gcc.dg/strlenopt-73.c: Ditto. * gcc.dg/strlenopt-77.c: Ditto. * gcc.dg/strlenopt-82.c: Ditto. * gcc.dg/tree-ssa/builtin-snprintf-4.c: Ditto. * gcc.dg/tree-ssa/builtin-snprintf-6.c: Ditto. * gcc.dg/tree-ssa/builtin-snprintf-7.c: Ditto. * gcc.dg/tree-ssa/builtin-sprintf-10.c: Ditto. * gcc.dg/tree-ssa/builtin-sprintf-9.c: Ditto. * gcc.dg/tree-ssa/phi-opt-value-5.c: Ditto. * lib/multiline.exp: Ditto. * lib/target-supports.exp: Ditto.
2025-03-27	testsuite: harmless dg-* whitespace fixes	Sam James	6	-7/+7
	These just fix inconsistent/unusual style to avoid noise when grepping and also people picking up bad habits when they see it (as similar mistakes can be harmful). gcc/testsuite/ChangeLog: * c-c++-common/goacc/pr69916.c: Fix unusual whitespace in dg-. g++.old-deja/g++.abi/vtable2.C: Ditto. * g++.old-deja/g++.bugs/900330_02.C: Ditto. * g++.old-deja/g++.bugs/900406_02.C: Ditto. * g++.old-deja/g++.bugs/900519_13.C: Ditto. * g++.old-deja/g++.mike/p9068.C: Ditto. * gcc.dg/20040203-1.c: Ditto. * gcc.dg/980502-1.c: Ditto. * gcc.dg/ipa/ipa-sra-14.c: Ditto. * gcc.dg/pr35468.c: Ditto. * gcc.dg/pr82597.c: Ditto. * gcc.dg/tree-ssa/phi-opt-7.c: Ditto. * gfortran.dg/assumed_charlen_in_main.f90: Ditto. * gfortran.dg/cray_pointers_2.f90: Ditto.
2025-03-27	c: Fix tagname confusion for typedef redefinitions [PR118765]	Martin Uecker	3	-0/+89
	When we redefine a typedef for a tagged type that has just been redefined, merge_decls may produce invalid TYPE_DECLS that are not consistent with what set_underlying_type produces. This is fixed by updating DECL_ORIGINAL_TYPE. PR c/118765 gcc/c/ChangeLog: * c-decl.cc (merge_decls): For TYPE_DECLS copy DECL_ORIGINAL_TYPE from the old declaration. * c-typeck.cc (tagged_types_tu_compatible_p): Add checking assertions. gcc/testsuite/ChangeLog: * gcc.dg/pr118765-2.c: New test. * gcc.dg/pr118765-3.c: New test. * gcc.dg/typedef-redecl3.c: New test.
2025-03-27	testsuite: fix more dg-* whitespace issues	Sam James	8	-8/+8
	A handful of cosmetic ones in here but most meant the directive wasn't doing anything. gcc/testsuite/ChangeLog: PR target/98743 PR tree-optimization/105820 * g++.dg/cpp0x/udlit-namespace-ambiguous.C: Fix whitespace. * g++.dg/cpp2a/constexpr-init21.C: Ditto. * g++.dg/diagnostic/wrong-tag-1.C: Ditto. * g++.dg/init/self1.C: Ditto. * g++.dg/opt/pr98743.C: Add missing '}' to terminate dg directive. * g++.dg/parse/error8.C: Fix whitespace. * g++.dg/template/explicit-args6.C: Add missing '{' to begin dg directive. * g++.dg/template/unify9.C: Fix whitespace. * g++.dg/tree-ssa/pr105820.C: Ditto. * g++.dg/warn/Wmismatched-tags-8.C: Add missing braces. * gcc.dg/cpp/cmdlne-dM-M.c: Ditto. * gcc.dg/tree-ssa/reassoc-32.c: Ditto. * gcc.dg/tree-ssa/reassoc-33.c: Ditto. * gcc.dg/tree-ssa/reassoc-34.c: Ditto. * gcc.dg/tree-ssa/reassoc-35.c: Ditto. * gcc.dg/tree-ssa/reassoc-36.c: Ditto. * gcc.dg/tree-ssa/reassoc-39.c: Ditto. * gcc.dg/tree-ssa/reassoc-41.c: Ditto.
2025-03-27	testsuite: tree-ssa: fix PR98265 filename	Sam James	1	-348/+0
	.C is for C++ testcases and gcc.dg's dg.exp ignores .c. The test was not being run. gcc/testsuite/ChangeLog: PR ipa/98265 * gcc.dg/tree-ssa/pr98265.C: Move to... * g++.dg/tree-ssa/pr98265.C: ...here.
2025-03-26	testsuite, gomp: fix broken dg directives	David Malcolm	2	-2/+2
	gcc/testsuite/ChangeLog: * c-c++-common/gomp/metadirective-target-device-2.c: Fix missing trailing " }" on dg-do directive. * gcc.dg/gomp/attrs-21.c: Likewise for dg-options. * gcc.dg/gomp/parallel-2.c: Drop ":" from dg-message. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-03-26	testsuite: fix broken dg directives	David Malcolm	10	-16/+16
	Found by dg-lint. gcc/testsuite/ChangeLog: * gcc.dg/ipa/pr110377.c: Fix missing trailing " }" in dg-do directive. * gcc.dg/plugin/infoleak-1.c: Fix dg-bogus directive. * gcc.dg/pr101364-1.c: Fix missing trailing " }" in dg-options directive. * gcc.dg/pr113207.c: Fix dg-do. * gcc.dg/sarif-output/include-chain-2.c: Fix ordering of dg-do and dg-require-effective-target. * gcc.dg/strub-pr118007.c: Likewise. * gcc.dg/tanhbysinh.c: Fix missing whitespace after opening brace and before closing brace in 6 dg-final directives. * gcc.dg/uninit-pred-3_c.c: Fix missing whitespace after opening brace in 6 dg-final directive. * gcc.dg/uninit-pred-3_d.c: Likewise. * gcc.dg/variable-sized-type-flex-array.c: Fix missing space between dg-bogus and message in 2 places. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-03-26	widening_mul: Fix up further r14-8680 widening mul issues [PR119417]	Jakub Jelinek	1	-0/+24
	The following testcase is miscompiled since r14-8680 PR113560 changes. I've already tried to fix some of the issues caused by that change in r14-8823 PR113759, but apparently didn't get it right. The problem is that the r14-8680 changes sometimes set type_out to a narrower type than the new_rhs_out actually has (because it will handle stuff like _1 = rhs1 & 0xffff; and imply from that HImode type_out. Now, if in convert_mult_to_widen or convert_plusminus_to_widen we actually get optab for the modes we've asked for (i.e. with from_mode and to_mode), everything works fine, if the operands don't have the expected types, they are converted to those (for INTEGER_CSTs with fold_convert, otherwise with build_and_insert_cast). On the following testcase on aarch64 that is not the case, we ask for from_mode HImode and to_mode DImode, but get actual_mode SImode. The mult_rhs1 operand already has SImode and we change type1 to unsigned int and so no cast is actually done, except that the & 0xffff is lost that way. The following patch ensures that if we change typeN because of wider actual_mode (or because of a sign change), we first cast to the old typeN (if the r14-8680 code was encountered, otherwise it would have the same precision) and only then change it, and then perhaps cast again. On the testcase on aarch64-linux the patch results in the expected - add x19, x19, w0, uxtw 1 + add x19, x19, w0, uxth 1 difference. 2025-03-26 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/119417 * tree-ssa-math-opts.cc (convert_mult_to_widen): Before changing typeN because actual_precision/from_unsignedN differs cast rhsN to typeN if it has a different type. (convert_plusminus_to_widen): Before changing typeN because actual_precision/from_unsignedN differs cast mult_rhsN to typeN if it has a different type. * gcc.dg/torture/pr119417.c: New test.
2025-03-26	testsuite: add testcase for recent alias fix	Sam James	1	-0/+16
	r15-7961-gdc47161c1f32c3 fixes a typo in ao_compare::compare_ao_refs but there wasn't a testcase available at the time. Now there is. Thanks to Andrew for the testcase. gcc/testsuite/ChangeLog: PR testsuite/119382 * gcc.dg/ipa/ipa-icf-40.c: New test. Co-authored-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-03-25	opcodes: fix wrong code in expand_binop_directly [PR117811]	Richard Earnshaw	1	-0/+27
	If expand_binop_directly fails to add a REG_EQUAL note it tries to unwind and restart. But it can unwind too far if expand_binop changed some of the operands before calling it. We don't need to unwind that far anyway since we should end up taking exactly the same route next time, just without a target rtx. To fix this we remove LAST from the argument list and let the callers (all in expand_binop) do their own unwinding if the call fails. Instead we unwind just as far as the entry to expand_binop_directly and recurse within this function instead of all the way back up. gcc/ChangeLog: PR middle-end/117811 * optabs.cc (expand_binop_directly): Remove LAST as an argument, instead record the last insn on entry. Only delete insns if we need to restart and restart by calling ourself, not expand_binop. (expand_binop): Update callers to expand_binop_directly. If it fails to expand the operation, delete back to LAST. gcc/testsuite: PR middle-end/117811 * gcc.dg/torture/pr117811.c: New test.
2025-03-21	lra, v2: emit caller-save register spills before call insn [PR116028]	Jakub Jelinek	3	-3/+26
	Here is an updated version of Surya's PR116028 fix from August, which got reverted because it caused bootstrap failures on aarch64, later on bootstrap comparison errors there as well and problems on other targets as well. Original description: LRA emits insns to save caller-save registers in the inheritance/splitting pass. In this pass, LRA builds EBBs (Extended Basic Block) and traverses the insns in the EBBs in reverse order from the last insn to the first insn. When LRA sees a write to a pseudo (that has been assigned a caller-save register), and there is a read following the write, with an intervening call insn between the write and read, then LRA generates a spill immediately after the write and a restore immediately before the read. The spill is needed because the call insn will clobber the caller-save register. If there is a write insn and a call insn in two separate BBs but belonging to the same EBB, the spill insn gets generated in the BB containing the write insn. If the write insn is in the entry BB, then the spill insn that is generated in the entry BB prevents shrink wrap from happening. This is because the spill insn references the stack pointer and hence the prolog gets generated in the entry BB itself. This patch ensures the the spill insn is generated before the call insn instead of after the write. This also ensures that the spill occurs only in the path containing the call. The changes compared to the first r15-2810 version are: 1) the reason for aarch64 miscompilations and later on bootstrap comparison issues as can be seen on the pr118615.c testcase in the patch was that when curr_insn is a JUMP_INSN or some cases of CALL_INSNs, split_if_necessary is called with before_p true and if it is successful, the code set use_insn = PREV_INSN (curr_insn); instead of use_insn = curr_insn; and that use_insn is then what is passed to add_next_usage_insn; now, if the patch decides to emit the save instruction(s) before the first call after curr_insn in the ebb rather than before the JUMP_INSN/CALL_INSN, PREV_INSN (curr_insn) is some random insn before it, not anything related to the split_reg actions. If it is e.g. a DEBUG_INSN in one case vs. some unrelated other insn otherwise, that can affect further split_reg within the same function 2) as suggested by Surya in PR118615, it makes no sense to try to change behavior if the first call after curr_insn is in the same bb as curr_insn 3) split_reg is actually called sometimes from within inherit_in_ebb but sometimes from elsewhere; trying to use whatever last call to inherit_in_ebb saw last is a sure way to run into wrong-code issues, so instead of clearing the rtx var at the start of inherit_in_ebb it is now cleared at the end of it 4) calling the var latest_call_insn was weird, inherit_in_ebb walks the ebb backwards, so what the var contains is the first call insn within the ebb (after curr_insn) 5) the patch was using lra_process_new_insns (PREV_INSN (latest_call_insn), NULL, save, "Add save<-reg"); to emit the save insn before latest_call_insn. That feels quite weird given that latest_call_insn has explicit support for adding stuff before some insn or after some insn, adding something before some insn doesn't really need to be done as addition after PREV_INSN 6) some formatting nits + new testcase + removal of xfail even on arm32 Bootstrapped/regtested on x86_64-linux/i686-linux (my usual --enable-checking=yes,rtl,extra builds), aarch64-linux (normal default bootstrap) and our distro scratch build ({x86_64,i686,aarch64,powerpc64le,s390x}-linux --enable-checking=release LTO profiledbootstrap/regtest), I think Sam James tested on 32-bit arm too. On aarch64-linux this results in -FAIL: gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping" I admit I don't know the code well nor understood everything it is doing. I have some concerns: 1) I wonder if there is a guarantee that first_call_insn if non-NULL will be always in between curr_insn and usage_insn when call_save_p; I'd hope yes because if usage_insn is before first_call_insn in the ebb, presumably it wouldn't need to find call save regs because the range wouldn't cross any calls 2) I wonder whether it wouldn't be better instead of inserting the saves before first_call_insn insert it at the start of the bb containing that call (after labels of course); emitting it right before a call could mislead code looking for argument slot initialization of the call 3) even when avoiding the use_insn = PREV_INSN (curr_insn);, I wonder if it is ok to use use_insn equal to curr_insn rather than the insns far later where we actually inserted it, but primarily because I don't understand the code much; I think for the !before_p case it is doing similar thing on a shorter distance, the saves were emitted after curr_insn and we record it on curr_insn 2025-03-21 Surya Kumari Jangala <jskumari@linux.ibm.com> Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/116028 PR rtl-optimization/118615 * lra-constraints.cc (first_call_insn): New variable. (split_reg): Spill register before first_call_insn if call_save_p and the call is in a different bb in the ebb. (split_if_necessary): Formatting fix. (inherit_in_ebb): Set first_call_insn when handling a CALL_INSN. For successful split_if_necessary with before_p, only change use_insn if it emitted any new instructions before curr_insn. Clear first_call_insn before returning. * gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc. * gcc.dg/pr10474.c: Remove xfail for powerpc and arm. * gcc.dg/pr118615.c: New test.
2025-03-19	diagnostics: fix crash in urlifier with -Wfatal-errors [PR119366]	David Malcolm	1	-0/+8
	diagnostic_context's dtor assumed that it owned the m_urlifier pointer and would delete it. As of r15-5988-g5a022062d22e0b this isn't always the case - auto_urlify_attributes is used in various places in the C/C++ frontends and in the middle-end to temporarily override the urlifier with an on-stack instance, which would lead to delete-of-on-stack-buffer crashes with -Wfatal-errors as the global_dc was cleaned up. Fix by explicitly tracking the stack of urlifiers within diagnostic_context, tracking for each node whether the pointer is owned or borrowed. gcc/ChangeLog: PR c/119366 * diagnostic-format-sarif.cc (test_message_with_embedded_link): Convert diagnostic_context from one urlifier to a stack of urlifiers, where each node in the stack tracks whether the urlifier is owned or borrowed. * diagnostic.cc (diagnostic_context::initialize): Likewise. (diagnostic_context::finish): Likewise. (diagnostic_context::set_urlifier): Delete. (diagnostic_context::push_owned_urlifier): New. (diagnostic_context::push_borrowed_urlifier): New. (diagnostic_context::pop_urlifier): New. (diagnostic_context::get_urlifier): Reimplement in terms of stack. (diagnostic_context::override_urlifier): Delete. * diagnostic.h (diagnostic_context::set_urlifier): Delete decl. (diagnostic_context::override_urlifier): Delete decl. (diagnostic_context::push_owned_urlifier): New decl. (diagnostic_context::push_borrowed_urlifier): New decl. (diagnostic_context::pop_urlifier): New decl. (diagnostic_context::get_urlifier): Make return value const; hide implementation. (diagnostic_context::m_urlifier): Replace with... (diagnostic_context::urlifier_stack_node): ... this and... (diagnostic_context::m_urlifier_stack): ...this. * gcc-urlifier.cc (auto_override_urlifier::auto_override_urlifier): Reimplement. (auto_override_urlifier::~auto_override_urlifier): Reimplement. * gcc-urlifier.h (class auto_override_urlifier): Reimplement. (auto_urlify_attributes::auto_urlify_attributes): Update for pass-by-reference. * gcc.cc (driver::global_initializations): Update for reimplementation of urlifiers in terms of a stack. * toplev.cc (general_init): Likewise. gcc/testsuite/ChangeLog: PR c/119366 * gcc.dg/Wfatal-bad-attr-pr119366.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-03-19	c: pedwarn on flexible array member initialization with {} for C23+ [PR119350]	Jakub Jelinek	3	-0/+42
	Even in C23/C2Y any initialization of flexible array member is still invalid, so we should emit a pedwarn on it. But we no longer do for initialization with {}. The reason is that for C17 and earlier, we already emitted a pedwarn on the {} initializer and so emitting another pedwarn on the flexible array member initialization would be diagnosing the same thing multiple times. In C23 we no longer pedwarn on {}, it is standard. The following patch arranges a pedwarning for that for C23+, so that at least one pedwarning is emitted. So that we don't "regress" from C17 to C23 on nested flexible array member initialization with no -pedantic/-pedantic-errors/-Wpedantic, the patch emits even the initialization of flexible array member in a nested context diagnostic as pedwarn in the {} case, after all, it doesn't cause much trouble, we just ignore it like before, it wouldn't initialize anything. 2025-03-19 Jakub Jelinek <jakub@redhat.com> PR c/119350 * c-typeck.cc (pop_init_level): Don't ignore empty brackets for flag_isoc23, still set constructor_type to NULL in that case but emit a pedwarn_init in that case. * gcc.dg/pr119350-1.c: New test. * gcc.dg/pr119350-2.c: New test. * gcc.dg/pr119350-3.c: New test.
2025-03-19	testsuite/113634 - fixup declarations of calloc/realloc	Richard Biener	1	-3/+3
	Then we can also remove the added -std=gnu17 PR testsuite/113634 * gcc.dg/Wfree-nonheap-object-7.c: Adjust calloc and realloc declarations, remove -std=gnu17.
2025-03-19	middle-end: update early-break tests for non-load-lanes targets [PR119286]	Tamar Christina	12	-23/+23
	Broadly speaking, these tests were failing because the BB limitation for SLP'ing loads in an \|\| in an early break makes the loads end up in different BBs and so today we can't SLP them. This results in load_lanes being required to vectorize them because the alternative is loads with permutes which we don't allow. The original checks were only checking partial vectors, which ended up working because e.g. Adv. SIMD isn't a partial vector target, so it failed, and SVE was a partial vector target but also has load lanes so it passes. GCN however is a partial vector target without load lanes which makes the tests fail. As we require load_lanes for now, also check for them. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Cross checked the failing cases on amdgcn-amdhsa and all pass now. gcc/testsuite/ChangeLog: PR target/119286 * gcc.dg/vect/bb-slp-41.c: Add pragma novector. * gcc.dg/vect/vect-early-break_133_pfa11.c: Should never vectorize today as indexes can be out of range. * gcc.dg/vect/vect-early-break_128.c: Require load_lanes as well. * gcc.dg/vect/vect-early-break_133_pfa10.c: Likewise. * gcc.dg/vect/vect-early-break_133_pfa8.c: Likewise. * gcc.dg/vect/vect-early-break_133_pfa9.c: Likewise. * gcc.dg/vect/vect-early-break_22.c: Likewise. * gcc.dg/vect/vect-early-break_26.c: Likewise. * gcc.dg/vect/vect-early-break_43.c: Likewise. * gcc.dg/vect/vect-early-break_44.c: Likewise. * gcc.dg/vect/vect-early-break_6.c: Likewise. * gcc.dg/vect/vect-early-break_56.c: Expect failures on group misalign.
2025-03-19	Remove mistakenly committed file	Jakub Jelinek	1	-0/+0
	r15-7222 added an empty file gcc.dg/pr not mentioned in the ChangeLog nor used anywhere in that patch. Removed as obvious. 2025-03-19 Jakub Jelinek <jakub@redhat.com> * gcc.dg/pr: Remove.
2025-03-19	c: Fix bug in typedef redefinitions of tagged types [PR118765]	Martin Uecker	1	-0/+7
	When we redefine a tagged type we incorrectly update TYPE_STUB_DECL of the previously defined type instead of the new one. Because TYPE_STUB_DECL is used when determining whether two such types are the same, this can cause valid typedef redefinitions to be rejected later. This is only a partial fix for PR118765. PR c/118765 gcc/c/ChangeLog: * c-decl.cc (finish_struct,finish_enum): Swap direction when copying TYPE_STRUB_DECL in redefinitions. gcc/testsuite/ChangeLog: * gcc.dg/pr118765.c: New test.
2025-03-19	c: Fix ICE in error recovery when checking struct compatibility [PR118061]	Martin Uecker	1	-0/+8
	Return early when comparing two structures for compatibility and the type of a member is erroneous. PR c/118061 gcc/c/ChangeLog: * c-typeck.cc (tagged_types_tu_compatible_p): Handle errors in types of struct members. gcc/testsuite/ChangeLog: * gcc.dg/pr118061.c: New test.
2025-03-18	testsuite: Add support for dg-output-file directive	Jakub Jelinek	3	-0/+19
	The COBOL tests has many tests which just dump emit lots of output to stdout and want to compare it against expected output. We have the dg-output directive, but if one needs more than dozens of lines in the output, adding hundreds of dg-output directives to each source uses too much memory and is harder to maintain. The following patch offers an alternative, dg-output-file directive where one can supply a text file with expected output (no regexp matching in that case, just exact output, except that it handles different line ending styles (for the expected file using tcl gets, for the actual output skips over \n, \r\n or \r). And a newline at the end of the whole output is optional (in the actual output, because I think some boards get it eaten). Also tested with addition or subtraction of some characters from the expected output files and saw FAILs with appropriate messages. 2025-03-18 Jakub Jelinek <jakub@redhat.com> * doc/sourcebuild.texi (dg-output-file): Document. * lib/gcc-dg.exp (${tool}-load): If output-file is set, compare combined output against content of the [lindex ${output-file} 1] file. (dg-output-file): New directive. * lib/dg-test-cleanup.exp (cleanup-after-saved-dg-test): Clear output-file variable. * gcc.dg/dg-output-file-1.c: New test. * gcc.dg/dg-output-file-1-lp64.txt: New test. * gcc.dg/dg-output-file-1-ilp32.txt: New test.
2025-03-17	gcc.dg/pr90838-2.c: Replace long with long long	H.J. Lu	1	-2/+2
	Since gcc.dg/pr90838-2.c is only for 64-bit integer, replace long with long long for ILP32 targets. * gcc.dg/pr90838-2.c (ctz4): Replace long with long long. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-17	testsuite: Add -gno-strict-dwarf option to dwarf2 inline[26].c tests	John David Anglin	2	-2/+2
	Some targets default to strict dwarf. 2025-03-17 John David Anglin <danglin@gcc.gnu.org> gcc/testsuite/ChangeLog: PR testsuite/119220 * gcc.dg/debug/dwarf2/inline2.c: Add -gno-strict-dwarf option. * gcc.dg/debug/dwarf2/inline6.c: Likewise.
2025-03-17	testsuite: s390: Skip gcc.dg/vect/bb-slp-77.c	Stefan Schulze Frielinghaus	1	-1/+1
	There exists no .REDUC_PLUS on s390. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-77.c: Skip on s390.
2025-03-14	match.pd: Fix up r15-8025 simplification [PR119287]	Jakub Jelinek	1	-0/+16
	The following testcase ICEs since r15-8025. tree_nop_conversion_p doesn't imply TREE_TYPE (@0) is uselessly convertible to type, e.g. they could be INTEGER_TYPEs with the same precision but different TYPE_SIGN. The following patch just adds a convert so that it creates a valid IL even in those cases. 2025-03-14 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/119287 * match.pd (((X >> C1) & C2) * (1 << C1) to X & (C2 << C1)): Use (convert @0) instead of @0 in the substitution. * gcc.dg/pr119287.c: New test.
2025-03-14	tree-optimization/119155 - wrong aligned access for vectorized packed access	Richard Biener	1	-0/+26
	When doing strided SLP vectorization we use the wrong alignment for the possibly piecewise access of the vector elements for loads and stores. While we are carefully using element aligned loads and stores that isn't enough for the case the original scalar accesses are packed. The following instead honors larger alignment when present but correctly falls back to the original scalar alignment used. PR tree-optimization/119155 * tree-vect-stmts.cc (vectorizable_store): Do not always use vector element alignment for VMAT_STRIDED_SLP but a more correct alignment towards both ends. (vectorizable_load): Likewise. * gcc.dg/vect/pr119155.c: New testcase.
2025-03-13	match.pd: Extend pointer alignment folds	Richard Sandiford	2	-0/+121
	We have long had the fold: /* Pattern match tem = (sizetype) ptr; tem = tem & algn; tem = -tem; ... = ptr p+ tem; and produce the simpler and easier to analyze with respect to alignment ... = ptr & ~algn; / But the gimple in gcc.target/aarch64/sve/pr98119.c has a variant in which a constant is added before the conversion, giving: tem = (sizetype) (ptr p+ CST); tem = tem & algn; tem = -tem; ... = ptr p+ tem; This case is also valid if algn fits within the trailing zero bits of CST. Adding CST then has no effect. Similarly the testcase has: tem = (sizetype) (ptr p+ CST1); tem = tem & algn; tem = CST2 - tem; ... = ptr p+ tem; This folds to: ... = (ptr & ~algn) p+ CST2; if algn fits within the trailing zero bits of both CST1 and CST2. An alternative would be: ... = (ptr p+ CST2) & ~algn; but I would expect the alignment to be more easily shareable than the CST2 addition, given that the CST2 addition wasn't being applied by a POINTER_PLUS_EXPR. gcc/ match.pd: Extend pointer alignment folds so that they handle the case where a constant is added before or after the alignment. gcc/testsuite/ * gcc.dg/pointer-arith-11.c: New test. * gcc.dg/pointer-arith-12.c: Likewise.
2025-03-13	match.pd: Fold ((X >> C1) & C2) * (1 << C1)	Richard Sandiford	2	-0/+74
	Using a combination of rules, we were able to fold ((X >> C1) & C2) * (1 << C1) --> X & (C2 << C1) if everything was done at the same precision, but we couldn't fold it if the AND was done at a different precision. The optimisation is often (but not always) valid for that case too. This patch adds a dedicated rule for the case where different precisions are involved. An alternative would be to extend the individual folds that together handle the same-precision case so that those rules handle differing precisions. But the risk is that that could replace narrow operations with wide operations, which would be especially harmful on targets like avr. It's also not obviously free of cycles. I also wondered whether the converts should be non-optional. gcc/ * match.pd: Fold ((X >> C1) & C2) * (1 << C1) to X & (C2 << C1). gcc/testsuite/ * gcc.dg/fold-mul-and-lshift-1.c: New test. * gcc.dg/fold-mul-and-lshift-2.c: Likewise.
2025-03-11	c: Don't emit -Wunterminated-string-initialization warning for ↵	Jakub Jelinek	1	-0/+120
	multi-dimensional nonstring array initializers [PR117178] My/Kees' earlier patches adjusted -Wunterminated-string-initialization warning so that it doesn't warn about initializers of nonstring decls and that nonstring attribute is allowed on multi-dimensional arrays. Unfortunately as this testcase shows, we still warn about initializers of multi-dimensional array nonstring decls. The problem is that in that case field passed to output_init_element is actually INTEGER_CST, index into the array. For RECORD_OR_UNION_TYPE_P (constructor_type) field is a FIELD_DECL which we want to use, but otherwise (in arrays) IMHO we want to use constructor_fields (which is the innermost FIELD_DECL whose part is being initialized), or - if that is NULL - constructor_decl, the whole decl being initialized with multi-dimensional array type. 2025-03-11 Jakub Jelinek <jakub@redhat.com> PR c/117178 * c-typeck.cc (output_init_element): Pass field to digest_init only for record/union types, otherwise pass constructor_fields if non-NULL and constructor_decl if constructor_fields is NULL. * gcc.dg/Wunterminated-string-initialization-2.c: New test.
2025-03-11	aarch64: Fix DFP constants [PR119131]	Andrew Pinski	1	-0/+31
	After r15-6660-g45d306a835cb3f865, in some cases DFP constants would cause an ICE. This is due to do a mismatch of a few things. The predicate of the move uses aarch64_valid_fp_move to say if the constant is valid or not. But after reload/LRA when can_create_pseudo_p returns false; aarch64_valid_fp_move would return false for constants that were valid for the constraints of the instruction. A strictor predicate compared to the constraint is wrong. In this case `Uvi` is the constraint while aarch64_valid_fp_move allows it via aarch64_can_const_movi_rtx_p for !DECIMAL_FLOAT_MODE_P, there is no such check for DECIMAL_FLOAT_MODE_P. The fix is to remove the check !DECIMAL_FLOAT_MODE_P in aarch64_valid_fp_move and in the define_expand. As now the predicate allows a superset of what is allowed by the constraints. aarch64_float_const_representable_p should be rejecting DFP modes as they can't be used with instructions like `mov s0, 1.0`. Changes since v1: * v2: Add check to aarch64_float_const_representable_p for DFP. Built and tested on aarch64-linux-gnu with no regressions. PR target/119131 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_valid_fp_move): Remove check for !DECIMAL_FLOAT_MODE_P. (aarch64_float_const_representable_p): Reject decimal floating modes. * config/aarch64/aarch64.md (mov<mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr119131-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-03-11	testsuite: Improve builtin-bswap-5.c	Oscar Gustafsson	1	-3/+3
	gcc/testsuite/ChangeLog * gcc.dg/builtin-bswap-5.c: Improve test vector to avoid nibble swaps passing.
2025-03-11	middle-end/119204 - ICE with strcspn folding	Richard Biener	1	-0/+13
	The following makes sure to convert the folded expression to the original expression type. PR middle-end/119204 * builtins.cc (fold_builtin_strcspn): Preserve the original expression type. * gcc.dg/pr119204.c: New testcase.
2025-03-11	tree: Improve skip_simple_arithmetic [PR119183]	Jakub Jelinek	1	-0/+12
	The following testcase takes very long time to compile, because skip_simple_arithmetic decides to first call tree_invariant_p on the second argument (and indirectly recurse there). I think before canonicalization of operands for commutative binary expressions (and for non-commutative ones always) it is pretty common that the first operand is a constant, something which tree_invariant_p handles immediately, so the following patch special cases that; I've added there a tree_invariant_p call too after the checks, while it is not really needed currently, tree_invariant_p has the same checks, I wanted to be prepared in case tree_invariant_p changes. But if you think I should avoid it, I can drop it too. This is just a partial fix, I think one can certainly construct a testcase which will still have horrible compile time complexity (but I've tried and haven't managed to do so), so perhaps we should just limit the recursion depth through skip_simple_arithmetic/tree_invariant_p with some defaulted argument. 2025-03-11 Jakub Jelinek <jakub@redhat.com> PR c/119183 * tree.cc (skip_simple_arithmetic): If first operand of binary expr is TREE_CONSTANT or TREE_READONLY with no side-effects, call tree_invariant_p on that operand first instead of on the second. * gcc.dg/pr119183.c: New test.
2025-03-10	libgcc: Fix up unwind-dw2-btree.h [PR119151]	Jakub Jelinek	1	-0/+151
	The following testcase shows a bug in unwind-dw2-btree.h. In short, the header provides lock-free btree data structure (so no parent link on nodes, both insertion and deletion are done in top-down walks with some locking of just a few nodes at a time so that lookups can notice concurrent modifications and retry, non-leaf (inner) nodes contain keys which are initially the base address of the left-most leaf entry of the following child (or all ones if there is none) minus one, insertion ensures balancing of the tree to ensure [d/2, d] entries filled through aggressive splitting if it sees a full tree while walking, deletion performs various operations like merging neighbour trees, merging into parent or moving some nodes from neighbour to the current one). What differs from the textbook implementations is mostly that the leaf nodes don't include just address as a key, but address range, address + size (where we don't insert any ranges with zero size) and the lookups can be performed for any address in the [address, address + size) range. The keys on inner nodes are still just address-1, so the child covers all nodes where addr <= key unless it is covered already in children to the left. The user (static executables or JIT) should always ensure there is no overlap in between any of the ranges. In the testcase a bunch of insertions are done, always followed by one removal, followed by one insertion of a range slightly different from the removed one. E.g. in the first case [&code[0x50], &code[0x59]] range is removed and then we insert [&code[0x4c], &code[0x53]] range instead. This is valid, it doesn't overlap anything. But the problem is that some non-leaf (inner) one used the &code[0x4f] key (after the 11 insertions completely correctly). On removal, nothing adjusts the keys on the parent nodes (it really can't in the top-down only walk, the keys could be many nodes above it and unlike insertion, removal only knows the start address, doesn't know the removed size and so will discover it only when reaching the leaf node which contains it; plus even if it knew the address and size, it still doesn't know what the second left-most leaf node will be (i.e. the one after removal)). And on insertion, if nodes aren't split at a level, nothing adjusts the inner keys either. If a range is inserted and is either fully bellow key (keys are - 1, so having address + size - 1 being equal to key is fine) or fully after key (i.e. address > key), it works just fine, but if the key is in a middle of the range like in this case, &code[0x4f] is in the middle of the [&code[0x4c], &code[0x53]] range, then insertion works fine (we only use size on the leaf nodes), and lookup of the addresses below the key work fine too (i.e. [&code[0x4c], &code[0x4f]] will succeed). The problem is with lookups after the key (i.e. [&code[0x50, &code[0x53]]), the lookup looks for them in different children of the btree and doesn't find an entry and returns NULL. As users need to ensure non-overlapping entries at any time, the following patch fixes it by adjusting keys during insertion where we know not just the address but also size; if we find during the top-down walk a key which is in the middle of the range being inserted, we simply increase the key to be equal to address + size - 1 of the range being inserted. There can't be any existing leaf nodes overlapping the range in correct programs and the btree rebalancing done on deletion ensures we don't have any empty nodes which would also cause problems. The patch adjusts the keys in two spots, once for the current node being walked (the last hunk in the header, with large comment trying to explain it) and once during inner node splitting in a parent node if we'd otherwise try to add that key in the middle of the range being inserted into the parent node (in that case it would be missed in the last hunk). The testcase covers both of those spots, so succeeds with GCC 12 (which didn't have btrees) and fails with vanilla GCC trunk and also fails if either the if (fence < base + size - 1) fence = iter->content.children[slot].separator = base + size - 1; or if (left_fence >= target && left_fence < target + size - 1) left_fence = target + size - 1; hunk is removed (of course, only with the current node sizes, i.e. up to 15 children of inner nodes and up to 10 entries in leaf nodes). 2025-03-10 Jakub Jelinek <jakub@redhat.com> Michael Leuchtenburg <michael@slashhome.org> PR libgcc/119151 * unwind-dw2-btree.h (btree_split_inner): Add size argument. If left_fence is in the middle of [target,target + size - 1] range, increase it to target + size - 1. (btree_insert): Adjust btree_split_inner caller. If fence is smaller than base + size - 1, increase it and separator of the slot to base + size - 1. * gcc.dg/pr119151.c: New test.
2025-03-10	LoongArch: testsuite: Fix gcc.dg/vect/slp-26.c.	Lulu Cheng	1	-2/+2
	After d34cda720988674bcf8a24267c9e1ec61335d6de, what was originally not vectorizable can now be vectorized. So adjust gcc.dg/vect/slp-26.c. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-26.c: Adjust.
2025-03-10	LoongArch: testsuite: Fix gcc.dg/vect/bb-slp-77.c.	Lulu Cheng	1	-1/+1
	The issue is the same as 12383255fe4e82c31f5e42c72a8fbcb1b5dea35d. Neither is .REDUC_PLUS set for V2SImode on LoongArch, so add it to the list of targets not expecting BB vectorization. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-77.c: Add loongarch--* to the list of expected failing targets.
2025-03-10	LoongArch: testsuite: Fix pr112325.c and pr117888-1.c.	Lulu Cheng	2	-0/+2
	By default, vectorization is not enabled on LoongArch, resulting in the failure of these two test cases. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr112325.c: Add the vector compilation option '-mlsx' for LoongArch. * gcc.dg/vect/pr117888-1.c: Likewise.
2025-03-09	phiopt: Fix value_replacement for middle bb having phi nodes [PR118922]	Andrew Pinski	1	-0/+57
	After r12-5300-gf98f373dd822b3, value_replacement would be able to look at the following cfg structure: ``` <bb 5> [local count: 1014686024]: if (h_6 != 0) goto <bb 7>; [94.50%] else goto <bb 6>; [5.50%] <bb 6> [local count: 114863530]: # h_6 = PHI <0(4), 1(5)> <bb 7> [local count: 1073741824]: # f_8 = PHI <0(5), h_6(6)> _9 = f_8 ^ 1; a.0_10 = a; _11 = _9 + a.0_10; if (_11 != -117) goto <bb 5>; [94.50%] else goto <bb 8>; [5.50%] ``` value_replacement would incorrectly think the middle bb (6) was empty and so it decides to remove condition in bb5 and replacing it with 0 as the function thought it was `h_6 ? 0 : h_6`. But since the there is an incoming phi node to bb6 defining h_6 that is incorrect. The fix is to check if there is phi nodes in the middle bb and set empty_or_with_defined_p to false. This was not needed before r12-5300-gf98f373dd822b3 because the phi would have been dead otherwise due to other checks. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/118922 gcc/ChangeLog: * tree-ssa-phiopt.cc (value_replacement): Set empty_or_with_defined_p to false when there is phi nodes for the middle bb. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr118922-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-03-09	testsuite: Require effective target float16 for test [PR119133]	Dimitar Dimitrov	1	-0/+1
	The test spuriously failed on pru-unknown-elf due to missing support for _Float16 type. PR target/119133 gcc/testsuite/ChangeLog: * gcc.dg/torture/pr119133.c: Require effective target float16. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-03-07	c: do not warn about truncating NUL char when initializing nonstring arrays ↵	Jakub Jelinek	4	-3/+96
	[PR117178] When initializing a nonstring char array when compiled with -Wunterminated-string-initialization the warning trips even when truncating the trailing NUL character from the string constant. Only warn about this when running under -Wc++-compat since under C++ we should not initialize nonstrings from C strings. This patch separates the -Wunterminated-string-initialization and -Wc++-compat warnings, they are now independent option, the former implied by -Wextra, the latter not implied by anything. If -Wc++-compat is in effect, it takes precedence over -Wunterminated-string-initialization and warns regardless of nonstring attribute, otherwise if -Wunterminated-string-initialization is enabled, it warns only if there isn't nonstring attribute. In all cases, the warnings and also pedwarn_init for even larger sizes now provide details on the lengths. 2025-03-07 Kees Cook <kees@kernel.org> Jakub Jelinek <jakub@redhat.com> PR c/117178 gcc/ * doc/invoke.texi (Wunterminated-string-initialization): Document the new interaction between this warning and -Wc++-compat and that initialization of decls with nonstring attribute aren't warned about. gcc/c-family/ * c.opt (Wunterminated-string-initialization): Don't depend on -Wc++-compat. gcc/c/ * c-typeck.cc (digest_init): Add DECL argument. Adjust wording of pedwarn_init for too long strings and provide details on the lengths, for string literals where just the trailing NULL doesn't fit warn for warn_cxx_compat with OPT_Wc___compat, wording which mentions "for C++" and provides details on lengths, otherwise for warn_unterminated_string_initialization adjust the warning, provide details on lengths and don't warn if get_attr_nonstring_decl (decl). (build_c_cast, store_init_value, output_init_element): Adjust digest_init callers. gcc/testsuite/ * gcc.dg/Wunterminated-string-initialization.c: Add additional test coverage. * gcc.dg/Wcxx-compat-14.c: Check in dg-warning for "for C++" part of the diagnostics. * gcc.dg/Wcxx-compat-23.c: New test. * gcc.dg/Wcxx-compat-24.c: New test. Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-07	Fix testcases up after recent -Wreturn-type change	Andrew Pinski	2	-2/+2
	I missed these two testcases in the diff when looking for testcases that fail. The change is the same as what was done for gcc.dg/Wreturn-mismatch-2.c. Pushed as obvious after a quick test. gcc/testsuite/ChangeLog: * gcc.dg/Wreturn-mismatch-2a.c: Change dg-warning for the last -Wreturn-type to dg-bogus. * gcc.dg/Wreturn-mismatch-6.c: Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-03-07	c: Fix warning after an error on a return statment [PR60440]	Andrew Pinski	2	-1/+11
	Like r5-6912-g3dbb84276aca10 but this is for the C front-end. Basically we have an error on a return statement, we just return error_mark_node and then the warning happens as there is no return statement. Anyways instead mark the current function for supression of the warning instead. PR c/60440 gcc/c/ChangeLog: * c-typeck.cc (c_finish_return): Mark the current function for supression of the -Wreturn-type if there was an error on the return statement. gcc/testsuite/ChangeLog: * gcc.dg/Wreturn-mismatch-2.c: Change dg-warning for the last -Wreturn-type to dg-bogus. * gcc.dg/pr60440-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-03-07	middle-end: delay checking for alignment to load [PR118464]	Tamar Christina	31	-18/+322
	This fixes two PRs on Early break vectorization by delaying the safety checks to vectorizable_load when the VF, VMAT and vectype are all known. This patch does add two new restrictions: 1. On LOAD_LANES targets, where the buffer size is known, we reject non-power of two group sizes, as they are unaligned every other iteration and so may cross a page unwittingly. For those cases require partial masking support. 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if we cannot peel for alignment, as the alignment requirement is quite large at GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we don't support it for now. There are other steps documented inside the code itself so that the reasoning is next to the code. As a fall-back, when the alignment fails we require partial vector support. For VLA targets like SVE return element alignment as the desired vector alignment. This means that the loads are never misaligned and so annoying it won't ever need to peel. So what I think needs to happen in GCC 16 is that. 1. during vect_compute_data_ref_alignment we need to take the max of POLY_VALUE_MIN and vector_alignment. 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use as a proxy for pagesize. 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in vect_determine_partial_vectors_and_peeling since the first iteration has to be partial. Require LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P otherwise we have to fail to vectorize. 4. Create a default mask to be used, so that vect_use_loop_mask_for_alignment_p becomes true and we generate the peeled check through loop control for partial loops. From what I can tell this won't work for LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at all in the compiler. That would need to be done independently from the above. In any case, not GCC 15 material so I've kept the WIP patches I have downstream. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. gcc/ChangeLog: PR tree-optimization/118464 PR tree-optimization/116855 * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay checks. (vect_compute_data_ref_alignment): Remove alignment checks and move to get_load_store_type, increase group access alignment. (vect_enhance_data_refs_alignment): Add note to comment needing investigating. (vect_analyze_data_refs_alignment): Likewise. (vect_supportable_dr_alignment): For group loads look at first DR. * tree-vect-stmts.cc (get_load_store_type): Perform safety checks for early break pfa. * tree-vectorizer.h (dr_set_safe_speculative_read_required, dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New. (need_peeling_for_alignment): Renamed to... (safe_speculative_read_required): .. This (class dr_vec_info): Add scalar_access_known_in_bounds. gcc/testsuite/ChangeLog: PR tree-optimization/118464 PR tree-optimization/116855 * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the load type is relaxed later. * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. * gcc.dg/vect/vect-early-break_22.c: Require partial vectors. * gcc.dg/vect/vect-early-break_128.c: Likewise. * gcc.dg/vect/vect-early-break_26.c: Likewise. * gcc.dg/vect/vect-early-break_43.c: Likewise. * gcc.dg/vect/vect-early-break_44.c: Likewise. * gcc.dg/vect/vect-early-break_2.c: Require load_lanes. * gcc.dg/vect/vect-early-break_7.c: Likewise. * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. * gcc.dg/vect/vect-early-break_133_pfa11.c: New test. * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment. * gcc.dg/vect/vect-early-break_18.c: Likewise. * gcc.dg/vect/vect-early-break_20.c: Likewise. * gcc.dg/vect/vect-early-break_21.c: Likewise. * gcc.dg/vect/vect-early-break_38.c: Likewise. * gcc.dg/vect/vect-early-break_6.c: Likewise. * gcc.dg/vect/vect-early-break_53.c: Likewise. * gcc.dg/vect/vect-early-break_56.c: Likewise. * gcc.dg/vect/vect-early-break_57.c: Likewise. * gcc.dg/vect/vect-early-break_81.c: Likewise.
2025-03-07	tree-optimization/119145 - avoid stray .MASK_CALL after vectorization	Richard Biener	1	-0/+35
	When we BB vectorize an if-converted loop body we make sure to not leave around .MASK_LOAD or .MASK_STORE created by if-conversion but we failed to check for .MASK_CALL. PR tree-optimization/119145 * tree-vectorizer.cc (try_vectorize_loop_1): Avoid BB vectorizing an if-converted loop body when there's a .MASK_CALL in the loop body. * gcc.dg/vect/pr119145.c: New testcase.
2025-03-07	vect: Enforce dr_with_seg_len::align precondition [PR116125]	Richard Sandiford	1	-0/+30
	tree-data-refs.cc uses alignment information to try to optimise the code generated for alias checks. The assumption for "normal" non-grouped, full-width scalar accesses was that the access size would be a multiple of the alignment. As Richi notes in the PR, this is a documented precondition of dr_with_seg_len: /* The minimum common alignment of DR's start address, SEG_LEN and ACCESS_SIZE. / unsigned int align; PR115192 was a case in which this assumption didn't hold. The access was part of an aligned 4-element group, but only the first 2 elements of the group were accessed. The alignment was therefore double the access size. In r15-820-ga0fe4fb1c8d78045 I'd "fixed" that by capping the alignment in one of the output routines. But I think that was misconceived. The precondition means that we should cap the alignment at source instead. Failure to do that caused a similar wrong code bug in this PR, where the alignment comes from a short bitfield access rather than from a group access. gcc/ PR tree-optimization/116125 tree-vect-data-refs.cc (vect_prune_runtime_alias_test_list): Make the dr_with_seg_len alignment fields describe tha access sizes as well as the pointer alignment. * tree-data-ref.cc (create_intersect_range_checks): Don't compensate for invalid alignment fields here. gcc/testsuite/ PR tree-optimization/116125 * gcc.dg/vect/pr116125.c: New test.
2025-03-07	aarch64: Use force_lowpart_subreg in a BFI splitter [PR119133]	Richard Sandiford	1	-0/+8
	lowpart_subreg ICEs are the gift that keeps giving. This is another case where we need to use force_lowpart_subreg instead, to handle cases where the input is already a subreg and where the combined subreg is not allowed as a single operation. We don't need to check can_create_pseudo_p since the input should be a hard register rather than a subreg if !can_create_pseudo_p. gcc/ PR target/119133 * config/aarch64/aarch64.md (aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>): Use force_lowpart_subreg. gcc/testsuite/ PR target/119133 gcc.dg/torture/pr119133.c: New test.
2025-03-06	[PR rtl-optimization/119099] Avoid infinite loop in ext-dce.	Alexey Merzlyakov	1	-0/+19
	This fixes the ping-ponging of live sets in ext-dce which is left unresolved can lead to infinite loops in the ext-dce pass as seen by the P1 regression 119099. At its core instead of replacing the livein set with the just recomputed data, we IOR in the just recomputed data to the existing livein set. That ensures the existing livein set never shrinks. Bootstrapped and regression tested on x86. I've also thrown this into my tester to verify it across multiple targets and that we aren't regressing the (limited) tests we have in place for ext-dce's optimization behavior. While it's a generic patch, I'll wait for the RISC-V tester to run is course before committing. PR rtl-optimization/119099 gcc/ * ext-dce.cc (ext_dce_rd_transfer_n): Do not allow the livein set to shrink. gcc/testsuite/ * gcc.dg/torture/pr119099.c: New test. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2025-03-05	value-range: Fix up irange::union_bitmask [PR118953]	Jakub Jelinek	1	-0/+42
	The following testcase is miscompiled during evrp. Before vrp, we have (from ccp): # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x2d _3 = _2 + 18446744073708503085; ... # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 VALUE 0x59 _6 = (long long unsigned int) _5; # RANGE [irange] int [-INF, +INF] MASK 0xffffc000 VALUE 0x34 _7 = k_11 + -1048524; switch (_7) <default: <L5> [33.33%], case 8: <L7> [33.33%], case 24: <L6> [33.33%], case 32: <L6> [33.33%]> ... # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc07d VALUE 0x0 # i_20 = PHI <_3(4), 0(3), _6(2)> and evrp is now trying to figure out range for i_20 in range_of_phi. All the ranges and MASK/VALUE pairs above are correct for the testcase, k_11 and _2 based on it is a result of multiplication by a constant with low 14 bits cleared and then some numbers are added to it. There is an obvious missed optimization for which I've filed PR119039, simplify_switch_using_ranges could see that all the labels but default are unreachable because the controlling expression has MASK 0xffffc000 VALUE 0x34 and none of 8, 24 and 32 satisfy that. Anyway, during range_of_phi for i_20, we process the PHI arguments in order. For the _3(4) case, we figure out that it is reachable through the case 24: case 32: labels only of the switch and that 0x34 - 0x2d is 7, so derive [irange] long long unsigned int [17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d (the MASK/VALUE just got inherited from the _3 earlier range). Now (not suprisingly because those labels aren't actually reachable), that range is inconsistent, 0x2d is 45, so there is conflict between the values and the irange_bitmask. value-range.{h,cc} code differentiates between actually stored irange_bitmask, which is that MASK 0xffffffffffffc000 VALUE 0x2d, and semantic bitmask, which is what get_bitmask returns. That is // The mask inherent in the range is calculated on-demand. For // example, [0,255] does not have known bits set by default. This // saves us considerable time, because setting it at creation incurs // a large penalty for irange::set. At the time of writing there // was a 5% slowdown in VRP if we kept the mask precisely up to date // at all times. Instead, we default to -1 and set it when // explicitly requested. However, this function will always return // the correct mask. // // This also means that the mask may have a finer granularity than // the range and thus contradict it. Think of the mask as an // enhancement to the range. For example: // // [3, 1000] MASK 0xfffffffe VALUE 0x0 // // 3 is in the range endpoints, but is excluded per the known 0 bits // in the mask. // // See also the note in irange_bitmask::intersect. irange_bitmask bm = get_bitmask_from_range (type (), lower_bound (), upper_bound ()); if (!m_bitmask.unknown_p ()) bm.intersect (m_bitmask); Now, get_bitmask_from_range here is MASK 0x1f VALUE 0x0 and it intersects that with that MASK 0xffffffffffffc000 VALUE 0x2d. Which triggers the ugly special case in irange_bitmask::intersect: // If we have two known bits that are incompatible, the resulting // bit is undefined. It is unclear whether we should set the entire // range to UNDEFINED, or just a subset of it. For now, set the // entire bitmask to unknown (VARYING). if (wi::bit_and (~(m_mask \| src.m_mask), m_value ^ src.m_value) != 0) { unsigned prec = m_mask.get_precision (); m_mask = wi::minus_one (prec); m_value = wi::zero (prec); } so the semantic bitmask is actually MASK 0xffffffffffffffff VALUE 0x0. Next, range_of_phi attempts to union it with the 0(3) PHI argument, and during irange::union_ first adds the [0,0] to the subranges, so [irange] long long unsigned int [0, 0][17, 17][25, 25] MASK 0xffffffffffffc000 VALUE 0x2d and then goes on to irange::union_bitmask which does if (m_bitmask == r.m_bitmask) return false; irange_bitmask bm = get_bitmask (); irange_bitmask save = bm; bm.union_ (r.get_bitmask ()); if (save == bm) return false; m_bitmask = bm; if (save == get_bitmask ()) return false; m_bitmask MASK 0xffffffffffffc000 VALUE 0x2d isn't the same as r.m_bitmask MASK 0x0 VALUE 0x0, so we compute the semantic bitmask (but note, not from the original range before union, but the modified one, dunno if that isn't a problem as well), which is still the VARYING/unknown_p one, union_ that with MASK 0x0 VALUE 0x0 and get still MASK 0xffffffffffffffff VALUE 0x0, so don't update anything, the semantic bitmask didn't change, so we are fine (not!, see later). Except then we try to union with the third PHI argument. And, because the edge to that comes only from case 8: label and there is a known difference between the two, the argument is actually already from earlier replaced by 45(2) constant. So, irange::union_ adds the [45, 45] range to the list of subranges, but voila, 45 is 0x2d and satisfies the stored MASK 0xffffffffffffc000 VALUE 0x2d and so the semantic bitmask changed to from MASK 0xffffffffffffffff VALUE 0x0 to MASK 0xffffffffffffc000 VALUE 0x2d by that addition. Eventually, we just optimize this to [irange] long long unsigned int [45, 45] because that is the only range which satisfies the bitmask. And that is wrong, at runtime i_20 has value 0. The following patch attempts to detect this case where get_bitmask turns some non-VARYING m_bitmask into VARYING one because of a conflict and in that case makes sure m_bitmask is actually updated rather than unmodified, so that later union_ doesn't cause problems. I also wonder whether e.g. get_bitmask couldn't have special case for this and if bm.intersect (m_bitmask); yields unknown_p from something not originally unknown_p, perhaps chooses to just use get_bitmask_from_range value and ignore the stored m_bitmask. Though, dunno how union_bitmask in that case would figure out it needs to update m_bitmask. 2025-03-05 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/118953 * value-range.cc (irange::union_bitmask): Update m_bitmask if get_bitmask () is unknown_p and m_bitmask is not even when the semantic bitmask didn't change and returning false. * gcc.dg/torture/pr118953.c: New test.
2025-03-05	middle-end/97323 - TYPE_CANONICAL vs. ARRAY_TYPE modes	Richard Biener	1	-0/+5
	For strict-alignment targets we can end up with BLKmode single-element array types when the element type is unaligned. This confuses type checking since the canonical type would have an aligned element type and a non-BLKmode mode. The following simply ignores the mode we assign to array types for this purpose, like we already do for record and union types. PR middle-end/97323 * tree.cc (gimple_canonical_types_compatible_p): Ignore TYPE_MODE also for ARRAY_TYPE. (verify_type): Likewise. * gcc.dg/pr97323.c: New testcase.