aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2025-04-17s390: Use match_scratch instead of scratch in define_split [PR119834]Jakub Jelinek2-11/+87
The following testcase ICEs since r15-1579 (addition of late combiner), because *clrmem_short can't be split. The problem is that the define_insn uses (use (match_operand 1 "nonmemory_operand" "n,a,a,a")) (use (match_operand 2 "immediate_operand" "X,R,X,X")) (clobber (match_scratch:P 3 "=X,X,X,&a")) and define_split assumed that if operands[1] is const_int_operand, match_scratch will be always scratch, and it will be reg only if it was the last alternative where operands[1] is a reg. The pattern doesn't guarantee it though, of course RA will not try to uselessly assign a reg there if it is not needed, but during RA on the testcase below we match the last alternative, but then comes late combiner and propagates const_int 3 into operands[1]. And that matches fine, match_scratch matches either scratch or reg and the constraint in that case is X for the first variant, so still just fine. But we won't split that because the splitters only expect scratch. The following patch fixes it by using match_scratch instead of scratch, so that it accepts either. 2025-04-17 Jakub Jelinek <jakub@redhat.com> PR target/119834 * config/s390/s390.md (define_split after *cpymem_short): Use (clobber (match_scratch N)) instead of (clobber (scratch)). Use (match_dup 4) and operands[4] instead of (match_dup 3) and operands[3] in the last of those. (define_split after *clrmem_short): Use (clobber (match_scratch N)) instead of (clobber (scratch)). (define_split after *cmpmem_short): Likewise. * g++.target/s390/pr119834.C: New test.
2025-04-17nvptx: Remove 'TARGET_ASM_NEED_VAR_DECL_BEFORE_USE'Thomas Schwinge1-2/+0
Unused; remnant of an (internal) experiment, before we had nvptx 'as'. gcc/ * config/nvptx/nvptx.cc (TARGET_ASM_NEED_VAR_DECL_BEFORE_USE): Don't '#define'.
2025-04-17d: Fix infinite loop regression in CTFEIain Buclaw4-4/+38
An infinite loop was introduced by a previous refactoring in the semantic pass for DeclarationExp nodes. Ensure the loop properly terminates and add tests cases. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 956e73d64e. gcc/testsuite/ChangeLog: * gdc.test/fail_compilation/test21247.d: New test. * gdc.test/fail_compilation/test21247b.d: New test. Reviewed-on: https://github.com/dlang/dmd/pull/21248
2025-04-17combine: Correct comments about combine_validate_costHans-Peter Nilsson1-3/+3
Fix misleading comments. That function only determines whether replacements cost more; it doesn't actually *validate* costs as being cheaper. For example, it returns true also if it for various reasons cannot determine the costs, or if the new cost is the same, like when doing an identity replacement. The code has been the same since r0-59417-g64b8935d4809f3. * combine.cc: Correct comments about combine_validate_cost.
2025-04-16c++: ill-formed constexpr function [PR113360]Jason Merrill9-12/+45
If we already gave an error while parsing a function, we don't also need to try to explain what's wrong with it when we later try to use it in a constant-expression. In the new testcase explain_invalid_constexpr_fn couldn't find anything still in the function to complain about, so it said because: followed by nothing. We still try to constant-evaluate it to reduce error cascades, but we shouldn't complain if it doesn't work very well. This flag is similar to CLASSTYPE_ERRONEOUS that I added a while back. PR c++/113360 gcc/cp/ChangeLog: * cp-tree.h (struct language_function): Add erroneous bit. * constexpr.cc (explain_invalid_constexpr_fn): Return if set. (cxx_eval_call_expression): Quiet if set. * parser.cc (cp_parser_function_definition_after_declarator) * pt.cc (instantiate_body): Set it. gcc/testsuite/ChangeLog: * g++.dg/cpp23/constexpr-nonlit18.C: Remove redundant message. * g++.dg/cpp1y/constexpr-diag2.C: New test. * g++.dg/cpp1y/pr63996.C: Adjust expected errors. * g++.dg/template/explicit-args6.C: Likewise. * g++.dg/cpp0x/constexpr-ice21.C: Likewise.
2025-04-17Daily bump.GCC Administrator6-1/+223
2025-04-16[testsuite] [ppc] ipa-sra-19.c: pass -Wno-psabi on powerpc-*-elf as wellAlexandre Oliva1-1/+1
Like other ppc targets, powerpc-*-elf needs -Wno-psabi to compile gcc.dg/ipa/ipa-sra-19.c without an undesired warning about vector argument passing. for gcc/testsuite/ChangeLog * gcc.dg/ipa/ipa-sra-19.c: Add -Wno-psabi on ppc-elf too.
2025-04-16Doc: Document raw string literals as GNU C extension [PR88382]Sandra Loosemore1-0/+20
gcc/ChangeLog PR c/88382 * doc/extend.texi (Syntax Extensions): Adjust menu. (Raw String Literals): New section.
2025-04-16testsuite: Replace altivec vector attribute with generic equivalent [PR112822]Peter Bergner1-1/+1
Usage of the altivec vector attribute requires use of the -maltivec option. Replace with a generic equivalent which allows building the test case on multiple other targets and non-altivec ppc cpus, but still diagnoses the ICE on unfixed compilers. 2025-04-16 Peter Bergner <bergner@linux.ibm.com> gcc/testsuite/ PR tree-optimization/112822 * g++.dg/pr112822.C: Replace altivec vector attribute with a generic vector attribute.
2025-04-16cobol: Eliminate gcc/cobol/LICENSE. [PR119759]Bob Dubner1-29/+0
gcc/cobol PR cobol/119759 * LICENSE: Deleted.
2025-04-16[PATCH] rx: avoid adding setpsw for rx_cmpstrn when len is constKeith Packard1-4/+16
pattern using rx_cmpstrn is cmpstrsi for which len is a constant -1, so we'll be moving the setpsw instructions from rx_cmpstrn to cmpstrnsi as follows: 1. Adjust the predicate on the length operand from "register_operand" to "nonmemory_operand". This will allow constants to appear here, instead of having them already transferred into a register. 2. Check to see if the len value is constant, and then check if it is actually zero. In that case, short-circuit the rest of the pattern and set the result register to 0. 3. Emit 'setpsw c' and 'setpsw z' instructions when the len is not a constant, in case it turns out to be zero at runtime. 4. Remove the two 'setpsw' instructions from rx_cmpstrn. gcc/ * config/rx/rx.md (cmpstrnsi): Allow constant length. For static length 0, just store 0 into the output register. For dynamic zero, set C/Z appropriately. (rxcmpstrn): No longer set C/Z.
2025-04-16Fix wrong optimization of conditional expression with enumeration typeEric Botcazou4-3/+53
This is a regression introduced on the mainline and 14 branch by: https://gcc.gnu.org/pipermail/gcc-cvs/2023-October/391658.html The change bypasses int_fits_type_p (essentially) to work around the signedness constraints, but in doing so disregards the peculiarities of boolean types whose precision is not 1 dealt with by the predicate, leading to the creation of a problematic conversion here. Fixed by special-casing boolean types whose precision is not 1, as done in several other places. gcc/ * tree-ssa-phiopt.cc (factor_out_conditional_operation): Do not bypass the int_fits_type_p test for boolean types whose precision is not 1. gcc/testsuite/ * gnat.dg/opt105.adb: New test. * gnat.dg/opt105_pkg.ads, gnat.dg/opt105_pkg.adb: New helper.
2025-04-16Doc: make regenerate-opt-urlsSandra Loosemore1-2/+3
gcc/ChangeLog * common.opt.urls: Regenerated.
2025-04-16c++: templates, attributes, #pragma target [PR114772]Jason Merrill2-0/+20
Since r12-5426 apply_late_template_attributes suppresses various global state to avoid applying active pragmas to earlier declarations; we also need to override target_option_current_node. PR c++/114772 PR c++/101180 gcc/cp/ChangeLog: * pt.cc (apply_late_template_attributes): Also override target_option_current_node. gcc/testsuite/ChangeLog: * g++.dg/ext/pragma-target2.C: New test.
2025-04-16c++: format attribute redeclaration [PR116954]Jason Merrill2-1/+24
Here when merging the two decls, remove_contract_attributes loses ATTR_IS_DEPENDENT on the format attribute, so apply_late_template_attributes just returns, so the attribute doesn't get propagated to the type where the warning looks for it. Fixed by using copy_node instead of tree_cons to preserve flags. PR c++/116954 gcc/cp/ChangeLog: * contracts.cc (remove_contract_attributes): Preserve flags on the attribute list. gcc/testsuite/ChangeLog: * g++.dg/warn/Wformat-3.C: New test.
2025-04-16i386: Enable -mnop-mcount for -fpic with PLTs [PR119386]Ard Biesheuvel2-2/+12
-mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs are being used, given that the instruction encodings are identical, only the target may resolve differently depending on how the linker decides to incorporate the object file. So relax the option check, and add a test to ensure that 5-byte NOPs are emitted when -mnop-mcount is being used. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> gcc/ChangeLog: PR target/119386 * config/i386/i386-options.cc: Permit -mnop-mcount when using -fpic with PLTs. gcc/testsuite/ChangeLog: PR target/119386 * gcc.target/i386/pr119386-3.c: New test.
2025-04-16i386: Prefer PLT indirection for __fentry__ calls under -fPIC [PR119386]Ard Biesheuvel3-2/+32
Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling __fentry__") updated the logic that emits mcount() / __fentry__() calls into function prologues when profiling is enabled, to avoid GOT-based indirect calls when a direct call would suffice. There are two problems with that change: - it relies on -mdirect-extern-access rather than -fno-plt to decide whether or not a direct [PLT based] call is appropriate; - for the PLT case, it falls through to x86_print_call_or_nop(), which does not emit the @PLT suffix, resulting in the wrong relocation to be used (R_X86_64_PC32 instead of R_X86_64_PLT32) Fix this by testing flag_plt instead of ix86_direct_extern_access, and updating x86_print_call_or_nop() to take flag_pic and flag_plt into account. This also ensures that -mnop-mcount works as expected when emitting the PLT based profiling calls. While at it, fix the 32-bit logic as well, and issue a PLT call unless PLTs are explicitly disabled. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> gcc/ChangeLog: PR target/119386 * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix where appropriate. (x86_function_profiler): Fall through to x86_print_call_or_nop() for PIC codegen when flag_plt is set. gcc/testsuite/ChangeLog: PR target/119386 * gcc.target/i386/pr119386-1.c: New test. * gcc.target/i386/pr119386-2.c: New test.
2025-04-16Doc: Add pointer to --help use to main entry for -Q option [PR90465]Sandra Loosemore1-2/+8
-Q does something completely different in conjunction with --help than it does otherwise; its main entry in the manual didn't mention that, nor did -Q have an entry in the index for the --help usage. gcc/ChangeLog PR driver/90465 * doc/invoke.texi (Overall Options): Add a @cindex for -Q in connection with --help=. (Developer Options): Point at --help= documentation for the other use of -Q.
2025-04-16Fortran: pure subroutine with pure procedure as dummy [PR106948]Harald Anlauf2-0/+56
PR fortran/106948 gcc/fortran/ChangeLog: * resolve.cc (gfc_pure_function): If a function has been resolved, but esym is not yet set, look at its attributes to see whether it is pure or elemental. gcc/testsuite/ChangeLog: * gfortran.dg/pure_formal_proc_4.f90: New test.
2025-04-16For nvptx offloading, make sure to emit C++ constructor, destructor aliases ↵Thomas Schwinge1-0/+12
[PR97106] PR target/97106 gcc/ * config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls) [ACCEL_COMPILER]: Make sure to emit C++ constructor, destructor aliases. libgomp/ * testsuite/libgomp.c++/pr96390.C: Un-XFAIL nvptx offloading. * testsuite/libgomp.c-c++-common/pr96390.c: Adjust.
2025-04-16Stream ipa_return_value_summaryJan Hubicka2-18/+131
Add streaming of return summaries from compile time to ltrans which are now needed for vrp to not ouput false errors on musttail. Co-authored-by: Jakub Jelinek <jakub@redhat.com> gcc/ChangeLog: PR tree-optimization/119614 * ipa-prop.cc (ipa_write_return_summaries): New function. (ipa_record_return_value_range_1): Break out from .... (ipa_record_return_value_range): ... here. (ipa_read_return_summaries): New function. (ipa_prop_read_section): Read return summaries. (read_ipcp_transformation_info): Read return summaries. (ipcp_write_transformation_summaries): Write return summaries; do not stream stray 0. gcc/testsuite/ChangeLog: * g++.dg/lto/pr119614_0.C: New test.
2025-04-16middle-end: force AMDGCN test for vect-early-break_18.c to consistent ↵Tamar Christina1-1/+1
architecture [PR119286] The given test is intended to test vectorization of a strided access done by having a step of > 1. GCN target doesn't support load lanes, so the testcase is expected to fail, other targets create a permuted load here which we then then reject. However some GCN arch don't seem to support the permuted loads either, so the vectorizer tries a gather/scatter. But the indices aren't supported by some target, so instead the vectorizer scalarizes the loads. I can't really test for which architecture is being used by the compiler, so instead this updates the testcase to use one single architecture so we get a consistent result. gcc/testsuite/ChangeLog: PR target/119286 * gcc.dg/vect/vect-early-break_18.c: Force -march=gfx908 for amdgcn.
2025-04-16middle-end: Fix incorrect codegen with PFA and VLS [PR119351]Tamar Christina14-3/+357
The following example: #define N 512 #define START 2 #define END 505 int x[N] __attribute__((aligned(32))); int __attribute__((noipa)) foo (void) { for (signed int i = START; i < END; ++i) { if (x[i] == 0) return i; } return -1; } generates incorrect code with fixed length SVE because for early break we need to know which value to start the scalar loop with if we take an early exit. Historically this means that we take the first element of every induction. this is because there's an assumption in place, that even with masked loops the masks come from a whilel* instruction. As such we reduce using a BIT_FIELD_REF <, 0>. When PFA was added this assumption was correct for non-masked loop, however we assumed that PFA for VLA wouldn't work for now, and disabled it using the alignment requirement checks. We also expected VLS to PFA using scalar loops. However as this PR shows, for VLS the vectorizer can, and does in some circumstances choose to peel using masks by masking the first iteration of the loop with an additional alignment mask. When this is done, the first elements of the predicate can be inactive. In this example element 1 is inactive based on the calculated misalignment. hence the -1 value in the first vector IV element. When we reduce using BIT_FIELD_REF we get the wrong value. This patch updates it by creating a new scalar PHI that keeps track of whether we are the first iteration of the loop (with the additional masking) or whether we have taken a loop iteration already. The generated sequence: pre-header: bb1: i_1 = <number of leading inactive elements> header: bb2: i_2 = PHI <i_1(bb1), 0(latch)> … early-exit: bb3: i_3 = iv_step * i_2 + PHI<vector-iv> Which eliminates the need to do an expensive mask based reduction. This fixes gromacs with one OpenMP thread. But with > 1 there is still an issue. gcc/ChangeLog: PR tree-optimization/119351 * tree-vectorizer.h (LOOP_VINFO_MASK_NITERS_PFA_OFFSET, LOOP_VINFO_NON_LINEAR_IV): New. (class _loop_vec_info): Add mask_skip_niters_pfa_offset and nonlinear_iv. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize them. (vect_analyze_scalar_cycles_1): Record non-linear inductions. (vectorizable_induction): If early break and PFA using masking create a new phi which tracks where the scalar code needs to start... (vectorizable_live_operation): ...and generate the adjustments here. (vect_use_loop_mask_for_alignment_p): Reject non-linear inductions and early break needing peeling. gcc/testsuite/ChangeLog: PR tree-optimization/119351 * gcc.target/aarch64/sve/peel_ind_10.c: New test. * gcc.target/aarch64/sve/peel_ind_10_run.c: New test. * gcc.target/aarch64/sve/peel_ind_5.c: New test. * gcc.target/aarch64/sve/peel_ind_5_run.c: New test. * gcc.target/aarch64/sve/peel_ind_6.c: New test. * gcc.target/aarch64/sve/peel_ind_6_run.c: New test. * gcc.target/aarch64/sve/peel_ind_7.c: New test. * gcc.target/aarch64/sve/peel_ind_7_run.c: New test. * gcc.target/aarch64/sve/peel_ind_8.c: New test. * gcc.target/aarch64/sve/peel_ind_8_run.c: New test. * gcc.target/aarch64/sve/peel_ind_9.c: New test. * gcc.target/aarch64/sve/peel_ind_9_run.c: New test.
2025-04-16bitintlower: Fix interaction of gimple_assign_copy_p stmts vs. ↵Jakub Jelinek2-4/+46
has_single_use [PR119808] The following testcase is miscompiled, because we emit a CLOBBER in a place where it shouldn't be emitted. Before lowering we have: b_5 = 0; b.0_6 = b_5; b.1_1 = (unsigned _BitInt(129)) b.0_6; ... <retval> = b_5; The bitint coalescing assigns the same partition/underlying variable for both b_5 and b.0_6 (possible because there is a copy assignment) and of course a different one for b.1_1 (and other SSA_NAMEs in between). This is -O0 so stmts aren't DCEd and aren't propagated that much etc. It is -O0 so we also don't try to optimize and omit some names from m_names and handle multiple stmts at once, so the expansion emits essentially bitint.4 = {}; bitint.4 = bitint.4; bitint.2 = cast of bitint.4; bitint.4 = CLOBBER; ... <retval> = bitint.4; and the CLOBBER is the problem because bitint.4 is still live afterwards. We emit the clobbers to improve code generation, but do it only for (initially) has_single_use SSA_NAMEs (remembered in m_single_use_names) being used, if they don't have the same partition on the lhs and a few other conditions. The problem above is that b.0_6 which is used in the cast has_single_use and so was in m_single_use_names bitmask and the lhs in that case is bitint.2, so a different partition. But there is gimple_assign_copy_p with SSA_NAME rhs1 and the partitioning special cases those and while b.0_6 is single use, b_5 has multiple uses. I believe this ought to be a problem solely in the case of such copy stmts and its special case by the partitioning, if instead of b.0_6 = b_5; there would be b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform changes on the value, partitioning couldn't assign the same partition to b.0_6 and b_5 if b_5 is used later, it couldn't have two different (or potentially different) values in the same bitint.N var. With copy that is possible though. So the following patch fixes it by being more careful when we set m_single_use_names, don't set it if it is a has_single_use SSA_NAME but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it is a copy stmt etc. Just to make sure it doesn't change code generation too much, I've gathered statistics how many times if (m_first && m_single_use_names && m_vars[p] != m_lhs && m_after_stmt && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op))) { tree clobber = build_clobber (TREE_TYPE (m_vars[p]), CLOBBER_STORAGE_END); g = gimple_build_assign (m_vars[p], clobber); gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt); gsi_insert_after (&gsi, g, GSI_SAME_STMT); } emits a clobber on make check-gcc GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c" and before this patch it was 41010 clobbers and after it is 40968, so difference is 42 clobbers, 0.1% fewer. 2025-04-16 Jakub Jelinek <jakub@redhat.com> PR middle-end/119808 * gimple-lower-bitint.cc (gimple_lower_bitint): Don't set m_single_use_names bits for SSA_NAMEs which have single use but their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't have a single use, or single use which is such a copy etc. * gcc.dg/bitint-121.c: New test.
2025-04-16riscv: Fix incorrect gnu property alignment on rv32Jesse Huang3-1/+15
Codegen is incorrectly emitting a ".p2align 3" that coerces the alignment of the .note.gnu.property section from 4 to 8 on rv32. 2025-04-11 Jesse Huang <jesse.huang@sifive.com> gcc/ChangeLog * config/riscv/riscv.cc (riscv_file_end): Fix .p2align value. gcc/testsuite/ChangeLog * gcc.target/riscv/gnu-property-align-rv32.c: New file. * gcc.target/riscv/gnu-property-align-rv64.c: New file.
2025-04-16RISC-V: Put jump table in text for large code modelKito Cheng2-1/+25
Large code model assume the data or rodata may put far away from text section. So we need to put jump table in text section for large code model. gcc/ChangeLog: * config/riscv/riscv.h (JUMP_TABLES_IN_TEXT_SECTION): Check if large code model. gcc/testsuite/ChangeLog: * gcc.target/riscv/jump-table-large-code-model.c: New test.
2025-04-16testsuite: Add testcase for already fixed PR [PR116093]Jakub Jelinek1-0/+20
This testcase got fixed with r15-9397 PR119722 fix. 2025-04-16 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/116093 * gcc.dg/bitint-122.c: New test.
2025-04-16AArch64: Fix operands order in vec_extract expanderTejas Belagod1-3/+3
The operand order to gen_vcond_mask call in the vec_extract pattern is wrong. Fix the order where predicate is operand 3. Tested and bootstrapped on aarch64-linux-gnu. OK for trunk? gcc/ChangeLog * config/aarch64/aarch64-sve.md (vec_extract<vpred><Vel>): Fix operand order to gen_vcond_mask_*.
2025-04-16aarch64: Disable sysreg feature gatingAlice Carlotti2-4/+13
This applies to the sysreg read/write intrinsics __arm_[wr]sr*. It does not depend on changes to Binutils, because GCC converts recognised sysreg names to an encoding based form, which is already ungated in Binutils. We have, however, agreed to make an equivalent change in Binutils (which would then disable feature gating for sysreg accesses in inline assembly), but this has not yet been posted upstream. In the future we may introduce a new flag to renable some checking, but these checks could not be comprehensive because many system registers depend on architecture features that don't have corresponding GCC/GAS --march options. This would also depend on addressing numerous inconsistencies in the existing list of sysreg feature dependencies. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_valid_sysreg_name_p): Remove feature check. (aarch64_retrieve_sysreg): Ditto. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/rwsr-ungated.c: New test.
2025-04-16Daily bump.GCC Administrator9-1/+333
2025-04-16d: Fix ICE: type variant differs by TYPE_MAX_VALUE with -g [PR119826]Iain Buclaw3-0/+42
Forward referenced enum types were never fixed up after the main ENUMERAL_TYPE was finished. All flags set are now propagated to all variants after its mode, size, and alignment has been calculated. PR d/119826 gcc/d/ChangeLog: * types.cc (TypeVisitor::visit (TypeEnum *)): Propagate flags of main enum types to all forward-referenced variants. gcc/testsuite/ChangeLog: * gdc.dg/debug/imports/pr119826b.d: New test. * gdc.dg/debug/pr119826.d: New test.
2025-04-16c++: Prune lambda captures from more places [PR119755]Nathaniel Shead3-0/+48
Currently, pruned lambda captures are still leftover in the function's BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal compilation, but does break modules streaming as we try to reconstruct a FIELD_DECL that no longer exists on the type itself. PR c++/119755 gcc/cp/ChangeLog: * lambda.cc (prune_lambda_captures): Remove pruned capture from function's BLOCK_VARS and BIND_EXPR_VARS. gcc/testsuite/ChangeLog: * g++.dg/modules/lambda-10_a.H: New test. * g++.dg/modules/lambda-10_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
2025-04-16testsuite: Fix up completion-2.c testJakub Jelinek1-0/+1
The r15-9487 change has added -flto-partition=default, which broke the completion-2.c testcase because that case is now also printed during completion. 2025-04-16 Jakub Jelinek <jakub@redhat.com> * gcc.dg/completion-2.c: Expect also -flto-partition=default line.
2025-04-15c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]Qing Zhao2-2/+30
C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold. In c_fully_fold, it assumes that operands of function calls have already been folded. However, when we build call to .ACCESS_WITH_SIZE, all its operands are not fully folded. therefore the C FE specific operator is passed to middle-end. In order to fix this issue, fully fold the parameters before building the call to .ACCESS_WITH_SIZE. PR c/119717 gcc/c/ChangeLog: * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the parameters for call to .ACCESS_WITH_SIZE. gcc/testsuite/ChangeLog: * gcc.dg/pr119717.c: New test.
2025-04-15x86: Update gcc.target/i386/apx-interrupt-1.cH.J. Lu1-1/+1
ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers pushed in red-zone. Since commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801 Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 13 12:20:42 2025 -0700 APX: Don't use red-zone with 32 GPRs and no caller-saved registers disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect 31 .cfi_restore directives. PR target/119784 * gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore directives. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-15Docs: Address -fivopts, -O0, and -Q confusion [PR71094]Sandra Loosemore1-0/+9
There's a blurb at the top of the "Optimize Options" node telling people that most optimization options are completely disabled at -O0 and a similar blurb in the entry for -Og, but nothing at the entry for -O0. Since this is a continuing point of confusion it seems wise to duplicate the information in all the places users are likely to look for it. gcc/ChangeLog PR tree-optimization/71094 * doc/invoke.texi (Optimize Options): Document that -fivopts is enabled at -O1 and higher. Add blurb about -O0 causing GCC to completely ignore most optimization options.
2025-04-15c++: constexpr, trivial, and non-alias target [PR111075]Jason Merrill1-0/+3
On Darwin and other targets with !can_alias_cdtor, we instead go to maybe_thunk_ctor, which builds a thunk function that calls the general constructor. And then cp_fold tries to constant-evaluate that call, and we ICE because we don't expect to ever be asked to constant-evaluate a call to a trivial function. No new test because this fixes g++.dg/torture/tail-padding1.C on affected targets. PR c++/111075 gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_call_expression): Allow trivial call from a thunk.
2025-04-15configure, Darwin: Recognise new naming for Xcode ld.Iain Sandoe2-6/+8
The latest editions of XCode have altered the identify reported by 'ld -v' (again). This means that GCC configure no longer detects the version. Fixed by adding the new name to the set checked. gcc/ChangeLog: * configure: Regenerate. * configure.ac: Recognise PROJECT:ld-mmmm.nn.aa as an identifier for Darwin's static linker. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-04-15includes, Darwin: Handle modular use for macOS SDKs [PR116827].Iain Sandoe1-0/+15
Recent changes to the OS SDKs have altered the way in which include guards are used for a number of headers when C++ modules are enabled. Instead of placing the guards in the included header, they are being placed in the including header. This breaks the assumptions in the current GCC stddef.h specifically, that the presence of __PTRDIFF_T and __SIZE_T means that the relevant defs are already made. However in the case of the module-enabled C++ with these SDKs, that is no longer true. stddef.h has a large body of special-cases already, but it seems that the only viable solution here is to add a new one specifically for __APPLE__ and modular code. This fixes around 280 new fails in the modules test-suite; it is needed on all open branches that support modules. PR target/116827 gcc/ChangeLog: * ginclude/stddef.h: Undefine __PTRDIFF_T and __SIZE_T for module- enabled c++ on Darwin/macOS platforms. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-04-15Regenerate common.opt.urlsKyrylo Tkachov1-0/+3
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> * common.opt.urls: Regenerate.
2025-04-15cobol/119302 - transform gcobol.3 name during install, install as gcobol-io.3Richard Biener1-3/+4
The following installs gcobol.3 as gcobol-io.3 and applies program-transform-name to the gcobol-io part. This follows naming of the pdf and the html variants. It also uses $(man1ext) and $(man3ext) consistently. PR cobol/119302 gcc/cobol/ * Make-lang.in (GCOBOLIO_INSTALL_NAME): Define. Use $(GCOBOLIO_INSTALL_NAME) for gcobol.3 manpage source upon install.
2025-04-15Set znver5 issue rate to 4.Jan Hubicka1-7/+8
this patch sets issue rate of znver5 to 4. With current model, unless a reservation is missing, we will never issue more than 4 instructions per cycle since that is the limit of decoders and the model does not take into acount the fact that typically code is run from op cache. gcc/ChangeLog: * config/i386/x86-tune-sched.cc (ix86_issue_rate): Set to 4 for znver5.
2025-04-15Set ADDSS cost to 3 for znver5Jan Hubicka1-1/+1
Znver5 has latency of addss 2 in typical case while all earlier versions has latency 3. Unforunately addss cost is used to cost many other SSE instructions than just addss and setting the cost to 2 makes us to vectorize 4 64bit stores into one 256bit store which in turn regesses imagemagick. This patch sets the cost back to 3. Next stage1 we can untie addss from the other operatoins and set it correctly. bootstrapped/regtested x86_64-linux and also benchmarked on SPEC2k17 gcc/ChangeLog: PR target/119298 * config/i386/x86-tune-costs.h (znver5_cost): Set ADDSS cost to 3.
2025-04-15RISC-V: vsetvl: elide abnormal edges from LCM computations [PR119533]Vineet Gupta3-1/+168
vsetvl phase4 uses LCM guided info to insert VSETVL insns, including a straggler loop for "mising vsetvls" on certain edges. Currently it asserts on encountering EDGE_ABNORMAL. When enabling go frontend with V enabled, libgo build hits the assert. The solution is to prevent abnormal edges from getting into LCM at all (my prior attempt at this just ignored them after LCM which is not right). Existing invalid_opt_bb_p () current does this for BB predecessors but not for successors which is what the patch adds. Crucially, the ICE/fix also depends on avoiding vsetvl hoisting past non-transparent blocks: That is taken care of by Robin's patch "RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547]" for a different yet related issue. Reported-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> PR target/119533 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (invalid_opt_bb_p): Check for EDGE_ABNOMAL. (pre_vsetvl::compute_lcm_local_properties): Initialize kill bitmap. Debug dump skipped edge. gcc/testsuite/ChangeLog: * go.dg/pr119533-riscv.go: New test. * go.dg/pr119533-riscv-2.go: New test.
2025-04-15RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547].Robin Dapp5-3/+315
When lifting up a vsetvl into a block we currently don't consider the block's transparency with respect to the vsetvl as in other parts of the pass. This patch does not perform the lift when transparency is not guaranteed. This condition is more restrictive than necessary as we can still perform a vsetvl lift if the conflicting register is only every used in vsetvls and no regular insns but given how late we are in the GCC 15 cycle it seems better to defer this. Therefore gcc.target/riscv/rvv/vsetvl/avl_single-68.c is XFAILed for now. This issue was found in OpenCV where it manifests as a runtime error. Zhijin Zeng debugged PR119547 and provided an initial patch. Reported-By: 曾治金 <zhijin.zeng@spacemit.com> PR target/119547 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): Do not perform lift if block is not transparent. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/avl_single-68.c: xfail. * g++.target/riscv/rvv/autovec/pr119547.C: New test. * g++.target/riscv/rvv/autovec/pr119547-2.C: New test. * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: Adjust.
2025-04-15Fortran/OpenMP: Support automatic mapping allocatable components (deep mapping)Tobias Burnus12-110/+1049
When mapping an allocatable variable (or derived-type component), explicitly or implicitly, all its allocated allocatable components will automatically be mapped. The patch implements the target hooks, added for this feature to omp-low.cc with commit r15-3895-ge4a58b6f28383c. Namely, there is a check whether there are allocatable components at all: gfc_omp_deep_mapping_p. Then gfc_omp_deep_mapping_cnt, counting the number of required mappings; this is a dynamic value as it depends on array bounds and whether an allocatable is allocated or not. And, finally, the actual mapping: gfc_omp_deep_mapping. Polymorphic variables are partially supported: the mapping of the _data component is fully supported, but only components of the declared type are processed for additional allocatables. Additionally, _vptr is not touched. This means that everything needing _vtab information requires unified shared memory; in particular, _size data is required when accessing elements of polymorphic arrays. However, for scalar arrays, accessing components of the declare type should work just fine. As polymorphic variables are not (really) supported and OpenMP 6 explicitly disallows them, there is now a warning (-Wopenmp) when they are encountered. Unlimited polymorphics are rejected (error). Additionally, PRIVATE and FIRSTPRIVATE are not quite supported for allocatable components, polymorphic components and as polymorphic variable. Thus, those are now rejected as well. gcc/fortran/ChangeLog: * f95-lang.cc (LANG_HOOKS_OMP_DEEP_MAPPING, LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT): Define. * openmp.cc (gfc_match_omp_clause_reduction): Fix location setting. (resolve_omp_clauses): Permit allocatable components, reject them and polymorphic variables in PRIVATE/FIRSTPRIVATE. * trans-decl.cc (add_clause): Set clause location. * trans-openmp.cc (gfc_has_alloc_comps): Add ptr_ok and shallow_alloc_only Boolean arguments. (gfc_omp_replace_alloc_by_to_mapping): New. (gfc_omp_private_outer_ref, gfc_walk_alloc_comps, gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor, gfc_omp_clause_assign_op, gfc_omp_clause_dtor): Update call to it. (gfc_omp_finish_clause): Minor cleanups, improve location data, handle allocatable components. (gfc_omp_deep_mapping_map, gfc_omp_deep_mapping_item, gfc_omp_deep_mapping_comps, gfc_omp_gen_simple_loop, gfc_omp_get_array_size, gfc_omp_elmental_loop, gfc_omp_deep_map_kind_p, gfc_omp_deep_mapping_int_p, gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_do, gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New. (gfc_trans_omp_array_section): Save array descriptor in case deep-mapping lang hook will need it. (gfc_trans_omp_clauses): Likewise; use better clause location data. * trans.h (gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): Add function prototypes. libgomp/ChangeLog: * libgomp.texi (5.0 Impl. Status): Mark mapping alloc comps as 'Y'. * testsuite/libgomp.fortran/allocatable-comp.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-3.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-4.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-5.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-6.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-7.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test. * testsuite/libgomp.fortran/map-alloc-comp-9.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/map-alloc-comp-1.f90: Remove dg-error. * gfortran.dg/gomp/polymorphic-mapping-2.f90: Update warn wording. * gfortran.dg/gomp/polymorphic-mapping.f90: Change expected diagnostic; some tests moved to ... * gfortran.dg/gomp/polymorphic-mapping-1.f90: ... here as new test. * gfortran.dg/gomp/polymorphic-mapping-3.f90: New test. * gfortran.dg/gomp/polymorphic-mapping-4.f90: New test. * gfortran.dg/gomp/polymorphic-mapping-5.f90: New test.
2025-04-15Locality cloning pass: -fipa-reorder-for-localityKyrylo Tkachov17-11/+1403
Implement partitioning and cloning in the callgraph to help locality. A new -fipa-reorder-for-locality flag is used to enable this. The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc The optimization has two components: * Partitioning the callgraph so as to group callers and callees that frequently call each other in the same partition * Cloning functions that straddle multiple callchains and allowing each clone to be local to the partition of its callchain. The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc. It creates a partitioning plan and does the prerequisite cloning. The partitioning is then implemented during the existing LTO partitioning pass. To guide these locality heuristics we use PGO data. In the absence of PGO data we use a static heuristic that uses the accumulated estimated edge frequencies of the callees for each function to guide the reordering. We are investigating some more elaborate static heuristics, in particular using the demangled C++ names to group template instantiatios together. This is promising but we are working out some kinks in the implementation currently and want to send that out as a follow-up once we're more confident in it. A new bootstrap-lto-locality bootstrap config is added that allows us to test this on GCC itself with either static or PGO heuristics. GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap). As this new pass enables a new partitioning scheme it is incompatible with explicit -flto-partition= options so an error is introduced when the user uses both flags explicitly. With this optimization we are seeing good performance gains on some large internal workloads that stress the parts of the processor that is sensitive to code locality, but we'd appreciate wider performance evaluation. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Prachi Godbole <pgodbole@nvidia.com> Co-authored-by: Kyrylo Tkachov <ktkachov@nvidia.com> config/ChangeLog: * bootstrap-lto-locality.mk: New file. gcc/ChangeLog: * Makefile.in (OBJS): Add ipa-locality-cloning.o. * cgraph.h (set_new_clone_decl_and_node_flags): Declare prototype. * cgraphclones.cc (set_new_clone_decl_and_node_flags): Remove static qualifier. * common.opt (fipa-reorder-for-locality): New flag. (LTO_PARTITION_DEFAULT): Declare. (flto-partition): Change default to LTO_PARTITION_DFEAULT. * doc/invoke.texi: Document -fipa-reorder-for-locality. * flag-types.h (enum lto_locality_cloning_model): Declare. (lto_partitioning_model): Add LTO_PARTITION_DEFAULT. * lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping of node and index. * opts.cc (validate_ipa_reorder_locality_lto_partition): Define. (finish_options): Handle LTO_PARTITION_DEFAULT. * params.opt (lto_locality_cloning_model): New enum. (lto-partition-locality-cloning): New param. (lto-partition-locality-frequency-cutoff): Likewise. (lto-partition-locality-size-cutoff): Likewise. (lto-max-locality-partition): Likewise. * passes.def: Register pass_ipa_locality_cloning. * timevar.def (TV_IPA_LC): New timevar. * tree-pass.h (make_pass_ipa_locality_cloning): Declare. * ipa-locality-cloning.cc: New file. * ipa-locality-cloning.h: New file. gcc/lto/ChangeLog: * lto-partition.cc (add_node_references_to_partition): Define. (create_partition): Likewise. (lto_locality_map): Likewise. (lto_promote_cross_file_statics): Add extra dumping. * lto-partition.h (lto_locality_map): Declare prototype. * lto.cc (do_whole_program_analysis): Handle flag_ipa_reorder_for_locality.
2025-04-15ipa-bit-cp: Fix adjusting value according to mask (PR119803)Martin Jambor2-3/+19
In my fix for PR 119318 I put mask calculation in ipcp_bits_lattice::meet_with_1 above a final fix to value so that all the bits in the value which are meaningless according to mask have value zero, which has tripped a validator in PR 119803. This patch fixes that by moving the adjustment down. Even thought the fix for PR 119318 did a similar thing in ipcp_bits_lattice::meet_with, the same is not necessary because that code path then feeds the new value and mask to ipcp_bits_lattice::set_to_constant which does the final adjustment correctly. In both places, however, Jakup proposed a better way of calculating cap_mask and so I have changed it accordingly. gcc/ChangeLog: 2025-04-15 Martin Jambor <mjambor@suse.cz> PR ipa/119803 * ipa-cp.cc (ipcp_bits_lattice::meet_with_1): Move m_value adjustmed according to m_mask below the adjustment of the latter according to cap_mask. Optimize the calculation of cap_mask a bit. (ipcp_bits_lattice::meet_with): Optimize the calculation of cap_mask a bit. gcc/testsuite/ChangeLog: 2025-04-15 Martin Jambor <mjambor@suse.cz> PR ipa/119803 * gcc.dg/ipa/pr119803.c: New test. Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2025-04-15d: Fix internal compiler error: in visit, at d/decl.cc:838 [PR119799]Iain Buclaw3-5/+13
This was caused by a check in the D front-end disallowing static VAR_DECLs with a size `0'. While empty structs in D are give the size `1', the same symbol coming from ImportC modules do infact have no size, so allow C variables to pass the check as well as array objects. PR d/119799 gcc/d/ChangeLog: * decl.cc (DeclVisitor::visit (VarDeclaration *)): Check front-end type size before building the VAR_DECL. Allow C symbols to have a size of `0'. gcc/testsuite/ChangeLog: * gdc.dg/import-c/pr119799.d: New test. * gdc.dg/import-c/pr119799c.c: New test.
2025-04-15c++: prev declared hidden tmpl friend inst, cont [PR119807]Patrick Palka3-0/+71
When remapping existing specializations of a hidden template friend from a previous declaration to the new definition, we must remap only those specializations that match this new definition, but currently we remap all specializations (since they all appear in the same DECL_TEMPLATE_INSTANTIATIONS list of the most general template). Concretely, in the first testcase below, we form two specializations of the friend A::f, one with arguments {{0},{bool}} and another with arguments {{1},{bool}}. Later when instantiating B, we need to remap these specializations. During the B<0> instantiation we only want to remap the first specialization, and during the B<1> instantiation only the second specialization, but currently we remap both specializations twice. tsubst_friend_function needs to determine if an existing specialization matches the shape of the new definition, which is tricky in general, e.g. if the outer template parameters may not match up. Fortunately we don't have to reinvent the wheel here since is_specialization_of_friend seems to do exactly what we need. We can check this unconditionally, but I think it's only necessary when dealing with specializations formed from a class template scope previous declaration, hence the TMPL_ARGS_HAVE_MULTIPLE_LEVELS check. PR c++/119807 PR c++/112288 gcc/cp/ChangeLog: * pt.cc (tsubst_friend_function): Skip remapping an existing specialization if it doesn't match the shape of the new friend definition. gcc/testsuite/ChangeLog: * g++.dg/template/friend86.C: New test. * g++.dg/template/friend87.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>