aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2025-03-31c++: lambda in function template signature [PR119401]Jason Merrill3-0/+39
Here we instantiate the lambda three times in producing A<0>::f: 1) in tsubst_function_type, substituting the type of A<>::f 2) in tsubst_function_decl, substituting the parameters of A<>::f 3) in regenerate_decl_from_template when instantiating A<>::f The first one gets thrown away by maybe_rebuild_function_decl_type. Before r15-7202, we happily built all of them and mangled the result wrongly as lambda #3. After r15-7202, we try to mangle #3 as #1, which breaks because #1 is already mangled as #1. This patch avoids building #3 by suppressing regenerate_decl_from_template if the template signature includes a lambda, fixing the ICE. We now mangle the lambda as #2, which is still wrong. Addressing that should involve not calling tsubst_function_type from tsubst_function_decl, and building the type from the parms types in the first place rather than fixing it up in maybe_rebuild_function_decl_type. PR c++/119401 gcc/cp/ChangeLog: * pt.cc (regenerate_decl_from_template): Don't regenerate if the signature involves a lambda. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-targ11.C: New test.
2025-03-31tree-optimization/119532 - ICE with fixed-point tail recursionRichard Biener2-0/+18
The following disables tail recursion optimization when fixed-point types are involved as we cannot generate -1 for all fixed-point types. PR tree-optimization/119532 * tree-tailcall.cc (process_assignment): FAIL for fixed-point typed functions. * gcc.dg/torture/pr119532.c: New testcase.
2025-03-31arm: testsuite: fix vect-fmaxmin.c testRichard Earnshaw2-9/+15
This is another case of a test that was both an executable test requiring specific hardware and an assembler scan test. The requirement for the hardware was masking some useful testing that could be done (by scanning the assembly output) on almost all test runs. Fixed in a similar manner to fmaxmin{,-2}.c by splitting the test into two, one that scans the assembler output and one that executes the compiled code if suitable hardware is available. The masked issue was that this test was expecting vectorization to occur that was incorrect given the options passed. For correct vectorization we need -funsafe-math-optimizations as the vector version of the single-precision operation will apply a truncation of denormal values. gcc/testsuite/ChangeLog: * gcc.target/arm/vect-fmaxmin-2.c: New compile test. Split from ... * gcc.target/arm/vect-fmaxmin.c: ... here. Remove scan-assembler subtests. For both, add -funsafe-math-optimizations.
2025-03-31OpenMP: modify_call_for_omp_dispatch - fix invalid memory access after ↵Tobias Burnus1-1/+2
'error' [PR119541] OpenMP requires that the number of dispatch 'interop' clauses (ninterop) is less or equal to the number of declare variant 'append_args' interop objects (nappend). While 'nappend < ninterop' was diagnosed as error, the processing continues, which lead to an invalid out-of-bounds memory access. Solution: only process the first nappend 'interop' clauses. gcc/ChangeLog: PR middle-end/119541 * gimplify.cc (modify_call_for_omp_dispatch): Limit interop claues processing by the number of append_args arguments.
2025-03-31PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modesKyrylo Tkachov2-3/+23
In this testcase GCC tries to expand a VNx4BI vector: vector(4) <signed-boolean:4> _40; _39 = (<signed-boolean:4>) _24; _40 = {_39, _39, _39, _39}; This ends up in a scalarised sequence of bitfield insert operations. This is despite the fact that AArch64 provides a vec_duplicate pattern specifically for vec_duplicate into VNx4BI. The store_constructor code is overly conservative when trying vec_duplicate as it sees a requested VNx4BImode and an element mode of QImode, which I guess is the storage mode of BImode objects. The vec_duplicate expander in aarch64-sve.md explicitly allows QImode element modes so it should be safe to use it. This patch extends that mode check to allow such expanders. The testcase is heavily auto-reduced from a real application but in itself is nonsensical, but it does demonstrate the current problematic codegen. This the testcase goes from: pfalse p15.b str p15, [sp, #6, mul vl] mov w0, 0 ldr w2, [sp, 12] bfi w2, w0, 0, 4 uxtw x2, w2 bfi w2, w0, 4, 4 uxtw x2, w2 bfi w2, w0, 8, 4 uxtw x2, w2 bfi w2, w0, 12, 4 str w2, [sp, 12] ldr p15, [sp, #6, mul vl] into: whilelo p15.s, wzr, wzr The whilelo could be optimised away into a pfalse of course, but the important part is that the bfis are gones. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ PR middle-end/119442 * expr.cc (store_constructor): Also allow element modes explicitly accepted by target vec_duplicate pattern. gcc/testsuite/ PR middle-end/119442 * gcc.target/aarch64/vls_sve_vec_dup_1.c: New test.
2025-03-31target/119010 - add mode attribute to *vmovv16si_constm1_pternlog_false_depRichard Biener1-1/+5
Like the other instances. This avoids ;; 1--> b 0: i6540 {xmm2=const_vector;unspec[xmm2] 38;} :nothing PR target/119010 * config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): Add mode attribute.
2025-03-31target/119010 - Zen4/Zen5 reservations for movlhps loadsRichard Biener1-2/+2
The following fixes up the ssemov2 type introduction, amending the znver4_sse_mov_fp_load reservation. This fixes ;; 14--> b 0: i1436 xmm6=vec_concat(xmm6,[ax+0x8]) :nothing PR target/119010 * config/i386/zn4zn5.md (znver4_sse_mov_fp_load, znver5_sse_mov_fp_load): Also match ssemov2.
2025-03-31target/119010 - reservations for Zen4/Zen5 movhlps to memoryRichard Biener1-0/+14
The following adds missing reservations for the store variant of sselog reservations covering ;; 112--> b 0: i1499 [dx-0x10]=vec_select(xmm10,parallel) :nothing PR target/119010 * config/i386/zn4zn5.md (znver4_sse_log_evex_store, znver5_sse_log_evex_store): New reservations.
2025-03-31target/119010 - fixup Zen4/Zen5 fp<->int convert reservationsRichard Biener1-3/+10
They were using ssecvt instead of sseicvt, I've also added handling for sseicvt2 which was introduced without fixing up automata, and the relevant instruction uses DFmode. IMO this is a quite messy area that could need TLC in the machine description itself. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_icvt): Use sseicvt. (znver4_sse_icvt_store): Likewise. (znver5_sse_icvt_store): Likewise. (znver4_sse_icvt2): New.
2025-03-31target/119010 - handle DFmode in SSE divide reservations for Zen4/Zen5Richard Biener1-3/+3
Like the other DFmode cases. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_div_pd, znver4_sse_div_pd_load, znver5_sse_div_pd_load): Handle DFmode.
2025-03-31target/119010 - add reservations for integer vector compares to zen4/zen5Richard Biener1-6/+6
The following handles TI, OI and XI mode in the respective EVEX compare reservations that do not use memory (I've not yet run into ones with). The znver automata has separate reservations for integer compares (but only for zen1, for zen2 and zen3 there are no compare reservations at all), but I don't see why that should be necessary here. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_cmp_avx128, znver5_sse_cmp_avx128): Handle TImode. (znver4_sse_cmp_avx256, znver5_sse_cmp_avx256): Handle OImode. (znver4_sse_cmp_avx512, znver5_sse_cmp_avx512): Handle XImode.
2025-03-31target/119010 - missing reservations for Zen4/5 and SSE comparesRichard Biener1-3/+2
There's the znver4_sse_test reservation which matches the memory-less SSE compares but currently requires prefix_extra == 1. The old znver automata in this case sometimes uses znver1-double instead of znver1-direct, but it's quite a maze. The following simply drops the prefix_extra requirement, but I have no idea what I'm doing here. There doesn't seem to be any documentation on the scheduler relevant attributes used, or at least I cannot find that. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_test): Drop test of prefix_extra attribute.
2025-03-31target/119010 - fixup zn4zn5 reservation for move from const_vectorRichard Biener1-1/+8
movv8si_internal uses sselog1 and V4SFmode for an instruction like (insn 363 2437 371 97 (set (reg:V8SI 46 xmm10 [1125]) (const_vector:V8SI [ (const_int 0 [0]) repeated x8 ])) "ComputeNonbondedUtil.C":185:21 2402 {movv8si_internal} this wasn't catched by the existing znver4_sse_log1 reservation, I think the znver automaton catches this with the generic (define_insn_reservation "znver1_sse_log1" 1 (and (eq_attr "cpu" "znver1,znver2,znver3") (and (eq_attr "type" "sselog1") (eq_attr "memory" "none"))) "znver1-direct,znver1-fp1|znver1-fp2") which does not look at the mode at all. The zn4zn5 automaton lacks this and instead has separated store and load-store reservations in odd ways. The following renames the store one and introduces a none variant. PR target/119010 * config/i386/zn4zn5.md (znver4_sse_log1): Rename to znver4_sse_log1_store. (znver5_sse_log1): Rename to znver5_sse_log1_store. (znver4_sse_log1): New memory-less variant.
2025-03-31c++: Honor noipa attribute for FE nothrow discovery [PR119518]Jakub Jelinek2-1/+22
The following testcase has different code generation in bar depending on whether foo is defined or just declared. That is undesirable when it has noipa attribute, that attribute is documented to be a black box between caller and callee, so the caller shouldn't know about any implicitly determined properties of the callee and callee shouldn't know about its callers. E.g. the ipa-pure-const passes including nothrow discovery in there all honor noipa attribute, but the FE did not. 2025-03-31 Jakub Jelinek <jakub@redhat.com> PR c++/119518 * decl.cc (finish_function): Don't set TREE_NOTHROW for functions with "noipa" attribute even when we can prove they can't throw. * g++.dg/opt/pr119518.C: New test.
2025-03-31Daily bump.GCC Administrator6-1/+201
2025-03-30Docs: make regenerate-opt-urlsSandra Loosemore3-4/+4
gcc/c-family/ChangeLog * c.opt.urls: Regenerate. gcc/d/ChangeLog * lang.opt.urls: Regenerate. gcc/m2/ChangeLog * lang.opt.urls: Regenerate.
2025-03-30Optimize string constructorJan Hubicka2-0/+23
this patch improves code generation on string constructors. We currently have _M_construct which takes as a parameter two iterators (begin/end pointers to other string) and produces new string. This patch adds special case of constructor where instead of begining/end pointers we readily know the string size and also special case when we know that source is 0 terminated. This happens commonly when producing stirng copies. Moreover currently ipa-prop is not able to propagate information that beg-end is known constant (copied string size) which makes it impossible for inliner to spot the common case where string size is known to be shorter than 15 bytes and fits in local buffer. Finally I made new constructor inline. Because it is explicitely instantiated without C++20 constexpr we do not produce implicit instantiation (as required by standard) which prevents inlining, ipa-modref and any other IPA analysis to happen. I think we need to make many of the other functions inline, since optimization accross string manipulation is quite important. There is PR94960 to track this issue. Bootstrapped/regtested x86_64-linux, OK? libstdc++-v3/ChangeLog: PR tree-optimization/103827 PR tree-optimization/80331 PR tree-optimization/87502 * config/abi/pre/gnu.ver: Add version for _M_construct<bool> * include/bits/basic_string.h: (basic_string::_M_construct<bool>): Declare. (basic_string constructors): Use it. * include/bits/basic_string.tcc: (basic_string::_M_construct<bool>): New template. * src/c++11/string-inst.cc: Instantated S::_M_construct<bool>. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/pr80331.C: New test. * g++.dg/tree-ssa/pr87502.C: New test.
2025-03-30Doc: Clean up New/Delete Builtins manual sectionSandra Loosemore1-12/+29
I noticed that the "New/Delete Builtins" section failed to explicitly name or describe the arguments of the builtin functions it purported to document, outside of using them in an example. I've fixed that and cleaned up the whole section. gcc/ChangeLog * doc/extend.texi (New/Delete Builtins): Cleanup up the text and explicitly list the builtins being documented.
2025-03-30Doc: Move Integer Overflow Builtins section [PR42270]Sandra Loosemore1-150/+153
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. gcc/ChangeLog PR other/42270 * doc/extend.texi (Numeric Builtins): Move Integer Overflow Builtins section here, as a subsection.
2025-03-30Doc: Organize atomic memory builtins documentation [PR42270]Sandra Loosemore1-175/+196
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. This installment adds a container section to hold documentation for both the _atomic and _sync builtins, reordering them so that the new _atomic interface is presented before the legacy _sync one. I also incorporated material from the separate x86 transactional memory section directly into the __atomic builtins documentation instead of retaining that as a parallel section. gcc/ChangeLog PR other/42270 * doc/extend.texi (Atomic Memory Access): New section. (__sync Builtins): Make it a subsection of the above. (Atomic Memory Access): Likewise. (x86 specific memory model extensions for transactional memory): Delete this section, incorporating the text into the discussion of __atomic builtins.
2025-03-30Doc: Break up and rearrange the "other builtins" section [PR42270]Sandra Loosemore2-1285/+1333
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. The "Other Builtins" section had become a catch-all for all sorts of things with very little organization or attempt to differentiate between important information (e.g., GCC treats a gazillion library functions as builtins by default) from obscure builtins provided primarily as internal interfaces. I've split it up into various pieces and attempted to move the more important or useful-to-users documentation earlier in the chapter. What's left of the section is still a jumbled mess... but at least it's a smaller jumbled mess. gcc/ChangeLog PR other/42270 * doc/extend.texi (Built-in Functions): Incorporate some text formerly in "Other Builtins" into the introduction. Adjust menu for new sections. (Library Builtins): New section, split from "Other Builtins". (Numeric Builtins): Likewise. (Stack Allocation): Likewise. (Constructing Calls): Move __builtin_call_with_static_chain here. (Object Size Checking): Minor copy-editing. (Other Builtins): Move text to new sections listed above. Delete duplicate docs for object-size checking builtins. * doc/invoke.texi (C dialect options): Update @xref for -fno-builtin.
2025-03-30Doc: Move builtin documentation to a new chapter [PR42270]Sandra Loosemore2-15/+37
This is part of an incremental effort to make the documentation for GCC extensions better organized by grouping/rearranging sections by topic. I was originally intending to consolidate all the sections documenting builtins as subsections of a new container section within the C extensions chapter, but I ran into a technical limitation of Texinfo: it only supports sectioning depth up to @subsubsection, and we already had quite a few of those in the target-specific builtins sections. So instead I have pulled all the existing sections out into a new chapter. This actually makes sense since some of the builtins are specific to C++ anyway and are not C language extensions at all. Subsequent patches in this series will move things around within the new chapter; this one just adds the new container node and adjusts the menus. gcc/ChangeLog PR other/42270 * doc/extend.texi (C Extensions): Move menu items for builtin-related sections to... (Built-in Functions): New chapter. * doc/gcc.texi (Introduction): Add menu entry for new chapter.
2025-03-30Doc: Add a container section to consolidate attribute documentation [PR42270]Sandra Loosemore1-73/+101
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. Note that this patch does not address the restructuring/rewrite suggested by PR88472 or PR102397, beyond adding a very short introduction to the new container section that is more explicit about both syntaxes being accepted as a GNU extension. gcc/ChangeLog PR other/42270 * doc/extend.texi (Attributes): New section. (Function Attributes): Make it a subsection of the new section. (Variable Attributes): Likewise. (Type Attributes): Likewise. (Label Attributes): Likewise. (Enumerator Attributes): Likewise. (Attribute Syntax): Likewise.
2025-03-30Doc: Remove separate "Target Format Checks" section [PR42270]Sandra Loosemore1-56/+39
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. Following the last round of patches, there's a leftover section "Target Format Checks" that didn't fit into any category. It seems best to merge this material into the main discussion of the "format" attribute, in particular because that discussion already contains similar discussion for mingw/Windows targets. gcc/ChangeLog PR other/42270 * doc/extend.texi (Function Attributes): Merge text from "Target Format Checks" into the main discussion of the format and format_arg attributes. (Target Format Checks): Delete section.
2025-03-30testsuite: Fix up atomic-inst-ldlogic.cJakub Jelinek1-35/+35
r15-8956 changed in the test: -/* { dg-final { scan-assembler-times "ldclr\t" 16} */ +/* { dg-final { scan-assembler-times "ldclr\t" 16 } */ which made it even worse than before, when the directive has been silently ignored because it didn't match the regex for directives. Now it matches it but is unbalanced. The following patch fixes it and adds space after all the other scan-assembler-times counts in the file. 2025-03-30 Jakub Jelinek <jakub@redhat.com> * gcc.target/aarch64/atomic-inst-ldlogic.c: Fix another unbalanced {} directive problem. Add space after all scan-assembler-times counts.
2025-03-30aarch64: Changed CRC test.Mariam Arutunian1-2/+2
Fixed the iteration number in crc-crc32c-data16.c test from 8 to 16 to match the test name. gcc/testsuite * gcc.target/aarch64/crc-crc32c-data16.c: Fix iteration count to match testname.
2025-03-30Alpha: Add option to avoid data races for partial writes [PR117759]Maciej W. Rozycki24-51/+837
Similarly to data races with 8-bit byte or 16-bit word quantity memory writes on non-BWX Alpha implementations we have the same problem even on BWX implementations with partial memory writes produced for unaligned stores as well as block memory move and clear operations. This happens at the boundaries of the area written where we produce unprotected RMW sequences, such as for example: ldbu $1,0($3) stw $31,8($3) stq $1,0($3) to zero a 9-byte member at the byte offset of 1 of a quadword-aligned struct, happily clobbering a 1-byte member at the beginning of said struct if concurrent write happens while executing on the same CPU such as in a signal handler or a parallel write happens while executing on another CPU such as in another thread or via a shared memory segment. To guard against these data races with partial memory write accesses introduce the `-msafe-partial' command-line option that instructs the compiler to protect boundaries of the data quantity accessed by instead using a longer code sequence composed of narrower memory writes where suitable machine instructions are available (i.e. with BWX targets) or atomic RMW access sequences where byte and word memory access machine instructions are not available (i.e. with non-BWX targets). Owing to the desire of branch avoidance there are redundant overlapping writes in unaligned cases where STQ_U operations are used in the middle of a block so as to make sure no part of data to be written has been lost regardless of run-time alignment. For the non-BWX case it means that with blocks whose size is not a multiple of 8 there are additional atomic RMW sequences issued towards the end of the block in addition to the always required pair enclosing the block from each end. Only one such additional atomic RMW sequence is actually required, but code currently issues two for the sake of simplicity. An improvement might be added to `alpha_expand_unaligned_store_words_safe_partial' in the future, by folding `alpha_expand_unaligned_store_safe_partial' code for handling multi-word blocks whose size is not a multiple of 8 (i.e. with a trailing partial-word part). It would improve performance a bit, but current code is correct regardless. Update test cases with `-mno-safe-partial' where required and add new ones accordingly. In some cases GCC chooses to open-code block memory write operations, so with non-BWX targets `-msafe-partial' will in the usual case have to be used together with `-msafe-bwa'. Credit to Magnus Lindholm <linmag7@gmail.com> for sharing hardware for the purpose of verifying the BWX side of this change. gcc/ PR target/117759 * config/alpha/alpha-protos.h (alpha_expand_unaligned_store_safe_partial): New prototype. * config/alpha/alpha.cc (alpha_expand_movmisalign) (alpha_expand_block_move, alpha_expand_block_clear): Handle TARGET_SAFE_PARTIAL. (alpha_expand_unaligned_store_safe_partial) (alpha_expand_unaligned_store_words_safe_partial) (alpha_expand_clear_safe_partial_nobwx): New functions. * config/alpha/alpha.md (insvmisaligndi): Handle TARGET_SAFE_PARTIAL. * config/alpha/alpha.opt (msafe-partial): New option. * config/alpha/alpha.opt.urls: Regenerate. * doc/invoke.texi (Option Summary, DEC Alpha Options): Document the new option. gcc/testsuite/ PR target/117759 * gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add `-mno-safe-partial'. * gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c: New file. * gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stlx0-safe-partial.c: New file. * gcc.target/alpha/stlx0-safe-partial-bwx.c: New file. * gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stqx0-safe-partial.c: New file. * gcc.target/alpha/stqx0-safe-partial-bwx.c: New file. * gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'. Refer to stwx0.c rather than copying its code and also verify no LDQ_U or STQ_U instructions have been produced. * gcc.target/alpha/stwx0-safe-partial.c: New file. * gcc.target/alpha/stwx0-safe-partial-bwx.c: New file.
2025-03-30Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]Maciej W. Rozycki19-6/+558
With non-BWX Alpha implementations we have a problem of data races where a 8-bit byte or 16-bit word quantity is to be written to memory in that in those cases we use an unprotected RMW access of a 32-bit longword or 64-bit quadword width. If contents of the longword or quadword accessed outside the byte or word to be written are changed midway through by a concurrent write executing on the same CPU such as by a signal handler or a parallel write executing on another CPU such as by another thread or via a shared memory segment, then the concluding write of the RMW access will clobber them. This is especially important for the safety of RCU algorithms, but is otherwise an issue anyway. To guard against these data races with byte and aligned word quantities introduce the `-msafe-bwa' command-line option (standing for Safe Byte & Word Access) that instructs the compiler to instead use an atomic RMW access sequence where byte and word memory access machine instructions are not available. There is no change to code produced for BWX targets. It would be sufficient for the secondary reload handle to use a pair of scratch registers, as requested by `reload_out<mode>', but it would end with poor code produced as one of the scratches would be occupied by data retrieved and the other one would have to be reloaded with repeated calculations, all within the LL/SC sequence. Therefore I chose to add a dedicated `reload_out<mode>_safe_bwa' handler and ask for more scratches there by defining a 256-bit OI integer mode. While reload is documented in our manual to support an arbitrary number of scratches in reality it hasn't been implemented for IRA: /* ??? It would be useful to be able to handle only two, or more than three, operands, but for now we can only handle the case of having exactly three: output, input and one temp/scratch. */ and it seems to be the case for LRA as well. Do what everyone else does then and just have one wide multi-register scratch. I note that the atomic sequences emitted are suboptimal performance-wise as the looping branch for the unsuccessful completion of the sequence points backwards, which means it will be predicted as taken despite that in most cases it will fall through. I do not see it as a deficiency of this change proposed as it takes care of recording that the branch is unlikely to be taken, by calling `alpha_emit_unlikely_jump'. Therefore generic code elsewhere should instead be investigated and adjusted accordingly for the arrangement to actually take effect. Add test cases accordingly. There are notable regressions between a plain `-mno-bwx' configuration and a `-mno-bwx -msafe-bwa' one: FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O0 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O1 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O3 -g execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -Os execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: g++.dg/init/array25.C -std=c++17 execution test FAIL: g++.dg/init/array25.C -std=c++98 execution test FAIL: g++.dg/init/array25.C -std=c++26 execution test They come from the fact that these test cases play tricks with alignment and end up calling code that expects a reference to aligned data but is handed one to unaligned data. This doesn't cause a visible problem with plain `-mno-bwx' code, because the resulting alignment exception is fixed up by Linux. There's no such handling currently implemented for LDL_L or LDQ_L instructions (which are first in the sequence) and consequently the offender is issued with SIGBUS instead. Suitable handling will be added to Linux to complement this change that will emulate the trapping instructions[1], so these interim regressions are seen as harmless and expected. References: [1] "Alpha: Emulate unaligned LDx_L/STx_C for data consistency", <https://lore.kernel.org/r/alpine.DEB.2.21.2502181912230.65342@angie.orcam.me.uk/> gcc/ PR target/117759 * config/alpha/alpha-modes.def (OI): New integer mode. * config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New prototype. * config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New function. (alpha_secondary_reload): Handle TARGET_SAFE_BWA. * config/alpha/alpha.md (aligned_store_safe_bwa) (unaligned_store<mode>_safe_bwa, reload_out<mode>_safe_bwa) (reload_out<mode>_unaligned_safe_bwa): New expanders. (mov<mode>, movcqi, reload_out<mode>_aligned): Handle TARGET_SAFE_BWA. (reload_out<mode>): Guard against TARGET_SAFE_BWA. * config/alpha/alpha.opt (msafe-bwa): New option. * config/alpha/alpha.opt.urls: Regenerate. * doc/invoke.texi (Option Summary, DEC Alpha Options): Document the new option. gcc/testsuite/ PR target/117759 * gcc.target/alpha/stb.c: New file. * gcc.target/alpha/stb-bwa.c: New file. * gcc.target/alpha/stb-bwx.c: New file. * gcc.target/alpha/stba.c: New file. * gcc.target/alpha/stba-bwa.c: New file. * gcc.target/alpha/stba-bwx.c: New file. * gcc.target/alpha/stw.c: New file. * gcc.target/alpha/stw-bwa.c: New file. * gcc.target/alpha/stw-bwx.c: New file. * gcc.target/alpha/stwa.c: New file. * gcc.target/alpha/stwa-bwa.c: New file. * gcc.target/alpha/stwa-bwx.c: New file.
2025-03-30IRA+LRA: Let the backend request to split basic blocksMaciej W. Rozycki3-4/+11
The next change for Alpha will produce extra labels and branches in reload, which in turn requires basic blocks to be split at completion. We do this already for functions that can trap, so just extend the arrangement with a flag for the backend to use whenever it finds it necessary. gcc/ * function.h (struct function): Add `split_basic_blocks_after_reload' member. * lra.cc (lra): Handle it. * reload1.cc (reload): Likewise.
2025-03-30Alpha: Export `emit_unlikely_jump' for a subsequent change to useMaciej W. Rozycki2-9/+11
Rename `emit_unlikely_jump' function to `alpha_emit_unlikely_jump', so as to avoid namespace pollution, updating callers accordingly and export it for use in the machine description. Make it return the insn emitted. gcc/ * config/alpha/alpha-protos.h (alpha_emit_unlikely_jump): New prototype. * config/alpha/alpha.cc (emit_unlikely_jump): Rename to... (alpha_emit_unlikely_jump): ... this. Return the insn emitted. (alpha_split_atomic_op, alpha_split_compare_and_swap) (alpha_split_compare_and_swap_12, alpha_split_atomic_exchange) (alpha_split_atomic_exchange_12): Update call sites accordingly.
2025-03-30gcc/testsuite/g++.dg/gomp/append-args-8.C: Fix scan-dump-treeTobias Burnus1-1/+1
gcc/testsuite/ChangeLog: * g++.dg/gomp/append-args-8.C: Remove bogus '3' after \.\[0-9\]+ pattern.
2025-03-30gcc/mingw: Align `.refptr.` to 8-byte boundaries for 64-bit targetsLIU Hao1-0/+1
Windows only requires sections to be aligned on a 4-byte boundary. This used to work because in binutils the `.rdata` section is over-aligned to a 16-byte boundary, which will be fixed in the future. This matches the output of Clang. Signed-off-by: LIU Hao <lh_mouse@126.com> Signed-off-by: Jonathan Yong <10walls@gmail.com> gcc/ChangeLog: * config/mingw/winnt.cc (mingw_pe_file_end): Add `.p2align`.
2025-03-30Daily bump.GCC Administrator5-1/+78
2025-03-29testsuite: arm: fixup more dg-final syntaxSam James1-5/+5
... as Richard E mentioned on the ML. Followup to r15-8956-ge90d6c2639c392. gcc/testsuite/ChangeLog: * gcc.target/arm/short-vfp-1.c: Add whitespace around brace.
2025-03-29c++/modules: unexported friend templateJason Merrill4-11/+40
Here we were failing to match the injected friend declaration to the definition because the latter isn't exported. But the friend is attached to the module, so we need to look for any reachable declaration in that module, not just the exports. The duplicate_decls change is to avoid clobbering DECL_MODULE_IMPORT_P on the imported definition; matching an injected friend doesn't change that it's imported. I considered checking get_originating_module == 0 or !decl_defined_p instead of DECL_UNIQUE_FRIEND_P there, but I think this situation is specific to friends. I removed an assert because we have a test for the same condition a few lines above. gcc/cp/ChangeLog: * decl.cc (duplicate_decls): Don't clobber DECL_MODULE_IMPORT_P with an injected friend. * name-lookup.cc (check_module_override): Look at all reachable decls in decl's originating module. gcc/testsuite/ChangeLog: * g++.dg/modules/friend-9_a.C: New test. * g++.dg/modules/friend-9_b.C: New test.
2025-03-29libiberty, gcc: Add memrchr to libiberty and use it [PR119283].Iain Sandoe3-2/+8
This adds an implementation of memrchr to libiberty and arranges to configure gcc to use it, if the host does not have it. PR cobol/119283 gcc/ChangeLog: * config.in: Regenerate. * configure: Regenerate. * configure.ac: Check for host memrchr. include/ChangeLog: * libiberty.h (memrchr): New. libiberty/ChangeLog: * Makefile.in: Add memrchr build rules. * config.in: Regenerate. * configure: Regenerate. * configure.ac: Check for memrchr. * functions.texi: Document memrchr. * memrchr.c: New file. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-03-29jit, Darwin: Update exports with ABI 28 throught 34.Iain Sandoe1-0/+21
Synchronise the darwin export list with the current map. gcc/jit/ChangeLog: * libgccjit.exports: Add symbols for ABI 28 to 34. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-03-29c++: optimize push_to_top_level [PR64500]Jason Merrill1-8/+12
Profiling showed that the loop to save away IDENTIFIER_BINDINGs from open binding levels was taking 5% of total compilation time in the PR116285 testcase. This turned out to be because we were unnecessarily trying to do this for namespaces, whose bindings are found through DECL_NAMESPACE_BINDINGS, not IDENTIFIER_BINDING. As a result we would frequently loop through everything in std::, checking whether it needs to be stored, and never storing anything. This change actually appears to speed up compilation for the PR116285 testcase by ~20%. The replaced comments referred either to long-replaced handling of classes and templates, or to wanting b to point to :: when the loop exits. PR c++/64500 PR c++/116285 gcc/cp/ChangeLog: * name-lookup.cc (push_to_top_level): Don't try to store_bindings for namespace levels.
2025-03-29c++: Fix comment typoJakub Jelinek1-2/+2
Found a typo in a comment. 2025-03-29 Jakub Jelinek <jakub@redhat.com> * name-lookup.cc (maybe_lazily_declare): Fix comment typo, anout -> about.
2025-03-29c++/modules: Fix modules and LTO with header units [PR118961]Nathaniel Shead12-9/+104
This patch makes some adjustments required to get a simple modules testcase working with LTO. There are two main issues fixed. Firstly, modules only streams the maybe-in-charge constructor, and any clones are recreated on stream-in. These clones are copied from the existing function decl and then adjusted. This caused issues because the clones were getting incorrectly marked as abstract, since after clones have been created (in the imported file) the maybe-in-charge decl gets marked as abstract. So this patch just ensures that clones are always created as non-abstract. The second issue is that we need to explicitly tell cgraph that explicit instantiations need to be emitted, otherwise LTO will elide them (as they don't necessarily appear to be used directly) and cause link errors. Additionally, expand_or_defer_fn doesn't setup comdat groups for explicit instantiations, so we need to do that here as well. Currently this is all handled in 'mark_decl_instantiated'; this patch splits out the linkage handling into a separate function that we can call from modules code, maybe in GCC16 we could move this somewhere more central. PR c++/118961 gcc/cp/ChangeLog: * class.cc (copy_fndecl_with_name): Mark clones as non-abstract. * cp-tree.h (setup_explicit_instantiation_definition_linkage): Declare new function. * module.cc (trees_in::read_var_def): Use it. (module_state::read_cluster): Likewise. * pt.cc (setup_explicit_instantiation_definition_linkage): New function. (mark_decl_instantiated): Use it. gcc/testsuite/ChangeLog: * g++.dg/modules/lto-1.h: New test. * g++.dg/modules/lto-1_a.H: New test. * g++.dg/modules/lto-1_b.C: New test. * g++.dg/modules/lto-1_c.C: New test. * g++.dg/modules/lto-2_a.H: New test. * g++.dg/modules/lto-2_b.C: New test. * g++.dg/modules/lto-3_a.H: New test. * g++.dg/modules/lto-3_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
2025-03-29LoongArch: doc: Add same-address constraint to the description of '-mld-seq-sa'.Lulu Cheng1-2/+2
gcc/ChangeLog: * doc/invoke.texi: Modify the description of '-mld-seq-sa'.
2025-03-29LoongArch: Set default alignment for functions jumps loops and labels.Lulu Cheng3-3/+13
Based on r15-7624, a set of align combinations with better performance was tested through spec2006. LA464: -falign-loops=8 -falign-functions=32 -falign-jumps=32 -falign-labels=8 LA664: -falign-loops=16 -falign-functions=16 -falign-jumps=32 -falign-labels=8 gcc/ChangeLog: * config/loongarch/loongarch-def.cc (la464_align): Add settings for labels. (la664_align): Likewise. * config/loongarch/loongarch-opts.cc (loongarch_target_option_override): Likewise. * config/loongarch/loongarch-tune.h (struct loongarch_align): Implement the function `label_`.
2025-03-29Daily bump.GCC Administrator7-1/+181
2025-03-29testsuite: Fix up musttail2.C testJakub Jelinek1-1/+6
On Wed, Mar 26, 2025 at 10:10:07AM -0700, Andi Kleen wrote: > I think this needs to be target external_tailcall, otherwise you will > fail on targets that don't support that. > > Or alternatively make this not extern. You're right (although I don't remember which targets are non-external_musttail). Here is a patch to define the function. 2025-03-28 Jakub Jelinek <jakub@redhat.com> * g++.dg/opt/musttail2.C (foo): Define the function instead of just declaring it, add [[gnu::noipa]] attribute to it.
2025-03-29cobol: Fix up cobol/{charmaps,valconv}.cc rulesJakub Jelinek1-21/+4
sed -i is not portable, it is supported by GNU sed and perhaps some BSDs, but not elsewhere. Furthermore, I think it is far better to always use #include "../../libgcobol/something.h" paths rather than something depending on the build directory. And because we require GNU make, we don't have to have two different rules for those, can use just one for both. The l variable in there is just to make it fit into 80 columns. 2025-03-28 Jakub Jelinek <jakub@redhat.com> * Make-lang.in (cobol/charmaps.cc, cobol/valconv.cc): Used sed -e instead of cp and multiple sed -i commands. Always prefix libgcobol header names in #include directives with ../../libgcobol/ rather than something depending on $(LIB_SOURCE).
2025-03-28Fortran: fix spelling of flag -fallow-invalid-bozHarald Anlauf1-1/+1
gcc/fortran/ChangeLog: * check.cc (gfc_invalid_boz): Correct spelling of compiler flag in hint to -fallow-invalid-boz.
2025-03-28testsuite: Don't cycle through option list for gfortran.dg and ↵Jakub Jelinek2-4/+8
libgomp.fortran dg-do run tests with -O in dg*options Here is a new version of the patch. The current behavior in gfortran.dg/ and libgomp.fortran/libgomp.oacc-fortran is that tests without any dg-do directive are implicitly dg-do compile and tests with dg-do compile or without dg-do don't cycle through options (-O is implicitly added but can be overridden), while test with dg-do run cycle through the optimization options. The following patch modifies this, so that even tests with dg-do run with -O in dg-options or dg-additional-options (after [ \t"{]) don't cycle either and also get implicit -O which is overridden by that -O{,0,1,2,3,s,z,g,fast} in dg-{,additional-}options. Previously we were mostly wasting test time on those, because e.g. -O0 -O2 -O1 -O2 -O2 -O2 -Os -O2 are still effectively -O2 and so the same thing, while -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -O2 and -O3 -g -O2 are not the same thing (effectively -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -O2 and -g -O2) I think it isn't worth to test those combinations (especially when with e.g. -O0 in dg-options it mostly doesn't do much). Tested with make check-gfortran where this results in slight decrease of tests: # of expected passes 73809 # of expected failures 343 # of unsupported tests 78 with unmodified trunk vs. # of expected passes 72734 # of expected failures 343 # of unsupported tests 73 with the patch, and on the libgomp side # of expected passes 11162 # of expected failures 238 # of unsupported tests 274 to # of expected passes 11092 # of expected failures 238 # of unsupported tests 274 (when counting just fortran.exp tests). Before the patch I see grep -- '-O[^ ].*-O' testsuite/gfortran/gfortran.log | grep -v '/vect/\|/graphite/' | wc -l 1008 and with the patch grep -- '-O[^ ].*-O' testsuite/gfortran/gfortran.log | grep -v '/vect/\|/graphite/' | wc -l 0 (vect and graphite have a few occurrences, but not too much). 2025-03-28 Jakub Jelinek <jakub@redhat.com> * lib/gfortran-dg.exp: Don't cycle through the option list if dg-options or dg-additional-options contains -O after space, tab, double quote or open curly bracket. * gfortran.dg/cray_pointers_2.f90: Remove extraneous space between dg-do and run and remove comment about it.
2025-03-28Regenerate common.opt.urlsJakub Jelinek1-0/+9
The r15-8947 commit has not regenerate-opt-urls. 2025-03-28 Jakub Jelinek <jakub@redhat.com> * common.opt.urls: Regenerate.
2025-03-28cobol: Confine all __int128/_Float128 references to libgcobol.Bob Dubner7-41/+12
These changes are part of the effort to make possible cross compilation for hosts that don't support __int128 or _Float128. gcc/cobol * Make-lang.in: Eliminate libgcobol.h from gcc/cobol files. * genapi.cc: Eliminate "#include libgcobol.h". (parser_display_internal): Change comment. * genmath.cc: Eliminate "#include libgcobol.h". * genutil.cc: Likewise. (get_power_of_ten): Change comment. * structs.cc: Eliminate cblc_int128_type_node. * structs.h: Likewise. * symbols.h: Receive comment from libgcobol.h libgcobol * charmaps.cc:Eliminate "#include libgcobol.h". Change comment about _Float128. * common-defs.h: Change comment about _Float128. Receive #defines from libgcobol.h. * constants.cc: Eliminate #include libgcobol.h. Eliminate other unneeded #includes. * ec.h: Receive declarations from libgcobol.h. * gcobolio.h: Likewise. * gfileio.cc: (__gg__file_init): Use file_flag_none_e instead of zero in assignment. (__gg__file_reopen): Likewise. (__io__file_open): Likewise. * gfileio.h: Receive declarations from libgcobol.h. * libgcobol.h: Numerous declarations moved elsewhere.
2025-03-28PR modula2/119504: ICE when attempting to access an element of a constant stringGaius Mulley4-10/+120
This patch prevents an ICE and generates an error if an array access to a constant string is attempted. The patch also allows HIGH ("string"). gcc/m2/ChangeLog: PR modula2/119504 * gm2-compiler/M2Quads.mod (BuildHighFunction): Defend against Type = NulSym and fall into BuildConstHighFromSym. (BuildDesignatorArray): Rewrite to detect an array access to a constant string. (BuildDesignatorArrayStaticDynamic): New procedure. gcc/testsuite/ChangeLog: PR modula2/119504 * gm2/iso/fail/conststrarray2.mod: New test. * gm2/iso/run/pass/constarray2.mod: New test. * gm2/pim/pass/hexstring.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>