aboutsummaryrefslogtreecommitdiff
path: root/gcc/doc
AgeCommit message (Collapse)AuthorFilesLines
2025-04-02Doc: Improve wording of -Werror documentation [PR58973]Sandra Loosemore1-2/+2
gcc/ChangeLog PR driver/58973 * common.opt (Werror, Werror=): Use less awkward wording in description. (pedantic-errors): Likewise. * doc/invoke.texi (Warning Options): Likewise for -Werror and -Werror= here. Co-Authored-By: GUO Yixuan <culu.gyx@gmail.com>
2025-04-02Doc: #pragma pack documentation cleanup [PR114957] [PR78008] [PR60972]Sandra Loosemore1-29/+58
This patch addresses a number of issues with the documentation of - None of the things in this section had @cindex entries [PR114957]. - The document formatting didn't match that of other #pragma documentation sections. - The effect of #pragma pack(0) wasn't documented [PR78008]. - There's a long-standing bug [PR60972] reporting that #pragma pack and the __attribute__(packed) don't get along well. It seems worthwhile to warn users about that since elsewhere pragmas are cross-referenced with related or equivalent attributes. gcc/ChangeLog PR c/114957 PR c/78008 PR c++/60972 * doc/extend.texi (Structure-Layout Pragmas): Add @cindex entries and reformat the pragma descriptions to match the markup used for other pragmas. Document what #pragma pack(0) does. Add cross-references to similar attributes.
2025-04-02doc: Extend musttail attribute docsJakub Jelinek1-0/+25
On Wed, Apr 02, 2025 at 10:32:20AM +0200, Richard Biener wrote: > I wonder if we can amend the documentation to suggest to end lifetime > of variables explicitly by proper scoping? In the -Wmaybe-musttail-local-addr attribute description I've already tried to show that in the example, but if you think something like the following would make it clearer. 2025-04-02 Jakub Jelinek <jakub@redhat.com> * doc/extend.texi (musttail statement attribute): Hint how to avoid -Wmaybe-musttail-local-addr warnings.
2025-04-02tailc: Don't fail musttail calls if they use or could use local arguments, ↵Jakub Jelinek2-5/+96
instead warn [PR119376] As discussed here and in bugzilla, [[clang::musttail]] attribute in clang not just strongly asks for tail call or error, but changes behavior. To quote: https://clang.llvm.org/docs/AttributeReference.html#musttail "The lifetimes of all local variables and function parameters end immediately before the call to the function. This means that it is undefined behaviour to pass a pointer or reference to a local variable to the called function, which is not the case without the attribute. Clang will emit a warning in common cases where this happens." The GCC behavior was just to error if we can't prove the musttail callee could not have dereferenced escaped pointers to local vars or parameters of the caller. That is still the case for variables with non-trivial destruction (even in clang), like vars with C++ non-trivial destructors or variables with cleanup attribute. The following patch changes the behavior to match that of clang, for all of [[clang::musttail]], [[gnu::musttail]] and __attribute__((musttail)). clang 20 actually added warning for some cases of it in https://github.com/llvm/llvm-project/pull/109255 but it is under -Wreturn-stack-address warning. Now, gcc doesn't have that warning, but -Wreturn-local-addr instead, and IMHO it is better to have this under new warnings, because this isn't about returning local address, but about passing it to a musttail call, or maybe escaping to a musttail call. And perhaps users will appreciate they can control it separately as well. The patch introduces 2 new warnings. -Wmusttail-local-addr which is turn on by default and warns for the always dumb cases of passing an address of a local variable or parameter to musttail call's argument. And then -Wmaybe-musttail-local-addr which is only diagnosed if -Wmusttail-local-addr was not diagnosed and diagnoses at most one (so that we don't emit 100s of warnings for one call if 100s of vars can escape) case where an address of a local var could have escaped to the musttail call. This is less severe, the code doesn't have to be obviously wrong, so the warning is only enabled in -Wextra. And I've adjusted also the documentation for this change and addition of new warnings. 2025-04-02 Jakub Jelinek <jakub@redhat.com> PR ipa/119376 * common.opt (Wmusttail-local-addr, Wmaybe-musttail-local-addr): New. * tree-tailcall.cc (suitable_for_tail_call_opt_p): Don't fail for TREE_ADDRESSABLE PARM_DECLs for musttail calls if diag_musttail. Emit -Wmusttail-local-addr warnings. (maybe_error_musttail): Use gimple_location instead of directly accessing location member. (find_tail_calls): For musttail calls if diag_musttail, don't fail if address of local could escape to the call, instead emit -Wmaybe-musttail-local-addr warnings. Emit -Wmaybe-musttail-local-addr warnings also for address taken parameters. * common.opt.urls: Regenerate. * doc/extend.texi (musttail statement attribute): Clarify local variables without non-trivial destruction are considered out of scope before the tail call instruction. * doc/invoke.texi (-Wno-musttail-local-addr, -Wmaybe-musttail-local-addr): Document. * c-c++-common/musttail8.c: Expect a warning rather than error in one case. (f4): Add int * argument. * c-c++-common/musttail15.c: Don't disallow for C++98. * c-c++-common/musttail16.c: Likewise. * c-c++-common/musttail17.c: Likewise. * c-c++-common/musttail18.c: Likewise. * c-c++-common/musttail19.c: Likewise. Expect a warning rather than error in one case. (f4): Add int * argument. * c-c++-common/musttail20.c: Don't disallow for C++98. * c-c++-common/musttail21.c: Likewise. * c-c++-common/musttail28.c: New test. * c-c++-common/musttail29.c: New test. * c-c++-common/musttail30.c: New test. * c-c++-common/musttail31.c: New test. * g++.dg/ext/musttail1.C: New test. * g++.dg/ext/musttail2.C: New test. * g++.dg/ext/musttail3.C: New test.
2025-04-02Doc: Cross-reference constructor and init_priority attributes [PR118982]Sandra Loosemore1-19/+32
Per the issue, the discussion of these two attributes needed to be better integrated. I also did some editing for style and readability, and clarified that almost all targets support this feature (it is enabled by default unless the back end disables it), not just "some". Co-Authored_by: Jonathan Wakely <jwakely@redhat.com> gcc/ChangeLog PR c++/118982 * doc/extend.texi (Common Function Attributes): For the constructor/destructory attribute, be more explicit about the relationship between the constructor attribute and the C++ init_priority attribute, and add a cross-reference. Also document that most targets support this. (C++ Attributes): Similarly for the init_priority attribute.
2025-04-01Doc: Document _Bool type as C90 extension [PR118118]Sandra Loosemore1-0/+12
gcc/ChangeLog PR c/118118 * doc/extend.texi (Boolean Type): New section.
2025-04-01Doc: Document enum with underlying type extension [PR117689]Sandra Loosemore1-12/+50
This is a C23/C++11 feature that is supported as an extension with earlier -std= options too, but was never previously documented. It interacts with the already-documented forward enum definition extension, so I have merged discussion of the two extensions into the same section. gcc/ChangeLog PR c/117689 * doc/extend.texi (Incomplete Enums): Rename to.... (Enum Extensions): This. Document support for specifying the underlying type of an enum as an extension in all earlier C and C++ standards. Document that a forward declaration with underlying type is not an incomplete type, and which dialects GCC supports that in.
2025-04-01Doc: -Wzero-as-null-pointer-constant is also available for C [PR119173]Martin Uecker1-6/+7
The warning -Wzero-as-null-pointer-constant is now not only supported in C++ but also in C. Change the documentation accordingly. PR c/119173 gcc/ChangeLog: * doc/invoke.texi (Warning Options): Move to general options.
2025-04-01LoongArch: doc: Put the '-mtls-dialect=opt' option description in the ↵Lulu Cheng1-8/+8
correct position. gcc/ChangeLog: * doc/invoke.texi: Corrected the position of '-mtls-dialect=opt' option.
2025-03-31aarch64: Remove +sme -> +sve2 feature flag dependencyAndre Simoes Dias Vieira1-1/+2
As per the AArch64 ISA FEAT_SME does not require FEAT_SVE2. However, we don't support SME without SVE2 and bail out with a 'sorry' if this configuration is encountered. We may choose to support this in the future. gcc/ChangeLog: * config/aarch64/aarch64-option-extensions.def (SME): Remove SVE2 as prerequisite and add in FCMA and F16FML. * config/aarch64/aarch64.cc (aarch64_override_options_internal): Diagnose use of SME without SVE2 and implicitly enable SVE2 when enabling SME after streaming mode diagnosis. * doc/invoke.texi (sme): Document that this can only be used with the sve2 extension. gcc/testsuite/ChangeLog: * gcc.target/aarch64/no-sve-with-sme-1.c: New. * gcc.target/aarch64/no-sve-with-sme-2.c: New. * gcc.target/aarch64/no-sve-with-sme-3.c: New. * gcc.target/aarch64/no-sve-with-sme-4.c: New. * gcc.target/aarch64/pragma_cpp_predefs_4.c: Pass +sve2 to existing +sme pragma. * gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_single_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_int_opt_single_1.c: * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_3.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_4.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_3.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binary_za_slice_uint_opt_single_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/compare_scalar_count_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/dot_za_slice_int_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_2.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/dot_za_slice_uint_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/storexn_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unary_za_slice_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/write_za_1.c: Likewise. * gcc.target/aarch64/sve/acle/general-c/write_za_slice_1.c: Likewise.
2025-03-30Doc: Clean up New/Delete Builtins manual sectionSandra Loosemore1-12/+29
I noticed that the "New/Delete Builtins" section failed to explicitly name or describe the arguments of the builtin functions it purported to document, outside of using them in an example. I've fixed that and cleaned up the whole section. gcc/ChangeLog * doc/extend.texi (New/Delete Builtins): Cleanup up the text and explicitly list the builtins being documented.
2025-03-30Doc: Move Integer Overflow Builtins section [PR42270]Sandra Loosemore1-150/+153
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. gcc/ChangeLog PR other/42270 * doc/extend.texi (Numeric Builtins): Move Integer Overflow Builtins section here, as a subsection.
2025-03-30Doc: Organize atomic memory builtins documentation [PR42270]Sandra Loosemore1-175/+196
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. This installment adds a container section to hold documentation for both the _atomic and _sync builtins, reordering them so that the new _atomic interface is presented before the legacy _sync one. I also incorporated material from the separate x86 transactional memory section directly into the __atomic builtins documentation instead of retaining that as a parallel section. gcc/ChangeLog PR other/42270 * doc/extend.texi (Atomic Memory Access): New section. (__sync Builtins): Make it a subsection of the above. (Atomic Memory Access): Likewise. (x86 specific memory model extensions for transactional memory): Delete this section, incorporating the text into the discussion of __atomic builtins.
2025-03-30Doc: Break up and rearrange the "other builtins" section [PR42270]Sandra Loosemore2-1285/+1333
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. The "Other Builtins" section had become a catch-all for all sorts of things with very little organization or attempt to differentiate between important information (e.g., GCC treats a gazillion library functions as builtins by default) from obscure builtins provided primarily as internal interfaces. I've split it up into various pieces and attempted to move the more important or useful-to-users documentation earlier in the chapter. What's left of the section is still a jumbled mess... but at least it's a smaller jumbled mess. gcc/ChangeLog PR other/42270 * doc/extend.texi (Built-in Functions): Incorporate some text formerly in "Other Builtins" into the introduction. Adjust menu for new sections. (Library Builtins): New section, split from "Other Builtins". (Numeric Builtins): Likewise. (Stack Allocation): Likewise. (Constructing Calls): Move __builtin_call_with_static_chain here. (Object Size Checking): Minor copy-editing. (Other Builtins): Move text to new sections listed above. Delete duplicate docs for object-size checking builtins. * doc/invoke.texi (C dialect options): Update @xref for -fno-builtin.
2025-03-30Doc: Move builtin documentation to a new chapter [PR42270]Sandra Loosemore2-15/+37
This is part of an incremental effort to make the documentation for GCC extensions better organized by grouping/rearranging sections by topic. I was originally intending to consolidate all the sections documenting builtins as subsections of a new container section within the C extensions chapter, but I ran into a technical limitation of Texinfo: it only supports sectioning depth up to @subsubsection, and we already had quite a few of those in the target-specific builtins sections. So instead I have pulled all the existing sections out into a new chapter. This actually makes sense since some of the builtins are specific to C++ anyway and are not C language extensions at all. Subsequent patches in this series will move things around within the new chapter; this one just adds the new container node and adjusts the menus. gcc/ChangeLog PR other/42270 * doc/extend.texi (C Extensions): Move menu items for builtin-related sections to... (Built-in Functions): New chapter. * doc/gcc.texi (Introduction): Add menu entry for new chapter.
2025-03-30Doc: Add a container section to consolidate attribute documentation [PR42270]Sandra Loosemore1-73/+101
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. Note that this patch does not address the restructuring/rewrite suggested by PR88472 or PR102397, beyond adding a very short introduction to the new container section that is more explicit about both syntaxes being accepted as a GNU extension. gcc/ChangeLog PR other/42270 * doc/extend.texi (Attributes): New section. (Function Attributes): Make it a subsection of the new section. (Variable Attributes): Likewise. (Type Attributes): Likewise. (Label Attributes): Likewise. (Enumerator Attributes): Likewise. (Attribute Syntax): Likewise.
2025-03-30Doc: Remove separate "Target Format Checks" section [PR42270]Sandra Loosemore1-56/+39
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. Following the last round of patches, there's a leftover section "Target Format Checks" that didn't fit into any category. It seems best to merge this material into the main discussion of the "format" attribute, in particular because that discussion already contains similar discussion for mingw/Windows targets. gcc/ChangeLog PR other/42270 * doc/extend.texi (Function Attributes): Merge text from "Target Format Checks" into the main discussion of the format and format_arg attributes. (Target Format Checks): Delete section.
2025-03-30Alpha: Add option to avoid data races for partial writes [PR117759]Maciej W. Rozycki1-1/+11
Similarly to data races with 8-bit byte or 16-bit word quantity memory writes on non-BWX Alpha implementations we have the same problem even on BWX implementations with partial memory writes produced for unaligned stores as well as block memory move and clear operations. This happens at the boundaries of the area written where we produce unprotected RMW sequences, such as for example: ldbu $1,0($3) stw $31,8($3) stq $1,0($3) to zero a 9-byte member at the byte offset of 1 of a quadword-aligned struct, happily clobbering a 1-byte member at the beginning of said struct if concurrent write happens while executing on the same CPU such as in a signal handler or a parallel write happens while executing on another CPU such as in another thread or via a shared memory segment. To guard against these data races with partial memory write accesses introduce the `-msafe-partial' command-line option that instructs the compiler to protect boundaries of the data quantity accessed by instead using a longer code sequence composed of narrower memory writes where suitable machine instructions are available (i.e. with BWX targets) or atomic RMW access sequences where byte and word memory access machine instructions are not available (i.e. with non-BWX targets). Owing to the desire of branch avoidance there are redundant overlapping writes in unaligned cases where STQ_U operations are used in the middle of a block so as to make sure no part of data to be written has been lost regardless of run-time alignment. For the non-BWX case it means that with blocks whose size is not a multiple of 8 there are additional atomic RMW sequences issued towards the end of the block in addition to the always required pair enclosing the block from each end. Only one such additional atomic RMW sequence is actually required, but code currently issues two for the sake of simplicity. An improvement might be added to `alpha_expand_unaligned_store_words_safe_partial' in the future, by folding `alpha_expand_unaligned_store_safe_partial' code for handling multi-word blocks whose size is not a multiple of 8 (i.e. with a trailing partial-word part). It would improve performance a bit, but current code is correct regardless. Update test cases with `-mno-safe-partial' where required and add new ones accordingly. In some cases GCC chooses to open-code block memory write operations, so with non-BWX targets `-msafe-partial' will in the usual case have to be used together with `-msafe-bwa'. Credit to Magnus Lindholm <linmag7@gmail.com> for sharing hardware for the purpose of verifying the BWX side of this change. gcc/ PR target/117759 * config/alpha/alpha-protos.h (alpha_expand_unaligned_store_safe_partial): New prototype. * config/alpha/alpha.cc (alpha_expand_movmisalign) (alpha_expand_block_move, alpha_expand_block_clear): Handle TARGET_SAFE_PARTIAL. (alpha_expand_unaligned_store_safe_partial) (alpha_expand_unaligned_store_words_safe_partial) (alpha_expand_clear_safe_partial_nobwx): New functions. * config/alpha/alpha.md (insvmisaligndi): Handle TARGET_SAFE_PARTIAL. * config/alpha/alpha.opt (msafe-partial): New option. * config/alpha/alpha.opt.urls: Regenerate. * doc/invoke.texi (Option Summary, DEC Alpha Options): Document the new option. gcc/testsuite/ PR target/117759 * gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add `-mno-safe-partial'. * gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New file. * gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New file. * gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c: New file. * gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stlx0-safe-partial.c: New file. * gcc.target/alpha/stlx0-safe-partial-bwx.c: New file. * gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stqx0-safe-partial.c: New file. * gcc.target/alpha/stqx0-safe-partial-bwx.c: New file. * gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'. * gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'. Refer to stwx0.c rather than copying its code and also verify no LDQ_U or STQ_U instructions have been produced. * gcc.target/alpha/stwx0-safe-partial.c: New file. * gcc.target/alpha/stwx0-safe-partial-bwx.c: New file.
2025-03-30Alpha: Add option to avoid data races for sub-longword memory stores [PR117759]Maciej W. Rozycki1-0/+9
With non-BWX Alpha implementations we have a problem of data races where a 8-bit byte or 16-bit word quantity is to be written to memory in that in those cases we use an unprotected RMW access of a 32-bit longword or 64-bit quadword width. If contents of the longword or quadword accessed outside the byte or word to be written are changed midway through by a concurrent write executing on the same CPU such as by a signal handler or a parallel write executing on another CPU such as by another thread or via a shared memory segment, then the concluding write of the RMW access will clobber them. This is especially important for the safety of RCU algorithms, but is otherwise an issue anyway. To guard against these data races with byte and aligned word quantities introduce the `-msafe-bwa' command-line option (standing for Safe Byte & Word Access) that instructs the compiler to instead use an atomic RMW access sequence where byte and word memory access machine instructions are not available. There is no change to code produced for BWX targets. It would be sufficient for the secondary reload handle to use a pair of scratch registers, as requested by `reload_out<mode>', but it would end with poor code produced as one of the scratches would be occupied by data retrieved and the other one would have to be reloaded with repeated calculations, all within the LL/SC sequence. Therefore I chose to add a dedicated `reload_out<mode>_safe_bwa' handler and ask for more scratches there by defining a 256-bit OI integer mode. While reload is documented in our manual to support an arbitrary number of scratches in reality it hasn't been implemented for IRA: /* ??? It would be useful to be able to handle only two, or more than three, operands, but for now we can only handle the case of having exactly three: output, input and one temp/scratch. */ and it seems to be the case for LRA as well. Do what everyone else does then and just have one wide multi-register scratch. I note that the atomic sequences emitted are suboptimal performance-wise as the looping branch for the unsuccessful completion of the sequence points backwards, which means it will be predicted as taken despite that in most cases it will fall through. I do not see it as a deficiency of this change proposed as it takes care of recording that the branch is unlikely to be taken, by calling `alpha_emit_unlikely_jump'. Therefore generic code elsewhere should instead be investigated and adjusted accordingly for the arrangement to actually take effect. Add test cases accordingly. There are notable regressions between a plain `-mno-bwx' configuration and a `-mno-bwx -msafe-bwa' one: FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O0 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O1 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O3 -g execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -Os execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: g++.dg/init/array25.C -std=c++17 execution test FAIL: g++.dg/init/array25.C -std=c++98 execution test FAIL: g++.dg/init/array25.C -std=c++26 execution test They come from the fact that these test cases play tricks with alignment and end up calling code that expects a reference to aligned data but is handed one to unaligned data. This doesn't cause a visible problem with plain `-mno-bwx' code, because the resulting alignment exception is fixed up by Linux. There's no such handling currently implemented for LDL_L or LDQ_L instructions (which are first in the sequence) and consequently the offender is issued with SIGBUS instead. Suitable handling will be added to Linux to complement this change that will emulate the trapping instructions[1], so these interim regressions are seen as harmless and expected. References: [1] "Alpha: Emulate unaligned LDx_L/STx_C for data consistency", <https://lore.kernel.org/r/alpine.DEB.2.21.2502181912230.65342@angie.orcam.me.uk/> gcc/ PR target/117759 * config/alpha/alpha-modes.def (OI): New integer mode. * config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New prototype. * config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New function. (alpha_secondary_reload): Handle TARGET_SAFE_BWA. * config/alpha/alpha.md (aligned_store_safe_bwa) (unaligned_store<mode>_safe_bwa, reload_out<mode>_safe_bwa) (reload_out<mode>_unaligned_safe_bwa): New expanders. (mov<mode>, movcqi, reload_out<mode>_aligned): Handle TARGET_SAFE_BWA. (reload_out<mode>): Guard against TARGET_SAFE_BWA. * config/alpha/alpha.opt (msafe-bwa): New option. * config/alpha/alpha.opt.urls: Regenerate. * doc/invoke.texi (Option Summary, DEC Alpha Options): Document the new option. gcc/testsuite/ PR target/117759 * gcc.target/alpha/stb.c: New file. * gcc.target/alpha/stb-bwa.c: New file. * gcc.target/alpha/stb-bwx.c: New file. * gcc.target/alpha/stba.c: New file. * gcc.target/alpha/stba-bwa.c: New file. * gcc.target/alpha/stba-bwx.c: New file. * gcc.target/alpha/stw.c: New file. * gcc.target/alpha/stw-bwa.c: New file. * gcc.target/alpha/stw-bwx.c: New file. * gcc.target/alpha/stwa.c: New file. * gcc.target/alpha/stwa-bwa.c: New file. * gcc.target/alpha/stwa-bwx.c: New file.
2025-03-29LoongArch: doc: Add same-address constraint to the description of '-mld-seq-sa'.Lulu Cheng1-2/+2
gcc/ChangeLog: * doc/invoke.texi: Modify the description of '-mld-seq-sa'.
2025-03-26Add prime path coverage to gcc/gcovJørgen Kvalsvik2-0/+225
This patch adds prime path coverage to gcc/gcov. First, a quick introduction to path coverage, before I explain a bit on the pieces of the patch. PRIME PATHS Path coverage is recording the paths taken through the program. Here is a simple example: if (cond1) BB 1 then1 () BB 2 else else1 () BB 3 if (cond2) BB 4 then2 () BB 5 else else2 () BB 6 _ BB 7 To cover all paths you must run {then1 then2}, {then1 else2}, {else1 then1}, {else1 else2}. This is in contrast with line/statement coverage where it is sufficient to execute then2, and it does not matter if it was reached through then1 or else1. 1 2 4 5 7 1 2 4 6 7 1 3 4 5 7 1 3 4 6 7 This gets more complicated with loops, because 0, 1, 2, ..., N iterations are all different paths. There are different ways of addressing this, a promising one being prime paths. A prime path is a maximal simple path (a path with no repeated vertices) or simple cycle (no repeated vertices except for the first/last) Prime paths strike a decent balance between number of tests, path growth, and loop coverage, requiring loops to be both taken and skipped. Of course, the number of paths still grows very fast with program complexity - for example, this program has 14 prime paths: while (a) { if (b) return; while (c--) a++; } -- ALGORITHM Since the numbers of paths grows so fast, we need a good algorithm. The naive approach of generating all paths and discarding redundancies (see reference_prime_paths in the diff) simply doesn't complete for even pretty simple functions with a few ten thousand paths (granted, the implementation is also poor, but only serves as a reference). Fazli & Afsharchi in their paper "Time and Space-Efficient Compositional Method for Prime and Test Paths Generation" describe a neat algorithm which drastically improves on for most programs, and brings complexity down to something managable. This patch implements that algorithm with a few minor tweaks. The algorithm first finds the strongly connected components (SCC) of the graph and creates a new graph where the vertices are the SCCs of the CFG. Within these vertices different paths are found - regular prime paths, paths that start in the SCCs entries, and paths that end in the SCCs exits. These per-SCC paths are combined with paths through the CFG which greatly reduces of paths needed to be evaluated just to be thrown away. Using this algorithm we can find the prime paths for somewhat complicated functions in a reasonable time. Please note that some programs don't benefit from this at all. We need to find the prime paths within a SCC, so if a single SCC is very large the function degenerates to the naive implementation. This can probably be much improved on, but is an exercise for later. -- OVERALL ARCHITECTURE Like the other coverages in gcc, this operates on the CFG in the profiling phase, just after branch and condition coverage, in phases: 1. All prime paths are generated, counted, and enumerated from the CFG 2. The paths are evaluted and counter instructions and accumulators are emitted 3. gcov reads the CFG and computes the prime paths (same as step 1) 4. gcov prints a report Simply writing out all the paths in the .gcno file is not really viable, the files would be too big. Additionally, there are limits to the practicality of measuring (and reporting) on millions of paths, so for most programs where coverage is feasible, computing paths should be plenty fast. As a result, path coverage really only adds 1 bit to the counter, rounded up to nearest 64 ("bucket"), so 64 paths takes up 8 bytes, 65 paths take up 16 bytes. Recording paths is really just massaging large bitsets. Per function, ceil(paths/64 or 32) buckets (gcov_type) are allocated. Paths are sorted, so the first path maps to the lowest bit, the second path to the second lowest bit, and so on. On taking an edge and entering a basic block, a few bitmasks are applied to unset the bits corresponding to the paths outside the block and set the bits of the paths that start in that block. Finally, the right buckets are masked and written to the global accumulators for the paths that end in the block. Full coverage is achieved when all bits are set. gcc does not really inform gcov of abnormal paths, so paths with abnormal edges are ignored. This probably possible, but requires some changes to the graph gcc writes to the .gcno file. -- IMPLEMENTATION In order to remove non-prime paths (subpaths) we use a suffix tree. Fazli & Afsharchi do not discuss how duplicates or subpaths are removed, and using the suffix works really well -- insertion time is a function of the length of the (longest) paths, not the number of paths. The paths are usually quite short, but there are many of them. The same prime_paths function is used both in gcc and in gcov. As for speed, I would say that it is acceptable. Path coverage is a problem that is exponential in its very nature, so if you enable this feature you can reasonably expect it to take a while. To combat the effects of path explosion there is a limit at which point gcc will give up and refuse to instrument a function, set with -fpath-coverage-limit. Since estimating the number of prime paths is pretty much is counting them, gcc maintains a pessimistic running count which slightly overestimates the number of paths found so far. When that count exceeds the limit, the function aborts and gcc prints a warning. This is a blunt instrument meant to not get stuck on the occasional large function, not fine-grained control over when to skip instrumentation. My main benchmark has been tree-2.1.3/tree.c which generates approx 2M paths across the 20 functions or so in it. Most functions have less than 1500 paths, and 2 around a million each. Finding the paths takes 3.5-4s, but the instrumentation phase takes approx. 2.5 minutes and generates a 32M binary. Not bad for a 1429 line source file. There are some selftests which deconstruct the algorithm, so it can be easily referenced with the Fazli & Afsharchi. I hope that including them both help to catch regression, clarify the assumptions, and help understanding the algorithm by breaking up the phases. DEMO This is the denser line-aware (grep-friendlier) output. Every missing path is summarized as the lines you need to run in what order, annotated with the true/false/throw decision. $ gcc -fpath-coverage --coverage bs.c -c -o bs $ gcov -e --prime-paths-lines bs.o bs.gcda:cannot open data file, assuming not executed -: 0:Source:bs.c -: 0:Graph:bs.gcno -: 0:Data:- -: 0:Runs:0 paths covered 0 of 17 path 0 not covered: lines 6 6(true) 11(true) 12 path 1 not covered: lines 6 6(true) 11(false) 13(true) 14 path 2 not covered: lines 6 6(true) 11(false) 13(false) 16 path 3 not covered: lines 6 6(false) 18 path 4 not covered: lines 11(true) 12 6(true) 11 path 5 not covered: lines 11(true) 12 6(false) 18 path 6 not covered: lines 11(false) 13(true) 14 6(true) 11 path 7 not covered: lines 11(false) 13(true) 14 6(false) 18 path 8 not covered: lines 12 6(true) 11(true) 12 path 9 not covered: lines 12 6(true) 11(false) 13(true) 14 path 10 not covered: lines 12 6(true) 11(false) 13(false) 16 path 11 not covered: lines 13(true) 14 6(true) 11(true) 12 path 12 not covered: lines 13(true) 14 6(true) 11(false) 13 path 13 not covered: lines 14 6(true) 11(false) 13(true) 14 path 14 not covered: lines 14 6(true) 11(false) 13(false) 16 path 15 not covered: lines 6(true) 11(true) 12 6 path 16 not covered: lines 6(true) 11(false) 13(true) 14 6 #####: 1:int binary_search(int a[], int len, int from, int to, int key) -: 2:{ #####: 3: int low = from; #####: 4: int high = to - 1; -: 5: #####: 6: while (low <= high) -: 7: { #####: 8: int mid = (low + high) >> 1; #####: 9: long midVal = a[mid]; -: 10: #####: 11: if (midVal < key) #####: 12: low = mid + 1; #####: 13: else if (midVal > key) #####: 14: high = mid - 1; -: 15: else #####: 16: return mid; // key found -: 17: } #####: 18: return -1; -: 19:} Then there's the human-oriented source mode. Because it is so verbose I have limited the demo to 2 paths. In this mode gcov will print the sequence of *lines* through the program and in what order to cover the path, including what basic block the line is a part of. Like its denser sibling, this also prints the true/false/throw decision, if there is one. $ gcov -t --prime-paths-source bs.o bs.gcda:cannot open data file, assuming not executed -: 0:Source:bs.c -: 0:Graph:bs.gcno -: 0:Data:- -: 0:Runs:0 paths covered 0 of 17 path 0: BB 2: 1:int binary_search(int a[], int len, int from, int to, int key) BB 2: 3: int low = from; BB 2: 4: int high = to - 1; BB 2: 6: while (low <= high) BB 8: (true) 6: while (low <= high) BB 3: 8: int mid = (low + high) >> 1; BB 3: 9: long midVal = a[mid]; BB 3: (true) 11: if (midVal < key) BB 4: 12: low = mid + 1; path 1: BB 2: 1:int binary_search(int a[], int len, int from, int to, int key) BB 2: 3: int low = from; BB 2: 4: int high = to - 1; BB 2: 6: while (low <= high) BB 8: (true) 6: while (low <= high) BB 3: 8: int mid = (low + high) >> 1; BB 3: 9: long midVal = a[mid]; BB 3: (false) 11: if (midVal < key) BB 5: (true) 13: else if (midVal > key) BB 6: 14: high = mid - 1; The listing is also aware of inlining: hello.c: #include <stdio.h> #include "hello.h" int notmain(const char *entity) { return hello (entity); } #include <stdio.h> inline __attribute__((always_inline)) int hello (const char *s) { if (s) printf ("hello, %s!\n", s); else printf ("hello, world!\n"); return 0; } $ gcov -t --prime-paths-source hello paths covered 0 of 2 path 0: BB 2: (true) 4:int notmain(const char *entity) == inlined from hello.h == BB 2: (true) 6: if (s) BB 3: 7: printf ("hello, %s!\n", s); BB 5: 10: return 0; ------------------------- BB 7: 6: return hello (entity); BB 8: 6: return hello (entity); path 1: BB 2: (false) 4:int notmain(const char *entity) == inlined from hello.h == BB 2: (false) 6: if (s) BB 4: 9: printf ("hello, world!\n"); BB 5: 10: return 0; ------------------------- BB 7: 6: return hello (entity); BB 8: 6: return hello (entity); --prime-paths-{lines,source} take an optional argument type, which can be 'covered', 'uncovered', or 'both', which defaults to 'uncovered'. The flag controls if the covered or uncovered paths are printed, and while uncovered is generally the most useful one, it is sometimes nice to be able to see only the covered paths. And finally, JSON (abbreviated). It is quite sparse and very nested, but is mostly a JSON version of the source listing. It has to be this nested in order to consistently capture multiple locations. It is always includes the file name per location for consistency, even though this is very much redundant in almost all cases. This format is in no way set in stone, and without targeting it with other tooling I am not sure if it does the job well. "gcc_version": "15.0.0 20240704 (experimental)", "current_working_directory": "dir", "data_file": "hello.o", "files": [ { "file": "hello.c", "functions": [ { "name": "notmain", "demangled_name": "notmain", "start_line": 4, "start_column": 5, "end_line": 7, "end_column": 1, "blocks": 7, "blocks_executed": 0, "execution_count": 0, "total_prime_paths": 2, "covered_prime_paths": 0, "prime_path_coverage": [ { "id": 0, "sequence": [ { "block_id": 2, "locations": [ { "file": "hello.c", "line_numbers": [ 4 ] }, { "file": "hello.h", "line_numbers": [ 6 ] } ], "edge_kind": "fallthru" }, ... gcc/ChangeLog: * Makefile.in (OBJS): Add prime-paths.o, path-coverage.o. (GTFILES): Add prime-paths.cc, path-coverage.cc (GCOV_OBJS): Add graphds.o, prime-paths.o, bitmap.o * builtins.cc (expand_builtin_fork_or_exec): Check path_coverage_flag. * collect2.cc (main): Add -fno-path-coverage to OBSTACK. * common.opt: Add new options -fpath-coverage, -fpath-coverage-limit, -Wcoverage-too-many-paths * doc/gcov.texi: Add --prime-paths, --prime-paths-lines, --prime-paths-source documentation. * doc/invoke.texi: Add -fpath-coverage, -fpath-coverage-limit, -Wcoverage-too-many-paths documentation. * gcc.cc: Link gcov on -fpath-coverage. * gcov-counter.def (GCOV_COUNTER_PATHS): New. * gcov-io.h (GCOV_TAG_PATHS): New. (GCOV_TAG_PATHS_LENGTH): New. (GCOV_TAG_PATHS_NUM): New. * gcov.cc (class path_info): New. (struct coverage_info): Add paths, paths_covered. (find_prime_paths): New. (add_path_counts): New. (find_arc): New. (print_usage): Add -e, --prime-paths, --prime-paths-lines, --prime-paths-source. (process_args): Likewise. (json_set_prime_path_coverage): New. (output_json_intermediate_file): Call json_set_prime_path_coverage. (process_all_functions): Call find_prime_paths. (generate_results): Call add_path_counts. (read_graph_file): Read path counters. (read_count_file): Likewise. (function_summary): Print path counts. (file_summary): Likewise. (print_source_line): New. (print_prime_path_lines): New. (print_inlined_separator): New. (print_prime_path_source): New. (output_path_coverage): New. (output_lines): Print path coverage. * ipa-inline.cc (can_early_inline_edge_p): Check path_coverage_flag. * passes.cc (finish_optimization_passes): Likewise. * profile.cc (branch_prob): Likewise. * selftest-run-tests.cc (selftest::run_tests): Run path coverage tests. * selftest.h (path_coverage_cc_tests): New declaration. * tree-profile.cc (tree_profiling): Check path_coverage_flag. (pass_ipa_tree_profile::gate): Likewise. * path-coverage.cc: New file. * prime-paths.cc: New file. gcc/testsuite/ChangeLog: * lib/gcov.exp: Add prime paths test function. * g++.dg/gcov/gcov-22.C: New test. * g++.dg/gcov/gcov-23-1.h: New test. * g++.dg/gcov/gcov-23-2.h: New test. * g++.dg/gcov/gcov-23.C: New test. * gcc.misc-tests/gcov-29.c: New test. * gcc.misc-tests/gcov-30.c: New test. * gcc.misc-tests/gcov-31.c: New test. * gcc.misc-tests/gcov-32.c: New test. * gcc.misc-tests/gcov-33.c: New test. * gcc.misc-tests/gcov-34.c: New test.
2025-03-25install.texi: gcn - suggest to use Newlib with simd math fix [PR119325]Tobias Burnus1-2/+3
Suggest a Newlib with a fix for the SIMD math issue. Newlib commit: https://sourceware.org/git/?p=newlib-cygwin.git;a=commitdiff;h=2ef1a37e7 Additionally, for generic support in ROCm, it is expected that 6.4 will added the support; the current version is 6.3.3 and it does not support it; bump >6.3.2 to >6.3.3 in install.texi to avoid doubts. gcc/ChangeLog: PR middle-end/119325 * doc/install.texi (gcn): Change ROCm > 6.3.2 to >6.3.3 for generic support; mention Newlib commit that fixes a SIMD math issue.
2025-03-24nvptx: Default at least to '-mptx=6.3'Thomas Schwinge1-1/+1
gcc/ * config/nvptx/nvptx.cc (default_ptx_version_option): Default at least to '-mptx=6.3'. * doc/invoke.texi (Nvidia PTX Options): Update '-mptx=[...]'. gcc/testsuite/ * gcc.target/nvptx/march-map=sm_30.c: Adjust. * gcc.target/nvptx/march-map=sm_32.c: Likewise. * gcc.target/nvptx/march-map=sm_35.c: Likewise. * gcc.target/nvptx/march-map=sm_37.c: Likewise. * gcc.target/nvptx/march-map=sm_50.c: Likewise. * gcc.target/nvptx/march=sm_30.c: Likewise. * gcc.target/nvptx/march=sm_35.c: Likewise. * gcc.target/nvptx/march=sm_37.c: Likewise.
2025-03-24i386: Raise deprecate warning for -mavx10.1-256/512 and -mevex512 while add ↵Haochen Jiang2-0/+9
-mavx10.1 back with 512 bit alias When AVX10.1 options are added into GCC 14, E-core is supposed to support up to 256 bit vector width, while P-core up to 512 bit vector width. Therefore, we added avx10.1-256 and avx10.1-512 options into compiler since there will be real platforms with 256 bit only support. At the same time, for old platforms could also compile a 256 bit only binary, we introduced -mno-evex512 to disable 512 bit vector. However, all the future platforms will now support 512 bit vector width, including P-core and E-core. It will result in no need for split the option for vector width. Therefore, we will remove them in this patch. Unlike AVX10.2 options, AVX10.1 options has been there in a major release, so we have to raise a deprecate warning in GCC 15 and remove them in GCC 16. At the same time, to align with avx10.2 options, we will add just removed avx10.1 option back with warning to mention its behavior change. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Change to FEATURE_AVX10_1. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_1_512_SET): Renamed to ... (OPTION_MASK_ISA2_AVX10_1_SET): ... this. (OPTION_MASK_ISA2_AVX10_2_SET): Use renamed macro. (OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto. (ix86_handle_option): Ditto. (processor_alias_table): Use P_PROC_AVX10_1. * common/config/i386/i386-cpuinfo.h (enum feature_priority): Rename from AVX10_1_512 to AVX10_1. (enum processor_features): Ditto. * common/config/i386/i386-isas.h: Add avx10.1. * config/i386/driver-i386.cc (host_detect_local_cpu): Use renamed enum. * config/i386/i386-c.cc (ix86_target_macros_internal): Rename to avx10.1. * config/i386/i386-isa.def (AVX10_1_512): Rename to ... (AVX10_1): ... this. * config/i386/i386-options.cc (isa2_opts): Rename to avx10.1. (ix86_valid_target_attribute_inner_p): Add avx10.1. (ix86_option_override_internal): Rename to AVX10_1. Revise warnings to mention behavior change for option combination in GCC 16. * config/i386/i386.h (PTA_DIAMONDRAPIDS): Use AVX10_1. * config/i386/i386.opt: Add avx10.1. Add deprecate warnings for mevex512 and mavx10.1-256/512. * config/i386/i386.opt.urls: Add avx10.1. * doc/extend.texi: Ditto. * doc/sourcebuild.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10-check.h: Change to avx10.1. * gcc.target/i386/avx10_1-1.c: Add warning check. * gcc.target/i386/avx10_1-10.c: Ditto. * gcc.target/i386/avx10_1-11.c: Ditto. * gcc.target/i386/avx10_1-12.c: Ditto. * gcc.target/i386/avx10_1-13.c: Ditto. * gcc.target/i386/avx10_1-15.c: Ditto. * gcc.target/i386/avx10_1-16.c: Ditto. * gcc.target/i386/avx10_1-18.c: Ditto. * gcc.target/i386/avx10_1-19.c: Ditto. * gcc.target/i386/avx10_1-2.c: Ditto. * gcc.target/i386/avx10_1-20.c: Ditto. * gcc.target/i386/avx10_1-21.c: Ditto. * gcc.target/i386/avx10_1-22.c: Ditto. * gcc.target/i386/avx10_1-23.c: Ditto. * gcc.target/i386/avx10_1-26.c: Ditto. * gcc.target/i386/avx10_1-3.c: Ditto. * gcc.target/i386/avx10_1-4.c: Ditto. * gcc.target/i386/avx10_1-7.c: Ditto. * gcc.target/i386/avx10_1-8.c: Ditto. * gcc.target/i386/avx10_1-9.c: Ditto. * gcc.target/i386/noevex512-1.c: Ditto. * gcc.target/i386/noevex512-2.c: Ditto. * gcc.target/i386/pr111068.c: Ditto. * gcc.target/i386/pr111907.c: Ditto. * gcc.target/i386/pr117240_avx512f.c: Ditto. * gcc.target/i386/pr117304-1.c: Ditto. * gcc.target/i386/pr117946.c: Ditto. * gcc.target/i386/avx10_1-24.c: Removed. * gcc.target/i386/avx10_1-25.c: Removed. * gcc.target/i386/avx10_1-5.c: Removed. * gcc.target/i386/avx10_1-6.c: Removed.
2025-03-24i386: Remove avx10.2-256 and avx10.2-512 optionsHaochen Jiang3-25/+4
When AVX10.2 options are added into GCC 15, E-core is supposed to support up to 256 bit vector width, while P-core up to 512 bit vector width. Therefore, we added avx10.2-256 and avx10.2-512 options into compiler since there will be real platforms with 256 bit only support. However, all the future platforms will now support 512 bit vector width, including P-core and E-core. It will result in no need for split the option for vector width. Therefore, we will remove them in this patch. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Revise the logic AVX10 version. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_2_256_SET): Removed. (OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto. (OPTION_MASK_ISA2_AVX10_2_SET): New. (OPTION_MASK_ISA2_AMX_AVX512_SET): Use AVX10.2 macro. (OPTION_MASK_ISA2_AVX10_2_UNSET): Ditto. (ix86_handle_option): Remove avx10.2-256 part. Adjust avx10.2. * common/config/i386/i386-cpuinfo.h (enum processor_features): Remove FEATURE_AVX10_2_256 and skip the value for it. Change the name from FEATURE_AVX10_2_512 to FEATURE_AVX10_2. * common/config/i386/i386-isas.h: Remove avx10.2-256/512. * config/i386/avx10_2-512bf16intrin.h: Use avx10.2 instead of avx10.2-256/512. * config/i386/avx10_2-512convertintrin.h: Ditto. * config/i386/avx10_2-512mediaintrin.h: Ditto. * config/i386/avx10_2-512minmaxintrin.h: Ditto. * config/i386/avx10_2-512satcvtintrin.h: Ditto. * config/i386/avx10_2bf16intrin.h: Ditto. * config/i386/avx10_2convertintrin.h: Ditto. * config/i386/avx10_2mediaintrin.h: Ditto. * config/i386/avx10_2minmaxintrin.h: Ditto. * config/i386/avx10_2satcvtintrin.h: Ditto. * config/i386/movrsintrin.h: Ditto. * config/i386/sm4intrin.h: Ditto. * config/i386/cpuid.h (bit_AVX10_256): Removed. (bit_AVX10_512): Ditto. * config/i386/driver-i386.cc (host_detect_local_cpu): Adjust Diamond Rapids and -march=native condition. * config/i386/i386-builtin.def (BDESC): Use AVX10.2 macro instead of AVX10.2-256/512. * config/i386/i386-c.cc (ix86_target_macros_internal): Ditto. * config/i386/i386-expand.cc (ix86_expand_branch): Use TARGET_AVX10_2 instead of specifying vector size. (ix86_prepare_fp_compare_args): Ditto. (ix86_expand_fp_compare): Ditto. (ix86_ssecom_setcc): Ditto. (ix86_expand_sse_comi): Ditto. (ix86_expand_sse_comi_round): Ditto. (ix86_check_builtin_isa_match): Ditto. * config/i386/i386.cc (ix86_fp_compare_code_to_integer): Ditto. (ix86_get_mask_mode): Ditto. * config/i386/i386.h (SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P): Ditto. * config/i386/i386.md: Ditto. * config/i386/mmx.md: Ditto. * config/i386/sse.md: Ditto. * config/i386/predicates.md: Ditto. * config/i386/i386-isa.def (AVX10_2_256): Removed. (AVX10_2_512): Removed. (AVX10_2): New. * config/i386/i386-options.cc (isa2_opts): Remove avx10.2-256/512. (ix86_valid_target_attribute_inner_p): Ditto. (PTA_DIAMONDRAPIDS): Use PTA_AVX10_2. * config/i386/i386.opt: Remove avx10.2-256/512. * config/i386/i386.opt.urls: Ditto. * doc/extend.texi: Ditto. * doc/invoke.texi: Ditto. * doc/sourcebuild.texi: Ditto.
2025-03-23Doc: Rearrange remaining top-level sections in extend.texi [PR42270]Sandra Loosemore1-1162/+1162
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. gcc/ChangeLog PR other/42270 * doc/extend.texi (Nonlocal Gotos): Group with other built-ins sections. (Constructing Calls): Likewise. (Pragmas): Move earlier in the section, before the built-ins docs. (Thread-Local): Likewise. (OpenMP): Likewise. (OpenACC): Likewise.
2025-03-23Doc: Add "Syntax Extensions" and "Semantic Extensions" sectioning to ↵Sandra Loosemore1-1137/+1160
extend.texi [PR42270] This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. gcc/ChangeLog PR other/42270 * doc/extend.texi (Syntax Extensions): New section. (Statement Exprs): Make it a subsection of the above. (Local Labels): Likewise. (Labels as Values): Likewise. (Nested Functions): Likewise. (Typeof): Likewise. (Offsetof): Likewise. (Alignment): Likewise. (Incomplete Enums): Likewise. (Variadic Macros): Likewise. (Conditionals): Likewise. (Case Ranges): Likewise. (Mixed Labels and Declarations): Likewise. (C++ Comments): Likewise. (Escaped Newlines): Likewise. (Hex Floats): Likewise. (Binary constants): Likewise. (Dollar Signs): Likewise. (Character Escapes): Likewise. (Alternate Keywords): Likewise. (Function Names): Likewise. (Semantic Extensions): New section. (Function Prototypes): Make it a subsection of the above. (Pointer Arith): Likewise. (Variadic Pointer Args): Likewise. (Pointers to Arrays): Likewise. (Const and Volatile Functions): Likewise.
2025-03-23Doc: Add "Aggregate Types" sectioning to extend.texi [PR42270]Sandra Loosemore1-596/+606
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. gcc/ChangeLog PR other/42270 * doc/extend.texi (Aggregate Types): New section. (Variable Length): Make it a subsection of the above. (Zero Length): Likewise. (Empty Structures): Likewise. (Flexible Array Members in Unions): Likewise. (Flexible Array Members alone in Structures): Likewise. (Unnamed Fields): Likewise. (Cast to Union): Likewise. (Subscripting): Likewise. (Initializers): Likewise. (Compound Literals): Likewise. (Designated Inits): Likewise.
2025-03-23Doc: Add "Additional Numeric Types" sectioning to extend.texi [PR42270]Sandra Loosemore1-41/+53
This is part of an incremental effort to make the chapter on GCC extensions better organized by grouping/rearranging sections by topic. gcc/ChangeLog PR other/42270 * doc/extend.texi (Additional Numeric Types): New section. (__int128): Make it a subsection of the above. (Long Long): Likewise. (Complex): Likewise. (Floating Types): Likewise. (Half-Precision): Likewise. (Decimal Float): Likewise. (Fixed-Point): Likewise.
2025-03-23AVR: Add AVR-SD devices.Georg-Johann Lay1-2/+2
gcc/ * config/avr/avr-mcus.def: Add AVR32SD20, AVR32SD28, AVR32SD32, AVR64SD28, AVR64SD32, AVR64SD48. * doc/avr-mmcu.texi: Rebuild.
2025-03-23AVR: Clarify some optimization options.Georg-Johann Lay1-1/+4
gcc/ * doc/invoke.texi (AVR Optimization Options) <-maccumulate-args>: Refer to -fdefer-pop. <-muse-nonzero-bits>: Re-formulate what the option does.
2025-03-22AVR: target/119421 Better optimize some bit operations.Georg-Johann Lay1-2/+7
There are occasions where knowledge about nonzero bits makes some optimizations possible. For example, Rd |= Rn << Off can be implemented as SBRC Rn, 0 ORI Rd, 1 << Off when Rn in { 0, 1 }, i.e. nonzero_bits (Rn) == 1. This patch adds some patterns that exploit nonzero_bits() in some combiner patterns. As insn conditions are not supposed to contain nonzero_bits(), the patch splits such insns right after pass insn combine. PR target/119421 gcc/ * config/avr/avr.opt (-muse-nonzero-bits): New option. * config/avr/avr-protos.h (avr_nonzero_bits_lsr_operands_p): New. (make_avr_pass_split_nzb): New. * config/avr/avr.cc (avr_nonzero_bits_lsr_operands_p): New function. (avr_rtx_costs_1): Return costs for the new insns. * config/avr/avr.md (nzb): New insn attribute. (*nzb=1.<code>...): New insns to better support some bit operations for <code> in AND, IOR, XOR. * config/avr/avr-passes.def (avr_pass_split_nzb): Insert pass atfer combine. * config/avr/avr-passes.cc (avr_pass_data_split_nzb). New pass data. (avr_pass_split_nzb): New pass. (make_avr_pass_split_nzb): New function. * common/config/avr/avr-common.cc (avr_option_optimization_table): Enable -muse-nonzero-bits for -O2 and higher. * doc/invoke.texi (AVR Options): Document -muse-nonzero-bits. gcc/testsuite/ * gcc.target/avr/torture/pr119421-sreg.c: New test.
2025-03-21aarch64: Add support for -mcpu=olympusDhruv Chawla1-1/+1
This adds support for the NVIDIA Olympus core to the AArch64 backend. The initial patch does not add any special tuning decisions, and those may come later. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-cores.def (olympus): New entry. * config/aarch64/aarch64-tune.md: Regenerate. * doc/invoke.texi (AArch64 Options): Document the above. Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com>
2025-03-20Revert "s390: Deprecate ESA/390 support"Stefan Schulze Frielinghaus1-5/+1
The intention of -m31 -mesa and -m31 -mzarch was that they are (ABI) compatible which is almost true except as it turns out they are not for attribute mode(word). After doing some archaeology and digging out an over 18 year old thread [1,2] which is about this very attribute, I come to the conclusion to revert this patch. The intention by deprecating and eventually removing ESA/390 support was to prepare for a future removal of -m31; though in smaller steps. Thus, instead of introducing some potential hick ups along the route, I will revert this patch and will revisit this topic when time for -m31 in its entirety has come---independent of -mesa/-mzarch. [1] https://gcc.gnu.org/pipermail/gcc-patches/2006-September/200465.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2006-October/201154.html This reverts commit 3b1bd1fdcd241dd1e5b706b6937400d74ca43146.
2025-03-19aarch64: Add +sve2p1 to -march=armv9.4-a flagsKyrylo Tkachov1-1/+1
The ArmARM says: "In an Armv9.4 implementation, if FEAT_SVE2 is implemented, FEAT_SVE2p1 is implemented." We should enable +sve2p1 as part of -march=armv9.4-a, which this patch does. This makes gcc consistent with gas. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64-arches.def (...): Add SVE2p1. * doc/invoke.texi (AArch64 Options): Document +sve2p1 in -march=armv9.4-a.
2025-03-18c, c++: Support musttail attribute even using __attribute__ form [PR116545]Jakub Jelinek1-6/+10
Apparently some programs in the wild use #if __has_attribute(musttail) __attribute__((musttail)) return foo (); #else return foo (); #endif clang supports musttail both as a standard attribute ([[clang::musttail]] which we also support for compatibility) and the above worked just fine with GCC 14 which had __has_attribute(musttail) 0. Now that it is 0, this doesn't compile anymore. So, either we need to ensure that __has_attribute(musttail) is 0 and just __has_c{,pp}_attribute({gnu,clang}::musttail) are non-zero, or IMHO better we just make it work in the attribute form, especially for C < C23 I can see why some projects would prefer that form. While [[gnu::musttail]] is rejected as an error in C11 etc. before GCC 15, rather than just handled as an unknown attribute. I view this as both a regression and compatibility issue. The patch handles it in similar spots to fallthrough/assume attributes inside of __attribute__ for C, and for C++ enables mixing of standard [[]] and GNU __attribute__(()) attributes at the start of statements in any order. While working on it, I've noticed we weren't diagnosing arguments to the clang::musttail attribute (fixed by the c-attribs.cc hunk) and newly on the __attribute__ form attribute (in that case the arguments aren't just skipped, they are always parsed and because we don't call decl_attributes etc., it wouldn't be diagnosed without a manual check). 2025-03-18 Jakub Jelinek <jakub@redhat.com> PR c/116545 gcc/ * doc/extend.texi (musttail statement attribute): Document that musttail GNU attribute can be used as well. gcc/c-family/ * c-attribs.cc (c_common_clang_attributes): Add musttail. gcc/c/ * c-parser.cc (c_parser_declaration_or_fndef): Parse __attribute__((musttail)) return. (c_parser_handle_musttail): Diagnose attribute arguments. (c_parser_statement_after_labels): Parse __attribute__((musttail)) return. gcc/cp/ * parser.cc (cp_parser_statement): Call cp_parser_attributes_opt rather than cp_parser_std_attribute_spec_seq. (cp_parser_jump_statement): Diagnose gnu::musttail attributes with no arguments. gcc/testsuite/ * c-c++-common/attr-fallthrough-2.c: Adjust expected diagnostics for C++. * c-c++-common/musttail15.c: New test. * c-c++-common/musttail16.c: New test. * c-c++-common/musttail17.c: New test. * c-c++-common/musttail18.c: New test. * c-c++-common/musttail19.c: New test. * c-c++-common/musttail20.c: New test. * c-c++-common/musttail21.c: New test. * c-c++-common/musttail22.c: New test. * c-c++-common/musttail23.c: New test. * c-c++-common/musttail24.c: New test. * g++.dg/musttail7.C: New test. * g++.dg/musttail8.C: New test. * g++.dg/musttail12.C: New test. * g++.dg/musttail13.C: New test. * g++.dg/musttail14.C: New test. * g++.dg/ext/pr116545.C: New test.
2025-03-18testsuite: Add support for dg-output-file directiveJakub Jelinek1-0/+4
The COBOL tests has many tests which just dump emit lots of output to stdout and want to compare it against expected output. We have the dg-output directive, but if one needs more than dozens of lines in the output, adding hundreds of dg-output directives to each source uses too much memory and is harder to maintain. The following patch offers an alternative, dg-output-file directive where one can supply a text file with expected output (no regexp matching in that case, just exact output, except that it handles different line ending styles (for the expected file using tcl gets, for the actual output skips over \n, \r\n or \r). And a newline at the end of the whole output is optional (in the actual output, because I think some boards get it eaten). Also tested with addition or subtraction of some characters from the expected output files and saw FAILs with appropriate messages. 2025-03-18 Jakub Jelinek <jakub@redhat.com> * doc/sourcebuild.texi (dg-output-file): Document. * lib/gcc-dg.exp (${tool}-load): If output-file is set, compare combined output against content of the [lindex ${output-file} 1] file. (dg-output-file): New directive. * lib/dg-test-cleanup.exp (cleanup-after-saved-dg-test): Clear output-file variable. * gcc.dg/dg-output-file-1.c: New test. * gcc.dg/dg-output-file-1-lp64.txt: New test. * gcc.dg/dg-output-file-1-ilp32.txt: New test.
2025-03-17rs6000: Add -msplit-patch-nops (PR112980)Michael Matz1-2/+15
as the bug report details some uses of -fpatchable-function-entry aren't happy with the "before" NOPs being inserted between global and local entry point on powerpc. We want the before NOPs be in front of the global entry point. That means that the patching NOPs aren't consecutive for dual entry point functions, but for these usecases that's not the problem. But let us support both under the control of a new target option: -msplit-patch-nops. gcc/ PR target/112980 * config/rs6000/rs6000.opt (msplit-patch-nops): New option. * doc/invoke.texi (RS/6000 and PowerPC Options): Document it. * config/rs6000/rs6000.h (machine_function.stop_patch_area_print): New member. * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): Emit split nops under control of that one. * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue): Add handling of split patch nops.
2025-03-17doc: document Incremental LTO flagsMichal Jires1-3/+23
This adds missing documentation for LTO flags. gcc/ChangeLog: * doc/invoke.texi: (Optimize Options): Add incremental LTO flags.
2025-03-14Doc: Remove redundant info from documentation of -ansi.Sandra Loosemore2-43/+6
The -ansi option has essentially been superseded by the more general -std= option, and all the additional information about its effects is already covered elsewhere in the manual. I also cleaned up some confusing text about alternate keywords that I noticed while confirming this. gcc/ChangeLog * doc/extend.texi (Alternate Keywords): Clean up text and remove discussion of "restrict", which is not a GNU extension at all. * doc/invoke.texi (C Dialect Options): Remove detailed discussion.
2025-03-13Allow to build libgccjit with a soname bound to the GCC major versionMatthias Klose1-0/+4
When configuring GCC with --program-suffix=-$(BASE_VERSION) to allow installation multiple GCC versions in parallel, the executable of the driver (gcc-$(BASE_VERSION)) gets recorded in the libgccjit.so.0 library. Assuming, that you only install the libgccjit.so.0 library from the newest GCC, you have a libgccjit installed, which always calls back to the newest installed version of GCC. I'm not saying that the ABI is changing, but I'd like to see the libgccjit calling out to the corresponding compiler, and therefore installing a libgccjit with a soname that matches the GCC major version. The downside is having to rebuild packages built against libgccjit with each major GCC version, but looking at the reverse dependencies, at least for package builds, only emacs is using libgccjit. My plan to use this feature is to build a libgccjit0 using the default GCC (e.g. gcc-14), and a libgccjit15, when building a newer GCC. When changing the GCC default to 15, building a libgccjit0 from gcc-15, and a libgccjit14 from gcc-14. When configuring without --enable-versioned-jit, the behavior is unchanged. 2025-03-13 Matthias Klose <doko@ubuntu.com> gcc/ * configure.ac: Add option --enable-versioned-jit. * configure: Regenerate. * Makefile.in: Move from jit/Make-lang.in, setting value from configure.ac. * doc/install.texi: Document option --enable-versioned-jit. gcc/jit/ * Make-lang.in (LIBGCCJIT_VERSION_NUM): Move to ../Makefile.in.
2025-03-11doc: Fix minor grammar nit in -ftrivial-auto-var-init docsJonathan Wakely1-1/+1
gcc/ChangeLog: * doc/extend.texi (Common Variable Attributes): Fix grammar in final sentence of -ftrivial-auto-var-init description.
2025-03-11s390: Deprecate ESA/390 supportStefan Schulze Frielinghaus1-1/+5
Deprecate support for the ESA/390 architecture which will be eventually removed, and encourage the usage of the z/Architecture instead. Furthermore, default for -m31 to -mzarch whereas previously we defaulted to -mesa. gcc/ChangeLog: * config.gcc: Fail in case of option --with-mode=esa. * config/s390/s390.cc (s390_option_override_internal): Default to z/Architecture mode. * config/s390/s390.h (DRIVER_SELF_SPECS): Ditto. * config/s390/s390.opt: Emit a warning for option -mesa. * doc/invoke.texi: Document the change. gcc/testsuite/ChangeLog: * gcc.target/s390/20020926-1.c: Deal with deprecation warning. * gcc.target/s390/dwarfregtable-1.c: Ditto. * gcc.target/s390/fp2int1.c: Ditto. * gcc.target/s390/pr102222.c: Ditto. * gcc.target/s390/pr106355-3.c: Ditto. * gcc.target/s390/pr61078.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-10.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-12.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-14.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-18.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-2.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-20.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-22.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-24.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-26.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-28.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-30.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-32.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-4.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-6.c: Ditto. * gcc.target/s390/target-attribute/tattr-m31-8.c: Ditto.
2025-03-11COBOL: documentation updates for gcobolJames K. Lowden6-22/+84
gcc/ * doc/contrib.texi: Update for gcobol. * doc/frontends.texi: Likewise. * doc/install.texi: Likewise. * doc/invoke.texi: Likewise. * doc/sourcebuild.texi: Likewise. * doc/standards.texi: Likewise.
2025-03-10Sanitizer: Fix typo in previous documentation patch.Sandra Loosemore1-1/+1
gcc/ChangeLog * doc/invoke.texi (Instrumentation Options): Fix typo introduced in commit 313edeeeb607fe32da5633cfb6f91977add446f6.
2025-03-08inline-asm: Improve documentation of "asm constexpr".Sandra Loosemore1-23/+37
While working on an adjacent documentation fix, I noticed that the documentation for the gnu++11 "asm constexpr" feature was very confusing, in some cases being attached to parts of the asm syntax that are not otherwise required to be string literals, and missing from other parts of the syntax that are. I've checked what the C++ parser actually does and fixed the documentation to match, also improving it to use correct markup and to be more explicit and less implementor-speaky. gcc/cp/ChangeLog * parser.cc (cp_parser_asm_definition): Make comment more explicit. (cp_parser_asm_operand_list): Likewise. Also correct the comment block at the top of the function to reflect reality. gcc/ChangeLog * doc/extend.texi (Basic Asm): Document that AssemblerInstructions can be an asm constexpr. (Extended Asm): Move the notes about asm constexprs for AssemblerTemplate and Clobbers to the corresponding subsections. Remove the notes for OutputOperands and InputOperands and reword misleading descriptions of the list item syntax. Note that constraint strings can be asm constexprs. (Asm constexprs): Use "title case" for subsection name. Be explicit about what parts of the asm syntax this applies to and that the parentheses are required. Correct markup and terminology.
2025-03-08inline-asm: Clarify documentation of operand syntax [PR67301]Sandra Loosemore1-8/+10
gcc/ChangeLog PR c/67301 * doc/extend.texi (Extended Asm): Clarify that the square brackets around the asmSymbolicName of operands are a required part of the syntax.
2025-03-07c: do not warn about truncating NUL char when initializing nonstring arrays ↵Jakub Jelinek1-6/+9
[PR117178] When initializing a nonstring char array when compiled with -Wunterminated-string-initialization the warning trips even when truncating the trailing NUL character from the string constant. Only warn about this when running under -Wc++-compat since under C++ we should not initialize nonstrings from C strings. This patch separates the -Wunterminated-string-initialization and -Wc++-compat warnings, they are now independent option, the former implied by -Wextra, the latter not implied by anything. If -Wc++-compat is in effect, it takes precedence over -Wunterminated-string-initialization and warns regardless of nonstring attribute, otherwise if -Wunterminated-string-initialization is enabled, it warns only if there isn't nonstring attribute. In all cases, the warnings and also pedwarn_init for even larger sizes now provide details on the lengths. 2025-03-07 Kees Cook <kees@kernel.org> Jakub Jelinek <jakub@redhat.com> PR c/117178 gcc/ * doc/invoke.texi (Wunterminated-string-initialization): Document the new interaction between this warning and -Wc++-compat and that initialization of decls with nonstring attribute aren't warned about. gcc/c-family/ * c.opt (Wunterminated-string-initialization): Don't depend on -Wc++-compat. gcc/c/ * c-typeck.cc (digest_init): Add DECL argument. Adjust wording of pedwarn_init for too long strings and provide details on the lengths, for string literals where just the trailing NULL doesn't fit warn for warn_cxx_compat with OPT_Wc___compat, wording which mentions "for C++" and provides details on lengths, otherwise for warn_unterminated_string_initialization adjust the warning, provide details on lengths and don't warn if get_attr_nonstring_decl (decl). (build_c_cast, store_init_value, output_init_element): Adjust digest_init callers. gcc/testsuite/ * gcc.dg/Wunterminated-string-initialization.c: Add additional test coverage. * gcc.dg/Wcxx-compat-14.c: Check in dg-warning for "for C++" part of the diagnostics. * gcc.dg/Wcxx-compat-23.c: New test. * gcc.dg/Wcxx-compat-24.c: New test. Signed-off-by: Kees Cook <kees@kernel.org>
2025-03-07Sanitizer: Mention -g option in documentation [PR56682]Sandra Loosemore1-1/+8
gcc/ChangeLog PR sanitizer/56682 * doc/invoke.texi (Instrumentation Options): Document that -g is useful with -fsanitize=thread and -fsanitize=address. Also mention -fno-omit-frame-pointer per the asan wiki.
2025-03-08ira: Add new hooks for callee-save vs spills [PR117477]Richard Sandiford2-10/+73
Following on from the discussion in: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and replaces it with two hooks: one that controls the cost of using an extra callee-saved register and one that controls the cost of allocating a frame for the first spill. (The patch does not attempt to address the shrink-wrapping part of the thread above.) On AArch64, this is enough to fix PR117477, as verified by the new tests. The patch does not change the SPEC2017 scores significantly. (I saw a slight improvement in fotonik3d and roms, but I'm not convinced that the improvements are real.) The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c, which is a scan-dump correctness test that relies on not using caller saves. The decision to use caller saves looks appropriate, and saves an instruction, so I've just added -fno-caller-saves to the test options. The x86 parts were written by Honza. ix86_callee_save_cost is updated by H.J. to replace gcc_checking_assert with returning 1 if mem_cost <= 2. gcc/ PR rtl-optimization/117477 * config/aarch64/aarch64.cc (aarch64_count_saves): New function. (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost) (aarch64_frame_allocation_cost): Likewise. (TARGET_CALLEE_SAVE_COST): Define. (TARGET_FRAME_ALLOCATION_COST): Likewise. * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): Replace with... (ix86_callee_save_cost): ...this new hook. (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. (TARGET_CALLEE_SAVE_COST): Define. * target.h (spill_cost_type, frame_cost_type): New enums. * target.def (callee_save_cost, frame_allocation_cost): New hooks. (ira_callee_saved_register_cost_scale): Delete. * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks. * doc/tm.texi: Regenerate. * hard-reg-set.h (hard_reg_set_popcount): New function. * ira-color.cc (allocated_memory_p): New variable. (allocated_callee_save_regs): Likewise. (record_allocation): New function. (assign_hard_reg): Use targetm.frame_allocation_cost to model the cost of the first spill or first caller save. Use targetm.callee_save_cost to model the cost of using new callee-saved registers. Apply the exit rather than entry frequency to the cost of restoring a register or deallocating the frame. Update the new variables above. (improve_allocation): Use record_allocation. (color): Initialize allocated_callee_save_regs. (ira_color): Initialize allocated_memory_p. * targhooks.h (default_callee_save_cost): Declare. (default_frame_allocation_cost): Likewise. * targhooks.cc (default_callee_save_cost): New function. (default_frame_allocation_cost): Likewise. gcc/testsuite/ PR rtl-optimization/117477 * gcc.target/aarch64/callee_save_1.c: New test. * gcc.target/aarch64/callee_save_2.c: Likewise. * gcc.target/aarch64/callee_save_3.c: Likewise. * gcc.target/aarch64/pr103350-1.c: Add -fno-caller-saves. Co-authored-by: Jan Hubicka <hubicka@ucw.cz> Co-authored-by: H.J. Lu <hjl.tools@gmail.com>