Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog
PR driver/58973
* common.opt (Werror, Werror=): Use less awkward wording in
description.
(pedantic-errors): Likewise.
* doc/invoke.texi (Warning Options): Likewise for -Werror and
-Werror= here.
Co-Authored-By: GUO Yixuan <culu.gyx@gmail.com>
|
|
This patch addresses a number of issues with the documentation of
- None of the things in this section had @cindex entries [PR114957].
- The document formatting didn't match that of other #pragma
documentation sections.
- The effect of #pragma pack(0) wasn't documented [PR78008].
- There's a long-standing bug [PR60972] reporting that #pragma pack
and the __attribute__(packed) don't get along well. It seems worthwhile
to warn users about that since elsewhere pragmas are cross-referenced
with related or equivalent attributes.
gcc/ChangeLog
PR c/114957
PR c/78008
PR c++/60972
* doc/extend.texi (Structure-Layout Pragmas): Add @cindex
entries and reformat the pragma descriptions to match the markup
used for other pragmas. Document what #pragma pack(0) does.
Add cross-references to similar attributes.
|
|
On Wed, Apr 02, 2025 at 10:32:20AM +0200, Richard Biener wrote:
> I wonder if we can amend the documentation to suggest to end lifetime
> of variables explicitly by proper scoping?
In the -Wmaybe-musttail-local-addr attribute description I've already
tried to show that in the example, but if you think something like
the following would make it clearer.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
* doc/extend.texi (musttail statement attribute): Hint how
to avoid -Wmaybe-musttail-local-addr warnings.
|
|
instead warn [PR119376]
As discussed here and in bugzilla, [[clang::musttail]] attribute in clang
not just strongly asks for tail call or error, but changes behavior.
To quote:
https://clang.llvm.org/docs/AttributeReference.html#musttail
"The lifetimes of all local variables and function parameters end immediately
before the call to the function. This means that it is undefined behaviour
to pass a pointer or reference to a local variable to the called function,
which is not the case without the attribute. Clang will emit a warning in
common cases where this happens."
The GCC behavior was just to error if we can't prove the musttail callee
could not have dereferenced escaped pointers to local vars or parameters
of the caller. That is still the case for variables with non-trivial
destruction (even in clang), like vars with C++ non-trivial destructors or
variables with cleanup attribute.
The following patch changes the behavior to match that of clang, for all of
[[clang::musttail]], [[gnu::musttail]] and __attribute__((musttail)).
clang 20 actually added warning for some cases of it in
https://github.com/llvm/llvm-project/pull/109255
but it is under -Wreturn-stack-address warning.
Now, gcc doesn't have that warning, but -Wreturn-local-addr instead, and
IMHO it is better to have this under new warnings, because this isn't about
returning local address, but about passing it to a musttail call, or maybe
escaping to a musttail call. And perhaps users will appreciate they can
control it separately as well.
The patch introduces 2 new warnings.
-Wmusttail-local-addr
which is turn on by default and warns for the always dumb cases of passing
an address of a local variable or parameter to musttail call's argument.
And then
-Wmaybe-musttail-local-addr
which is only diagnosed if -Wmusttail-local-addr was not diagnosed and
diagnoses at most one (so that we don't emit 100s of warnings for one call
if 100s of vars can escape) case where an address of a local var could have
escaped to the musttail call. This is less severe, the code doesn't have
to be obviously wrong, so the warning is only enabled in -Wextra.
And I've adjusted also the documentation for this change and addition of
new warnings.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
PR ipa/119376
* common.opt (Wmusttail-local-addr, Wmaybe-musttail-local-addr): New.
* tree-tailcall.cc (suitable_for_tail_call_opt_p): Don't fail for
TREE_ADDRESSABLE PARM_DECLs for musttail calls if diag_musttail.
Emit -Wmusttail-local-addr warnings.
(maybe_error_musttail): Use gimple_location instead of directly
accessing location member.
(find_tail_calls): For musttail calls if diag_musttail, don't fail
if address of local could escape to the call, instead emit
-Wmaybe-musttail-local-addr warnings. Emit
-Wmaybe-musttail-local-addr warnings also for address taken
parameters.
* common.opt.urls: Regenerate.
* doc/extend.texi (musttail statement attribute): Clarify local
variables without non-trivial destruction are considered out of scope
before the tail call instruction.
* doc/invoke.texi (-Wno-musttail-local-addr,
-Wmaybe-musttail-local-addr): Document.
* c-c++-common/musttail8.c: Expect a warning rather than error in one
case.
(f4): Add int * argument.
* c-c++-common/musttail15.c: Don't disallow for C++98.
* c-c++-common/musttail16.c: Likewise.
* c-c++-common/musttail17.c: Likewise.
* c-c++-common/musttail18.c: Likewise.
* c-c++-common/musttail19.c: Likewise. Expect a warning rather than
error in one case.
(f4): Add int * argument.
* c-c++-common/musttail20.c: Don't disallow for C++98.
* c-c++-common/musttail21.c: Likewise.
* c-c++-common/musttail28.c: New test.
* c-c++-common/musttail29.c: New test.
* c-c++-common/musttail30.c: New test.
* c-c++-common/musttail31.c: New test.
* g++.dg/ext/musttail1.C: New test.
* g++.dg/ext/musttail2.C: New test.
* g++.dg/ext/musttail3.C: New test.
|
|
Per the issue, the discussion of these two attributes needed to be
better integrated. I also did some editing for style and readability,
and clarified that almost all targets support this feature (it is
enabled by default unless the back end disables it), not just "some".
Co-Authored_by: Jonathan Wakely <jwakely@redhat.com>
gcc/ChangeLog
PR c++/118982
* doc/extend.texi (Common Function Attributes): For the
constructor/destructory attribute, be more explicit about the
relationship between the constructor attribute and
the C++ init_priority attribute, and add a cross-reference.
Also document that most targets support this.
(C++ Attributes): Similarly for the init_priority attribute.
|
|
gcc/ChangeLog
PR c/118118
* doc/extend.texi (Boolean Type): New section.
|
|
This is a C23/C++11 feature that is supported as an extension with
earlier -std= options too, but was never previously documented. It
interacts with the already-documented forward enum definition extension,
so I have merged discussion of the two extensions into the same section.
gcc/ChangeLog
PR c/117689
* doc/extend.texi (Incomplete Enums): Rename to....
(Enum Extensions): This. Document support for specifying the
underlying type of an enum as an extension in all earlier C
and C++ standards. Document that a forward declaration with
underlying type is not an incomplete type, and which dialects
GCC supports that in.
|
|
The warning -Wzero-as-null-pointer-constant is now not only supported
in C++ but also in C. Change the documentation accordingly.
PR c/119173
gcc/ChangeLog:
* doc/invoke.texi (Warning Options): Move to general options.
|
|
correct position.
gcc/ChangeLog:
* doc/invoke.texi: Corrected the position of '-mtls-dialect=opt'
option.
|
|
As per the AArch64 ISA FEAT_SME does not require FEAT_SVE2. However, we don't
support SME without SVE2 and bail out with a 'sorry' if this configuration is
encountered. We may choose to support this in the future.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def (SME): Remove SVE2 as
prerequisite and add in FCMA and F16FML.
* config/aarch64/aarch64.cc (aarch64_override_options_internal):
Diagnose use of SME without SVE2 and implicitly enable SVE2 when
enabling SME after streaming mode diagnosis.
* doc/invoke.texi (sme): Document that this can only be used with the
sve2 extension.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/no-sve-with-sme-1.c: New.
* gcc.target/aarch64/no-sve-with-sme-2.c: New.
* gcc.target/aarch64/no-sve-with-sme-3.c: New.
* gcc.target/aarch64/no-sve-with-sme-4.c: New.
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Pass +sve2 to existing
+sme pragma.
* gcc.target/aarch64/sve/acle/general-c/binary_int_opt_single_n_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_opt_single_n_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_single_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_int_opt_single_1.c:
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_3.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_lane_4.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_opt_single_3.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_slice_uint_opt_single_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/binaryxn_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/clamp_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/compare_scalar_count_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_int_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_lane_2.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/dot_za_slice_uint_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/shift_right_imm_narrowxn_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/storexn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/ternary_qq_or_011_lane_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c:
Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_za_slice_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unaryxn_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_slice_1.c: Likewise.
|
|
I noticed that the "New/Delete Builtins" section failed to explicitly
name or describe the arguments of the builtin functions it purported
to document, outside of using them in an example. I've fixed that
and cleaned up the whole section.
gcc/ChangeLog
* doc/extend.texi (New/Delete Builtins): Cleanup up the text and
explicitly list the builtins being documented.
|
|
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Numeric Builtins): Move Integer Overflow Builtins
section here, as a subsection.
|
|
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
This installment adds a container section to hold documentation for
both the _atomic and _sync builtins, reordering them so that the new
_atomic interface is presented before the legacy _sync one. I also
incorporated material from the separate x86 transactional memory
section directly into the __atomic builtins documentation instead of
retaining that as a parallel section.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Atomic Memory Access): New section.
(__sync Builtins): Make it a subsection of the above.
(Atomic Memory Access): Likewise.
(x86 specific memory model extensions for transactional memory):
Delete this section, incorporating the text into the discussion
of __atomic builtins.
|
|
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
The "Other Builtins" section had become a catch-all for all sorts of
things with very little organization or attempt to differentiate between
important information (e.g., GCC treats a gazillion library functions as
builtins by default) from obscure builtins provided primarily as internal
interfaces. I've split it up into various pieces and attempted to move
the more important or useful-to-users documentation earlier in the chapter.
What's left of the section is still a jumbled mess... but at least it's a
smaller jumbled mess.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Built-in Functions): Incorporate some text
formerly in "Other Builtins" into the introduction. Adjust
menu for new sections.
(Library Builtins): New section, split from "Other Builtins".
(Numeric Builtins): Likewise.
(Stack Allocation): Likewise.
(Constructing Calls): Move __builtin_call_with_static_chain here.
(Object Size Checking): Minor copy-editing.
(Other Builtins): Move text to new sections listed above. Delete
duplicate docs for object-size checking builtins.
* doc/invoke.texi (C dialect options): Update @xref for -fno-builtin.
|
|
This is part of an incremental effort to make the documentation for
GCC extensions better organized by grouping/rearranging sections by
topic.
I was originally intending to consolidate all the sections documenting
builtins as subsections of a new container section within the C
extensions chapter, but I ran into a technical limitation of Texinfo:
it only supports sectioning depth up to @subsubsection, and we already
had quite a few of those in the target-specific builtins sections. So
instead I have pulled all the existing sections out into a new
chapter. This actually makes sense since some of the builtins are
specific to C++ anyway and are not C language extensions at all.
Subsequent patches in this series will move things around within the
new chapter; this one just adds the new container node and adjusts
the menus.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (C Extensions): Move menu items for
builtin-related sections to...
(Built-in Functions): New chapter.
* doc/gcc.texi (Introduction): Add menu entry for new chapter.
|
|
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
Note that this patch does not address the restructuring/rewrite
suggested by PR88472 or PR102397, beyond adding a very short
introduction to the new container section that is more explicit about
both syntaxes being accepted as a GNU extension.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Attributes): New section.
(Function Attributes): Make it a subsection of the new section.
(Variable Attributes): Likewise.
(Type Attributes): Likewise.
(Label Attributes): Likewise.
(Enumerator Attributes): Likewise.
(Attribute Syntax): Likewise.
|
|
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
Following the last round of patches, there's a leftover section
"Target Format Checks" that didn't fit into any category. It seems best
to merge this material into the main discussion of the "format" attribute,
in particular because that discussion already contains similar discussion
for mingw/Windows targets.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Function Attributes): Merge text from "Target
Format Checks" into the main discussion of the format and
format_arg attributes.
(Target Format Checks): Delete section.
|
|
Similarly to data races with 8-bit byte or 16-bit word quantity memory
writes on non-BWX Alpha implementations we have the same problem even on
BWX implementations with partial memory writes produced for unaligned
stores as well as block memory move and clear operations. This happens
at the boundaries of the area written where we produce unprotected RMW
sequences, such as for example:
ldbu $1,0($3)
stw $31,8($3)
stq $1,0($3)
to zero a 9-byte member at the byte offset of 1 of a quadword-aligned
struct, happily clobbering a 1-byte member at the beginning of said
struct if concurrent write happens while executing on the same CPU such
as in a signal handler or a parallel write happens while executing on
another CPU such as in another thread or via a shared memory segment.
To guard against these data races with partial memory write accesses
introduce the `-msafe-partial' command-line option that instructs the
compiler to protect boundaries of the data quantity accessed by instead
using a longer code sequence composed of narrower memory writes where
suitable machine instructions are available (i.e. with BWX targets) or
atomic RMW access sequences where byte and word memory access machine
instructions are not available (i.e. with non-BWX targets).
Owing to the desire of branch avoidance there are redundant overlapping
writes in unaligned cases where STQ_U operations are used in the middle
of a block so as to make sure no part of data to be written has been
lost regardless of run-time alignment. For the non-BWX case it means
that with blocks whose size is not a multiple of 8 there are additional
atomic RMW sequences issued towards the end of the block in addition to
the always required pair enclosing the block from each end.
Only one such additional atomic RMW sequence is actually required, but
code currently issues two for the sake of simplicity. An improvement
might be added to `alpha_expand_unaligned_store_words_safe_partial' in
the future, by folding `alpha_expand_unaligned_store_safe_partial' code
for handling multi-word blocks whose size is not a multiple of 8 (i.e.
with a trailing partial-word part). It would improve performance a bit,
but current code is correct regardless.
Update test cases with `-mno-safe-partial' where required and add new
ones accordingly.
In some cases GCC chooses to open-code block memory write operations, so
with non-BWX targets `-msafe-partial' will in the usual case have to be
used together with `-msafe-bwa'.
Credit to Magnus Lindholm <linmag7@gmail.com> for sharing hardware for
the purpose of verifying the BWX side of this change.
gcc/
PR target/117759
* config/alpha/alpha-protos.h
(alpha_expand_unaligned_store_safe_partial): New prototype.
* config/alpha/alpha.cc (alpha_expand_movmisalign)
(alpha_expand_block_move, alpha_expand_block_clear): Handle
TARGET_SAFE_PARTIAL.
(alpha_expand_unaligned_store_safe_partial)
(alpha_expand_unaligned_store_words_safe_partial)
(alpha_expand_clear_safe_partial_nobwx): New functions.
* config/alpha/alpha.md (insvmisaligndi): Handle
TARGET_SAFE_PARTIAL.
* config/alpha/alpha.opt (msafe-partial): New option.
* config/alpha/alpha.opt.urls: Regenerate.
* doc/invoke.texi (Option Summary, DEC Alpha Options): Document
the new option.
gcc/testsuite/
PR target/117759
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Add
`-mno-safe-partial'.
* gcc.target/alpha/memclr-a2-o1-c9-ptr-safe-partial.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial.c: New
file.
* gcc.target/alpha/memcpy-di-unaligned-dst-safe-partial-bwx.c:
New file.
* gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial.c: New
file.
* gcc.target/alpha/memcpy-si-unaligned-dst-safe-partial-bwx.c:
New file.
* gcc.target/alpha/stlx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stlx0-safe-partial.c: New file.
* gcc.target/alpha/stlx0-safe-partial-bwx.c: New file.
* gcc.target/alpha/stqx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stqx0-safe-partial.c: New file.
* gcc.target/alpha/stqx0-safe-partial-bwx.c: New file.
* gcc.target/alpha/stwx0.c: Add `-mno-safe-partial'.
* gcc.target/alpha/stwx0-bwx.c: Add `-mno-safe-partial'. Refer
to stwx0.c rather than copying its code and also verify no LDQ_U
or STQ_U instructions have been produced.
* gcc.target/alpha/stwx0-safe-partial.c: New file.
* gcc.target/alpha/stwx0-safe-partial-bwx.c: New file.
|
|
With non-BWX Alpha implementations we have a problem of data races where
a 8-bit byte or 16-bit word quantity is to be written to memory in that
in those cases we use an unprotected RMW access of a 32-bit longword or
64-bit quadword width. If contents of the longword or quadword accessed
outside the byte or word to be written are changed midway through by a
concurrent write executing on the same CPU such as by a signal handler
or a parallel write executing on another CPU such as by another thread
or via a shared memory segment, then the concluding write of the RMW
access will clobber them. This is especially important for the safety
of RCU algorithms, but is otherwise an issue anyway.
To guard against these data races with byte and aligned word quantities
introduce the `-msafe-bwa' command-line option (standing for Safe Byte &
Word Access) that instructs the compiler to instead use an atomic RMW
access sequence where byte and word memory access machine instructions
are not available. There is no change to code produced for BWX targets.
It would be sufficient for the secondary reload handle to use a pair of
scratch registers, as requested by `reload_out<mode>', but it would end
with poor code produced as one of the scratches would be occupied by
data retrieved and the other one would have to be reloaded with repeated
calculations, all within the LL/SC sequence.
Therefore I chose to add a dedicated `reload_out<mode>_safe_bwa' handler
and ask for more scratches there by defining a 256-bit OI integer mode.
While reload is documented in our manual to support an arbitrary number
of scratches in reality it hasn't been implemented for IRA:
/* ??? It would be useful to be able to handle only two, or more than
three, operands, but for now we can only handle the case of having
exactly three: output, input and one temp/scratch. */
and it seems to be the case for LRA as well. Do what everyone else does
then and just have one wide multi-register scratch.
I note that the atomic sequences emitted are suboptimal performance-wise
as the looping branch for the unsuccessful completion of the sequence
points backwards, which means it will be predicted as taken despite that
in most cases it will fall through. I do not see it as a deficiency of
this change proposed as it takes care of recording that the branch is
unlikely to be taken, by calling `alpha_emit_unlikely_jump'. Therefore
generic code elsewhere should instead be investigated and adjusted
accordingly for the arrangement to actually take effect.
Add test cases accordingly.
There are notable regressions between a plain `-mno-bwx' configuration
and a `-mno-bwx -msafe-bwa' one:
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O0 execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O1 execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O3 -g execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -Os execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.dg/torture/inline-mem-cpy-cmp-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: g++.dg/init/array25.C -std=c++17 execution test
FAIL: g++.dg/init/array25.C -std=c++98 execution test
FAIL: g++.dg/init/array25.C -std=c++26 execution test
They come from the fact that these test cases play tricks with alignment
and end up calling code that expects a reference to aligned data but is
handed one to unaligned data.
This doesn't cause a visible problem with plain `-mno-bwx' code, because
the resulting alignment exception is fixed up by Linux. There's no such
handling currently implemented for LDL_L or LDQ_L instructions (which
are first in the sequence) and consequently the offender is issued with
SIGBUS instead. Suitable handling will be added to Linux to complement
this change that will emulate the trapping instructions[1], so these
interim regressions are seen as harmless and expected.
References:
[1] "Alpha: Emulate unaligned LDx_L/STx_C for data consistency",
<https://lore.kernel.org/r/alpine.DEB.2.21.2502181912230.65342@angie.orcam.me.uk/>
gcc/
PR target/117759
* config/alpha/alpha-modes.def (OI): New integer mode.
* config/alpha/alpha-protos.h (alpha_expand_mov_safe_bwa): New
prototype.
* config/alpha/alpha.cc (alpha_expand_mov_safe_bwa): New
function.
(alpha_secondary_reload): Handle TARGET_SAFE_BWA.
* config/alpha/alpha.md (aligned_store_safe_bwa)
(unaligned_store<mode>_safe_bwa, reload_out<mode>_safe_bwa)
(reload_out<mode>_unaligned_safe_bwa): New expanders.
(mov<mode>, movcqi, reload_out<mode>_aligned): Handle
TARGET_SAFE_BWA.
(reload_out<mode>): Guard against TARGET_SAFE_BWA.
* config/alpha/alpha.opt (msafe-bwa): New option.
* config/alpha/alpha.opt.urls: Regenerate.
* doc/invoke.texi (Option Summary, DEC Alpha Options): Document
the new option.
gcc/testsuite/
PR target/117759
* gcc.target/alpha/stb.c: New file.
* gcc.target/alpha/stb-bwa.c: New file.
* gcc.target/alpha/stb-bwx.c: New file.
* gcc.target/alpha/stba.c: New file.
* gcc.target/alpha/stba-bwa.c: New file.
* gcc.target/alpha/stba-bwx.c: New file.
* gcc.target/alpha/stw.c: New file.
* gcc.target/alpha/stw-bwa.c: New file.
* gcc.target/alpha/stw-bwx.c: New file.
* gcc.target/alpha/stwa.c: New file.
* gcc.target/alpha/stwa-bwa.c: New file.
* gcc.target/alpha/stwa-bwx.c: New file.
|
|
gcc/ChangeLog:
* doc/invoke.texi: Modify the description of '-mld-seq-sa'.
|
|
This patch adds prime path coverage to gcc/gcov. First, a quick
introduction to path coverage, before I explain a bit on the pieces of
the patch.
PRIME PATHS
Path coverage is recording the paths taken through the program. Here is
a simple example:
if (cond1) BB 1
then1 () BB 2
else
else1 () BB 3
if (cond2) BB 4
then2 () BB 5
else
else2 () BB 6
_ BB 7
To cover all paths you must run {then1 then2}, {then1 else2}, {else1
then1}, {else1 else2}. This is in contrast with line/statement coverage
where it is sufficient to execute then2, and it does not matter if it
was reached through then1 or else1.
1 2 4 5 7
1 2 4 6 7
1 3 4 5 7
1 3 4 6 7
This gets more complicated with loops, because 0, 1, 2, ..., N
iterations are all different paths. There are different ways of
addressing this, a promising one being prime paths. A prime path is a
maximal simple path (a path with no repeated vertices) or simple cycle
(no repeated vertices except for the first/last) Prime paths strike a
decent balance between number of tests, path growth, and loop coverage,
requiring loops to be both taken and skipped. Of course, the number of
paths still grows very fast with program complexity - for example, this
program has 14 prime paths:
while (a)
{
if (b)
return;
while (c--)
a++;
}
--
ALGORITHM
Since the numbers of paths grows so fast, we need a good algorithm. The
naive approach of generating all paths and discarding redundancies (see
reference_prime_paths in the diff) simply doesn't complete for even
pretty simple functions with a few ten thousand paths (granted, the
implementation is also poor, but only serves as a reference). Fazli &
Afsharchi in their paper "Time and Space-Efficient Compositional Method
for Prime and Test Paths Generation" describe a neat algorithm which
drastically improves on for most programs, and brings complexity down to
something managable. This patch implements that algorithm with a few
minor tweaks.
The algorithm first finds the strongly connected components (SCC) of the
graph and creates a new graph where the vertices are the SCCs of the
CFG. Within these vertices different paths are found - regular prime
paths, paths that start in the SCCs entries, and paths that end in the
SCCs exits. These per-SCC paths are combined with paths through the CFG
which greatly reduces of paths needed to be evaluated just to be thrown
away.
Using this algorithm we can find the prime paths for somewhat
complicated functions in a reasonable time. Please note that some
programs don't benefit from this at all. We need to find the prime paths
within a SCC, so if a single SCC is very large the function degenerates
to the naive implementation. This can probably be much improved on, but
is an exercise for later.
--
OVERALL ARCHITECTURE
Like the other coverages in gcc, this operates on the CFG in the
profiling phase, just after branch and condition coverage, in phases:
1. All prime paths are generated, counted, and enumerated from the CFG
2. The paths are evaluted and counter instructions and accumulators are
emitted
3. gcov reads the CFG and computes the prime paths (same as step 1)
4. gcov prints a report
Simply writing out all the paths in the .gcno file is not really viable,
the files would be too big. Additionally, there are limits to the
practicality of measuring (and reporting) on millions of paths, so for
most programs where coverage is feasible, computing paths should be
plenty fast. As a result, path coverage really only adds 1 bit to the
counter, rounded up to nearest 64 ("bucket"), so 64 paths takes up 8
bytes, 65 paths take up 16 bytes.
Recording paths is really just massaging large bitsets. Per function,
ceil(paths/64 or 32) buckets (gcov_type) are allocated. Paths are
sorted, so the first path maps to the lowest bit, the second path to the
second lowest bit, and so on. On taking an edge and entering a basic
block, a few bitmasks are applied to unset the bits corresponding to the
paths outside the block and set the bits of the paths that start in that
block. Finally, the right buckets are masked and written to the global
accumulators for the paths that end in the block. Full coverage is
achieved when all bits are set.
gcc does not really inform gcov of abnormal paths, so paths with
abnormal edges are ignored. This probably possible, but requires some
changes to the graph gcc writes to the .gcno file.
--
IMPLEMENTATION
In order to remove non-prime paths (subpaths) we use a suffix tree.
Fazli & Afsharchi do not discuss how duplicates or subpaths are removed,
and using the suffix works really well -- insertion time is a function
of the length of the (longest) paths, not the number of paths. The paths
are usually quite short, but there are many of them. The same
prime_paths function is used both in gcc and in gcov.
As for speed, I would say that it is acceptable. Path coverage is a
problem that is exponential in its very nature, so if you enable this
feature you can reasonably expect it to take a while. To combat the
effects of path explosion there is a limit at which point gcc will give
up and refuse to instrument a function, set with -fpath-coverage-limit.
Since estimating the number of prime paths is pretty much is counting
them, gcc maintains a pessimistic running count which slightly
overestimates the number of paths found so far. When that count exceeds
the limit, the function aborts and gcc prints a warning. This is a
blunt instrument meant to not get stuck on the occasional large
function, not fine-grained control over when to skip instrumentation.
My main benchmark has been tree-2.1.3/tree.c which generates approx 2M
paths across the 20 functions or so in it. Most functions have less than
1500 paths, and 2 around a million each. Finding the paths takes
3.5-4s, but the instrumentation phase takes approx. 2.5 minutes and
generates a 32M binary. Not bad for a 1429 line source file.
There are some selftests which deconstruct the algorithm, so it can be
easily referenced with the Fazli & Afsharchi. I hope that including them
both help to catch regression, clarify the assumptions, and help
understanding the algorithm by breaking up the phases.
DEMO
This is the denser line-aware (grep-friendlier) output. Every missing
path is summarized as the lines you need to run in what order, annotated
with the true/false/throw decision.
$ gcc -fpath-coverage --coverage bs.c -c -o bs
$ gcov -e --prime-paths-lines bs.o
bs.gcda:cannot open data file, assuming not executed
-: 0:Source:bs.c
-: 0:Graph:bs.gcno
-: 0:Data:-
-: 0:Runs:0
paths covered 0 of 17
path 0 not covered: lines 6 6(true) 11(true) 12
path 1 not covered: lines 6 6(true) 11(false) 13(true) 14
path 2 not covered: lines 6 6(true) 11(false) 13(false) 16
path 3 not covered: lines 6 6(false) 18
path 4 not covered: lines 11(true) 12 6(true) 11
path 5 not covered: lines 11(true) 12 6(false) 18
path 6 not covered: lines 11(false) 13(true) 14 6(true) 11
path 7 not covered: lines 11(false) 13(true) 14 6(false) 18
path 8 not covered: lines 12 6(true) 11(true) 12
path 9 not covered: lines 12 6(true) 11(false) 13(true) 14
path 10 not covered: lines 12 6(true) 11(false) 13(false) 16
path 11 not covered: lines 13(true) 14 6(true) 11(true) 12
path 12 not covered: lines 13(true) 14 6(true) 11(false) 13
path 13 not covered: lines 14 6(true) 11(false) 13(true) 14
path 14 not covered: lines 14 6(true) 11(false) 13(false) 16
path 15 not covered: lines 6(true) 11(true) 12 6
path 16 not covered: lines 6(true) 11(false) 13(true) 14 6
#####: 1:int binary_search(int a[], int len, int from, int to, int key)
-: 2:{
#####: 3: int low = from;
#####: 4: int high = to - 1;
-: 5:
#####: 6: while (low <= high)
-: 7: {
#####: 8: int mid = (low + high) >> 1;
#####: 9: long midVal = a[mid];
-: 10:
#####: 11: if (midVal < key)
#####: 12: low = mid + 1;
#####: 13: else if (midVal > key)
#####: 14: high = mid - 1;
-: 15: else
#####: 16: return mid; // key found
-: 17: }
#####: 18: return -1;
-: 19:}
Then there's the human-oriented source mode. Because it is so verbose I
have limited the demo to 2 paths. In this mode gcov will print the
sequence of *lines* through the program and in what order to cover the
path, including what basic block the line is a part of. Like its denser
sibling, this also prints the true/false/throw decision, if there is
one.
$ gcov -t --prime-paths-source bs.o
bs.gcda:cannot open data file, assuming not executed
-: 0:Source:bs.c
-: 0:Graph:bs.gcno
-: 0:Data:-
-: 0:Runs:0
paths covered 0 of 17
path 0:
BB 2: 1:int binary_search(int a[], int len, int from, int to, int key)
BB 2: 3: int low = from;
BB 2: 4: int high = to - 1;
BB 2: 6: while (low <= high)
BB 8: (true) 6: while (low <= high)
BB 3: 8: int mid = (low + high) >> 1;
BB 3: 9: long midVal = a[mid];
BB 3: (true) 11: if (midVal < key)
BB 4: 12: low = mid + 1;
path 1:
BB 2: 1:int binary_search(int a[], int len, int from, int to, int key)
BB 2: 3: int low = from;
BB 2: 4: int high = to - 1;
BB 2: 6: while (low <= high)
BB 8: (true) 6: while (low <= high)
BB 3: 8: int mid = (low + high) >> 1;
BB 3: 9: long midVal = a[mid];
BB 3: (false) 11: if (midVal < key)
BB 5: (true) 13: else if (midVal > key)
BB 6: 14: high = mid - 1;
The listing is also aware of inlining:
hello.c:
#include <stdio.h>
#include "hello.h"
int notmain(const char *entity)
{
return hello (entity);
}
#include <stdio.h>
inline __attribute__((always_inline))
int hello (const char *s)
{
if (s)
printf ("hello, %s!\n", s);
else
printf ("hello, world!\n");
return 0;
}
$ gcov -t --prime-paths-source hello
paths covered 0 of 2
path 0:
BB 2: (true) 4:int notmain(const char *entity)
== inlined from hello.h ==
BB 2: (true) 6: if (s)
BB 3: 7: printf ("hello, %s!\n", s);
BB 5: 10: return 0;
-------------------------
BB 7: 6: return hello (entity);
BB 8: 6: return hello (entity);
path 1:
BB 2: (false) 4:int notmain(const char *entity)
== inlined from hello.h ==
BB 2: (false) 6: if (s)
BB 4: 9: printf ("hello, world!\n");
BB 5: 10: return 0;
-------------------------
BB 7: 6: return hello (entity);
BB 8: 6: return hello (entity);
--prime-paths-{lines,source} take an optional argument type, which can
be 'covered', 'uncovered', or 'both', which defaults to 'uncovered'.
The flag controls if the covered or uncovered paths are printed, and
while uncovered is generally the most useful one, it is sometimes nice
to be able to see only the covered paths.
And finally, JSON (abbreviated). It is quite sparse and very nested, but
is mostly a JSON version of the source listing. It has to be this nested
in order to consistently capture multiple locations. It is always
includes the file name per location for consistency, even though this is
very much redundant in almost all cases. This format is in no way set in
stone, and without targeting it with other tooling I am not sure if it
does the job well.
"gcc_version": "15.0.0 20240704 (experimental)",
"current_working_directory": "dir",
"data_file": "hello.o",
"files": [
{
"file": "hello.c",
"functions": [
{
"name": "notmain",
"demangled_name": "notmain",
"start_line": 4,
"start_column": 5,
"end_line": 7,
"end_column": 1,
"blocks": 7,
"blocks_executed": 0,
"execution_count": 0,
"total_prime_paths": 2,
"covered_prime_paths": 0,
"prime_path_coverage": [
{
"id": 0,
"sequence": [
{
"block_id": 2,
"locations": [
{
"file": "hello.c",
"line_numbers": [
4
]
},
{
"file": "hello.h",
"line_numbers": [
6
]
}
],
"edge_kind": "fallthru"
},
...
gcc/ChangeLog:
* Makefile.in (OBJS): Add prime-paths.o, path-coverage.o.
(GTFILES): Add prime-paths.cc, path-coverage.cc
(GCOV_OBJS): Add graphds.o, prime-paths.o, bitmap.o
* builtins.cc (expand_builtin_fork_or_exec): Check
path_coverage_flag.
* collect2.cc (main): Add -fno-path-coverage to OBSTACK.
* common.opt: Add new options -fpath-coverage,
-fpath-coverage-limit, -Wcoverage-too-many-paths
* doc/gcov.texi: Add --prime-paths, --prime-paths-lines,
--prime-paths-source documentation.
* doc/invoke.texi: Add -fpath-coverage, -fpath-coverage-limit,
-Wcoverage-too-many-paths documentation.
* gcc.cc: Link gcov on -fpath-coverage.
* gcov-counter.def (GCOV_COUNTER_PATHS): New.
* gcov-io.h (GCOV_TAG_PATHS): New.
(GCOV_TAG_PATHS_LENGTH): New.
(GCOV_TAG_PATHS_NUM): New.
* gcov.cc (class path_info): New.
(struct coverage_info): Add paths, paths_covered.
(find_prime_paths): New.
(add_path_counts): New.
(find_arc): New.
(print_usage): Add -e, --prime-paths, --prime-paths-lines,
--prime-paths-source.
(process_args): Likewise.
(json_set_prime_path_coverage): New.
(output_json_intermediate_file): Call
json_set_prime_path_coverage.
(process_all_functions): Call find_prime_paths.
(generate_results): Call add_path_counts.
(read_graph_file): Read path counters.
(read_count_file): Likewise.
(function_summary): Print path counts.
(file_summary): Likewise.
(print_source_line): New.
(print_prime_path_lines): New.
(print_inlined_separator): New.
(print_prime_path_source): New.
(output_path_coverage): New.
(output_lines): Print path coverage.
* ipa-inline.cc (can_early_inline_edge_p): Check
path_coverage_flag.
* passes.cc (finish_optimization_passes): Likewise.
* profile.cc (branch_prob): Likewise.
* selftest-run-tests.cc (selftest::run_tests): Run path coverage
tests.
* selftest.h (path_coverage_cc_tests): New declaration.
* tree-profile.cc (tree_profiling): Check path_coverage_flag.
(pass_ipa_tree_profile::gate): Likewise.
* path-coverage.cc: New file.
* prime-paths.cc: New file.
gcc/testsuite/ChangeLog:
* lib/gcov.exp: Add prime paths test function.
* g++.dg/gcov/gcov-22.C: New test.
* g++.dg/gcov/gcov-23-1.h: New test.
* g++.dg/gcov/gcov-23-2.h: New test.
* g++.dg/gcov/gcov-23.C: New test.
* gcc.misc-tests/gcov-29.c: New test.
* gcc.misc-tests/gcov-30.c: New test.
* gcc.misc-tests/gcov-31.c: New test.
* gcc.misc-tests/gcov-32.c: New test.
* gcc.misc-tests/gcov-33.c: New test.
* gcc.misc-tests/gcov-34.c: New test.
|
|
Suggest a Newlib with a fix for the SIMD math issue. Newlib commit:
https://sourceware.org/git/?p=newlib-cygwin.git;a=commitdiff;h=2ef1a37e7
Additionally, for generic support in ROCm, it is expected that 6.4 will
added the support; the current version is 6.3.3 and it does not support it;
bump >6.3.2 to >6.3.3 in install.texi to avoid doubts.
gcc/ChangeLog:
PR middle-end/119325
* doc/install.texi (gcn): Change ROCm > 6.3.2 to >6.3.3 for generic
support; mention Newlib commit that fixes a SIMD math issue.
|
|
gcc/
* config/nvptx/nvptx.cc (default_ptx_version_option): Default at
least to '-mptx=6.3'.
* doc/invoke.texi (Nvidia PTX Options): Update '-mptx=[...]'.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_30.c: Adjust.
* gcc.target/nvptx/march-map=sm_32.c: Likewise.
* gcc.target/nvptx/march-map=sm_35.c: Likewise.
* gcc.target/nvptx/march-map=sm_37.c: Likewise.
* gcc.target/nvptx/march-map=sm_50.c: Likewise.
* gcc.target/nvptx/march=sm_30.c: Likewise.
* gcc.target/nvptx/march=sm_35.c: Likewise.
* gcc.target/nvptx/march=sm_37.c: Likewise.
|
|
-mavx10.1 back with 512 bit alias
When AVX10.1 options are added into GCC 14, E-core is supposed to
support up to 256 bit vector width, while P-core up to 512 bit vector
width. Therefore, we added avx10.1-256 and avx10.1-512 options into
compiler since there will be real platforms with 256 bit only support.
At the same time, for old platforms could also compile a 256 bit only
binary, we introduced -mno-evex512 to disable 512 bit vector.
However, all the future platforms will now support 512 bit vector width,
including P-core and E-core. It will result in no need for split the
option for vector width. Therefore, we will remove them in this patch.
Unlike AVX10.2 options, AVX10.1 options has been there in a major
release, so we have to raise a deprecate warning in GCC 15 and remove
them in GCC 16. At the same time, to align with avx10.2 options, we will
add just removed avx10.1 option back with warning to mention its
behavior change.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_available_features): Change to FEATURE_AVX10_1.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_1_512_SET): Renamed to ...
(OPTION_MASK_ISA2_AVX10_1_SET): ... this.
(OPTION_MASK_ISA2_AVX10_2_SET): Use renamed macro.
(OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto.
(ix86_handle_option): Ditto.
(processor_alias_table): Use P_PROC_AVX10_1.
* common/config/i386/i386-cpuinfo.h
(enum feature_priority): Rename from AVX10_1_512 to AVX10_1.
(enum processor_features): Ditto.
* common/config/i386/i386-isas.h: Add avx10.1.
* config/i386/driver-i386.cc
(host_detect_local_cpu): Use renamed enum.
* config/i386/i386-c.cc
(ix86_target_macros_internal): Rename to avx10.1.
* config/i386/i386-isa.def (AVX10_1_512): Rename to ...
(AVX10_1): ... this.
* config/i386/i386-options.cc (isa2_opts): Rename to avx10.1.
(ix86_valid_target_attribute_inner_p): Add avx10.1.
(ix86_option_override_internal): Rename to AVX10_1.
Revise warnings to mention behavior change for option
combination in GCC 16.
* config/i386/i386.h (PTA_DIAMONDRAPIDS): Use AVX10_1.
* config/i386/i386.opt: Add avx10.1.
Add deprecate warnings for mevex512 and mavx10.1-256/512.
* config/i386/i386.opt.urls: Add avx10.1.
* doc/extend.texi: Ditto.
* doc/sourcebuild.texi: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10-check.h: Change to avx10.1.
* gcc.target/i386/avx10_1-1.c: Add warning check.
* gcc.target/i386/avx10_1-10.c: Ditto.
* gcc.target/i386/avx10_1-11.c: Ditto.
* gcc.target/i386/avx10_1-12.c: Ditto.
* gcc.target/i386/avx10_1-13.c: Ditto.
* gcc.target/i386/avx10_1-15.c: Ditto.
* gcc.target/i386/avx10_1-16.c: Ditto.
* gcc.target/i386/avx10_1-18.c: Ditto.
* gcc.target/i386/avx10_1-19.c: Ditto.
* gcc.target/i386/avx10_1-2.c: Ditto.
* gcc.target/i386/avx10_1-20.c: Ditto.
* gcc.target/i386/avx10_1-21.c: Ditto.
* gcc.target/i386/avx10_1-22.c: Ditto.
* gcc.target/i386/avx10_1-23.c: Ditto.
* gcc.target/i386/avx10_1-26.c: Ditto.
* gcc.target/i386/avx10_1-3.c: Ditto.
* gcc.target/i386/avx10_1-4.c: Ditto.
* gcc.target/i386/avx10_1-7.c: Ditto.
* gcc.target/i386/avx10_1-8.c: Ditto.
* gcc.target/i386/avx10_1-9.c: Ditto.
* gcc.target/i386/noevex512-1.c: Ditto.
* gcc.target/i386/noevex512-2.c: Ditto.
* gcc.target/i386/pr111068.c: Ditto.
* gcc.target/i386/pr111907.c: Ditto.
* gcc.target/i386/pr117240_avx512f.c: Ditto.
* gcc.target/i386/pr117304-1.c: Ditto.
* gcc.target/i386/pr117946.c: Ditto.
* gcc.target/i386/avx10_1-24.c: Removed.
* gcc.target/i386/avx10_1-25.c: Removed.
* gcc.target/i386/avx10_1-5.c: Removed.
* gcc.target/i386/avx10_1-6.c: Removed.
|
|
When AVX10.2 options are added into GCC 15, E-core is supposed to
support up to 256 bit vector width, while P-core up to 512 bit vector
width. Therefore, we added avx10.2-256 and avx10.2-512 options into
compiler since there will be real platforms with 256 bit only support.
However, all the future platforms will now support 512 bit vector width,
including P-core and E-core. It will result in no need for split the
option for vector width. Therefore, we will remove them in this patch.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_available_features): Revise the logic AVX10 version.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_2_256_SET): Removed.
(OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto.
(OPTION_MASK_ISA2_AVX10_2_SET): New.
(OPTION_MASK_ISA2_AMX_AVX512_SET): Use AVX10.2 macro.
(OPTION_MASK_ISA2_AVX10_2_UNSET): Ditto.
(ix86_handle_option): Remove avx10.2-256 part. Adjust avx10.2.
* common/config/i386/i386-cpuinfo.h
(enum processor_features): Remove FEATURE_AVX10_2_256 and skip
the value for it. Change the name from FEATURE_AVX10_2_512 to
FEATURE_AVX10_2.
* common/config/i386/i386-isas.h: Remove avx10.2-256/512.
* config/i386/avx10_2-512bf16intrin.h: Use avx10.2 instead of
avx10.2-256/512.
* config/i386/avx10_2-512convertintrin.h: Ditto.
* config/i386/avx10_2-512mediaintrin.h: Ditto.
* config/i386/avx10_2-512minmaxintrin.h: Ditto.
* config/i386/avx10_2-512satcvtintrin.h: Ditto.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/avx10_2convertintrin.h: Ditto.
* config/i386/avx10_2mediaintrin.h: Ditto.
* config/i386/avx10_2minmaxintrin.h: Ditto.
* config/i386/avx10_2satcvtintrin.h: Ditto.
* config/i386/movrsintrin.h: Ditto.
* config/i386/sm4intrin.h: Ditto.
* config/i386/cpuid.h (bit_AVX10_256): Removed.
(bit_AVX10_512): Ditto.
* config/i386/driver-i386.cc (host_detect_local_cpu): Adjust
Diamond Rapids and -march=native condition.
* config/i386/i386-builtin.def (BDESC): Use AVX10.2 macro
instead of AVX10.2-256/512.
* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
* config/i386/i386-expand.cc
(ix86_expand_branch): Use TARGET_AVX10_2 instead of specifying
vector size.
(ix86_prepare_fp_compare_args): Ditto.
(ix86_expand_fp_compare): Ditto.
(ix86_ssecom_setcc): Ditto.
(ix86_expand_sse_comi): Ditto.
(ix86_expand_sse_comi_round): Ditto.
(ix86_check_builtin_isa_match): Ditto.
* config/i386/i386.cc (ix86_fp_compare_code_to_integer): Ditto.
(ix86_get_mask_mode): Ditto.
* config/i386/i386.h (SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P): Ditto.
* config/i386/i386.md: Ditto.
* config/i386/mmx.md: Ditto.
* config/i386/sse.md: Ditto.
* config/i386/predicates.md: Ditto.
* config/i386/i386-isa.def (AVX10_2_256): Removed.
(AVX10_2_512): Removed.
(AVX10_2): New.
* config/i386/i386-options.cc
(isa2_opts): Remove avx10.2-256/512.
(ix86_valid_target_attribute_inner_p): Ditto.
(PTA_DIAMONDRAPIDS): Use PTA_AVX10_2.
* config/i386/i386.opt: Remove avx10.2-256/512.
* config/i386/i386.opt.urls: Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.
* doc/sourcebuild.texi: Ditto.
|
|
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Nonlocal Gotos): Group with other built-ins
sections.
(Constructing Calls): Likewise.
(Pragmas): Move earlier in the section, before the built-ins docs.
(Thread-Local): Likewise.
(OpenMP): Likewise.
(OpenACC): Likewise.
|
|
extend.texi [PR42270]
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Syntax Extensions): New section.
(Statement Exprs): Make it a subsection of the above.
(Local Labels): Likewise.
(Labels as Values): Likewise.
(Nested Functions): Likewise.
(Typeof): Likewise.
(Offsetof): Likewise.
(Alignment): Likewise.
(Incomplete Enums): Likewise.
(Variadic Macros): Likewise.
(Conditionals): Likewise.
(Case Ranges): Likewise.
(Mixed Labels and Declarations): Likewise.
(C++ Comments): Likewise.
(Escaped Newlines): Likewise.
(Hex Floats): Likewise.
(Binary constants): Likewise.
(Dollar Signs): Likewise.
(Character Escapes): Likewise.
(Alternate Keywords): Likewise.
(Function Names): Likewise.
(Semantic Extensions): New section.
(Function Prototypes): Make it a subsection of the above.
(Pointer Arith): Likewise.
(Variadic Pointer Args): Likewise.
(Pointers to Arrays): Likewise.
(Const and Volatile Functions): Likewise.
|
|
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Aggregate Types): New section.
(Variable Length): Make it a subsection of the above.
(Zero Length): Likewise.
(Empty Structures): Likewise.
(Flexible Array Members in Unions): Likewise.
(Flexible Array Members alone in Structures): Likewise.
(Unnamed Fields): Likewise.
(Cast to Union): Likewise.
(Subscripting): Likewise.
(Initializers): Likewise.
(Compound Literals): Likewise.
(Designated Inits): Likewise.
|
|
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Additional Numeric Types): New section.
(__int128): Make it a subsection of the above.
(Long Long): Likewise.
(Complex): Likewise.
(Floating Types): Likewise.
(Half-Precision): Likewise.
(Decimal Float): Likewise.
(Fixed-Point): Likewise.
|
|
gcc/
* config/avr/avr-mcus.def: Add AVR32SD20, AVR32SD28, AVR32SD32,
AVR64SD28, AVR64SD32, AVR64SD48.
* doc/avr-mmcu.texi: Rebuild.
|
|
gcc/
* doc/invoke.texi (AVR Optimization Options)
<-maccumulate-args>: Refer to -fdefer-pop.
<-muse-nonzero-bits>: Re-formulate what the option does.
|
|
There are occasions where knowledge about nonzero bits makes some
optimizations possible. For example,
Rd |= Rn << Off
can be implemented as
SBRC Rn, 0
ORI Rd, 1 << Off
when Rn in { 0, 1 }, i.e. nonzero_bits (Rn) == 1. This patch adds some
patterns that exploit nonzero_bits() in some combiner patterns.
As insn conditions are not supposed to contain nonzero_bits(), the patch
splits such insns right after pass insn combine.
PR target/119421
gcc/
* config/avr/avr.opt (-muse-nonzero-bits): New option.
* config/avr/avr-protos.h (avr_nonzero_bits_lsr_operands_p): New.
(make_avr_pass_split_nzb): New.
* config/avr/avr.cc (avr_nonzero_bits_lsr_operands_p): New function.
(avr_rtx_costs_1): Return costs for the new insns.
* config/avr/avr.md (nzb): New insn attribute.
(*nzb=1.<code>...): New insns to better support some bit
operations for <code> in AND, IOR, XOR.
* config/avr/avr-passes.def (avr_pass_split_nzb): Insert pass
atfer combine.
* config/avr/avr-passes.cc (avr_pass_data_split_nzb). New pass data.
(avr_pass_split_nzb): New pass.
(make_avr_pass_split_nzb): New function.
* common/config/avr/avr-common.cc (avr_option_optimization_table):
Enable -muse-nonzero-bits for -O2 and higher.
* doc/invoke.texi (AVR Options): Document -muse-nonzero-bits.
gcc/testsuite/
* gcc.target/avr/torture/pr119421-sreg.c: New test.
|
|
This adds support for the NVIDIA Olympus core to the AArch64 backend. The
initial patch does not add any special tuning decisions, and those may come
later.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (olympus): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.
Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com>
|
|
The intention of -m31 -mesa and -m31 -mzarch was that they are (ABI)
compatible which is almost true except as it turns out they are not for
attribute mode(word).
After doing some archaeology and digging out an over 18 year old thread
[1,2] which is about this very attribute, I come to the conclusion to
revert this patch. The intention by deprecating and eventually removing
ESA/390 support was to prepare for a future removal of -m31; though in
smaller steps. Thus, instead of introducing some potential hick ups
along the route, I will revert this patch and will revisit this topic
when time for -m31 in its entirety has come---independent of
-mesa/-mzarch.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2006-September/200465.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2006-October/201154.html
This reverts commit 3b1bd1fdcd241dd1e5b706b6937400d74ca43146.
|
|
The ArmARM says:
"In an Armv9.4 implementation, if FEAT_SVE2 is implemented, FEAT_SVE2p1
is implemented."
We should enable +sve2p1 as part of -march=armv9.4-a, which this patch does.
This makes gcc consistent with gas.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* config/aarch64/aarch64-arches.def (...): Add SVE2p1.
* doc/invoke.texi (AArch64 Options): Document +sve2p1 in
-march=armv9.4-a.
|
|
Apparently some programs in the wild use
#if __has_attribute(musttail)
__attribute__((musttail)) return foo ();
#else
return foo ();
#endif
clang supports musttail both as a standard attribute ([[clang::musttail]]
which we also support for compatibility) and the above worked just
fine with GCC 14 which had __has_attribute(musttail) 0. Now that it is
0, this doesn't compile anymore.
So, either we need to ensure that __has_attribute(musttail) is 0
and just __has_c{,pp}_attribute({gnu,clang}::musttail) are non-zero,
or IMHO better we just make it work in the attribute form, especially for
C < C23 I can see why some projects would prefer that form.
While [[gnu::musttail]] is rejected as an error in C11 etc. before GCC 15,
rather than just handled as an unknown attribute.
I view this as both a regression and compatibility issue.
The patch handles it in similar spots to fallthrough/assume attributes
inside of __attribute__ for C, and for C++ enables mixing of standard [[]]
and GNU __attribute__(()) attributes at the start of statements in any order.
While working on it, I've noticed we weren't diagnosing arguments to the
clang::musttail attribute (fixed by the c-attribs.cc hunk) and newly
on the __attribute__ form attribute (in that case the arguments aren't just
skipped, they are always parsed and because we don't call decl_attributes
etc., it wouldn't be diagnosed without a manual check).
2025-03-18 Jakub Jelinek <jakub@redhat.com>
PR c/116545
gcc/
* doc/extend.texi (musttail statement attribute): Document
that musttail GNU attribute can be used as well.
gcc/c-family/
* c-attribs.cc (c_common_clang_attributes): Add musttail.
gcc/c/
* c-parser.cc (c_parser_declaration_or_fndef): Parse
__attribute__((musttail)) return.
(c_parser_handle_musttail): Diagnose attribute arguments.
(c_parser_statement_after_labels): Parse
__attribute__((musttail)) return.
gcc/cp/
* parser.cc (cp_parser_statement): Call cp_parser_attributes_opt
rather than cp_parser_std_attribute_spec_seq.
(cp_parser_jump_statement): Diagnose gnu::musttail attributes
with no arguments.
gcc/testsuite/
* c-c++-common/attr-fallthrough-2.c: Adjust expected diagnostics
for C++.
* c-c++-common/musttail15.c: New test.
* c-c++-common/musttail16.c: New test.
* c-c++-common/musttail17.c: New test.
* c-c++-common/musttail18.c: New test.
* c-c++-common/musttail19.c: New test.
* c-c++-common/musttail20.c: New test.
* c-c++-common/musttail21.c: New test.
* c-c++-common/musttail22.c: New test.
* c-c++-common/musttail23.c: New test.
* c-c++-common/musttail24.c: New test.
* g++.dg/musttail7.C: New test.
* g++.dg/musttail8.C: New test.
* g++.dg/musttail12.C: New test.
* g++.dg/musttail13.C: New test.
* g++.dg/musttail14.C: New test.
* g++.dg/ext/pr116545.C: New test.
|
|
The COBOL tests has many tests which just dump emit lots of output
to stdout and want to compare it against expected output.
We have the dg-output directive, but if one needs more than dozens
of lines in the output, adding hundreds of dg-output directives to
each source uses too much memory and is harder to maintain.
The following patch offers an alternative, dg-output-file
directive where one can supply a text file with expected output
(no regexp matching in that case, just exact output, except that it
handles different line ending styles (for the expected file
using tcl gets, for the actual output skips over \n, \r\n or \r).
And a newline at the end of the whole output is optional (in the actual
output, because I think some boards get it eaten).
Also tested with addition or subtraction of some characters from the
expected output files and saw FAILs with appropriate messages.
2025-03-18 Jakub Jelinek <jakub@redhat.com>
* doc/sourcebuild.texi (dg-output-file): Document.
* lib/gcc-dg.exp (${tool}-load): If output-file is set, compare
combined output against content of the [lindex ${output-file} 1]
file.
(dg-output-file): New directive.
* lib/dg-test-cleanup.exp (cleanup-after-saved-dg-test): Clear
output-file variable.
* gcc.dg/dg-output-file-1.c: New test.
* gcc.dg/dg-output-file-1-lp64.txt: New test.
* gcc.dg/dg-output-file-1-ilp32.txt: New test.
|
|
as the bug report details some uses of -fpatchable-function-entry
aren't happy with the "before" NOPs being inserted between global and
local entry point on powerpc. We want the before NOPs be in front
of the global entry point. That means that the patching NOPs aren't
consecutive for dual entry point functions, but for these usecases
that's not the problem. But let us support both under the control
of a new target option: -msplit-patch-nops.
gcc/
PR target/112980
* config/rs6000/rs6000.opt (msplit-patch-nops): New option.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document it.
* config/rs6000/rs6000.h (machine_function.stop_patch_area_print):
New member.
* config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
Emit split nops under control of that one.
* config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
Add handling of split patch nops.
|
|
This adds missing documentation for LTO flags.
gcc/ChangeLog:
* doc/invoke.texi: (Optimize Options):
Add incremental LTO flags.
|
|
The -ansi option has essentially been superseded by the more general
-std= option, and all the additional information about its effects is
already covered elsewhere in the manual. I also cleaned up some
confusing text about alternate keywords that I noticed while
confirming this.
gcc/ChangeLog
* doc/extend.texi (Alternate Keywords): Clean up text and remove
discussion of "restrict", which is not a GNU extension at all.
* doc/invoke.texi (C Dialect Options): Remove detailed discussion.
|
|
When configuring GCC with --program-suffix=-$(BASE_VERSION) to allow
installation multiple GCC versions in parallel, the executable of the
driver (gcc-$(BASE_VERSION)) gets recorded in the libgccjit.so.0
library. Assuming, that you only install the libgccjit.so.0 library
from the newest GCC, you have a libgccjit installed, which always calls
back to the newest installed version of GCC. I'm not saying that the
ABI is changing, but I'd like to see the libgccjit calling out to the
corresponding compiler, and therefore installing a libgccjit with a
soname that matches the GCC major version.
The downside is having to rebuild packages built against libgccjit with
each major GCC version, but looking at the reverse dependencies, at
least for package builds, only emacs is using libgccjit.
My plan to use this feature is to build a libgccjit0 using the default
GCC (e.g. gcc-14), and a libgccjit15, when building a newer GCC. When
changing the GCC default to 15, building a libgccjit0 from gcc-15, and a
libgccjit14 from gcc-14.
When configuring without --enable-versioned-jit, the behavior is unchanged.
2025-03-13 Matthias Klose <doko@ubuntu.com>
gcc/
* configure.ac: Add option --enable-versioned-jit.
* configure: Regenerate.
* Makefile.in: Move from jit/Make-lang.in, setting value from
configure.ac.
* doc/install.texi: Document option --enable-versioned-jit.
gcc/jit/
* Make-lang.in (LIBGCCJIT_VERSION_NUM): Move to ../Makefile.in.
|
|
gcc/ChangeLog:
* doc/extend.texi (Common Variable Attributes): Fix grammar in
final sentence of -ftrivial-auto-var-init description.
|
|
Deprecate support for the ESA/390 architecture which will be eventually
removed, and encourage the usage of the z/Architecture instead.
Furthermore, default for -m31 to -mzarch whereas previously we defaulted
to -mesa.
gcc/ChangeLog:
* config.gcc: Fail in case of option --with-mode=esa.
* config/s390/s390.cc (s390_option_override_internal): Default
to z/Architecture mode.
* config/s390/s390.h (DRIVER_SELF_SPECS): Ditto.
* config/s390/s390.opt: Emit a warning for option -mesa.
* doc/invoke.texi: Document the change.
gcc/testsuite/ChangeLog:
* gcc.target/s390/20020926-1.c: Deal with deprecation warning.
* gcc.target/s390/dwarfregtable-1.c: Ditto.
* gcc.target/s390/fp2int1.c: Ditto.
* gcc.target/s390/pr102222.c: Ditto.
* gcc.target/s390/pr106355-3.c: Ditto.
* gcc.target/s390/pr61078.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-10.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-12.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-14.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-18.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-2.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-20.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-22.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-24.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-26.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-28.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-30.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-32.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-4.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-6.c: Ditto.
* gcc.target/s390/target-attribute/tattr-m31-8.c: Ditto.
|
|
gcc/
* doc/contrib.texi: Update for gcobol.
* doc/frontends.texi: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Likewise.
* doc/sourcebuild.texi: Likewise.
* doc/standards.texi: Likewise.
|
|
gcc/ChangeLog
* doc/invoke.texi (Instrumentation Options): Fix typo introduced
in commit 313edeeeb607fe32da5633cfb6f91977add446f6.
|
|
While working on an adjacent documentation fix, I noticed that the
documentation for the gnu++11 "asm constexpr" feature was very
confusing, in some cases being attached to parts of the asm syntax
that are not otherwise required to be string literals, and missing from
other parts of the syntax that are. I've checked what the C++ parser
actually does and fixed the documentation to match, also improving it
to use correct markup and to be more explicit and less implementor-speaky.
gcc/cp/ChangeLog
* parser.cc (cp_parser_asm_definition): Make comment more explicit.
(cp_parser_asm_operand_list): Likewise. Also correct the comment
block at the top of the function to reflect reality.
gcc/ChangeLog
* doc/extend.texi (Basic Asm): Document that AssemblerInstructions
can be an asm constexpr.
(Extended Asm): Move the notes about asm constexprs for
AssemblerTemplate and Clobbers to the corresponding subsections.
Remove the notes for OutputOperands and InputOperands and reword
misleading descriptions of the list item syntax. Note that
constraint strings can be asm constexprs.
(Asm constexprs): Use "title case" for subsection name. Be
explicit about what parts of the asm syntax this applies to and
that the parentheses are required. Correct markup and terminology.
|
|
gcc/ChangeLog
PR c/67301
* doc/extend.texi (Extended Asm): Clarify that the square brackets
around the asmSymbolicName of operands are a required part of
the syntax.
|
|
[PR117178]
When initializing a nonstring char array when compiled with
-Wunterminated-string-initialization the warning trips even when
truncating the trailing NUL character from the string constant. Only
warn about this when running under -Wc++-compat since under C++ we should
not initialize nonstrings from C strings.
This patch separates the -Wunterminated-string-initialization and
-Wc++-compat warnings, they are now independent option, the former implied
by -Wextra, the latter not implied by anything. If -Wc++-compat is in effect,
it takes precedence over -Wunterminated-string-initialization and warns regardless
of nonstring attribute, otherwise if -Wunterminated-string-initialization is
enabled, it warns only if there isn't nonstring attribute.
In all cases, the warnings and also pedwarn_init for even larger sizes now
provide details on the lengths.
2025-03-07 Kees Cook <kees@kernel.org>
Jakub Jelinek <jakub@redhat.com>
PR c/117178
gcc/
* doc/invoke.texi (Wunterminated-string-initialization): Document
the new interaction between this warning and -Wc++-compat and that
initialization of decls with nonstring attribute aren't warned about.
gcc/c-family/
* c.opt (Wunterminated-string-initialization): Don't depend on
-Wc++-compat.
gcc/c/
* c-typeck.cc (digest_init): Add DECL argument. Adjust wording of
pedwarn_init for too long strings and provide details on the lengths,
for string literals where just the trailing NULL doesn't fit warn for
warn_cxx_compat with OPT_Wc___compat, wording which mentions "for C++"
and provides details on lengths, otherwise for
warn_unterminated_string_initialization adjust the warning, provide
details on lengths and don't warn if get_attr_nonstring_decl (decl).
(build_c_cast, store_init_value, output_init_element): Adjust
digest_init callers.
gcc/testsuite/
* gcc.dg/Wunterminated-string-initialization.c: Add additional test
coverage.
* gcc.dg/Wcxx-compat-14.c: Check in dg-warning for "for C++" part of
the diagnostics.
* gcc.dg/Wcxx-compat-23.c: New test.
* gcc.dg/Wcxx-compat-24.c: New test.
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
gcc/ChangeLog
PR sanitizer/56682
* doc/invoke.texi (Instrumentation Options): Document that -g
is useful with -fsanitize=thread and -fsanitize=address.
Also mention -fno-omit-frame-pointer per the asan wiki.
|
|
Following on from the discussion in:
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html
this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
replaces it with two hooks: one that controls the cost of using an
extra callee-saved register and one that controls the cost of allocating
a frame for the first spill.
(The patch does not attempt to address the shrink-wrapping part of
the thread above.)
On AArch64, this is enough to fix PR117477, as verified by the new tests.
The patch does not change the SPEC2017 scores significantly. (I saw a
slight improvement in fotonik3d and roms, but I'm not convinced that
the improvements are real.)
The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
which is a scan-dump correctness test that relies on not using
caller saves. The decision to use caller saves looks appropriate,
and saves an instruction, so I've just added -fno-caller-saves
to the test options.
The x86 parts were written by Honza. ix86_callee_save_cost is updated
by H.J. to replace gcc_checking_assert with returning 1 if mem_cost <= 2.
gcc/
PR rtl-optimization/117477
* config/aarch64/aarch64.cc (aarch64_count_saves): New function.
(aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
(aarch64_frame_allocation_cost): Likewise.
(TARGET_CALLEE_SAVE_COST): Define.
(TARGET_FRAME_ALLOCATION_COST): Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
Replace with...
(ix86_callee_save_cost): ...this new hook.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST): Define.
* target.h (spill_cost_type, frame_cost_type): New enums.
* target.def (callee_save_cost, frame_allocation_cost): New hooks.
(ira_callee_saved_register_cost_scale): Delete.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
(TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
* doc/tm.texi: Regenerate.
* hard-reg-set.h (hard_reg_set_popcount): New function.
* ira-color.cc (allocated_memory_p): New variable.
(allocated_callee_save_regs): Likewise.
(record_allocation): New function.
(assign_hard_reg): Use targetm.frame_allocation_cost to model
the cost of the first spill or first caller save. Use
targetm.callee_save_cost to model the cost of using new callee-saved
registers. Apply the exit rather than entry frequency to the cost
of restoring a register or deallocating the frame. Update the
new variables above.
(improve_allocation): Use record_allocation.
(color): Initialize allocated_callee_save_regs.
(ira_color): Initialize allocated_memory_p.
* targhooks.h (default_callee_save_cost): Declare.
(default_frame_allocation_cost): Likewise.
* targhooks.cc (default_callee_save_cost): New function.
(default_frame_allocation_cost): Likewise.
gcc/testsuite/
PR rtl-optimization/117477
* gcc.target/aarch64/callee_save_1.c: New test.
* gcc.target/aarch64/callee_save_2.c: Likewise.
* gcc.target/aarch64/callee_save_3.c: Likewise.
* gcc.target/aarch64/pr103350-1.c: Add -fno-caller-saves.
Co-authored-by: Jan Hubicka <hubicka@ucw.cz>
Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
|