Age | Commit message (Collapse) | Author | Files | Lines |
|
Now that the iwmmxt extensions have been removed, clean up the
references to it in the documentation. We keep the
-mcpu/-mtune/-march references as these are still accepted by the
driver.
gcc/ChangeLog:
* doc/extend.texi: Remove the iwmmxt intrinsics.
* doc/md.texi: Remove the iwmmxt-related constraints.
|
|
The iwmmxt ABI is a variant of the ABI that supported passing certain
parameters and results in iwmmxt registers. But since we no-longer
support the instructions that can read and write these registers, the
ABI variant can no-longer be used.
gcc/ChangeLog:
* config.gcc (arm, --with-abi): Remove iwmmxt abi option.
* config/arm/arm.opt (enum ARM_ABI_IWMMXT): Remove.
* config/arm/arm.h (TARGET_IWMMXT_ABI): Delete.
(enum arm_pcs): Remove ARM_PCS_AAPCS_IWMMXT.
(FUNCTION_ARG_REGNO_P): Remove IWMMXT ABI support.
(CUMULATIVE_ARGS): Remove iwmmxt_nregs.
* config/arm/arm.cc (arm_options_perform_arch_sanity_checks):
Remove IWMMXT ABI checks.
(arm_libcall_value_1): Likewise.
(arm_function_value_regno_p): Likewise.
(arm_apply_result_size): Remove adjustment for IWMMXT ABI.
(arm_function_arg): Remove IWMMXT ABI support.
(arm_arg_partial_bytes): Likewise.
(arm_function_arg_advance): Likewise.
(arm_init_cumulative_args): Don't initialize iwmmxt_nregs.
* doc/invoke.texi (arm -mabi): Remove mention of the iwmmxt
ABI option.
* config/arm/arm-opts.h (enum arm_abi_type): Remove ARM_ABI_IWMMXT.
|
|
Treat options that select iwmmxt variants as we would for xscale. We
leave the feature bits in for now, since they are still needed
elsewhere, but they are never enabled.
Also remove the remaining testsuite framework support for iwmmxt,
since this will never trigger now.
gcc/
* config/arm/arm-cpus.in (arch iwmmxt): treat in the same
way as we would treat XScale.
(arch iwmmxt2): Likewise.
(cpu xscale): Add aliases for iwmmxt and iwmmxt2.
(cpu iwmmxt): Delete.
(cpu iwmmxt2): Delete.
* config/arm/arm-generic.md (load_ldsched_xscale): Remove references
to iwmmxt.
(load_ldsched): Likewise.
* config/arm/arm-tables.opt: Regenerated.
* config/arm/arm-tune.md: Regenerated.
* doc/sourcebuild.texi (arm_iwmmxt_ok): Delete.
gcc/testsuite/ChangeLog:
* gcc.target/arm/ivopts.c: Remove test for iwmmxt
* lib/target-supports.exp
(check_effective_target_arm_iwmmxt_ok): Delete.
|
|
This patch introduces support for RISC-V Profiles RV20 and RV22 [1],
enabling developers to utilize these profiles through the -march option.
[1] https://github.com/riscv/riscv-profiles/releases/tag/v1.0
Version log:
Using lowercase letters to present Profiles.
Using '_' as divsor between Profiles and other RISC-V extension.
Add descriptions in invoke.texi.
Checking if there exist '_' between Profiles and additional extensions.
Using std::string to avoid memory problems.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (struct riscv_profiles): New struct.
(riscv_subset_list::parse_profiles): New parser.
(riscv_subset_list::parse_base_ext): Ditto.
* config/riscv/riscv-subset.h: New def.
* doc/invoke.texi: New option descriptions.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-49.c: New test.
* gcc.target/riscv/arch-50.c: New test.
* gcc.target/riscv/arch-51.c: New test.
* gcc.target/riscv/arch-52.c: New test.
|
|
[PR116792]
In r15-3752-g48261bd26df624 I added a test plugin that overrode the
regular output, instead emitting diagnostics in crude HTML form.
In r15-4760-g0b73e9382ab51c I added support for multiple kinds of
diagnostic output simultaneously, adding
-fdiagnostics-add-output=DIAGNOSTICS-OUTPUT-SPEC
-fdiagnostics-set-output=DIAGNOSTICS-OUTPUT-SPEC
for adding/changing the kind of diagnostics output, supporting
"text" and "sarif" output schemes.
This patch promotes the HTML output code from the test plugins so
that it is available from "-fdiagnostics-add-output=", using a
new "experimental-html" scheme, to allow simultaneous text, sarif
and html output, and to make it easier to experiment with. The
patch adds Python-based testing of the emitted HTML.
The patch does not affect the generated HTML, which is still crude, and
not yet ready for end-users. I hope to improve it in followups.
gcc/ChangeLog:
PR other/116792
* Makefile.in (OBJS-libcommon): Add diagnostic-format-html.o.
* diagnostic-format-html.cc: Move here from
testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc.
Simplify includes. Rename "xhtml" to "html" throughout.
(write_escaped_text): Drop.
(class xhtml_stream_output_format): Drop.
(class html_file_output_format): Reimplement using
diagnostic_output_file.
(diagnostic_output_format_init_xhtml): Drop.
(diagnostic_output_format_init_xhtml_stderr): Drop.
(diagnostic_output_format_init_xhtml_file): Drop.
(diagnostic_output_format_open_html_file): New.
(make_html_sink): New.
(xhtml_format_selftests): Convert to...
(diagnostic_format_html_cc_tests): ...this.
(plugin_is_GPL_compatible): Drop.
(plugin_init): Drop.
* diagnostic-format-html.h: New file.
* doc/invoke.texi (-fdiagnostics-add-output=): Add
"experimental-html" scheme.
* opts-diagnostic.cc: Include "diagnostic-format-html.h".
(class html_scheme_handler): New.
(output_factory::output_factory): Add html_scheme_handler.
(html_scheme_handler::make_sink): New.
* selftest-run-tests.cc (selftest::run_tests): Call the new
selftests.
* selftest.h (selftest::diagnostic_format_html_cc_tests): New
decl.
gcc/testsuite/ChangeLog:
PR other/116792
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc: Move to
gcc/diagnostic-format-html.cc.
* gcc.dg/html-output/html-output.exp: New support script.
* gcc.dg/html-output/missing-semicolon.c: New test.
* gcc.dg/html-output/missing-semicolon.py: New test script.
* gcc.dg/plugin/diagnostic-test-xhtml-1.c: Deleted test.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Drop moved plugin
and its deleted test.
* lib/gcc-dg.exp (load_lib): Add load_lib of scanhtml.exp.
* lib/htmltest.py: New support script.
* lib/scanhtml.exp: New support script, based on scansarif.exp.
libatomic/ChangeLog:
PR other/116792
* testsuite/lib/libatomic.exp: Add load_lib of scanhtml.exp.
libgomp/ChangeLog:
PR other/116792
* testsuite/lib/libgomp.exp: Add load_lib of scanhtml.exp.
libitm/ChangeLog:
PR other/116792
* testsuite/lib/libitm.exp: Add load_lib of scanhtml.exp.
libphobos/ChangeLog:
PR other/116792
* testsuite/lib/libphobos-dg.exp: Add load_lib of scanhtml.exp.
libvtv/ChangeLog:
PR other/116792
* testsuite/lib/libvtv-dg.exp: Add load_lib of scanhtml.exp.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
This patch support svadu and svade extension.
To enable GCC to recognize and process svadu and svade extension correctly at compile time.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_ext_version_table): New
extension.
(riscv_ext_flag_table) Ditto.
* config/riscv/riscv.opt: New mask.
* doc/invoke.texi (RISC-V Options): New extension
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-45.c: New test.
* gcc.target/riscv/arch-46.c: New test.
|
|
I have noticed that the option -fdump-ipa-clones is not documented
although there are users who depend on it. This patch adds the
missing documentation along with the description of the information it
dumps and the format it uses.
I am never quite sure which of the texinfo mark-ups is the most
appropriate in which situation, I'll of course incorporate any
feedback on this as well as the general wording of the text.
After we settle on a version, I'd like to backport the documentation
also at least to GCC 15, 14 and 13.
Is it perhaps OK for master and the branches or what would better be
changed?
Thanks,
Martin
gcc/ChangeLog:
2025-04-23 Martin Jambor <mjambor@suse.cz>
* doc/invoke.texi (Developer Options): Document -fdump-ipa-clones.
|
|
gcc/ChangeLog:
* diagnostic-format-sarif.cc (maybe_get_sarif_kind): Add cases for
new kinds of logical location.
* doc/libgdiagnostics/topics/logical-locations.rst: Add new kinds
of logical location for handling XML and JSON.
* libgdiagnostics.cc (impl_logical_location_manager::get_kind):
Add cases for new kinds of logical location.
(diagnostic_text_sink::text_starter): Likewise, introducing a
macro for this.
(diagnostic_manager_debug_dump_logical_location): Likewise.
* libgdiagnostics.h (enum diagnostic_logical_location_kind_t): Add
new kinds of logical location for handling XML and JSON.
* libsarifreplay.cc (handle_logical_location_object): Add entries
to "kind_values" for decoding sarif logical location kinds
relating to XML and JSON.
* logical-location.h (enum logical_location_kind): Add new kinds
of logical location for handling XML and JSON.
gcc/testsuite/ChangeLog:
* libgdiagnostics.dg/test-nested-logical-locations-json-c.py: New test.
* libgdiagnostics.dg/test-nested-logical-locations-json.c: New test.
* sarif-replay.dg/2.1.0-valid/3.33.7-json-example.sarif: New test.
* sarif-replay.dg/2.1.0-valid/3.33.7-xml-example.sarif: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
[LIBGDIAGNOSTICS_ABI_1]
For followup work I need to be able to get at data from a
diagnostic_logical_location after creating it, hence the
need to extend libgdiagnostics with accessor entrypoints.
This is the first extension to libgdiagnostics since the initial
release. The patch uses symbol versioning to add the new
entrypoints in the same way that libgccjit does.
gcc/ChangeLog:
* doc/libgdiagnostics/topics/compatibility.rst: New file, based
on gcc/jit/docs/topics/compatibility.rst.
* doc/libgdiagnostics/topics/index.rst: Add compatibility.rst.
* doc/libgdiagnostics/topics/logical-locations.rst (Accessors):
New section.
* libgdiagnostics++.h (logical_location::operator bool): New.
(logical_location::operator==): New.
(logical_location::operator!=): New.
(logical_location::get_kind): New.
(logical_location::get_parent): New.
(logical_location::get_short_name): New.
(logical_location::get_fully_qualified_name): New.
(logical_location::get_decorated_name): New.
* libgdiagnostics.cc
(diagnostic_logical_location::get_fully_qualified_name): New.
(diagnostic_logical_location_get_kind): New entrypoint.
(diagnostic_logical_location_get_parent): New entrypoint.
(diagnostic_logical_location_get_short_name): New entrypoint.
(diagnostic_logical_location_get_fully_qualified_name): New
entrypoint.
(diagnostic_logical_location_get_decorated_name): New entrypoint.
* libgdiagnostics.h
(LIBDIAGNOSTICS_HAVE_LOGICAL_LOCATION_ACCESSORS): New define.
(diagnostic_logical_location_get_kind): New entrypoint.
(diagnostic_logical_location_get_parent): New entrypoint.
(diagnostic_logical_location_get_short_name): New entrypoint.
(diagnostic_logical_location_get_fully_qualified_name): New
entrypoint.
(diagnostic_logical_location_get_decorated_name): New entrypoint.
* libgdiagnostics.map (LIBGDIAGNOSTICS_ABI_1): New.
gcc/testsuite/ChangeLog:
* libgdiagnostics.dg/test-logical-location.c: Include
<string.h>.
(main): Verify that the accessors work.
* libgdiagnostics.dg/test-logical-location.cc: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
inhibited warning [PR118163,PR118392]
Those 2 PRs show that even when using a *single* diagnostic_group, it's
possible to end up with a warning being inhibited and its associated
note still emitted, which leads to puzzling user experience. Example
from PR118392:
===
$ gcc/cc1plus inhibit-warn-3.C -w
inhibit-warn-3.C:10:17: note: only here as a ‘friend’
10 | friend void bar();
| ^~~
===
Following a suggestion from ppalka@, this patch keeps track of the
"diagnostic depth" at which a warning in inhibited, and makes sure that
all subsequent notes at that depth or deeper are inhibited as well,
until a subsequent warning or error is accepted at that depth, or the
diagnostic_group or nesting level is popped.
PR c++/118163
PR c++/118392
gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::initialize): Initialize
m_diagnostic_groups.m_inhibiting_notes_from.
(diagnostic_context::inhibit_notes_in_group): New.
(diagnostic_context::notes_inhibited_in_group): New
(diagnostic_context::report_diagnostic): Call
inhibit_notes_in_group and notes_inhibited_in_group.
(diagnostic_context::end_group): Call inhibit_notes_in_group.
(diagnostic_context::pop_nesting_level): Ditto.
* diagnostic.h (diagnostic_context::m_diagnostic_groups): Add
member to track the depth at which a warning has been inhibited.
(diagnostic_context::notes_inhibited_in_group): Declare.
(diagnostic_context::inhibit_notes_in_group): Declare.
* doc/ux.texi: Document diagnostic_group behavior with regards
to inhibited warnings.
gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/incomplete-type-2.C: New test.
* g++.dg/diagnostic/incomplete-type-2a.C: New test.
* g++.dg/diagnostic/inhibit-warn-3.C: New test.
|
|
For RV32 inline assembly, when handling 64-bit integer data, it is
often necessary to process the lower and upper 32 bits separately.
Unfortunately, we can only output the current register name
(lower 32 bits) but not the next register name (upper 32 bits).
To address this, the modifier 'H' has been added to allow users
to handle the upper 32 bits of the data. While I believe the
modifier 'N' (representing the next register name) might be more
suitable for this functionality, 'N' is already in use.
Therefore, 'H' (representing the high register) was chosen instead.
Co-Authored-By: Dimitar Dimitrov <dimitar@dinux.eu>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_print_operand): Add H.
* doc/extend.texi: Document for H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/modifier-H-error-1.c: New test.
* gcc.target/riscv/modifier-H-error-2.c: New test.
* gcc.target/riscv/modifier-H.c: New test.
|
|
C++20 made a class with only explicitly defaulted constructors no longer
aggregate, and this wrongly affected whether the class is considered "POD
for layout purposes" under the ABI.
Conveniently, we already have check_non_pod_aggregate to diagnose cases
where this makes a difference, due to PR103681 around a C++14 aggregate
change.
PR c++/120012
gcc/cp/ChangeLog:
* cp-tree.h (struct lang_type): Add non_aggregate_pod.
(CLASSTYPE_NON_AGGREGATE_POD): New.
* class.cc (check_bases_and_members): Set it.
(check_non_pod_aggregate): Diagnose it.
gcc/ChangeLog:
* doc/invoke.texi: Document C++20 aggregate fix.
* common.opt: Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/abi/base-defaulted1.C: New test.
* g++.dg/abi/base-defaulted1a.C: New test.
|
|
Looks like I've forgotten to update the docs for -fabi-version for a couple
of my changes.
gcc/ChangeLog:
* doc/invoke.texi: Add -fabi-version detail.
* common.opt: Likewise.
|
|
The SARIF 2.1.0 spec says that although a "SARIF log file SHALL contain
a serialization of the SARIF object model into the JSON format ... in the
future, other serializations might be defined." (§3.1)
I've been experimenting with alternative serializations of SARIF (CBOR
and JSON5 for now). To help with these experiments, this patch adds a
new param "serialization" to -fdiagnostics-add-output='s "sarif" scheme.
For now this must have value "json", but will be helpful for any
followup patches.
gcc/ChangeLog:
* diagnostic-format-sarif.cc
(sarif_serialization_format_json::write_to_file): New.
(sarif_builder::m_formatted): Replace field with...
(sarif_builder::m_serialization_format): ...this.
(sarif_builder::sarif_builder): Update for field change.
(sarif_builder::flush_to_file): Call m_serialization_format's
write_to_file vfunc.
(sarif_output_format::sarif_output_format): Replace param
"formatted" with "serialization_format".
(sarif_stream_output_format::sarif_output_format): Likewise.
(sarif_file_output_format::sarif_file_output_format): Likewise.
(diagnostic_output_format_init_sarif_stderr): Make a
sarif_serialization_format_json and pass it to
diagnostic_output_format_init_sarif.
(diagnostic_output_format_open_sarif_file): Split out into...
(diagnostic_output_file::try_to_open): ...this, adding
"serialization_kind" param.
(diagnostic_output_format_init_sarif_file): Update for new param
to diagnostic_output_format_open_sarif_file. Make a
sarif_serialization_format_json and pass it to
diagnostic_output_format_init_sarif.
(diagnostic_output_format_init_sarif_stream): Make a
sarif_serialization_format_json and pass it to
diagnostic_output_format_init_sarif.
(make_sarif_sink): Replace param "formatted" with "serialization".
(selftest::test_make_location_object): Update for changes to
sarif_builder ctor.
* diagnostic-format-sarif.h (enum class sarif_serialization): New.
(diagnostic_output_format_open_sarif_file): Add param
"serialization_kind".
(class sarif_serialization_format): New.
(class sarif_serialization_format_json): New.
(make_sarif_sink): Replace param "formatted" with
"serialization_format".
* diagnostic-output-file.h (diagnostic_output_file::try_to_open):
New decl.
* diagnostic.h (enum diagnostics_output_format): Tweak comments.
* doc/invoke.texi (-fdiagnostics-add-output): Add "serialization"
param to sarif scheme.
* libgdiagnostics.cc (sarif_sink::sarif_sink): Update for change
to make_sarif_sink.
* opts-diagnostic.cc (sarif_scheme_handler::make_sink): Add
"serialization" param and pass it on to make_sarif_sink.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
Since TARGET_PROMOTE_FUNCTION_RETURN is no longer used, remove its
reference from target.def.
PR target/119985
* target.def: Remove TARGET_PROMOTE_FUNCTION_RETURN reference.
* doc/tm.texi: Regenerated.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The following rewords the documentation for -Og which over-promises
the ability to debug the generated code. While -Og enables
var-tracking and thus improves debugging in some areas the experience
is usually worse than -O0 for standard C code.
PR debug/78685
* doc/invoke.texi (-Og): Reword.
|
|
This patch adds initial support for exception-handling to -fanalyzer,
handling eh_dispatch for regions of type ERT_TRY and
ERT_ALLOWED_EXCEPTIONS. I haven't managed yet seen eh_dispatch for
regions of type ERT_CLEANUP and ERT_MUST_NOT_THROW in the analyzer; with
this patch it will ICE if it sees those.
Additionally, this patch only checks for exact matches of exception
types, rather than supporting subclasses and references. I'm deferring
fixing this for now whilst figuring out how best to interact with the C++
type system; I'm tracking it as PR analyzer/119697.
The patch adds event classes for throwing and catching exceptions, and
seems to generate readable warnings for the kinds of leak that might
occur due to trying to manage resources manually and forgetting about
exceptions; for example:
exception-leak-1.C: In function ‘int test()’:
exception-leak-1.C:7:9: warning: leak of ‘ptr’ [CWE-401] [-Wanalyzer-malloc-leak]
7 | throw 42;
| ^~
‘int test()’: events 1-3
5 | void *ptr = __builtin_malloc (1024);
| ~~~~~~~~~~~~~~~~~^~~~~~
| |
| (1) allocated here
6 |
7 | throw 42;
| ~~
| |
| (2) throwing exception of type ‘int’ here...
| (3) ⚠️ ‘ptr’ leaks here; was allocated at (1)
Although dynamic exception specifications are only available in C++14
and earlier, the need to support them meant it seemed relatively easy to
add a warning to check them, hence the patch adds a new warning
for code paths that throw an exception that doesn't match a dynamic
exception specification: -Wanalyzer-throw-of-unexpected-type.
gcc/analyzer/ChangeLog:
PR analyzer/97111
* analyzer.cc (is_cxa_throw_p): New.
(is_cxa_rethrow_p): New.
* analyzer.opt (Wanalyzer-throw-of-unexpected-type): New.
* analyzer.opt.urls: Regenerate.
* call-info.cc (custom_edge_info::create_enode): New.
* call-info.h (call_info::print): Drop "final".
(call_info::add_events_to_path): Likewise.
* checker-event.cc (event_kind_to_string): Add cases for
event_kind::catch_, event_kind::throw_, and event_kind::unwind.
(explicit_throw_event::print_desc): New.
(throw_from_call_to_external_fn_event::print_desc): New.
(unwind_event::print_desc): New.
* checker-event.h (enum class event_kind): Add catch_, throw_,
and unwind.
(class catch_cfg_edge_event): New.
(class throw_event): New.
(class explicit_throw_event): New.
(class throw_from_call_to_external_fn_event): New.
(class unwind_event): New.
* common.h (class eh_dispatch_cfg_superedge): New forward decl.
(class eh_dispatch_try_cfg_superedge): New forward decl.
(class eh_dispatch_allowed_cfg_superedge): New forward decl.
(custom_edge_info::create_enode): New vfunc decl.
(is_cxa_throw_p): New decl.
(is_cxa_rethrow_p): New decl.
* diagnostic-manager.cc
(diagnostic_manager::add_events_for_superedge): Special-case edges
for eh_dispach_try.
(diagnostic_manager::prune_path): Call consolidate_unwind_events.
(diagnostic_manager::prune_for_sm_diagnostic): Don't filter the new
event_kinds.
(diagnostic_manager::consolidate_unwind_events): New.
* diagnostic-manager.h
(diagnostic_manager::consolidate_unwind_events): New decl.
* engine.cc (exploded_node::on_stmt_pre): Handle "__cxa_throw",
"__cxa_rethrow", and resx statements.
(class throw_custom_edge): New.
(class unwind_custom_edge): New.
(get_eh_outedge): New.
(exploded_graph::unwind_from_exception): New.
(exploded_node::on_throw): New.
(exploded_node::on_resx): New.
(exploded_graph::get_or_create_node): Add "add_to_worklist" param
and use it.
(exploded_graph::process_node): Use edge_info's create_enode vfunc
to create enodes, rather than calling get_or_create_node directly.
Ignore CFG edges in the sgraph flagged with EH whilst we're
exploring the egraph.
(exploded_graph_annotator::print_enode): Handle case
exploded_node::status::special.
* exploded-graph.h (exploded_node::status): Add value "special".
(exploded_node::on_throw): New decl.
(exploded_node::on_resx): New decl.
(exploded_graph::get_or_create_node): Add optional
"add_to_worklist" param.
(exploded_graph::unwind_from_exception): New decl.
* kf-lang-cp.cc (class kf_cxa_allocate_exception): New.
(class kf_cxa_begin_catch): New.
(class kf_cxa_end_catch): New.
(class throw_of_unexpected_type): New.
(class kf_cxa_call_unexpected): New.
(register_known_functions_lang_cp): Register known functions
"__cxa_allocate_exception", "__cxa_begin_catch",
"__cxa_end_catch", and "__cxa_call_unexpected".
* kf.cc (class kf_eh_pointer): New.
(register_known_functions): Register it for BUILT_IN_EH_POINTER.
* region-model.cc: Include "analyzer/function-set.h".
(exception_node::operator==): New.
(exception_node::dump_to_pp): New.
(exception_node::dump): New.
(exception_node::to_json): New.
(exception_node::make_dump_widget): New.
(exception_node::maybe_get_type): New.
(exception_node::add_to_reachable_regions): New.
(region_model::region_model): Initialize
m_thrown_exceptions_stack and m_caught_exceptions_stack.
(region_model::operator=): Likewise.
(region_model::operator==): Compare them.
(region_model::dump_to_pp): Dump exception stacks.
(region_model::to_json): Add exception stacks.
(region_model::make_dump_widget): Likewise.
(class exception_thrown_from_unrecognized_call): New.
(get_fns_assumed_not_to_throw): New.
(can_throw_p): New.
(region_model::check_for_throw_inside_call): New.
(region_model::on_call_pre): Call check_for_throw_inside_call
on unknown fns or those we don't have a body for.
(region_model::maybe_update_for_edge): Handle eh_dispatch_stmt
statements. Drop old code that called
apply_constraints_for_exception on EDGE_EH edges.
(class rejected_eh_dispatch): New.
(exception_matches_type_p): New.
(matches_any_exception_type_p): New.
(region_model::apply_constraints_for_eh_dispatch): New.
(region_model::apply_constraints_for_eh_dispatch_try): New.
(region_model::apply_constraints_for_eh_dispatch_allowed): New.
(region_model::apply_constraints_for_exception): Delete.
(region_model::can_merge_with_p): Don't merge models with
non-equal exception stacks.
(region_model::get_referenced_base_regions): Add regions from
exception stacks.
* region-model.h (struct exception_node): New.
(region_model::push_thrown_exception): New.
(region_model::get_current_thrown_exception): New.
(region_model::pop_thrown_exception): New.
(region_model::push_caught_exception): New.
(region_model::get_current_caught_exception): New.
(region_model::pop_caught_exception): New.
(region_model::apply_constraints_for_eh_dispatch_try): New decl.
(region_model::apply_constraints_for_eh_dispatch_allowed) New decl.
(region_model::apply_constraints_for_exception): Delete.
(region_model::apply_constraints_for_eh_dispatch): New decl.
(region_model::check_for_throw_inside_call): New decl.
(region_model::m_thrown_exceptions_stack): New field.
(region_model::m_caught_exceptions_stack): New field.
* supergraph.cc: Include "except.h" and "analyzer/region-model.h".
(supergraph::add_cfg_edge): Special-case eh_dispatch edges.
(superedge::get_description): Use default_tree_printer.
(get_catch): New.
(eh_dispatch_cfg_superedge::make): New.
(eh_dispatch_cfg_superedge::eh_dispatch_cfg_superedge): New.
(eh_dispatch_cfg_superedge::get_eh_status): New.
(eh_dispatch_try_cfg_superedge::dump_label_to_pp): New.
(eh_dispatch_try_cfg_superedge::apply_constraints): New.
(eh_dispatch_allowed_cfg_superedge::eh_dispatch_allowed_cfg_superedge):
New.
(eh_dispatch_allowed_cfg_superedge::dump_label_to_pp): New.
(eh_dispatch_allowed_cfg_superedge::apply_constraints): New.
* supergraph.h: Include "except.h".
(superedge::dyn_cast_eh_dispatch_cfg_superedge): New vfunc.
(superedge::dyn_cast_eh_dispatch_try_cfg_superedge): New vfunc.
(superedge::dyn_cast_eh_dispatch_allowed_cfg_superedge): New
vfunc.
(class eh_dispatch_cfg_superedge): New.
(is_a_helper <const eh_dispatch_cfg_superedge *>::test): New.
(class eh_dispatch_try_cfg_superedge): New.
(is_a_helper <const eh_dispatch_try_cfg_superedge *>::test): New.
(class eh_dispatch_allowed_cfg_superedge): New.
(is_a_helper <const eh_dispatch_allowed_cfg_superedge *>::test):
New.
* svalue.cc (svalue::maybe_get_type_from_typeinfo): New.
* svalue.h (svalue::maybe_get_type_from_typeinfo): New decl.
gcc/ChangeLog:
PR analyzer/97111
* doc/invoke.texi: Add -Wanalyzer-throw-of-unexpected-type.
* gimple.h (gimple_call_nothrow_p): Make arg const.
gcc/testsuite/ChangeLog:
PR analyzer/97111
* c-c++-common/analyzer/analyzer-verbosity-2a.c: Add
-fno-exceptions.
* c-c++-common/analyzer/analyzer-verbosity-3a.c: Likewise.
* c-c++-common/analyzer/attr-const-2.c: Add
__attribute__((nothrow)).
* c-c++-common/analyzer/attr-malloc-4.c: Likewise.
* c-c++-common/analyzer/attr-malloc-5.c: Likewise.
* c-c++-common/analyzer/attr-malloc-6.c: Add -fno-exceptions.
* c-c++-common/analyzer/attr-malloc-CVE-2019-19078-usb-leak.c:
Likewise.
* c-c++-common/analyzer/attr-malloc-exception.c: New test.
* c-c++-common/analyzer/call-summaries-pr107158-2.c: Add
-fno-exceptions.
* c-c++-common/analyzer/call-summaries-pr107158.c: Likewise.
* c-c++-common/analyzer/capacity-2.c: Likewise.
* c-c++-common/analyzer/coreutils-sum-pr108666.c: Likewise.
* c-c++-common/analyzer/data-model-22.c: Likewise.
* c-c++-common/analyzer/data-model-5d.c: Likewise.
* c-c++-common/analyzer/deref-before-check-pr108455-git-pack-revindex.c:
Likewise.
* c-c++-common/analyzer/deref-before-check-pr108475-haproxy-tcpcheck.c:
Likewise.
* c-c++-common/analyzer/edges-2.c: Likewise.
* c-c++-common/analyzer/fd-2.c: Likewise.
* c-c++-common/analyzer/fd-3.c: Likewise.
* c-c++-common/analyzer/fd-meaning.c: Likewise.
* c-c++-common/analyzer/file-1.c: Likewise.
* c-c++-common/analyzer/file-3.c: Likewise.
* c-c++-common/analyzer/file-meaning-1.c: Likewise.
* c-c++-common/analyzer/infinite-recursion.c: Likewise.
* c-c++-common/analyzer/leak-3.c: Likewise.
* c-c++-common/analyzer/malloc-dedupe-1.c: Likewise.
* c-c++-common/analyzer/malloc-in-loop.c: Likewise.
* c-c++-common/analyzer/malloc-many-paths-3.c: Likewise.
* c-c++-common/analyzer/malloc-paths-5.c: Likewise.
* c-c++-common/analyzer/malloc-paths-7.c: Likewise.
* c-c++-common/analyzer/malloc-paths-8.c: Likewise.
* c-c++-common/analyzer/malloc-vs-local-1a.c: Likewise.
* c-c++-common/analyzer/malloc-vs-local-2.c: Likewise.
* c-c++-common/analyzer/malloc-vs-local-3.c: Likewise.
* c-c++-common/analyzer/paths-7.c: Likewise.
* c-c++-common/analyzer/pr110830.c: Likewise.
* c-c++-common/analyzer/pr93032-mztools-simplified.c: Likewise.
* c-c++-common/analyzer/pr93355-localealias-feasibility-3.c:
Likewise.
* c-c++-common/analyzer/pr93355-localealias-simplified.c:
Likewise.
* c-c++-common/analyzer/pr96650-1-trans.c: Likewise.
* c-c++-common/analyzer/pr97072.c: Add __attribute__((nothrow)).
* c-c++-common/analyzer/pr98575-1.c: Likewise.
* c-c++-common/analyzer/pr99716-1.c: Add -fno-exceptions.
* c-c++-common/analyzer/pr99716-2.c: Likewise.
* c-c++-common/analyzer/pr99716-3.c: Likewise.
* c-c++-common/analyzer/pragma-2.c: Likewise.
* c-c++-common/analyzer/rhbz1878600.c: Likewise.
* c-c++-common/analyzer/strndup-1.c: Likewise.
* c-c++-common/analyzer/write-to-string-literal-4-disabled.c:
Likewise.
* c-c++-common/analyzer/write-to-string-literal-4.c: Likewise.
* c-c++-common/analyzer/write-to-string-literal-5.c: Likewise.
* c-c++-common/analyzer/zlib-5.c: Likewise.
* g++.dg/analyzer/exception-could-throw-1.C: New test.
* g++.dg/analyzer/exception-could-throw-2.C: New test.
* g++.dg/analyzer/exception-dynamic-spec.C: New test.
* g++.dg/analyzer/exception-leak-1.C: New test.
* g++.dg/analyzer/exception-leak-2.C: New test.
* g++.dg/analyzer/exception-leak-3.C: New test.
* g++.dg/analyzer/exception-leak-4.C: New test.
* g++.dg/analyzer/exception-leak-5.C: New test.
* g++.dg/analyzer/exception-leak-6.C: New test.
* g++.dg/analyzer/exception-nothrow.C: New test.
* g++.dg/analyzer/exception-path-1.C: New test.
* g++.dg/analyzer/exception-path-catch-all-1.C: New test.
* g++.dg/analyzer/exception-path-catch-all-2.C: New test.
* g++.dg/analyzer/exception-path-unwind-multiple-2.C: New test.
* g++.dg/analyzer/exception-path-unwind-multiple.C: New test.
* g++.dg/analyzer/exception-path-unwind-single.C: New test.
* g++.dg/analyzer/exception-path-with-cleanups.C: New test.
* g++.dg/analyzer/exception-rethrow-1.C: New test.
* g++.dg/analyzer/exception-rethrow-2.C: New test.
* g++.dg/analyzer/exception-stack-1.C: New test.
* g++.dg/analyzer/exception-stack-2.C: New test.
* g++.dg/analyzer/exception-subclass-1.C: New test.
* g++.dg/analyzer/exception-subclass-2.C: New test.
* g++.dg/analyzer/exception-value-1.C: New test.
* g++.dg/analyzer/exception-value-2.C: New test.
* g++.dg/analyzer/fno-exception.C: New test.
* g++.dg/analyzer/pr94028.C: Drop xfail.
* g++.dg/analyzer/std-unexpected.C: New test.
* g++.dg/coroutines/pr105287.C: Drop dg-excess-errors.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
The following changes how flag_complex_method is managed towards
being able to record that in the optimization set so we can stream
and restore it per function. Currently -fcx-fortran-rules and
-fcx-limited-range are separate recorded options but saving/restoring
does not restore flag_complex_method which is later used in the
middle-end.
The solution is to make -fcx-fortran-rules and -fcx-limited-range
aliases of a new -fcx-method= switch that represents flag_complex_method
directly so we can save and restore it.
PR middle-end/60779
* common.opt (fcx-method=): New, map to flag_complex_method.
(Enum complex_method): New.
(fcx-limited-range): Alias to -fcx-method=limited-range.
(fcx-fortran-rules): Alias to -fcx-medhot=fortran.
* ipa-inline-transform.cc (inline_call): Check flag_complex_method.
* ipa-inline.cc (can_inline_edge_by_limits_p): Likewise.
* opts.cc (finish_options): Adjust.
(set_fast_math_flags): Likewise.
* doc/invoke.texi (fcx-method=): Document.
* gcc.dg/lto/pr60779_0.c: New testcase.
* gcc.dg/lto/pr60779_1.c: Likewise.
|
|
The following removes the ability to switch back to non SLP-only
operation of the vectorizer - a requirement to start cleaning out
non-SLP paths without risk of regressing that case.
* params.opt (--param=vect-force-slp): Remove.
* doc/invoke.texi (--param=vect-force-slp): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_2): Assume
param_vect_force_slp is 1.
* tree-vect-stmts.cc (vect_analyze_stmt): Likewise.
|
|
Some backends do not define TARGET_ASM_OUTPUT_MI_THUNK. But the generic
thunk support cannot emit code for calling variadic methods of
multiple-inheritance classes. Example error for pru-unknown-elf:
.../gcc/gcc/testsuite/g++.dg/ipa/pr83549.C:7:24: error: generic thunk code fails for method 'virtual void C::_ZThn4_N1C3fooEz(...)' which uses '...'
Disable the affected tests for all targets which do not define
TARGET_ASM_OUTPUT_MI_THUNK.
Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.
gcc/ChangeLog:
* doc/sourcebuild.texi: Document variadic_mi_thunk effective
target check.
gcc/testsuite/ChangeLog:
* g++.dg/ipa/pr83549.C: Require effective target
variadic_mi_thunk.
* g++.dg/ipa/pr83667.C: Ditto.
* g++.dg/torture/pr81812.C: Ditto.
* g++.old-deja/g++.jason/thunk3.C: Ditto.
* lib/target-supports.exp
(check_effective_target_variadic_mi_thunk): New function.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Testcases for musttail call optimization fail on pru-unknown-elf:
FAIL: c-c++-common/musttail14.c -std=gnu++17 (test for excess errors)
Excess errors:
.../gcc/gcc/testsuite/c-c++-common/musttail14.c:37:14: error: cannot tail-call: caller uses sjlj exceptions
Silence these errors by disabling the tests if target uses SJLJ for
implementing exceptions. Use a new effective target check for this.
Ensured that test results with and without this patch for
x86_64-pc-linux-gnu are the same.
gcc/ChangeLog:
* doc/sourcebuild.texi: Document effective target
using_sjlj_exceptions.
gcc/testsuite/ChangeLog:
* c-c++-common/musttail14.c: Disable test if effective target
using_sjlj_exceptions.
* c-c++-common/musttail22.c: Ditto.
* g++.dg/musttail8.C: Ditto.
* g++.dg/musttail9.C: Ditto.
* g++.dg/opt/musttail3.C: Ditto.
* g++.dg/opt/musttail4.C: Ditto.
* g++.dg/opt/musttail5.C: Ditto.
* g++.dg/opt/pr119613.C: Ditto.
* lib/target-supports.exp
(check_effective_target_using_sjlj_exceptions): New check.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Currently enabling profile feedback regresses x264 and exchange. In both cases the root of the
issue is that ipa-cp cost model thinks cloning is not relevant when feedback is available
while it clones without feedback.
Consider:
__attribute__ ((used))
int a[1000];
__attribute__ ((noinline))
void
test2(int sz)
{
for (int i = 0; i < sz; i++)
a[i]++;
asm volatile (""::"m"(a));
}
__attribute__ ((noinline))
void
test1 (int sz)
{
for (int i = 0; i < 1000; i++)
test2(sz);
}
int main()
{
test1(1000);
return 0;
}
Here we want to clone call both test1 and test2 and specialize for 1000, but
ipa-cp will not do that, since it will skip call main->test1 as not hot since
it is called just once both with or without profile feedback.
In this simple testcase even without profile feedback we will track that main
is called once.
I think the testcase shows that hotness of call is not that relevant when
deciding whether we want to propagate constants across it. ipa-cp with IPA
profile can compute overall estimate of time saved (which is existing time
benefit computing time saved per invociation of the function multiplied by
number of executions) and see if result is big enough. An easy check is to
simply call maybe_hot_p on the resulting count.
So this patch makes ipa-cp to consider all calls sites except those known to be
unlikely executed (i.e. run 0 times in train run or known to lead to someting
bad) as interesting, which makes ipa-cp to propagate across them, find cloning
candidates and feed them into good_clonning_oppurtunity.
For this I added cs_interesting_for_ipcp_p which also attempts to do right
thing with partial training.
Now good_clonning_oppurtunity will currently return false, since it will figure
out that the call edge is not very frequent.
It already kind of knows that frequency of call instruction istself is not too
important, but instead of computing overall time saved, it tries to compare it
with param_ipa_cp_profile_count_base percentage of counts of call edges. I
think this is not very relevant since estimated time saved per call can be
large. So I dropped this logic and replaced it with simple use of overall
saved time.
Since ipa-cp is not dealing well with the cases where it hits the allowed unit
growth limit, we probably want to be more careful, so I keep existing metric
with this change.
So now we get:
Evaluating opportunities for test1/3.
- considering value 1000 for param #0 sz (caller_count: 1)
good_cloning_opportunity_p (time: 1, size: 8, count_sum: 1 (precise), overall time saved: 1 (adjusted)) -> evaluation: 0.12, threshold: 500
not cloning: time saved is not hot
good_cloning_opportunity_p (time: 129001, size: 20, count_sum: 1 (precise), overall time saved: 129001 (adjusted)) -> evaluation: 6450.05, threshold: 500
First call to good_cloning_oppurtunity considers the case where only test1 is
clonned. In this case time saved is 1 (for passing the value around) and since
it is called just once (count_sum) overall time saved is 1 which is not
considered hot and we also get very low evaulation score.
In the second call we consider cloning chain test1->test2. In this case time
saved is large (12901) since test2 is invoked many times and it is used to
controll the loop. We still know that the count is 1 but overall time is
129001 which is already considered relevant and we clone.
I also try to do something sensible in case we have calls both with
and without IPA profile (which can happen for comdats where profile got missing
or with LTO if some units were not trained).
Instead of checking whether sum of calls with known profile is nonzero, I keep
track if there are other calls and if so, also try the local heuristics that
is used without profile feedback.
The patch improves SPECint with -Ofast -fprofile-use by approx 1% by speeding
up x264 from 99.3s to 91.3s (9%) and exchange from 99.7s to 95.5s (3.3%).
We still get better x264 runtime without profile (86.4s for x264 and 93.8 for exchange).
The main problem I see is that ipa-cp has the global limit for growth of 10%
but does not consider the oppurtunities in priority order. Consequently if the
limit is hit, randomly some clone oppurtunities are dropped in favour of
others.
I dumped unit size changes with -flto -Ofast build of SPEC2017. Without patch I get:
orig new growth
588677 605385 102.838229
4378 6037 137.894016
484650 494851 102.104818
4111 4111 100.000000
99953 103519 103.567677
106181 114889 108.201091
21389 21597 100.972462
24925 26746 107.305918
15308 23974 156.610922
27354 27906 102.017986
494 494 100.000000
4631 4631 100.000000
863216 872729 101.102042
126604 126604 100.000000
605138 627156 103.638509
4112 4112 100.000000
222006 231293 104.183220
2952 3384 114.634146
37584 39807 105.914751
4111 4111 100.000000
13226 13226 100.000000
4111 4111 100.000000
326215 337396 103.427494
25240 25433 100.764659
64644 65972 102.054328
127223 132300 103.990631
494 494 100.000000
Small units can grow up to 16000 instructions and other units are
large. So there is only one 156% growth hititng limits which is exchange
that has recursive clonning that goes specially.
With profile feedback ipacp basically shuts itself off:
333815 333891 100.022767
2559 2974 116.217272
217576 217581 100.002298
2749 2749 100.000000
64652 64716 100.098992
68416 69707 101.886986
13171 13171 100.000000
11849 11849 100.000000
10519 16180 153.816903
15843 15843 100.000000
231 231 100.000000
3624 3624 100.000000
573385 573386 100.000174
97623 97623 100.000000
295673 295676 100.001015
2750 2750 100.000000
130723 130726 100.002295
2334 2334 100.000000
19313 19313 100.000000
2749 2749 100.000000
517331 517331 100.000000
6707 6707 100.000000
2749 2749 100.000000
193638 193638 100.000000
16425 16425 100.000000
47154 47154 100.000000
96422 96422 100.000000
231 231 100.000000
So we essentially clone only exchange and and mcf (116%)
With patch and no FDO I get:
588677 605385 102.838229
4378 6037 137.894016
484519 494698 102.100846
4111 4111 100.000000
99953 103519 103.567677
106181 114889 108.201091
21389 22632 105.811398
24854 26620 107.105496
15308 23974 156.610922
27354 28039 102.504204
494 494 100.000000
4631 4631 100.000000
4631 4631 100.000000
126604 126630 100.020536
4112 4112 100.000000
222006 231293 104.183220
2952 3384 114.634146
37584 39807 105.914751
2760715 2835539 102.710312
4111 4111 100.000000
13226 13226 100.000000
4111 4111 100.000000
326215 337396 103.427494
25240 25433 100.764659
64644 65972 102.054328
127223 132300 103.990631
494 494 100.000000
which seems essentially same as without patch. However with FDO I get:
333815 350363 104.957237
2559 3345 130.715123
217469 220765 101.515618
485599 488772 100.653420
2749 2749 100.000000
64652 74265 114.868836
68416 87484 127.870674
13171 20656 156.829398
11792 11990 101.679104
10519 17028 161.878506
15843 16119 101.742094
231 231 100.000000
573336 573336 100.000000
97623 97623 100.000000
295497 296208 100.240612
2750 2750 100.000000
130723 133341 102.002708
2334 2334 100.000000
19313 19368 100.284782
2749 2749 100.000000
6707 6755 100.715670
2749 2749 100.000000
193638 194712 100.554643
16425 17377 105.796043
47154 47154 100.000000
96422 96422 100.000000
231 231 100.000000
So here we get 114% and 127 growth in x264 (two differen tbinaries)
56% growht in Deepsjeng, 61% growth in Exchange which all are above
10% cutoff.
Bootstrapped/regtested x86_64-linux.
gcc/ChangeLog:
* ipa-cp.cc (base_count): Remove.
(struct caller_statistics): Rename n_hot_calls to n_interesting_calls;
add called_without_ipa_profile.
(init_caller_stats): Update.
(cs_interesting_for_ipcp_p): New function.
(gather_caller_stats): collect n_interesting_calls and
called_without_profile.
(ipcp_cloning_candidate_p): Use n_interesting-calls rather then hot.
(good_cloning_opportunity_p): Rewrite heuristics when IPA profile is
present
(estimate_local_effects): Update.
(value_topo_info::propagate_effects): Update.
(compare_edge_profile_counts): Remove.
(ipcp_propagate_stage): Do not collect base_count.
(get_info_about_necessary_edges): Record whether function is called
without profile.
(decide_about_value): Update.
(ipa_cp_cc_finalize): Do not initialie base_count.
* profile-count.cc (profile_count::operator*): New.
(profile_count::operator*=): New.
* profile-count.h (profile_count::operator*): Declare
(profile_count::operator*=): Declare.
* params.opt: Remove ipa-cp-profile-count-base.
* doc/invoke.texi: Likewise.
|
|
Support -mcpu=xt-c908, xt-c908v, xt-c910, xt-c910v2, xt-c920, xt-c920v2
for Xuantie series cpu.
ref:https://www.xrvm.cn/community/download?id=4224248662731067392
without fmv_cost, vector_unaligned_access, use_divmod_expansion, overlap_op_by_pieces, fill the tune info with generic ooo for further modification.
gcc/ChangeLog:
* config/riscv/riscv-cores.def (RISCV_TUNE): Add xt-c908, xt-c908v,
xt-c910, xt-c910v2, xt-c920, xt-c920v2.
(RISCV_CORE): Add xt-c908, xt-c908v, xt-c910, xt-c910v2, xt-c920,
xt-c920v2.
* doc/invoke.texi: Add xt-c908, xt-c908v, xt-c910, xt-c910v2,
xt-c920, xt-c920v2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/mcpu-xt-c908.c: test -mcpu=xt-c908.
* gcc.target/riscv/mcpu-xt-c910.c: test -mcpu=xt-c910.
* gcc.target/riscv/mcpu-xt-c920v2.c: test -mcpu=xt-c920v2.
* gcc.target/riscv/mcpu-xt-c908v.c: test -mcpu=xt-c908v.
* gcc.target/riscv/mcpu-xt-c910v2.c: test -mcpu=xt-c910v2.
* gcc.target/riscv/mcpu-xt-c920.c: test -mcpu=xt-c920.
|
|
Filip Kastl pointed out that contrib/check-params-in-docs.py complains
about params not documented in invoke.texi, so this patch adds the short
explanation from params.opt for these to the invoke.texi section.
Thanks for the reminder.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* doc/invoke.texi (lto-partition-locality-frequency-cutoff,
lto-partition-locality-size-cutoff, lto-max-locality-partition):
Document.
|
|
The documentation for the REG_EH_REGION could easily be read
(especially by non-native speakers) to indicate that it should be
attached to insn at the destination of an excpetion edge. Despite the
original text saying that the note "specifies the destination," it is
actually always attached to the source instruction.
This updates the documentation to make it clear that the REG_EH_REGION
note is always attached to instructions originating an exception edge
and that the value of the note specifies where the exception edge
leads to.
Co-Developed-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
gcc/ChangeLog:
* doc/cfg.texi: Update the exception handling section for the
REG_EH_REGION notes to make it clear that the note is attached
to the instruction throwing the exception.
|
|
Include the term used in the standard to ease further research for users,
and while at it, rephrase the description of the rule entirely using
Alexander Monakov's suggestion: it was previously wrong (and imprecise) as
"the same address" may well be re-used later on, and the issue is the
access via an expression of the wrong type.
gcc/ChangeLog:
* doc/invoke.texi: Use "compatible types" term. Rephrase to be
more precise (and correct).
|
|
gcc/ChangeLog
PR c/88382
* doc/extend.texi (Syntax Extensions): Adjust menu.
(Raw String Literals): New section.
|
|
-Q does something completely different in conjunction with --help than it
does otherwise; its main entry in the manual didn't mention that, nor did
-Q have an entry in the index for the --help usage.
gcc/ChangeLog
PR driver/90465
* doc/invoke.texi (Overall Options): Add a @cindex for -Q in
connection with --help=.
(Developer Options): Point at --help= documentation for the
other use of -Q.
|
|
There's a blurb at the top of the "Optimize Options" node telling
people that most optimization options are completely disabled at -O0
and a similar blurb in the entry for -Og, but nothing at the entry for
-O0. Since this is a continuing point of confusion it seems wise to
duplicate the information in all the places users are likely to look
for it.
gcc/ChangeLog
PR tree-optimization/71094
* doc/invoke.texi (Optimize Options): Document that -fivopts is
enabled at -O1 and higher. Add blurb about -O0 causing GCC to
completely ignore most optimization options.
|
|
Implement partitioning and cloning in the callgraph to help locality.
A new -fipa-reorder-for-locality flag is used to enable this.
The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
The optimization has two components:
* Partitioning the callgraph so as to group callers and callees that frequently
call each other in the same partition
* Cloning functions that straddle multiple callchains and allowing each clone
to be local to the partition of its callchain.
The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
It creates a partitioning plan and does the prerequisite cloning.
The partitioning is then implemented during the existing LTO partitioning pass.
To guide these locality heuristics we use PGO data.
In the absence of PGO data we use a static heuristic that uses the accumulated
estimated edge frequencies of the callees for each function to guide the
reordering.
We are investigating some more elaborate static heuristics, in particular using
the demangled C++ names to group template instantiatios together.
This is promising but we are working out some kinks in the implementation
currently and want to send that out as a follow-up once we're more confident
in it.
A new bootstrap-lto-locality bootstrap config is added that allows us to test
this on GCC itself with either static or PGO heuristics.
GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).
As this new pass enables a new partitioning scheme it is incompatible with
explicit -flto-partition= options so an error is introduced when the user
uses both flags explicitly.
With this optimization we are seeing good performance gains on some large
internal workloads that stress the parts of the processor that is sensitive
to code locality, but we'd appreciate wider performance evaluation.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill
Signed-off-by: Prachi Godbole <pgodbole@nvidia.com>
Co-authored-by: Kyrylo Tkachov <ktkachov@nvidia.com>
config/ChangeLog:
* bootstrap-lto-locality.mk: New file.
gcc/ChangeLog:
* Makefile.in (OBJS): Add ipa-locality-cloning.o.
* cgraph.h (set_new_clone_decl_and_node_flags): Declare prototype.
* cgraphclones.cc (set_new_clone_decl_and_node_flags): Remove static
qualifier.
* common.opt (fipa-reorder-for-locality): New flag.
(LTO_PARTITION_DEFAULT): Declare.
(flto-partition): Change default to LTO_PARTITION_DFEAULT.
* doc/invoke.texi: Document -fipa-reorder-for-locality.
* flag-types.h (enum lto_locality_cloning_model): Declare.
(lto_partitioning_model): Add LTO_PARTITION_DEFAULT.
* lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping of
node and index.
* opts.cc (validate_ipa_reorder_locality_lto_partition): Define.
(finish_options): Handle LTO_PARTITION_DEFAULT.
* params.opt (lto_locality_cloning_model): New enum.
(lto-partition-locality-cloning): New param.
(lto-partition-locality-frequency-cutoff): Likewise.
(lto-partition-locality-size-cutoff): Likewise.
(lto-max-locality-partition): Likewise.
* passes.def: Register pass_ipa_locality_cloning.
* timevar.def (TV_IPA_LC): New timevar.
* tree-pass.h (make_pass_ipa_locality_cloning): Declare.
* ipa-locality-cloning.cc: New file.
* ipa-locality-cloning.h: New file.
gcc/lto/ChangeLog:
* lto-partition.cc (add_node_references_to_partition): Define.
(create_partition): Likewise.
(lto_locality_map): Likewise.
(lto_promote_cross_file_statics): Add extra dumping.
* lto-partition.h (lto_locality_map): Declare prototype.
* lto.cc (do_whole_program_analysis): Handle
flag_ipa_reorder_for_locality.
|
|
gcc/ChangeLog
PR ipa/113203
* doc/extend.texi (Common Function Attributes): Explain how to
use always_inline in programs that have multiple translation
units, and that LTO inlining additionally needs optimization
enabled.
|
|
gcc/ChangeLog:
PR target/108134
* doc/extend.texi: Remove documents from r11-344-g0fec3f62b9bfc0.
|
|
gcc/ChangeLog
PR target/42683
* doc/invoke.texi (x86 Options): Clarify that -march=pentiumpro
doesn't include MMX.
|
|
Addresses PR#117869
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117869
gcc/ChangeLog:
* doc/install.texi: Add requirements for building gccrs.
|
|
This patch introduces four dejagnu tests matching four
documentation examples. Both asm examples are added and only built if
the x86_64 target is available. The other two are hello world using
libc and StrIO. The doc/gm2.texi asm examples are changed to
use eax rather than rax.
gcc/ChangeLog:
PR modula2/119779
* doc/gm2.texi (Interface to assembly language): Use eax
rather than rax in both examples.
gcc/testsuite/ChangeLog:
PR modula2/119779
* gm2.dg/doc/examples/pass/doc-examples-pass.exp: New test.
* gm2.dg/doc/examples/pass/exampleadd.mod: New test.
* gm2.dg/doc/examples/pass/exampleadd2.mod: New test.
* gm2.dg/doc/examples/pass/hello.mod: New test.
* gm2.dg/doc/examples/pass/hellopim.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
The recently announced IBM z17 processor implements the architecture
already supported as arch15. This patch adds support for z17 as an
alternative architecture name for arch15.
gcc/ChangeLog:
* common/config/s390/s390-common.cc: Rename arch15 to z17.
* config.gcc: Add z17.
* config/s390/driver-native.cc: Detect z17 machine.
* config/s390/s390-builtins.def (B_VXE3): Rename arch15 to z17.
* config/s390/s390-c.cc (s390_resolve_overloaded_builtin): Ditto.
* config/s390/s390-opts.h (enum processor_type): Ditto.
* config/s390/s390.cc: Ditto.
* config/s390/s390.h: Ditto.
* config/s390/s390.md: Ditto.
* config/s390/s390.opt: Add z17.
* doc/invoke.texi: Ditto.
|
|
gcc/ChangeLog
PR target/97585
* doc/invoke.texi (x86 Options): Document list of extensions
supported by -march=x86_64, according to the declaration of
PTA_X86_64_BASELINE in config/i386/i386.h.
|
|
gcc/ChangeLog
PR c++/106618
* doc/invoke.texi (Option Summary): Remove -fargs-in-order, add
-fstrong-eval-order.
(C++ Dialect Options): Explicitly document that -fstrong-eval-order
takes an optional argument and what the choices are. Generalize
references to C++17.
|
|
gcc/ChangeLog
PR middle-end/105548
* doc/invoke.texi (Optimize Options): Delete misleading sentence
about conversions.
|
|
gcc/ChangeLog
PR tree-optimization/87909
* common.opt.urls: Regenerate.
* doc/invoke.texi (Option Summary): Add -ftree-cselim.
(Optimize Options): Likewise.
|
|
gcc/ChangeLog
PR middle-end/14708
* doc/invoke.texi (Optimize Options): List -fexcess-precision
before -ffloat-store, moving some background discussion to the
former from the latter. Recommend using -fexcess-precision=standard
instead of -ffloat-store.
|
|
The issue is specifically about a missing word, but I spotted other
copy-editing issues like misplaced hyphens in nearby text. I also
thought that the -Wimplicit example was anachronistic because it's a
hard error in modern C dialects rather than a warning, and replaced it
with something users are more likely to run into.
gcc/ChangeLog
PR c++/90468
* doc/invoke.texi (Warning Options): Clean up text describing
-Wno-xxx.
|
|
After discussion with the open source support team at Apple, we have
established that the cores conform to the 8.5 and 8.6 requirements.
One of the mandatory features (FEAT_SPECRES) is not exposed (or
available) in user-space code but is supported for privileged code.
The values for chip IDs and the LITTLE.big variants have been taken
from lists in the XNU and LLVM sources.
PR target/113257
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Apple-a12,
Apple-M1, Apple-M2, Apple-M3 with expanded names to allow for the
LITTLE.big versions.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add apple-m1,2 and 3 cores to the ones listed
for arch and tune selections.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
As noted in the previous patch, combine still takes >30% of
compile time in the original testcase for PR101523. The problem
is that try_combine uses linear insn searches for some dataflow
queries, so in the worst case, an unlimited number of 2->2
combinations for the same i2 can lead to quadratic behaviour.
This patch limits distribute_links to a certain number
of instructions when i2 is unchanged. As Segher said in the PR trail,
it would make more conceptual sense to apply the limit unconditionally,
but I thought it would be better to change as little as possible at
this development stage. Logically, in stage 1, the --param should
be applied directly by distribute_links with no input from callers.
As I mentioned in:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
I think it's safe to drop log links even if a use exists. All
processing of log links seems to handle the absence of a link
for a particular register in a conservative way.
The initial set-up errs on the side of dropping links, since for example
create_log_links has:
/* flow.c claimed:
We don't build a LOG_LINK for hard registers contained
in ASM_OPERANDs. If these registers get replaced,
we might wind up changing the semantics of the insn,
even if reload can make what appear to be valid
assignments later. */
if (regno < FIRST_PSEUDO_REGISTER
&& asm_noperands (PATTERN (use_insn)) >= 0)
continue;
which excludes combinations by dropping log links, rather than during
try_combine. And:
/* If this register is being initialized using itself, and the
register is uninitialized in this basic block, and there are
no LOG_LINKS which set the register, then part of the
register is uninitialized. In that case we can't assume
anything about the number of nonzero bits.
??? We could do better if we checked this in
reg_{nonzero_bits,num_sign_bit_copies}_for_combine. Then we
could avoid making assumptions about the insn which initially
sets the register, while still using the information in other
insns. We would have to be careful to check every insn
involved in the combination. */
if (insn
&& reg_referenced_p (x, PATTERN (insn))
&& !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
REGNO (x)))
{
struct insn_link *link;
FOR_EACH_LOG_LINK (link, insn)
if (dead_or_set_p (link->insn, x))
break;
if (!link)
{
rsp->nonzero_bits = GET_MODE_MASK (mode);
rsp->sign_bit_copies = 1;
return;
}
}
treats the lack of a log link as a possible sign of uninitialised data,
but that would be a missed optimisation rather than a correctness issue.
One question is what the default --param value should be. I went with
Jakub's suggestion of 3000 from:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
Also, to answer Jakub's question in that comment, I tried bisecting:
int limit = atoi (getenv ("BISECT"));
(so applying the limit for all calls from try_combine) with an
abort in distribute_links if the limit caused a link to be skipped.
The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
was 142, so much lower than the parameter value. I realised too late
that --enable-checking=release would probably have been a more
interesting test.
The previous patch meant that distribute_links itself is now linear
for a given i2 definition, since each search starts at the previous
last use, rather than at i2 itself. This means that the limit has
to be applied cumulatively across all searches for the same link.
The patch does that by storing a counter in the insn_link structure.
There was a 32-bit hole there on LP64 hosts.
gcc/
PR testsuite/116398
* params.opt (-param=max-combine-search-insns=): New param.
* doc/invoke.texi: Document it.
* combine.cc (insn_link::insn_count): New field.
(alloc_insn_link): Initialize it.
(distribute_links): Add a limit parameter.
(try_combine): Use the new param to limit distribute_links
when only i3 has changed.
|
|
gcc/ChangeLog
* doc/extend.texi (Boolean Type): Further clarify support for
_Bool in C23 and C++.
|
|
gcc/ChangeLog
PR middle-end/78874
* doc/invoke.texi (Warning Options): Fix description of
-Wno-aggressive-loop-optimizations to reflect that this turns
off the warning, and the default is for it to be enabled.
|
|
Per the issue, there were a couple places in the manual where
-Wno-psabi was mentioned, but the option itself was not documented.
gcc/c-family/ChangeLog
PR c/81831
* c.opt (Wpsabi): Remove "Undocumented" modifier and add a
documentation string.
gcc/ChangeLog
PR c/81831
* doc/invoke.texi (Option Summary): Add -Wno-psabi.
(Warning Options): Document -Wpsabi separately from -Wabi.
Note it's enabled by default, not just implied by -Wabi.
Replace the detailed example for a GCC 4.4 change for x86
(which is unlikely to be very interesting nowadays) with
just a list of all targets that presently diagnose these
warnings.
(RS/6000 and PowerPC Options): Add cross-references for
-Wno-psabi.
|
|
gcc/ChangeLog
PR middle-end/112589
* common.opt (-fcf-protection): Add documentation string.
* doc/invoke.texi (Option Summary): Add entry for -fcf-protection
without argument.
(Instrumentation Options): Tidy the -fcf-protection entry and
and add documention for the form without an argument.
|
|
Looking over the recently-committed change to the musttail attribute
documentation, it appears the comment in the last example was a paste-o,
as it does not agree with either what the similar example in the
-Wmaybe-musttail-local-addr documentation says, or the actual behavior
observed when running the code.
In addition, the entire section on musttail was in need of copy-editing
to put it in the present tense, avoid reference to "the user", etc. I've
attempted to clean it up here.
gcc/ChangeLog
* doc/extend.texi (Statement Attributes): Copy-edit the musttail
attribute documentation and correct the comment in the last
example.
|
|
This issue was specifically about a confusing mention of the "second
and third arguments to the memcpy function" when only the second one
is a pointer affected by the attribute, but reading through the entire
discussion I found other things confusing as well; e.g. in some cases
it wasn't clear whether the "arguments" were the arguments to the
attribute or the function, or exactly what a "positional argument"
was. I've tried to rewrite that part to straighten it out, as well as
some light copy-editing throughout.
gcc/ChangeLog
PR c/101440
* doc/extend.texi (Common Function Attributes): Clean up some
confusing language in the description of the "access" attribute.
|