riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-05-08	diagnostics: convert HTML output test plugin to 'experimental-html' sink ↵	David Malcolm	6	-1006/+128
	[PR116792] In r15-3752-g48261bd26df624 I added a test plugin that overrode the regular output, instead emitting diagnostics in crude HTML form. In r15-4760-g0b73e9382ab51c I added support for multiple kinds of diagnostic output simultaneously, adding -fdiagnostics-add-output=DIAGNOSTICS-OUTPUT-SPEC -fdiagnostics-set-output=DIAGNOSTICS-OUTPUT-SPEC for adding/changing the kind of diagnostics output, supporting "text" and "sarif" output schemes. This patch promotes the HTML output code from the test plugins so that it is available from "-fdiagnostics-add-output=", using a new "experimental-html" scheme, to allow simultaneous text, sarif and html output, and to make it easier to experiment with. The patch adds Python-based testing of the emitted HTML. The patch does not affect the generated HTML, which is still crude, and not yet ready for end-users. I hope to improve it in followups. gcc/ChangeLog: PR other/116792 * Makefile.in (OBJS-libcommon): Add diagnostic-format-html.o. * diagnostic-format-html.cc: Move here from testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc. Simplify includes. Rename "xhtml" to "html" throughout. (write_escaped_text): Drop. (class xhtml_stream_output_format): Drop. (class html_file_output_format): Reimplement using diagnostic_output_file. (diagnostic_output_format_init_xhtml): Drop. (diagnostic_output_format_init_xhtml_stderr): Drop. (diagnostic_output_format_init_xhtml_file): Drop. (diagnostic_output_format_open_html_file): New. (make_html_sink): New. (xhtml_format_selftests): Convert to... (diagnostic_format_html_cc_tests): ...this. (plugin_is_GPL_compatible): Drop. (plugin_init): Drop. * diagnostic-format-html.h: New file. * doc/invoke.texi (-fdiagnostics-add-output=): Add "experimental-html" scheme. * opts-diagnostic.cc: Include "diagnostic-format-html.h". (class html_scheme_handler): New. (output_factory::output_factory): Add html_scheme_handler. (html_scheme_handler::make_sink): New. * selftest-run-tests.cc (selftest::run_tests): Call the new selftests. * selftest.h (selftest::diagnostic_format_html_cc_tests): New decl. gcc/testsuite/ChangeLog: PR other/116792 * gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc: Move to gcc/diagnostic-format-html.cc. * gcc.dg/html-output/html-output.exp: New support script. * gcc.dg/html-output/missing-semicolon.c: New test. * gcc.dg/html-output/missing-semicolon.py: New test script. * gcc.dg/plugin/diagnostic-test-xhtml-1.c: Deleted test. * gcc.dg/plugin/plugin.exp (plugin_test_list): Drop moved plugin and its deleted test. * lib/gcc-dg.exp (load_lib): Add load_lib of scanhtml.exp. * lib/htmltest.py: New support script. * lib/scanhtml.exp: New support script, based on scansarif.exp. libatomic/ChangeLog: PR other/116792 * testsuite/lib/libatomic.exp: Add load_lib of scanhtml.exp. libgomp/ChangeLog: PR other/116792 * testsuite/lib/libgomp.exp: Add load_lib of scanhtml.exp. libitm/ChangeLog: PR other/116792 * testsuite/lib/libitm.exp: Add load_lib of scanhtml.exp. libphobos/ChangeLog: PR other/116792 * testsuite/lib/libphobos-dg.exp: Add load_lib of scanhtml.exp. libvtv/ChangeLog: PR other/116792 * testsuite/lib/libvtv-dg.exp: Add load_lib of scanhtml.exp. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-05-08	Fix tree-ssa/pr31261.c testcase after r16-400 [PR120168]	Andrew Pinski	1	-3/+3
	AFter r16-400-g5e363ffefaceb9, on targets where char is unsigned by default, tree-ssa/pr31261.c testcase started to fail: FAIL: gcc.dg/tree-ssa/pr31261.c scan-tree-dump-times original "return \\\$char\\\$ -\\\$unsigned char\\\$ c & 31;" 1 This is because the casts are no longer needed as both char and unsigned char are the same signedness. I was deciding between add -fsigned-char or changing the testcase to use explicitly `signed char`. I went with using an explicit `signed char` as that would be case normally. PR testsuite/120168 gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr31261.c: Use `signed char` instead of plain char. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-05-08	tree-optimization/120043 - bogus conditional store elimination	Richard Biener	1	-0/+10
	The following fixes conditional store elimination to properly check for conditional stores to readonly memory which we can obviously not store to unconditionally. The tree_could_trap_p predicate used is only considering rvalues and the chosen approach mimics that of loop store motion. PR tree-optimization/120043 * tree-ssa-phiopt.cc (cond_store_replacement): Check whether the store is to readonly memory. * gcc.dg/torture/pr120043.c: New testcase.
2025-05-08	phiopt: Use rewrite_to_defined_overflow in move_stmt [PR116938]	Andrew Pinski	2	-5/+6
	As mentioned previously the rewrite in move_stmt should be using gimple_needing_rewrite_undefined/rewrite_to_defined_unconditional instead of just rewriting the VCE. This moves move_stmt over to those APIs. A few testcases needed to be updated due to ABS_EXPR rewrite that happens. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/116938 gcc/ChangeLog: * tree-ssa-phiopt.cc (move_stmt): Use rewrite_to_defined_overflow isntead of manually doing the rewrite of the VCE. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-40.c: Update to expect ABSU_EXPR. * gcc.dg/tree-ssa/phi-opt-41.c: Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-05-08	Rewrite VCEs of integral types [PR116939]	Andrew Pinski	1	-0/+51
	Like the patch to phiopt (r15-4033-g1f619fe25925a5f7), this adds rewriting of VCE to gimple_with_undefined_signed_overflow/rewrite_to_defined_overflow. In the case of moving VCE of a bool from being conditional to unconditional, it needs to be rewritten to not to use VCE but a normal cast. pr120122-1.c is an example of where LIM needs this rewriting. The precision of the outer type needs to be less then the inner one. This also renames gimple_with_undefined_signed_overflow to gimple_needing_rewrite_undefined and rewrite_to_defined_overflow to rewrite_to_defined_unconditional as they will be doing more than just handling signed overflow. Changes since v1: * v2: rename the functions. * v3: Add check for precision to be smaller. Bootstrappd and tested on x86_64-linux-gnu. PR tree-optimization/120122 PR tree-optimization/116939 gcc/ChangeLog: * gimple-fold.h (gimple_with_undefined_signed_overflow): Rename to .. (rewrite_to_defined_overflow): This. (gimple_needing_rewrite_undefined): Rename to ... (rewrite_to_defined_unconditional): this. * gimple-fold.cc (gimple_with_undefined_signed_overflow): Rename to ... (gimple_needing_rewrite_undefined): This. Return true for VCE with integral types of smaller precision. (rewrite_to_defined_overflow): Rename to ... (rewrite_to_defined_unconditional): This. Handle VCE rewriting to a cast. * tree-if-conv.cc: s/gimple_with_undefined_signed_overflow/gimple_needing_rewrite_undefined/ s/rewrite_to_defined_overflow/rewrite_to_defined_unconditional. * tree-scalar-evolution.cc: Likewise * tree-ssa-ifcombine.cc: Likewise. * tree-ssa-loop-im.cc: Likewise. * tree-ssa-loop-split.cc: Likewise. * tree-ssa-reassoc.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr120122-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-05-08	tree-optimization/120143 - ICE with failed early break store move	Richard Biener	1	-0/+18
	The early break vectorization store moving was incorrectly trying to move the pattern stmt instead of the original one which failed to register and then confused virtual SSA form due to the update triggered by a degenerate virtual PHI. PR tree-optimization/120143 * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Move/update the original stmts, not the pattern stmts which lack virtual operands and are not in the IL. * gcc.dg/vect/vect-early-break_135-pr120143.c: New testcase.
2025-05-08	tree-optimization/120089 - force all PHIs live for early-break vect	Richard Biener	1	-0/+66
	The following makes sure to even mark unsupported PHIs live when doing early-break vectorization since otherwise we fail to validate we can vectorize those and generate wrong code based on the scalar PHIs which would only work with a vectorization factor of one. PR tree-optimization/120089 * tree-vect-stmts.cc (vect_stmt_relevant_p): Mark all PHIs live when not already so and doing early-break vectorization. (vect_mark_stmts_to_be_vectorized): Skip virtual PHIs. * tree-vect-slp.cc (vect_analyze_slp): Robustify handling of early-break forced IVs. * gcc.dg/vect/vect-early-break_134-pr120089.c: New testcase.
2025-05-07	libcpp: Further fixes for incorrect line numbers in large files [PR120061]	Jakub Jelinek	8	-4/+40
	The backport of the PR108900 fix to 14 branch broke building chromium because static_assert (__LINE__ == expected_line_number, ""); now triggers as the __LINE__ values are off by one. This isn't the case on the trunk and 15 branch because we've switched to 64-bit location_t and so one actually needs far longer header files to trigger it. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c11 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c12 contain (large) testcases in patch form which show on the 14 branch that the first one used to fail before the PR108900 backport and now works correctly, while the second one attempts to match the chromium behavior and it used to pass before the PR108900 backport and now it FAILs. The two testcases show rare problematic cases, because do_include_common -> parse_include -> check_eol -> check_eol_1 -> cpp_get_token_1 -> _cpp_lex_token -> _cpp_lex_direct -> linemap_line_start triggers there /* Allocate the new line_map. However, if the current map only has a single line we can sometimes just increase its column_bits instead. / if (line_delta < 0 \|\| last_line != ORDINARY_MAP_STARTING_LINE_NUMBER (map) \|\| SOURCE_COLUMN (map, highest) >= (1U << (column_bits - range_bits)) \|\| ( / We can't reuse the map if the line offset is sufficiently large to cause overflow when computing location_t values. / (to_line - ORDINARY_MAP_STARTING_LINE_NUMBER (map)) >= (((uint64_t) 1) << (CHAR_BIT sizeof (linenum_type) - column_bits))) \|\| range_bits < map->m_range_bits) map = linemap_check_ordinary (const_cast <line_map > (linemap_add (set, LC_RENAME, ORDINARY_MAP_IN_SYSTEM_HEADER_P (map), ORDINARY_MAP_FILE_NAME (map), to_line))); and so creates a new ordinary map on the line right after the (problematic) #include line. Now, in the spot that r14-11679-g8a884140c2bcb7 patched, pfile->line_table->highest_location in all 3 tests (also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120061#c13 ) is before the decrement the start of the line after the #include line and so the decrement is really desirable in that case to put highest_location somewhere on the line where the #include actually is. But at the same time it is also undesirable, because if we do decrement it, then linemap_add LC_ENTER called from _cpp_do_file_change will then / Generate a start_location above the current highest_location. If possible, make the low range bits be zero. / location_t start_location = set->highest_location + 1; unsigned range_bits = 0; if (start_location < LINE_MAP_MAX_LOCATION_WITH_COLS) range_bits = set->default_range_bits; start_location += (1 << range_bits) - 1; start_location &= ~((1 << range_bits) - 1); linemap_assert (!LINEMAPS_ORDINARY_USED (set) \|\| (start_location >= MAP_START_LOCATION (LINEMAPS_LAST_ORDINARY_MAP (set)))); and we can end up with the new LC_ENTER ordinary map having the same start_location as the preceding LC_RENAME one. Next thing that happens is computation of included_from: if (reason == LC_ENTER) { if (set->depth == 0) map->included_from = 0; else / The location of the end of the just-closed map. / map->included_from = (((map[0].start_location - 1 - map[-1].start_location) & ~((1 << map[-1].m_column_and_range_bits) - 1)) + map[-1].start_location); The normal case (e.g. with the testcase included at the start of this comment) is that map[-1] starts somewhere earlier and so map->included_from computation above nicely computes location_t which expands to the start of the #include line. With r14-11679 reverted, for #c11 as well as #c12 map[0].start_location == map[-1].start_location above, and so it is ((location_t) -1 & ~((1 << map[-1].m_column_and_range_bits) - 1))) + map[-1].start_location, which happens to be start of the #include line. For #c11 map[0].start_location is 0x500003a0 and map[-1] has m_column_and_range_bits 7 and map[-2] has m_column_and_range_bits 12 and map[0].included_from is set to 0x50000320. For #c12 map[0].start_location is 0x606c0402 and map[-2].start_location is 0x606c0400 and m_column_and_range_bits is 0 for all 3 maps. map[0].included_from is set to 0x606c0401. The last important part is again in linemap_add when doing LC_LEAVE: / (MAP - 1) points to the map we are leaving. The map from which (MAP - 1) got included should be the map that comes right before MAP in the same file. / from = linemap_included_from_linemap (set, map - 1); / A TO_FILE of NULL is special - we use the natural values. / if (to_file == NULL) { to_file = ORDINARY_MAP_FILE_NAME (from); to_line = SOURCE_LINE (from, from[1].start_location); sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from); } Here it wants to compute the right to_line which ought to be the line after the #include directive. On the #c11 testcase that doesn't work correctly though, because map[-1].included_from is 0x50000320, from[0] for that is LC_ENTER with start_location 0x4080 and m_column_and_range_bits 12 but note that we've earlier computed map[-1].start_location + (-1 & 0xffffff80) and so only decreased by 7 bits, so to_line is still on the line with #include and not after it. In the #c12 that doesn't happen, all the ordinary maps involved there had 0 m_column_and_range_bits and so this computes correct line. Below is a fix for the trunk including testcases using the location_overflow_plugin hack to simulate the bugs without needing huge files (in the 14 case it is just 330KB and almost 10MB, but in the 15 case it would need to be far bigger). The pre- r15-9018 trunk has FAIL: gcc.dg/plugin/location-overflow-test-pr116047.c -fplugin=./location_overflow_plugin.so scan-file static_assert[^\n\r]6[^\n\r]== 6 and current trunk FAIL: gcc.dg/plugin/location-overflow-test-pr116047.c -fplugin=./location_overflow_plugin.so scan-file static_assert[^\n\r]6[^\n\r]== 6 FAIL: gcc.dg/plugin/location-overflow-test-pr120061.c -fplugin=./location_overflow_plugin.so scan-file static_assert[^\n\r]5[^\n\r]== 5 and with the patch everything PASSes. I'll post afterwards a 14 version of the patch. The patch reverts the r15-9018 change, because it is incorrect, we really need to decrement it even when crossing ordinary map boundaries, so that the location is not on the line after the #include line but somewhere on the #include line. It also patches two spots in linemap_add mentioned above to make sure we get correct locations both in the included_from location_t when doing LC_ENTER (second line-map.cc hunk) and when doing LC_LEAVE to compute the right to_line (first line-map.cc hunk), both in presence of an added LC_RENAME with the same start_location as the following LC_ENTER (i.e. the problematic cases). The LC_ENTER hunk is mostly to ensure included_form location_t is at the start of the #include line (column 0), without it we can decrease include_from not enough and end up at some random column in the middle of the line, because it is masking away map[-1].m_column_and_range_bits bits even when in the end the resulting include_from location_t will be found in map[-2] map with perhaps different m_column_and_range_bits. That alone doesn't fix the bug though. The more important is the LC_LEAVE hunk and the problem there is caused by linemap_line_start not actually doing r = set->highest_line + (line_delta << map->m_column_and_range_bits); when adding a new map (the LC_RENAME one because we need to switch to different number of directly encoded ranges, or columns, etc.). So, in the original PR108900 case that to_line = SOURCE_LINE (from, from[1].start_location); doesn't do the right thing, from there is the last < 0x50000000 map with m_column_and_range_bits 12, from[1] is the first one above it and map[-1].included_from is the correct location of column 0 on the #include line, but as the new LC_RENAME map has been created without actually increasing highest_location to be on the new line (we've just set to_line of the new LC_RENAME map to the correct line), to_line = SOURCE_LINE (from, from[1].start_location); stays on the same source line. I've tried to just replace that with to_line = SOURCE_LINE (from, linemap_included_from (map - 1)) + 1; i.e. just find out the #include line from map[-1].included_from and add 1 to it, unfortunately that breaks the c-c++-common/cpp/line-4.c test where we expect to stay on the same 0 line for LC_LEAVE from <command line> and gcc.dg/cpp/trad/Wunused.c, gcc.dg/cpp/trad/builtins.c and c-c++-common/analyzer/named-constants-via-macros-traditional.c tests all with -traditional-cpp preprocessing where to_line is also off-by-one from the expected one. So, this patch instead conditionalizes it, uses the to_line = SOURCE_LINE (from, linemap_included_from (map - 1)) + 1; way only if from[1] is a LC_RENAME map (rather than the usual LC_ENTER one), that should limit it to the problematic cases of when parse_include peeked after EOL and had to create LC_RENAME map with the same start_location as the LC_ENTER after it. Some further justification for the LC_ENTER hunk, using the https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682774.html testcase (old is 14 before r14-11679, vanilla current 14 and new with the 14 patch) I get $ /usr/src/gcc-14/obj/gcc/cc1.old -quiet -std=c23 pr116047.c -nostdinc In file included from pr116047-1.h:327677:21, from pr116047.c:4: pr116047-2.h:1:1: error: unknown type name ‘a’ 1 \| a b c; \| ^ pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’ 1 \| a b c; \| ^ pr116047-1.h:327677:1: error: static assertion failed: "" 327677 \| #include "pr116047-2.h" \| ^~~~~~~~~~~~~ $ /usr/src/gcc-14/obj/gcc/cc1.vanilla -quiet -std=c23 pr116047.c -nostdinc In file included from pr116047-1.h:327678, from pr116047.c:4: pr116047-2.h:1:1: error: unknown type name ‘a’ 1 \| a b c; \| ^ pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’ 1 \| a b c; \| ^ $ /usr/src/gcc-14/obj/gcc/cc1.new -quiet -std=c23 pr116047.c -nostdinc In file included from pr116047-1.h:327677, from pr116047.c:4: pr116047-2.h:1:1: error: unknown type name ‘a’ 1 \| a b c; \| ^ pr116047-2.h:1:5: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘c’ 1 \| a b c; \| ^ pr116047-1.h has on lines 327677+327678: #include "pr116047-2.h" static_assert (__LINE__ == 327678, ""); so the static_assert failure is something that was dealt mainly in the LC_LEAVE hunk and files.cc reversion, but please have a look at the In file included from lines. 14.2 emits correct line (#include "pr116047-2.h" is indeed on line 327677) but some random column in there (which is not normally printed for smaller headers; 21 is the . before extension in the filename). Current trunk emits incorrect line (327678 instead of 327677, clearly it didn't decrement). And the patched compiler emits the right line with no column, as would be printed if I remove e.g. 300000 newlines from the file. 2025-05-07 Jakub Jelinek <jakub@redhat.com> PR preprocessor/108900 PR preprocessor/116047 PR preprocessor/120061 files.cc (_cpp_stack_file): Revert 2025-03-28 change. * line-map.cc (linemap_add): Use SOURCE_LINE (from, linemap_included_from (map - 1)) + 1; instead of SOURCE_LINE (from, from[1].start_location); to compute to_line for LC_LEAVE. For LC_ENTER included_from computation, look at map[-2] or even lower if map[-1] has the same start_location as map[0]. * gcc.dg/plugin/plugin.exp: Add location-overflow-test-pr116047.c and location-overflow-test-pr120061.c. * gcc.dg/plugin/location_overflow_plugin.cc (plugin_init): Don't error on unknown values, instead just break. Handle 0x4fHHHHHH arguments differently. * gcc.dg/plugin/location-overflow-test-pr116047.c: New test. * gcc.dg/plugin/location-overflow-test-pr116047-1.h: New test. * gcc.dg/plugin/location-overflow-test-pr116047-2.h: New test. * gcc.dg/plugin/location-overflow-test-pr120061.c: New test. * gcc.dg/plugin/location-overflow-test-pr120061-1.h: New test. * gcc.dg/plugin/location-overflow-test-pr120061-2.h: New test.
2025-05-06	ipa: Do not emit info about temporary clones to ipa-clones dump (PR119852)	Martin Jambor	1	-0/+50
	As described in PR 119852, the output of -fdump-ipa-clones can contain "(null)" as the suffix/reason for cloning when we need to create a clone to hold the original function during recursive inlining. Such clone is never output and so should not be part of the dump output either. gcc/ChangeLog: 2025-04-23 Martin Jambor <mjambor@suse.cz> PR ipa/119852 * cgraphclones.cc (dump_callgraph_transformation): Document the function. Do not dump if suffix is NULL. gcc/testsuite/ChangeLog: 2025-04-23 Martin Jambor <mjambor@suse.cz> PR ipa/119852 * gcc.dg/ipa/pr119852.c: New test.
2025-05-06	diagnostics: add logical_location_manager; reimplement logical_location	David Malcolm	1	-2/+8
	Previously we used an abstract base class logical_location with concrete subclasses to separate the diagnostics subsystem from implementation details of "tree" and of libgdiagnostics. This approach required allocating implementation objects on the heap whenever working with logical locations, and made comparing logical locations awkward. This patch reworks things so that the type "logical_location" becomes a boxed pointer (const void ), and client code provides a single object implementing a new logical_location_manager abstract base class. The manager class has responsibility for providing meaning to the boxed pointers. Within the compiler we use a manager in which they are "tree" pointers, whereas within libgdiagnostics we use a manager in which they are pointers to instances of libgdiagnostics' "struct diagnostic_logical_location". Other kinds of manager could be implemented. gcc/analyzer/ChangeLog: checker-event.cc (checker_event::checker_event): Update initialization of m_logical_loc. (checker_event::maybe_add_sarif_properties): Add "builder" param. Replace call to make_sarif_logical_location_object with call to sarif_property_bag::set_logical_location. (superedge_event::maybe_add_sarif_properties): Add "builder" param. * checker-event.h (checker_event::get_logical_location): Reimplement. (checker_event::maybe_add_sarif_properties): Add "builder" param. (checker_event::maybe_add_sarif_properties): Add "builder" param. (checker_event::m_logical_loc): Convert from tree_logical_location to logical_location. (superedge_event::maybe_add_sarif_properties): Add sarif_builder param. * checker-path.h (checker_path::checker_path): Add logical_loc_mgr param. * diagnostic-manager.cc (diagnostic_manager::emit_saved_diagnostic): Pass logical location manager to emission_path ctor. (diagnostic_manager::get_logical_location_manager): New. * diagnostic-manager.h (diagnostic_manager::get_logical_location_manager): New decl. gcc/ChangeLog: * diagnostic-client-data-hooks.h: Include "logical-location.h". (diagnostic_client_data_hooks::get_logical_location_manager): New. (diagnostic_client_data_hooks::get_current_logical_location): Convert return type from const logical_location * to logical_location. * diagnostic-format-json.cc: Include "diagnostic-client-data-hooks.h". (make_json_for_path): Update to use logical_location_manager from the context. * diagnostic-format-sarif.cc (sarif_builder::get_logical_location_manager): New. (sarif_builder::make_location_object): Update type of logical_loc from "const logical_location " to "logical_location". (sarif_builder::set_any_logical_locs_arr): Likewise. (sarif_builder::m_logical_loc_mgr): New field. (sarif_result::on_nested_diagnostic): Use logical_location default ctor rather than nullptr. (sarif_builder::sarif_builder): Initialize m_logical_loc_mgr from context's client data hooks. (sarif_builder::make_locations_arr): Convert type of logical_loc from from "const logical_location " to "logical_location". (sarif_builder::set_any_logical_locs_arr): Likewise. Pass manager to make_sarif_logical_location_object. (sarif_builder::make_location_object): Likewise. (sarif_property_bag::set_logical_location): New. (make_sarif_logical_location_object): Update for introduction of logical_location_manager. (populate_thread_flow_location_object): Pass builder to ev.maybe_add_sarif_properties. (selftest::test_make_location_object): Use logical_location default ctor rather than nullptr. * diagnostic-format-sarif.h (class logical_location): Replace forward decl with include of "logical-location.h". (class sarif_builder): New forward decl. (sarif_property_bag::set_logical_location): New. (make_sarif_logical_location_object): Add "mgr" param. * diagnostic-path.cc (diagnostic_path::get_first_event_in_a_function): Update for change of logical_location type. (per_thread_summary::per_thread_summary): Pass in "logical_loc_mgr". (per_thread_summary::m_logical_loc_mgr): New field. (event_range::m_logical_loc): Update for change of logical_location type. (path_summary::get_logical_location_manager): New accessor. (path_summary::m_logical_loc_mgr): New field. (path_summary::get_or_create_events_for_thread_id): Pass m_logical_loc_mgr to per_thread_summary ctor. (path_summary::path_summary): Initialize m_logical_loc_mgr. (thread_event_printer::print_swimlane_for_event_range): Add param "logical_loc_mgr". Update for change in logical_loc type. (print_path_summary_as_text): Pass manager to thread_event_printer::print_swimlane_for_event_range. (diagnostic_text_output_format::print_path): Update for introduction of logical_location_manager. * diagnostic-path.h: Include "logical-location.h". (class sarif_builder): New forward decl. (diagnostic_event::get_logical_location): Convert return type from "const logical_location " to "logical_location". (diagnostic_event::maybe_add_sarif_properties): Add sarif_builder param. (diagnostic_path::get_logical_location_manager): New accessor. (diagnostic_path::diagnostic_path): New ctor, taking manager. (diagnostic_path::m_logical_loc_mgr): New field. diagnostic.cc (diagnostic_context::get_logical_location_manager): New. (logical_location::function_p): Convert to... (logical_location_manager::function_p): ...this. * diagnostic.h (class logical_location): Replace forward decl with... (class logical_location_manager): ...this. (diagnostic_context::get_logical_location_manager): New decl. * lazy-diagnostic-path.cc (selftest::test_lazy_path::test_lazy_path): Pass m_logical_loc_mgr to path ctor. (selftest::test_lazy_path::make_inner_path): Likewise. (selftest::test_lazy_path::m_logical_loc_mgr): New field. * lazy-diagnostic-path.h (lazy_diagnostic_path::lazy_diagnostic_path): New ctor. * libgdiagnostics.cc (struct diagnostic_logical_location): Convert from subclass of logical_location to a plain struct, dropping accessors. (class impl_logical_location_manager): New. (impl_diagnostic_client_data_hooks::get_logical_location_manager): New (impl_diagnostic_client_data_hooks::m_logical_location_manager): New field. (diagnostic_manager::get_logical_location_manager): New. (libgdiagnostics_path_event::get_logical_location): Reimplement. (diagnostic_execution_path::diagnostic_execution_path): Add logical_loc_mgr and pass to base class. (diagnostic_execution_path::same_function_p): Update for change to logical_location type. (diagnostic::add_execution_path): Pass logical_loc_mgr to path ctor. (impl_diagnostic_client_data_hooks::get_current_logical_location): Reimplement. (diagnostic_text_sink::text_starter): Reimplement printing of logical location. (diagnostic_manager::new_execution_path): Pass mgr to path ctor. (diagnostic_manager_debug_dump_logical_location): Update for changes to diagnostic_logical_location. (diagnostic_logical_location_get_kind): Likewise. (diagnostic_logical_location_get_parent): Likewise. (diagnostic_logical_location_get_short_name): Likewise. (diagnostic_logical_location_get_fully_qualified_name): Likewise. (diagnostic_logical_location_get_decorated_name): Likewise. * logical-location.h (class logical_location_manager): New. (class logical_location): Convert to typedef of logical_location_manager::key. * selftest-diagnostic-path.cc (selftest::test_diagnostic_path::test_diagnostic_path): Pass m_test_logical_loc_mgr to base ctor. (selftest::test_diagnostic_path::same_function_p): Use pointer comparison. (selftest::test_diagnostic_path::add_event): Use logical_location_from_funcname. (selftest::test_diagnostic_path::add_thread_event): Likewise. (selftest::test_diagnostic_path::logical_location_from_funcname): New. (selftest::test_diagnostic_event::test_diagnostic_event): Fix indentation. Pass logical_location rather than const char . selftest-diagnostic-path.h (selftest::test_diagnostic_event::test_diagnostic_event): Likewise. (selftest::test_diagnostic_event::get_logical_location): Update for change to logical_location type. (selftest::test_diagnostic_event::get_function_name): Drop. (selftest::test_diagnostic_event::m_logical_loc): Convert from test_logical_location to logical_location. (selftest::test_diagnostic_path::logical_location_from_funcname): New. (selftest::test_diagnostic_path::m_test_logical_loc_mgr): New field. * selftest-logical-location.cc: Include "selftest.h". (selftest::test_logical_location::test_logical_location): Drop. (selftest::test_logical_location_manager::~test_logical_location_manager): New. (selftest::test_logical_location::get_short_name): Replace with... (selftest::test_logical_location_manager::get_short_name): ...this. (selftest::test_logical_location::get_name_with_scope): Replace with... (selftest::test_logical_location_manager::get_name_with_scope): ...this. (selftest::test_logical_location::get_internal_name): Replace with... (selftest::test_logical_location_manager::get_internal_name): ...this. (selftest::test_logical_location::get_kind): Replace with... (selftest::test_logical_location_manager::get_kind): ...this. (selftest::test_logical_location::get_name_for_path_output): Replace with... (selftest::test_logical_location_manager::get_name_for_path_output): ...this. (selftest::test_logical_location_manager::logical_location_from_funcname): New. (selftest::test_logical_location_manager::item_from_funcname): New. (selftest::selftest_logical_location_cc_tests): New. * selftest-logical-location.h (class test_logical_location): Replace with... (class test_logical_location_manager): ...this. * selftest-run-tests.cc (selftest::run_tests): Call selftest_logical_location_cc_tests. * selftest.h (selftest::selftest_logical_location_cc_tests): New decl. * simple-diagnostic-path.cc (simple_diagnostic_path::simple_diagnostic_path): Add "logical_loc_mgr" param and pass it to base ctor. (simple_diagnostic_event::simple_diagnostic_event): Update init of m_logical_loc. (selftest::test_intraprocedural_path): Update for changes to logical locations. * simple-diagnostic-path.h: Likewise. * tree-diagnostic-client-data-hooks.cc (compiler_data_hooks::get_logical_location_manger): New. (compiler_data_hooks::get_current_logical_location): Update. (compiler_data_hooks::m_current_fndecl_logical_loc): Replace with... (compiler_data_hooks::m_logical_location_manager): ...this. * tree-logical-location.cc (compiler_logical_location::get_short_name_for_tree): Replace with... (tree_logical_location_manager::get_short_name): ...this. (compiler_logical_location::get_name_with_scope_for_tree): Replace with... (tree_logical_location_manager::get_name_with_scope): ...this. (compiler_logical_location::get_internal_name_for_tree): Replace with... (tree_logical_location_manager::get_internal_name): ...this. (compiler_logical_location::get_kind_for_tree): Replace with... (tree_logical_location_manager::get_kind): ...this. (compiler_logical_location::get_name_for_tree_for_path_output): Replace with... (tree_logical_location_manager::get_name_for_path_output): ...this. (tree_logical_location::get_short_name): Drop. (tree_logical_location::get_name_with_scope): Drop. (tree_logical_location::get_internal_name): Drop. (tree_logical_location::get_kind): Drop. (tree_logical_location::get_name_for_path_output): Drop. (current_fndecl_logical_location::get_short_name): Drop. (current_fndecl_logical_location::get_name_with_scope): Drop. (current_fndecl_logical_location::get_internal_name): Drop. (current_fndecl_logical_location::get_kind): Drop. (current_fndecl_logical_location::get_name_for_path_output): Drop. * tree-logical-location.h (class compiler_logical_location): Drop. (class tree_logical_location): Drop. (class current_fndecl_logical_location): Drop. (class tree_logical_location_manager): New. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic_plugin_test_paths.cc: Update for changes to simple_diagnostic_path. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-05-06	tree-optimization/1157777 - STLF fails with BB vectorization of loop	Richard Biener	1	-0/+15
	The following tries to address us BB vectorizing a loop body that swaps consecutive elements of an array like for bubble-sort. This causes the vector store in the previous iteration to fail to forward to the vector load in the current iteration since there's a partial overlap. We try to detect this situation by looking for a load to store data dependence and analyze this with respect to the containing loop for a proven problematic access. Currently the search for a problematic pair is limited to loads and stores in the same SLP instance which means the problematic load happens in the next loop iteration and larger dependence distances are not considered. On x86 with generic costing this avoids vectorizing the loop body, but once you do core-specific tuning the saved cost for the vector store vs. the scalar stores makes vectorization still profitable, but at least the STLF issue is avoided. For example on my Zen4 machine with -O2 -march=znver4 the testcase in the PR is improving from insertion_sort => 2327 to insertion_sort => 997 but plain -O2 (or -fno-tree-slp-vectorize) gives insertion_sort => 183 In the end a better target-side cost model for small vector vectorization is needed to reject this vectorization from this side. I'll note this is a machine independent heuristic (similar to the avoid-store-forwarding RTL optimization pass), I expect that uarchs implementing vectors will suffer from this kind of issue. I know some aarch64 uarchs can forward from upper/lower part stores, this isn't considered at the moment. The actual vector size/overlap distance check could be moved to a target hook if it turns out necessary. There might be the chance to use a smaller vector size for the loads avoiding the penalty rather than falling back to elementwise accesses, that's not implemented either. PR tree-optimization/1157777 * tree-vectorizer.h (_slp_tree::avoid_stlf_fail): New member. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize it. (vect_print_slp_tree): Dump it. * tree-vect-data-refs.cc (vect_slp_analyze_instance_dependence): For dataflow dependent loads of a store check whether there's a cross-iteration data dependence that for sure prohibits store-to-load forwarding and mark involved loads. * tree-vect-stmts.cc (get_group_load_store_type): For avoid_stlf_fail marked loads use VMAT_ELEMENTWISE. * gcc.dg/vect/bb-slp-pr115777.c: New testcase.
2025-05-06	gimple-fold: Fix fold_truth_andor_for_ifcombine [PR120074]	Jakub Jelinek	1	-0/+20
	The following testcase ICEs because of a mismatch between wide_int precision, in particular lr_and_mask has 32-bit precision while sign has 16-bit. decode_field_reference ensures that {ll,lr,rl,rr}_and_mask has {ll,lr,rl,rr}_bitsize precision, so the ll_and_mask \|= sign; and rl_and_mask \|= sign; and ll_and_mask &= sign; and rl_and_mask &= sign; cases should work right, sign has in those cases {ll,rl}_bitsize precision. The problem is that nothing until much later guarantees that ll_bitsize == lr_bitsize or rl_bitsize == rr_bitsize. In the testcase there is ((b ^ a) & 3) < 0 where a is 16-bit and b is 32-bit, so it is the lsignbit handling, and because of the xor the xor operand is moved to the r_and_mask, so with ll_and_mask being 16-bit 3 and lr_and_mask being 32-bit 3. Now, either b in the above case would be INTEGER_CST, in that case if rr_arg was also INTEGER_CST we'd use the l_const && r_const case and try to handle it, or we'd run into (though much later) if (ll_bitsize != lr_bitsize \|\| rl_bitsize != rr_bitsize ... return 0; One possibility is dealing with a different precision using wide_int::from. Another option used in this patch as it is safest is + if (ll_bitsize != lr_bitsize) + return 0; if (!lr_and_mask.get_precision ()) lr_and_mask = sign; else lr_and_mask &= sign; and similarly in the other hunk, i.e. punt if there is a mismatch early. And yet another option would be to compute the sign wide_int sign = wi::mask (ll_bitsize - 1, true, ll_bitsize); / If ll_arg is zero-extended and we're testing the sign bit, we know what the result should be. Shifting the sign bit out of sign will get us to mask the entire field out, yielding zero, i.e., the sign bit of the zero-extended value. We know the masked value is being compared with zero, so the compare will get us the result we're looking for: TRUE if EQ_EXPR, FALSE if NE_EXPR. / if (lsignbit > ll_bitsize && ll_unsignedp) sign <<= 1; once again for the lr_and_mask and rr_and_mask cases using rl_bitsize. As we just return 0; anyway unless l_const && r_const, if l_const & r_const are false it doesn't really matter what is chosen, but for the const cases it matters and I'm not sure what is right. So the second option might be safest. 2025-05-06 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/120074 gimple-fold.cc (fold_truth_andor_for_ifcombine): For lsignbit && l_xor case, punt if ll_bitsize != lr_bitsize. Similarly for rsignbit && r_xor case, punt if rl_bitsize != rr_bitsize. Formatting fix. * gcc.dg/pr120074.c: New test.
2025-05-05	Allow IPA_CP to handle UNDEFINED as VARYING.	Andrew MacLeod	1	-0/+12
	When applying a bitmask to reflect ranges, it is sometimes deferred and this can result in an UNDEFINED result. IPA is not expecting this, and add a check for it, and convert to VARYING if encountered. PR tree-optimization/120048 gcc/ * ipa-cp.cc (ipcp_store_vr_results): Check for UNDEFINED. gcc/testsuite/ * gcc.dg/pr120048.c: New.
2025-05-05	testsuite: Link gcc.dg/lto/modref-2_0 with libm	John David Anglin	1	-0/+1
	2025-05-05 John David Anglin <danglin@gcc.gnu.org> gcc/testsuite/ChangeLog: PR testsuite/120085 * gcc.dg/lto/modref-2_0.c: Link test with libm.
2025-05-05	vect-simd-clone-1[6-8][cd].c: Expect in-branch clones for x86: Fix target ↵	Thomas Schwinge	6	-6/+6
	selector syntax Fix-up for commit f9f81d5017adc5d860b24f67aeb89b4e79c7ebdb "vect-simd-clone-1[6-8][cd].c: Expect in-branch clones for x86", where we lost the relevant testing, for example, for x86_64, or GCN: PASS: gcc.dg/vect/vect-simd-clone-16c.c (test for excess errors) UNSUPPORTED: gcc.dg/vect/vect-simd-clone-16c.c -flto -ffat-lto-objects PASS: gcc.dg/vect/vect-simd-clone-16c.c execution test -PASS: gcc.dg/vect/vect-simd-clone-16c.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 0 PASS: gcc.dg/vect/vect-simd-clone-16d.c (test for excess errors) UNSUPPORTED: gcc.dg/vect/vect-simd-clone-16d.c -flto -ffat-lto-objects PASS: gcc.dg/vect/vect-simd-clone-16d.c execution test -PASS: gcc.dg/vect/vect-simd-clone-16d.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 0 PASS: gcc.dg/vect/vect-simd-clone-17c.c (test for excess errors) UNSUPPORTED: gcc.dg/vect/vect-simd-clone-17c.c -flto -ffat-lto-objects PASS: gcc.dg/vect/vect-simd-clone-17c.c execution test -PASS: gcc.dg/vect/vect-simd-clone-17c.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 0 PASS: gcc.dg/vect/vect-simd-clone-17d.c (test for excess errors) UNSUPPORTED: gcc.dg/vect/vect-simd-clone-17d.c -flto -ffat-lto-objects PASS: gcc.dg/vect/vect-simd-clone-17d.c execution test -PASS: gcc.dg/vect/vect-simd-clone-17d.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 0 PASS: gcc.dg/vect/vect-simd-clone-18c.c (test for excess errors) UNSUPPORTED: gcc.dg/vect/vect-simd-clone-18c.c -flto -ffat-lto-objects PASS: gcc.dg/vect/vect-simd-clone-18c.c execution test -PASS: gcc.dg/vect/vect-simd-clone-18c.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 0 PASS: gcc.dg/vect/vect-simd-clone-18d.c (test for excess errors) UNSUPPORTED: gcc.dg/vect/vect-simd-clone-18d.c -flto -ffat-lto-objects PASS: gcc.dg/vect/vect-simd-clone-18d.c execution test -PASS: gcc.dg/vect/vect-simd-clone-18d.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 0 ..., which this commit restores. PR middle-end/112877 gcc/testsuite/ * gcc.dg/vect/vect-simd-clone-16c.c: Fix target selector syntax. * gcc.dg/vect/vect-simd-clone-16d.c: Likewise. * gcc.dg/vect/vect-simd-clone-17c.c: Likewise. * gcc.dg/vect/vect-simd-clone-17d.c: Likewise. * gcc.dg/vect/vect-simd-clone-18c.c: Likewise. * gcc.dg/vect/vect-simd-clone-18d.c: Likewise.
2025-05-05	testsuite/120084 - adjust gcc.dg/lto/pr60779_0.c	Richard Biener	1	-0/+1
	Require the linker plugin so functions are properly detected as unused when inlined. PR testsuite/120084 * gcc.dg/lto/pr60779_0.c: Require linker-plugin.
2025-05-02	simplify-rtl: Fix crash due to simplify_with_subreg_not [PR120059]	Andrew Pinski	1	-0/+17
	r16-286-gd84fbc516ea57d added a call to simplify_gen_subreg but didn't check if the result of simplify_gen_subreg was non-null. simplify_gen_subreg can return NULL if the subreg would be not valid. In the case below we had a hard register for the SSE register xmm0 of mode SI and doing a subreg to QI mode but QImode is not a valid mode for the SSE register so simplify_gen_subreg would return NULL. This adds the obvious check. Pushed as obvious after bootstrap/test on x86_64-linux-gnu. PR rtl-optimization/120059 gcc/ChangeLog: * simplify-rtx.cc (simplify_with_subreg_not): Check the result of simplify_gen_subreg. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr120059-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-05-02	c: Fix up RAW_DATA_CST handling in check_constexpr_init [PR120057]	Jakub Jelinek	3	-0/+36
	The pr120057-1.c testcase is incorrectly rejected since r15-4377 (and for a while it also ICEd after the error), i.e. the optimization of large C initializers using RAW_DATA_CST. Similarly, the embed-18.c testcase is incorrectly rejected since the embed support has been introduced and RAW_DATA_CST used for that. The callers of check_constexpr_init (store_init_value and output_init_element) compute int_const_expr as int_const_expr = (TREE_CODE (init) == INTEGER_CST && !TREE_OVERFLOW (init) && INTEGRAL_TYPE_P (TREE_TYPE (init))); but that is only passed through down to check_constexpr_init. I think tweaking those 2 callers to also allow RAW_DATA_CST for int_const_expr when check_constexpr_init needs top special case it no matter what would be larger, so the patch just changes check_constexpr_init to deal with RAW_DATA_CST in the initializers. For TYPE_UNSIGNED char precision integral types RAW_DATA_CST is always valid, for !TYPE_UNSIGNED we need to check for 128-255 values being turned into negative ones. 2025-05-02 Jakub Jelinek <jakub@redhat.com> PR c/120057 * c-typeck.cc (check_constexpr_init): Handle RAW_DATA_CST. * gcc.dg/cpp/embed-18.c: New test. * gcc.dg/pr120057-1.c: New test. * gcc.dg/pr120057-2.c: New test.
2025-05-02	ranger: Improve nonnull_if_nonzero attribute [PR117023]	Jakub Jelinek	1	-0/+38
	On Mon, Mar 31, 2025 at 11:30:20AM -0400, Andrew MacLeod wrote: > Infer range processing was adjusted to allow a query to be specified, > but during VRP folding, ranger w3as not providing a query. This results > in contextual ranges being missed. Pass the cache in as the query > which provide a read-only query of the current state. Now that this patch is in, I've retested my patch and it works fine. If we can determine a range for the arg2 argument and prove that it doesn't include zero, we can imply nonzero for the arg1 argument. 2025-05-02 Jakub Jelinek <jakub@redhat.com> Andrew MacLeod <amacleod@redhat.com> PR c/117023 * gimple-range-infer.cc (gimple_infer_range::gimple_infer_range): For nonnull_if_nonzero attribute check also arg2 range if it doesn't include zero and in that case call add_nonzero too. * gcc.dg/tree-ssa/pr78154-2.c: New test.
2025-05-02	gimple: Switch bit-test lowering testcases for the more powerful alg	Filip Kastl	2	-0/+111
	This patch adds 2 testcases. One tests that GCC is able to create bit-test clusters of size 64. The other one contains two switches which GCC wouldn't completely cover with bit-test clusters before the changes from this patch set. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/switch-5.c: New test. * gcc.dg/tree-ssa/switch-6.c: New test. Signed-off-by: Filip Kastl <fkastl@suse.cz>
2025-05-02	c: Fix crash in c-typeck.cc convert_arguments with indirect calls	Florian Weimer	1	-0/+14
	gcc/c/ PR c/120055 * c-typeck.cc (convert_arguments): Check if fundecl is null before checking for builtin function declaration. gcc/testsuite/ * gcc.dg/Wdeprecated-non-prototype-6.c: New test.
2025-05-01	Fix BZ 119317: named loops (C2y) with debug info	Christopher Bazley	1	-0/+5
	Named loops (C2y) could not previously be compiled with -O1 and -ggdb2 or higher because the label preceding a loop (or switch) could not be found when using such command lines. This could be observed by compiling gcc/gcc/testsuite/gcc.dg/c2y-named-loops-1.c with the provoking command line (or any minimal example such as that cited in the bug report). The fix was simply to ignore the tree nodes inserted for debugging information. Base commit is 79aa2a283a8d3327ff4d6dca77e81d5b1ac3a01e PR c/119317 gcc/c/ChangeLog: * c-decl.cc (c_get_loop_names): Do not prematurely end the search for a label that names a loop or switch statement upon encountering a DEBUG_BEGIN_STMT. Instead, ignore any instances of DEBUG_BEGIN_STMT. gcc/testsuite/ChangeLog: * gcc.dg/c2y-named-loops-8.c: New test.
2025-05-01	c: Suppress -Wdeprecated-non-prototype warnings for builtins	Florian Weimer	1	-0/+14
	Builtins defined with BT_FN_INT_VAR etc. show as functions without a prototype and trigger the warning. gcc/c/ PR c/119950 * c-typeck.cc (convert_arguments): Check for built-in function declaration before warning. gcc/testsuite/ * gcc.dg/Wdeprecated-non-prototype-5.c: New test.
2025-05-01	Fix gcc.dg/tree-ssa/ssa-dom-thread-7.c for aarch64	Richard Biener	1	-1/+1
	So on another machine with a cross I see 17 jumps threaded, so adjusted like that. PR tree-optimization/120003 * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust aarch64 expected thread2 number of threads.
2025-04-30	analyzer: avoid saying "'0' is NULL"	David Malcolm	4	-4/+4
	gcc/analyzer/ChangeLog: * sm-malloc.cc (malloc_diagnostic::describe_state_change): Tweak the "EXPR is NULL" message for the case where EXPR is a null pointer. gcc/testsuite/ChangeLog: * c-c++-common/analyzer/data-model-path-1.c: Check for "using NULL here" message. * c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c: Likewise. Check for "return of NULL" message. * c-c++-common/analyzer/null-deref-pr108400-SoftEtherVPN-WebUi.c: Likewise. * gcc.dg/analyzer/data-model-5.c: Likewise. * gcc.dg/analyzer/data-model-5b.c: Likewise. * gcc.dg/analyzer/data-model-5c.c: Likewise. * gcc.dg/analyzer/torture/pr93647.c: Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-04-30	Revert "tree-optimization/119960 - failed external SLP promotion"	Richard Biener	1	-15/+0
	This reverts commit 51ba233fe2db562390a6e0a3618420889761bc77.
2025-04-30	tree-optimization/119960 - failed external SLP promotion	Richard Biener	1	-0/+15
	The following addresses a too conservative sanity check of SLP nodes we want to promote external. The issue lies in code generation for such external which relies on get_later_stmt to figure an insert location. But get_later_stmt relies on the ability to totally order stmts, specifically implementation-wise that they are all from the same BB, which is what is verified at the moment. The patch changes this to require stmts to be orderable by dominance queries. For simplicity and seemingly enough for the testcase in PR119960, this handles the case of two distinct BBs. PR tree-optimization/119960 * tree-vect-slp.cc (vect_slp_can_convert_to_external): Handle cases where defs from multiple BBs are ordered by their dominance relation. * gcc.dg/vect/bb-slp-pr119960-1.c: New testcase.
2025-04-30	ipa/120006 - wrong code with IPA PTA	Richard Biener	1	-0/+31
	When PTA gets support for special-handling more builtins in find_func_aliases the corresponding code in find_func_clobbers needs updating as well since for unhandled cases it assumes the former will populate ESCAPED accordingly. The following fixes a few omissions, the testcase runs into the missing strdup handling. I believe the more advanced handling using modref results and fnspecs opened a larger gap, the proper fix is to merge both functions, gating the clobber/use part on a parameter to avoid diverging. PR ipa/120006 * tree-ssa-structalias.cc (find_func_clobbers): Handle strdup, strndup, realloc, index, strchr, strrchr, memchr, strstr, strpbrk builtins like find_func_aliases does. * gcc.dg/torture/pr120006.c: New testcase.
2025-04-30	tree-optimization/120003 - missed jump threading	Richard Biener	2	-2/+21
	The following allows the entry and exit block of a jump thread path to be equal, which can easily happen when there isn't a forwarder on the interesting edge for an FSM thread conditional. We just don't want to enlarge the path from such a block. PR tree-optimization/120003 * tree-ssa-threadbackward.cc (back_threader::find_paths_to_names): Allow block re-use but do not enlarge the path beyond such a re-use. * gcc.dg/tree-ssa/ssa-thread-23.c: New testcase. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
2025-04-29	tree-optimization/119997 - &ptr->field no longer subject to PRE	Richard Biener	1	-0/+15
	The following makes PRE handle &ptr->field the same as VN by treating it as a POINTER_PLUS_EXPR when possible and thus as 'nary'. To facilitate this the patch splits out vn_pp_nary_for_addr and adds const overloads for vec::last. The patch also avoids handling an effective zero offset as POINTER_PLUS_EXPR. PR tree-optimization/119997 * vec.h (vec<T, A, vl_embed>::last): Provide const overload. (vec<T, va_heap, vl_ptr>::last): Likewise. * tree-ssa-sccvn.h (vn_pp_nary_for_addr): Declare. * tree-ssa-sccvn.cc (vn_pp_nary_for_addr): Split out from ... (vn_reference_lookup): ... here. (vn_reference_insert): ... and duplicate here. Do not handle zero offset as POINTER_PLUS_EXPR. * tree-ssa-pre.cc (compute_avail): Implement ADDR_EXPR-as-POINTER_PLUS_EXPR special casing. * gcc.dg/tree-ssa/ssa-pre-35.c: New testcase.
2025-04-28	Eliminate make-unique.h and ::make_unique	David Malcolm	6	-61/+60
	C++11 does not provide a std::make_unique so in r13-3627-g00d7c8ff16e683 I added a make-unique.h declaring a ::make_unique. As of r15-4719-ga9ec1bc06bd3cc we can use C++14, so make-unique.h is no longer needed: we can use simply use std::make_unique instead. This patch removes make-unique.h and updates every place using it to use std::make_unique. No functional change intended. gcc/analyzer/ChangeLog: * access-diagram.cc: Replace uses of ::make_unique with std::make_unique. * analyzer.cc: Likewise. * bounds-checking.cc: Likewise. * call-details.cc: Likewise. * call-info.cc: Likewise. * call-string.cc: Likewise. * checker-path.cc: Likewise. * common.h: Drop include of "make-unique.h". * constraint-manager.cc: Replace uses of ::make_unique with std::make_unique. * diagnostic-manager.cc: Likewise. * engine.cc: Likewise. * infinite-loop.cc: Likewise. * infinite-recursion.cc: Likewise. * kf-analyzer.cc: Likewise. * kf-lang-cp.cc: Likewise. * kf.cc: Likewise. * pending-diagnostic.cc: Likewise. * program-point.cc: Likewise; drop #include. * program-state.cc: Likewise. * ranges.cc: Likewise. * region-model.cc: Likewise. * region.cc: Likewise; drop #include. * sm-fd.cc: Likewise. * sm-file.cc: Likewise. * sm-malloc.cc: Likewise. * sm-pattern-test.cc: Likewise. * sm-sensitive.cc: Likewise. * sm-signal.cc: Likewise. * sm-taint.cc: Likewise. * sm.cc: Likewise. * store.cc: Likewise. * supergraph.cc: Likewise. * svalue.cc: Likewise; drop #include. * varargs.cc: Likewise. gcc/c-family/ChangeLog: * c-pretty-print.cc: Drop include of "make-unique.h". Replace uses of ::make_unique with std::make_unique. gcc/c/ChangeLog: * c-decl.cc: Drop include of "make-unique.h". Replace uses of ::make_unique with std::make_unique. * c-objc-common.cc: Likewise. * c-parser.cc: Likewise. gcc/cp/ChangeLog: * cxx-pretty-print.cc: Drop include of "make-unique.h". Replace uses of ::make_unique with std::make_unique. * error.cc: Likewise. * name-lookup.cc: Likewise. * parser.cc: Likewise. gcc/ChangeLog: * diagnostic-format-json.cc: Drop include of "make-unique.h". Replace uses of ::make_unique with std::make_unique. * diagnostic-format-sarif.cc: Likewise. * diagnostic-format-text.cc: Likewise. * diagnostic.cc: Likewise. * dumpfile.cc: Likewise. * gcc-attribute-urlifier.cc: Likewise. * gcc-urlifier.cc: Likewise. * json-parsing.cc: Likewise. * json.cc: Likewise. * lazy-diagnostic-path.cc: Likewise. * libgdiagnostics.cc: Likewise. * libsarifreplay.cc: Likewise. * lto-wrapper.cc: Likewise. * make-unique.h: Delete. * opts-diagnostic.cc: Drop include of "make-unique.h". Replace uses of ::make_unique with std::make_unique. * pretty-print.cc: Likewise. * text-art/style.cc: Likewise. * text-art/styled-string.cc: Likewise. * text-art/table.cc: Likewise. * text-art/tree-widget.cc: Likewise. * text-art/widget.cc: Likewise. * timevar.cc: Likewise. * toplev.cc: Likewise. * tree-diagnostic-client-data-hooks.cc: Likewise. gcc/jit/ChangeLog: * dummy-frontend.cc: Drop include of "make-unique.h". Replace uses of ::make_unique with std::make_unique. gcc/testsuite/ChangeLog: * gcc.dg/plugin/analyzer_cpython_plugin.cc: Drop include of "make-unique.h". Replace uses of ::make_unique with std::make_unique. * gcc.dg/plugin/analyzer_gil_plugin.cc: Likewise. * gcc.dg/plugin/analyzer_kernel_plugin.cc: Likewise. * gcc.dg/plugin/analyzer_known_fns_plugin.cc: Likewise. * gcc.dg/plugin/diagnostic_group_plugin.cc: Likewise. * gcc.dg/plugin/diagnostic_plugin_xhtml_format.cc: Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-04-28	analyzer: convert gcall * to gcall & in many places	David Malcolm	1	-12/+13
	No functional change intended. gcc/analyzer/ChangeLog: * analyzer.cc: Convert gcall * to gcall & where we know the pointer must be non-null. * call-details.cc: Likewise. * call-details.h: Likewise. * call-info.cc: Likewise. * call-info.h: Likewise. * call-summary.h: Likewise. * checker-event.cc: Likewise. * checker-event.h: Likewise. * common.h: Likewise. * diagnostic-manager.cc: Likewise. * engine.cc: Likewise. * exploded-graph.h: Likewise. * kf-analyzer.cc: Likewise. * kf-lang-cp.cc: Likewise. * kf.cc: Likewise. * known-function-manager.cc: Likewise. * program-state.cc: Likewise. * program-state.h: Likewise. * region-model.cc: Likewise. * region-model.h: Likewise. * sm-fd.cc: Likewise. * sm-file.cc: Likewise. * sm-malloc.cc: Likewise. * sm-sensitive.cc: Likewise. * sm-signal.cc: Likewise. * sm-taint.cc: Likewise. * sm.h: Likewise. * store.cc: Likewise. * store.h: Likewise. * supergraph.cc: Likewise. * supergraph.h: Likewise. * svalue.h: Likewise. * varargs.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/analyzer_gil_plugin.cc: Convert gcall * to gcall & where we know the pointer must be non-null. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-04-28	analyzer: convert various enums to "enum class"	David Malcolm	1	-2/+2
	Modernization; no functional change intended. gcc/analyzer/ChangeLog: * access-diagram.cc: Convert enum access_direction to "enum class". * bounds-checking.cc: Likewise. * checker-event.cc: Convert enum event_kind to "enum class". * checker-event.h: Likewise. * checker-path.cc: Likewise. * common.h: Convert enum access_direction to "enum class". * constraint-manager.cc: Convert enum bound_kind to "enum class". * constraint-manager.h: Likewise. * diagnostic-manager.cc: Convert enum event_kind to "enum class". * engine.cc: Convert enum status to "enum class". * exploded-graph.h: Likewise. * infinite-loop.cc: Likewise. * kf-lang-cp.cc: Convert enum poison_kind to "enum class". * kf.cc: Likewise. * region-model-manager.cc: Likewise. * region-model.cc: Likewise; also for enum access_direction. * svalue.cc: Likewise. * svalue.h: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/analyzer_cpython_plugin.cc: Convert enum poison_kind to "enum class". Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-04-28	analyzer: use analyzer/common.h as a common header	David Malcolm	4	-4/+4
	Our headers are a major pain to work with: many require certain other headers to be included in a particular (undocumented) order in order to be includable. Simplify includes in the analyzer by renaming analyzer/analyzer.h to analyzer/common.h and have it include all the common headers needed throughout the analyzer, thus encapsulating the rules about e.g. being able to include "gimple.h" in one place in the analyzer subdirectory. Doing so also makes it easier to e.g. define INCLUDE_SET in one place, rather than in many source files. gcc/analyzer/ChangeLog: * analyzer.h: Rename to... * common.h: ...this. Add define of INCLUDE_VECTOR, includes of "config.h", "system.h", "coretypes.h", "make-unique.h", "tree.h", "function.h", "basic-block.h", "gimple.h", "options.h", "bitmap.h", "diagnostic-core.h", and "diagnostic-path.h". * access-diagram.h: Don't include "analyzer/analyzer.h". * access-diagram.cc: Reorganize includes to #include "analyzer/common.h" first, then group by subsystem, dropping redundant headers. * analysis-plan.cc: Likewise. * analyzer-language.cc: Likewise. * analyzer-pass.cc: Likewise. * analyzer-selftests.cc: Likewise. * analyzer.cc: Likewise. * bounds-checking.cc: Likewise. * call-details.cc: Likewise. * call-info.cc: Likewise. * call-string.cc: Likewise. * call-summary.cc: Likewise. * checker-event.cc: Likewise. * checker-path.cc: Likewise. * complexity.cc: Likewise. * constraint-manager.cc: Likewise. * diagnostic-manager.cc: Likewise. * engine.cc: Likewise. * feasible-graph.cc: Likewise. * infinite-loop.cc: Likewise. * infinite-recursion.cc: Likewise. * kf-analyzer.cc: Likewise. * kf-lang-cp.cc: Likewise. * kf.cc: Likewise. * known-function-manager.cc: Likewise. * pending-diagnostic.cc: Likewise. * program-point.cc: Likewise. * program-state.cc: Likewise. * ranges.cc: Likewise. * record-layout.cc: Likewise. * region-model-asm.cc: Likewise. * region-model-manager.cc: Likewise. * region-model-reachability.cc: Likewise. * region-model.cc: Likewise. * region.cc: Likewise. * sm-fd.cc: Likewise. * sm-file.cc: Likewise. * sm-malloc.cc: Likewise. * sm-pattern-test.cc: Likewise. * sm-sensitive.cc: Likewise. * sm-signal.cc: Likewise. * sm-taint.cc: Likewise. * sm.cc: Likewise. * state-purge.cc: Likewise. * store.cc: Likewise. * supergraph.cc: Likewise. * svalue.cc: Likewise. * symbol.cc: Likewise. * trimmed-graph.cc: Likewise. * varargs.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/analyzer_cpython_plugin.cc: Update for renaming of analyzer/analyzer.h to analyzer/common.h. * gcc.dg/plugin/analyzer_gil_plugin.cc: Likewise. * gcc.dg/plugin/analyzer_kernel_plugin.cc: Likewise. * gcc.dg/plugin/analyzer_known_fns_plugin.cc: Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-04-28	Infer non-zero for integral division RHS.	Andrew MacLeod	1	-0/+13
	Adding op2_range for operator_div allows ranger to notice the divisor is non-zero after execution. PR tree-optimization/95801 gcc/ * range-op.cc (operator_div::op2_range): New. gcc/testsuite/ * gcc.dg/tree-ssa/pr95801.c: New.
2025-04-28	Always reflect lower bits from mask in subranges.	Andrew MacLeod	4	-6/+31
	During intersection, we expand the subranges to exclude the lower values from a bitmask with trailing zeros. This leads to inconsistant evaluations and in this case of this PR, that lead to an infinite cycle. Always expand the lower subranges in set_range_from_bitmask instead. PR tree-optimization/119712 gcc/ * value-range.cc (range_bitmask::adjust_range): Delete. (irange::set_range_from_bitmask): Integrate adjust_range. (irange::update_bitmask): Do nothing if bitmask doesnt change. (irange:intersect_bitmask): Do not call adjust_range. Exit if there is no second bitmask. * value-range.h (adjust_range): Remove prototype. gcc/testsuite/ * gcc.dg/pr119712.c: New. * gcc.dg/pr83072-2.c: Adjust. * gcc.dg/tree-ssa/phi-opt-value-5.c: Adjust. * gcc.dg/tree-ssa/vrp122.c: Adjust
2025-04-28	tailcall: Support ERF_RETURNS_ARG for tailcall [PR67797]	Andrew Pinski	2	-0/+41
	r15-6943-g9c4397cafc5ded added support to undo IPA-VRP return value optimization for tail calls, using the same code ERF_RETURNS_ARG can be supported for functions which return one of their arguments. This allows for tail calling of memset/memcpy in some cases which were not handled before. Note this is very similar to https://gcc.gnu.org/legacy-ml/gcc-patches/2016-11/msg02485.html except it has a few more checks. Also on the question of expand vs tail call here is that this path is also used by the IPA-VRP return value path and yes we get a tail call. Note in the review in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83142#c2 mentions about re-instantiate a LHS on the call & propagate to dominating uses. Even though that can be done for the ERF_RETURNS_ARG case, it is not done for the IPA-VRP return value case already so I don't think there is anything to be done there. Changes since v1: * v2: Add an useless_type_conversion_p check as suggested by Jakub and add a testcase for that. * v3: Fix the order of arguments to useless_type_conversion_p. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/67797 gcc/ChangeLog: * tree-tailcall.cc (find_tail_calls): Add support for ERF_RETURNS_ARG. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/tailcall-14.c: New test. * gcc.dg/tree-ssa/tailcall-15.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-04-28	gimplefe: Round trip of rotates [PR119432]	Andrew Pinski	1	-0/+11
	This adds support for rotate left/right to the GIMPLE front-end via __ROTATE_LEFT/__ROTATE_RIGHT oeprators. PR c/119432 gcc/c/ChangeLog: * gimple-parser.cc (gimple_binary_identifier_code): Add __ROTATE_LEFT and __ROTATE_RIGHT. gcc/ChangeLog: * tree-pretty-print.cc (op_symbol_code): For LROTATE_EXPR, output __ROTATE_LEFT for gimple. For RROTATE_EXPR output __ROTATE_RIGHT for gimple. gcc/testsuite/ChangeLog: * gcc.dg/gimplefe-57.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-04-28	ipa/119973 - IPA PTA issue with global initializers	Richard Biener	1	-0/+39
	For global initializers with IPA PTA we initialize them from the IPA reference data but that lacks references to the constant pool. The following conservatively considers the whole initializer. PR ipa/119973 * tree-ssa-structalias.cc (create_variable_info_for): Build constraints from DECL_INITIAL directly rather than the IPA reference list which is incomplete. * gcc.dg/torture/pr119973.c: New testcase.
2025-04-28	middle-end/60779 - LTO vs. -fcx-fortran-rules and -fcx-limited-range	Richard Biener	2	-0/+27
	The following changes how flag_complex_method is managed towards being able to record that in the optimization set so we can stream and restore it per function. Currently -fcx-fortran-rules and -fcx-limited-range are separate recorded options but saving/restoring does not restore flag_complex_method which is later used in the middle-end. The solution is to make -fcx-fortran-rules and -fcx-limited-range aliases of a new -fcx-method= switch that represents flag_complex_method directly so we can save and restore it. PR middle-end/60779 * common.opt (fcx-method=): New, map to flag_complex_method. (Enum complex_method): New. (fcx-limited-range): Alias to -fcx-method=limited-range. (fcx-fortran-rules): Alias to -fcx-medhot=fortran. * ipa-inline-transform.cc (inline_call): Check flag_complex_method. * ipa-inline.cc (can_inline_edge_by_limits_p): Likewise. * opts.cc (finish_options): Adjust. (set_fast_math_flags): Likewise. * doc/invoke.texi (fcx-method=): Document. * gcc.dg/lto/pr60779_0.c: New testcase. * gcc.dg/lto/pr60779_1.c: Likewise.
2025-04-27	c-family: Improve location for -Wunknown-pragmas in a _Pragma [PR118838]	Lewis Hyatt	2	-13/+17
	The warning for -Wunknown-pragmas is issued at the location provided by libcpp to the def_pragma() callback. This location is cpp_reader::directive_line, which is a location for the start of the line only; it is also not a valid location in case the unknown pragma was lexed from a _Pragma string. These factors make it impossible to suppress -Wunknown-pragmas via _Pragma("GCC diagnostic...") directives on the same source line, as in the PR and the test case. Address that by issuing the warning at a better location returned by cpp_get_diagnostic_override_loc(). libcpp already maintains this location to handle _Pragma-related diagnostics internally; it was needed also to make a publicly accessible version of it. gcc/c-family/ChangeLog: PR c/118838 * c-lex.cc (cb_def_pragma): Call cpp_get_diagnostic_override_loc() to get a valid location at which to issue -Wunknown-pragmas, in case it was triggered from a _Pragma. libcpp/ChangeLog: PR c/118838 * errors.cc (cpp_get_diagnostic_override_loc): New function. * include/cpplib.h (cpp_get_diagnostic_override_loc): Declare. gcc/testsuite/ChangeLog: PR c/118838 * c-c++-common/cpp/pragma-diagnostic-loc-2.c: New test. * g++.dg/gomp/macro-4.C: Adjust expected output. * gcc.dg/gomp/macro-4.c: Likewise. * gcc.dg/cpp/Wunknown-pragmas-1.c: Likewise.
2025-04-28	Fix size_t in id-15.c and infoleak-net-ethtool-ioctl.c for llp64	Jonathan Yong	2	-3/+2
	Use __SIZE_TYPE__ for size_t types so that it works for llp64. Signed-off-by: Jonathan Yong <10walls@gmail.com> gcc/testsuite/ChangeLog: * gcc.dg/graphite/id-15.c: Use __SIZE_TYPE__ instead of unsigned long. * gcc.dg/plugin/infoleak-net-ethtool-ioctl.c: ditto.
2025-04-27	ssa-fre-4.c: Enable for all targets and adjust scan match	H.J. Lu	1	-4/+2
	Since the C frontend no longer promotes char argument, enable ssa-fre-4.c for all targets and adjust scan match. PR middle-end/112877 * gcc.dg/tree-ssa/ssa-fre-4.c: Enable for all targets and adjust scan match. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-27	scev-cast.c: Enable for all targets and adjust scan matches	H.J. Lu	1	-3/+2
	Since the C frontend no longer promotes char argument, enable scev-cast.c for all targets and adjust scan matches. PR middle-end/112877 * gcc.dg/tree-ssa/scev-cast.c: Enable for all targets and adjust scan match. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-27	vect-simd-clone-1[6-8][cd].c: Expect in-branch clones for x86	H.J. Lu	6	-23/+6
	Since the C frontend no longer promotes char and short arguments, expect in-branch clones for x86. PR middle-end/112877 * gcc.dg/vect/vect-simd-clone-16c.c: Expect in-branch clones for x86. * gcc.dg/vect/vect-simd-clone-16d.c: Likewise. * gcc.dg/vect/vect-simd-clone-17c.c: Likewise. * gcc.dg/vect/vect-simd-clone-17d.c: Likewise. * gcc.dg/vect/vect-simd-clone-18c.c: Likewise. * gcc.dg/vect/vect-simd-clone-18d.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-25	modulo-sched: reject loop conditions when not decrementing with one [PR 116479]	Andre Vieira	1	-0/+26
	In the commit titled 'doloop: Add support for predicated vectorized loops' the doloop_condition_get function was changed to accept loops with decrements larger than 1. This patch rejects such loops for modulo-sched. gcc/ChangeLog: PR rtl-optimization/116479 * modulo-sched.cc (doloop_register_get): Reject conditions with decrements that are not 1. gcc/testsuite/ChangeLog: * gcc.dg/pr116479.c: New test.
2025-04-24	c: Allow $@` in GNU23/GNU2Y raw string delimiters [PR110343]	Jakub Jelinek	1	-0/+25
	Aaron mentioned in the PR that late in C23 N3124 was adopted and $@` are now part of basic character set. The paper has been implemented in GCC from what I can see, but we should allow for GNU23/2Y $@` in raw string delimiters as well, like they are allowed for C++26, because the delimiters can contain anything from basic character set but space, ()\, tab, form-feed, newline and backspace. 2025-04-24 Jakub Jelinek <jakub@redhat.com> PR c++/110343 * lex.cc (lex_raw_string): For C allow $@` in raw string delimiters if CPP_OPTION (pfile, low_ucns) i.e. for C23 and later. * gcc.dg/raw-string-1.c: New test.
2025-04-24	opts.cc Simplify handling of explicit -flto-partition= and ↵	Kyrylo Tkachov	1	-1/+0
	-fipa-reorder-for-locality The handling of an explicit -flto-partition= and -fipa-reorder-for-locality should be simpler. No need to have a new default option. We can use opts_set to check if -flto-partition is explicitly set and use that information in the error handling. Remove -flto-partition=default and update accordingly. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * common.opt (LTO_PARTITION_DEFAULT): Delete. (flto-partition=): Change default back to balanced. * flag-types.h (lto_partition_model): Remove LTO_PARTITION_DEFAULT. * opts.cc (validate_ipa_reorder_locality_lto_partition): Check opts_set->x_flag_lto_partition instead of LTO_PARTITION_DEFAULT. (finish_options): Remove handling of LTO_PARTITION_DEFAULT. gcc/testsuite/ * gcc.dg/completion-2.c: Remove check for default.
2025-04-23	Enable ip-cp cloning over non-hot edges	Jan Hubicka	2	-0/+60
	Currently enabling profile feedback regresses x264 and exchange. In both cases the root of the issue is that ipa-cp cost model thinks cloning is not relevant when feedback is available while it clones without feedback. Consider: __attribute__ ((used)) int a[1000]; __attribute__ ((noinline)) void test2(int sz) { for (int i = 0; i < sz; i++) a[i]++; asm volatile (""::"m"(a)); } __attribute__ ((noinline)) void test1 (int sz) { for (int i = 0; i < 1000; i++) test2(sz); } int main() { test1(1000); return 0; } Here we want to clone call both test1 and test2 and specialize for 1000, but ipa-cp will not do that, since it will skip call main->test1 as not hot since it is called just once both with or without profile feedback. In this simple testcase even without profile feedback we will track that main is called once. I think the testcase shows that hotness of call is not that relevant when deciding whether we want to propagate constants across it. ipa-cp with IPA profile can compute overall estimate of time saved (which is existing time benefit computing time saved per invociation of the function multiplied by number of executions) and see if result is big enough. An easy check is to simply call maybe_hot_p on the resulting count. So this patch makes ipa-cp to consider all calls sites except those known to be unlikely executed (i.e. run 0 times in train run or known to lead to someting bad) as interesting, which makes ipa-cp to propagate across them, find cloning candidates and feed them into good_clonning_oppurtunity. For this I added cs_interesting_for_ipcp_p which also attempts to do right thing with partial training. Now good_clonning_oppurtunity will currently return false, since it will figure out that the call edge is not very frequent. It already kind of knows that frequency of call instruction istself is not too important, but instead of computing overall time saved, it tries to compare it with param_ipa_cp_profile_count_base percentage of counts of call edges. I think this is not very relevant since estimated time saved per call can be large. So I dropped this logic and replaced it with simple use of overall saved time. Since ipa-cp is not dealing well with the cases where it hits the allowed unit growth limit, we probably want to be more careful, so I keep existing metric with this change. So now we get: Evaluating opportunities for test1/3. - considering value 1000 for param #0 sz (caller_count: 1) good_cloning_opportunity_p (time: 1, size: 8, count_sum: 1 (precise), overall time saved: 1 (adjusted)) -> evaluation: 0.12, threshold: 500 not cloning: time saved is not hot good_cloning_opportunity_p (time: 129001, size: 20, count_sum: 1 (precise), overall time saved: 129001 (adjusted)) -> evaluation: 6450.05, threshold: 500 First call to good_cloning_oppurtunity considers the case where only test1 is clonned. In this case time saved is 1 (for passing the value around) and since it is called just once (count_sum) overall time saved is 1 which is not considered hot and we also get very low evaulation score. In the second call we consider cloning chain test1->test2. In this case time saved is large (12901) since test2 is invoked many times and it is used to controll the loop. We still know that the count is 1 but overall time is 129001 which is already considered relevant and we clone. I also try to do something sensible in case we have calls both with and without IPA profile (which can happen for comdats where profile got missing or with LTO if some units were not trained). Instead of checking whether sum of calls with known profile is nonzero, I keep track if there are other calls and if so, also try the local heuristics that is used without profile feedback. The patch improves SPECint with -Ofast -fprofile-use by approx 1% by speeding up x264 from 99.3s to 91.3s (9%) and exchange from 99.7s to 95.5s (3.3%). We still get better x264 runtime without profile (86.4s for x264 and 93.8 for exchange). The main problem I see is that ipa-cp has the global limit for growth of 10% but does not consider the oppurtunities in priority order. Consequently if the limit is hit, randomly some clone oppurtunities are dropped in favour of others. I dumped unit size changes with -flto -Ofast build of SPEC2017. Without patch I get: orig new growth 588677 605385 102.838229 4378 6037 137.894016 484650 494851 102.104818 4111 4111 100.000000 99953 103519 103.567677 106181 114889 108.201091 21389 21597 100.972462 24925 26746 107.305918 15308 23974 156.610922 27354 27906 102.017986 494 494 100.000000 4631 4631 100.000000 863216 872729 101.102042 126604 126604 100.000000 605138 627156 103.638509 4112 4112 100.000000 222006 231293 104.183220 2952 3384 114.634146 37584 39807 105.914751 4111 4111 100.000000 13226 13226 100.000000 4111 4111 100.000000 326215 337396 103.427494 25240 25433 100.764659 64644 65972 102.054328 127223 132300 103.990631 494 494 100.000000 Small units can grow up to 16000 instructions and other units are large. So there is only one 156% growth hititng limits which is exchange that has recursive clonning that goes specially. With profile feedback ipacp basically shuts itself off: 333815 333891 100.022767 2559 2974 116.217272 217576 217581 100.002298 2749 2749 100.000000 64652 64716 100.098992 68416 69707 101.886986 13171 13171 100.000000 11849 11849 100.000000 10519 16180 153.816903 15843 15843 100.000000 231 231 100.000000 3624 3624 100.000000 573385 573386 100.000174 97623 97623 100.000000 295673 295676 100.001015 2750 2750 100.000000 130723 130726 100.002295 2334 2334 100.000000 19313 19313 100.000000 2749 2749 100.000000 517331 517331 100.000000 6707 6707 100.000000 2749 2749 100.000000 193638 193638 100.000000 16425 16425 100.000000 47154 47154 100.000000 96422 96422 100.000000 231 231 100.000000 So we essentially clone only exchange and and mcf (116%) With patch and no FDO I get: 588677 605385 102.838229 4378 6037 137.894016 484519 494698 102.100846 4111 4111 100.000000 99953 103519 103.567677 106181 114889 108.201091 21389 22632 105.811398 24854 26620 107.105496 15308 23974 156.610922 27354 28039 102.504204 494 494 100.000000 4631 4631 100.000000 4631 4631 100.000000 126604 126630 100.020536 4112 4112 100.000000 222006 231293 104.183220 2952 3384 114.634146 37584 39807 105.914751 2760715 2835539 102.710312 4111 4111 100.000000 13226 13226 100.000000 4111 4111 100.000000 326215 337396 103.427494 25240 25433 100.764659 64644 65972 102.054328 127223 132300 103.990631 494 494 100.000000 which seems essentially same as without patch. However with FDO I get: 333815 350363 104.957237 2559 3345 130.715123 217469 220765 101.515618 485599 488772 100.653420 2749 2749 100.000000 64652 74265 114.868836 68416 87484 127.870674 13171 20656 156.829398 11792 11990 101.679104 10519 17028 161.878506 15843 16119 101.742094 231 231 100.000000 573336 573336 100.000000 97623 97623 100.000000 295497 296208 100.240612 2750 2750 100.000000 130723 133341 102.002708 2334 2334 100.000000 19313 19368 100.284782 2749 2749 100.000000 6707 6755 100.715670 2749 2749 100.000000 193638 194712 100.554643 16425 17377 105.796043 47154 47154 100.000000 96422 96422 100.000000 231 231 100.000000 So here we get 114% and 127 growth in x264 (two differen tbinaries) 56% growht in Deepsjeng, 61% growth in Exchange which all are above 10% cutoff. Bootstrapped/regtested x86_64-linux. gcc/ChangeLog: * ipa-cp.cc (base_count): Remove. (struct caller_statistics): Rename n_hot_calls to n_interesting_calls; add called_without_ipa_profile. (init_caller_stats): Update. (cs_interesting_for_ipcp_p): New function. (gather_caller_stats): collect n_interesting_calls and called_without_profile. (ipcp_cloning_candidate_p): Use n_interesting-calls rather then hot. (good_cloning_opportunity_p): Rewrite heuristics when IPA profile is present (estimate_local_effects): Update. (value_topo_info::propagate_effects): Update. (compare_edge_profile_counts): Remove. (ipcp_propagate_stage): Do not collect base_count. (get_info_about_necessary_edges): Record whether function is called without profile. (decide_about_value): Update. (ipa_cp_cc_finalize): Do not initialie base_count. * profile-count.cc (profile_count::operator): New. (profile_count::operator=): New. * profile-count.h (profile_count::operator): Declare (profile_count::operator=): Declare. * params.opt: Remove ipa-cp-profile-count-base. * doc/invoke.texi: Likewise.
2025-04-23	testsuite: AMDGCN test for vect-early-break_38.c as well to consistent ↵	Tamar Christina	1	-0/+1
	architecture [PR119286] I had missed this one during the AMDGCN test failures. Like vect-early-break_18.c this test is also scalaring the loads and thus leading to unexpected vectorization for this testcase. gcc/testsuite/ChangeLog: PR target/119286 * gcc.dg/vect/vect-early-break_38.c: Force -march=gfx908 for amdgcn.