riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-04	c++: add a testcase for [PR 108620]	Arsen Arsenović	1	-0/+95
	Fixed by r15-2540-g32e678b2ed7521. Add a testcase, as the original ones do not cover this particular failure mode. gcc/testsuite/ChangeLog: PR c++/108620 * g++.dg/coroutines/pr108620.C: New test.
2024-09-04	coros: mark .CO_YIELD as LEAF [PR106973]	Arsen Arsenović	2	-1/+23
	We rely on .CO_YIELD calls being followed by an assignment (optionally) and then a switch/if in the same basic block. This implies that a .CO_YIELD can never end a block. However, since a call to .CO_YIELD is still a call, if the function containing it calls setjmp, GCC thinks that the .CO_YIELD can introduce abnormal control flow, and generates an edge for the call. We know this is not the case; .CO_YIELD calls get removed quite early on and have no effect, and result in no other calls, so .CO_YIELD can be considered a leaf function, preventing generating an edge when calling it. PR c++/106973 - coroutine generator and setjmp PR c++/106973 gcc/ChangeLog: * internal-fn.def (CO_YIELD): Mark as ECF_LEAF. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr106973.C: New test.
2024-09-04	object-size: Use simple_dce_from_worklist in object-size pass	Andrew Pinski	1	-1/+8
	While trying to see if there was a way to improve object-size pass to use the ranger (for pointer plus), I noticed that it leaves around the statement containing __builtin_object_size if it was reduced to a constant. This fixes that by using simple_dce_from_worklist. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-object-size.cc (object_sizes_execute): Mark lhs for maybe dceing if doing a propagate. Call simple_dce_from_worklist. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-04	Use dg-additional-options for gfortran.dg/vect/vect-8.f90 and RISC-V	Richard Biener	1	-1/+1
	r14-9122-g67a29f99cc8138 disabled scheduling on a lot of testcases for RISC-V for PR113249 but using dg-options. This makes gfortran.dg/vect/vect-8.f90 UNRESOLVED as it relies on default flags to enable vectorization. The following uses dg-additional-options instead. Tested on riscv64-linux with qemu-user, pushed. I didn't check all the other adjusted tests for similar issues. * gfortran.dg/vect/vect-8.f90: Use dg-additional-options.
2024-09-04	nvptx: Use 'enum ptx_version', 'enum ptx_isa' instead of 'int'	Thomas Schwinge	6	-22/+37
	This allows getting rid of the respective type casts. No change in behavior intended. gcc/ * config/nvptx/gen-opt.sh: Use 'enum ptx_isa' instead of 'int'. * config/nvptx/nvptx-gen.opt: Regenerate. * config/nvptx/nvptx.opt: Use 'enum ptx_version' instead of 'int'. * config/nvptx/nvptx-opts.h (enum ptx_isa): Add 'PTX_ISA_unset'. (enum ptx_version): Add 'PTX_VERSION_unset'. * config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Adjust. * config/nvptx/nvptx.cc (default_ptx_version_option) (handle_ptx_version_option, nvptx_option_override) (nvptx_file_start): Likewise.
2024-09-04	Fix branch prediction dump message	Frederik Harwath	1	-1/+1
	Instead of, for instance, "Loop got predicted 1 to iterate 10 times" the message should be "Loop 1 got predicted to iterate 10 times". gcc/ChangeLog: * predict.cc (pass_profile::execute): Fix dump message. Co-authored-by: Thomas Schwinge <tschwinge@baylibre.com>
2024-09-04	Fix gimple_debug_cfg declaration	Frederik Harwath	1	-1/+1
	Silence a warning. The argument type did not match the definition. gcc/ChangeLog: * tree-cfg.h (gimple_debug_cfg): Change argument type from int to dump_flags_t.
2024-09-04	Document 'pass_postreload' vs. 'pass_late_compilation'	Thomas Schwinge	2	-1/+16
	See Subversion r217124 (Git commit 433e4164339f18d0b8798968444a56b681b5232c) "Reorganize post-ra pipeline for targets without register allocation". gcc/ * passes.cc: Document 'pass_postreload' vs. 'pass_late_compilation'. * passes.def: Likewise.
2024-09-04	nvptx: Specify '-mno-alias' for 'gcc.dg/pr60797.c' [PR60797, PR104957]	Thomas Schwinge	1	-2/+4
	2014 Subversion r209299 (Git commit 8330537b5b58bd0532a0a49f9cbd59bf526a7847) "Fix PR60797" added this test case, which we now amend so that it's able to test its thing also in '--target=nvptx-none' configurations with symbol alias support enabled (..., and test nvptx '-mno-alias'). PR middle-end/60797 PR target/104957 gcc/testsuite/ * gcc.dg/pr60797.c: For nvptx, specify '-mno-alias'.
2024-09-04	Add 'gcc.target/nvptx/alias-to-alias-1.c'	Thomas Schwinge	1	-0/+27
	... similar to alias to alias usage in 'libgomp.c-c++-common/pr96390.c'. PR target/104957 gcc/testsuite/ * gcc.target/nvptx/alias-to-alias-1.c: New.
2024-09-04	Add 'gcc.target/nvptx/alias-weak-1.c'	Thomas Schwinge	1	-0/+10
	... testing for the GCC/nvptx "weak alias definitions not supported" error diagnostic (limitation of PTX). gcc/testsuite/ * gcc.target/nvptx/alias-weak-1.c: New.
2024-09-04	rust: avoid clobbering LIBS	Marc Poulhiès	2	-14/+16
	Save LIBS around calls to AC_SEARCH_LIBS to avoid clobbering $LIBS. ChangeLog: * configure: Regenerate. * configure.ac: Save LIBS around calls to AC_SEARCH_LIBS. Signed-off-by: Marc Poulhiès <dkm@kataplop.net> Reviewed-by: Thomas Schwinge <tschwinge@baylibre.com> Tested-by: Thomas Schwinge <tschwinge@baylibre.com>
2024-09-04	Also lower SLP grouped loads with just one consumer	Richard Biener	3	-20/+39
	This makes sure to produce interleaving schemes or load-lanes for single-element interleaving and other permutes that otherwise would use more than three vectors. It exposes the latent issue that single-element interleaving with large gaps can be inefficient - the mitigation in get_group_load_store_type doesn't trigger when we clear the load permutation. It also exposes the fact that not all permutes can be lowered in the best way in a vector length agnostic way so I've added an exception to keep power-of-two size contiguous aligned chunks unlowered (unless we want load-lanes). The optimal handling of load/store vectorization is going to continue to be a learning process. * tree-vect-slp.cc (vect_lower_load_permutations): Also process single-use grouped loads. Avoid lowering contiguous aligned power-of-two sized chunks, those are better handled by the vector size specific SLP code generation. * tree-vect-stmts.cc (get_group_load_store_type): Drop the unrelated requirement of a load permutation for the single-element interleaving limit. * gcc.dg/vect/slp-46.c: Remove XFAIL.
2024-09-04	Zen5 tuning part 5: update instruction latencies in x86-tune-costs	Jan Hubicka	1	-7/+21
	there is nothing exciting in this patch. I measured latencies and also compared them with newly released optimization guide. There are no dramatic changes compared to zen4. One interesting new bit is that addss is faster and can be 2 cycles when fed by another addss. I also increased the large insn bound since decoders seems no longer require instructions to be 8 bytes or less. gcc/ChangeLog: * config/i386/x86-tune-costs.h (znver5_cost): Update instruction costs.
2024-09-03	expand: Add dump for costing of positive divides	Andrew Pinski	1	-0/+7
	While trying to understand PR 115910 I found it was useful to print out the two costs of doing a signed and unsigned division just like was added in r15-3272-g3c89c41991d8e8 for popcount==1. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * expr.cc (expand_expr_divmod): Add dump of the two costs for positive division. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-04	CRIS: Add new peephole2 "lra_szext_decomposed_indir_plus"	Hans-Peter Nilsson	1	-0/+45
	Exposed when running the test-suite with -flate-combine-instructions. * config/cris/cris.md (lra_szext_decomposed_indir_plus): New peephole2 pattern.
2024-09-04	RISC-V: Allow IMM operand for unsigned scalar .SAT_ADD	Pan Li	8	-9/+9
	This patch would like to allow the IMM operand of the unsigned scalar .SAT_ADD. Like the operand 0, the operand 1 of .SAT_ADD will be zero extended to Xmode before underlying code generation. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_usadd): Zero extend the second operand of usadd as the first operand does. * config/riscv/riscv.md (usadd<m>3): Allow imm operand for scalar usadd pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_add-11.c: Make asm check robust. * gcc.target/riscv/sat_u_add-15.c: Ditto. * gcc.target/riscv/sat_u_add-19.c: Ditto. * gcc.target/riscv/sat_u_add-23.c: Ditto. * gcc.target/riscv/sat_u_add-3.c: Ditto. * gcc.target/riscv/sat_u_add-7.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-03	aarch64: Fix testcase vec-init-22-speed.c [PR116589]	Andrew Pinski	1	-1/+1
	For this testcase, the trunk produces: ``` f_s16: fmov s31, w0 fmov s0, w1 ``` While the testcase was expecting what was produced in GCC 14: ``` f_s16: sxth w0, w0 sxth w1, w1 fmov d31, x0 fmov d0, x1 ``` After r15-1575-gea8061f46a30 the code was: ``` dup v31.4h, w0 dup v0.4h, w1 ``` But when ext-dce was added with r15-1901-g98914f9eba5f19, we get the better code generation now and only fmov's. Pushed as obvious after running the testcase. PR target/116589 gcc/testsuite/ChangeLog: * gcc.target/aarch64/vec-init-22-speed.c: Update scan for better code gen. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-03	split-path: Improve ifcvt heurstic for split path [PR112402]	Andrew Pinski	7	-122/+88
	This simplifies the heurstic for split path to see if the join bb is a ifcvt candidate. For the predecessors bbs need either to be empty or only have one statement in them which could be a decent ifcvt candidate. The previous heurstics would miss that: ``` if (a) goto B else goto C; B: goto C; C: c = PHI<d,e> ``` Would be a decent ifcvt candidate. And would also miss: ``` if (a) goto B else goto C; B: d = f + 1; goto C; C: c = PHI<d,e> ``` Also since currently the max number of cmovs being able to produced is 3, we should only assume `<= 3` phis can be ifcvt candidates. The testcase changes for split-path-6.c is that lookharder function is a true ifcvt case where we would get cmov as expected; it looks like it was not a candidate when the heurstic was added but became one later on. pr88797.C is now rejected via it being an ifcvt candidate rather than being about DCE/const prop. The rest of the testsuite changes are just slight change in the dump, removing the "diamnond" part as it was removed from the print. Bootstrapped and tested on x86_64. PR tree-optimization/112402 gcc/ChangeLog: gimple-ssa-split-paths.cc (poor_ifcvt_pred): New function. (is_feasible_trace): Remove old heurstics for ifcvt cases. For num_stmts <=1 for both pred check poor_ifcvt_pred on both pred. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/split-path-11.c: Update scan. * gcc.dg/tree-ssa/split-path-2.c: Update scan. * gcc.dg/tree-ssa/split-path-5.c: Update scan. * gcc.dg/tree-ssa/split-path-6.c: Update scan. * g++.dg/tree-ssa/pr88797.C: Update scan. * gcc.dg/tree-ssa/split-path-13.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-03	split-paths: Move check for # of statements in join earlier	Andrew Pinski	1	-6/+13
	This moves the check for # of statements to copy in join to be the first check. This check is the cheapest check so it should be first. Plus add a print to the dump file since there was none beforehand. gcc/ChangeLog: * gimple-ssa-split-paths.cc (is_feasible_trace): Move check for # of statments in join earlier and add a debug print. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-03	Explicitly document that the "counted_by" attribute is only supported in C.	Qing Zhao	4	-1/+37
	The "counted_by" attribute currently is only supported in C, mention this explicitly in documentation and also issue warnings when see "counted_by" attribute in C++ with -Wattributes. gcc/c-family/ChangeLog: * c-attribs.cc (handle_counted_by_attribute): Is ignored and issues warning with -Wattributes in C++ for now. gcc/ChangeLog: * doc/extend.texi: Explicitly mentions counted_by is available only in C for now. gcc/testsuite/ChangeLog: * g++.dg/ext/flex-array-counted-by.C: New test. * g++.dg/ext/flex-array-counted-by-2.C: New test.
2024-09-03	c++: support C++11 attributes in C++98	Jason Merrill	7	-17/+17
	I don't see any reason why we can't allow the [[]] attribute syntax in C++98 mode with a pedwarn just like many other C++11 features. In fact, we already do support it in some places in the grammar, but not in places that check cp_nth_tokens_can_be_std_attribute_p. Let's also follow the C front-end's lead in only warning about them when -pedantic. It still isn't necessary for this function to guard against Objective-C message passing syntax; we handle that with tentative parsing in cp_parser_statement, and we don't call this function in that context anyway. gcc/cp/ChangeLog: * parser.cc (cp_nth_tokens_can_be_std_attribute_p): Don't check cxx_dialect. * error.cc (maybe_warn_cpp0x): Only complain about C++11 attributes if pedantic. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/gen-attrs-1.C: Also run in C++98 mode. * g++.dg/cpp0x/gen-attrs-11.C: Likewise. * g++.dg/cpp0x/gen-attrs-13.C: Likewise. * g++.dg/cpp0x/gen-attrs-15.C: Likewise. * g++.dg/cpp0x/gen-attrs-75.C: Don't expect C++98 warning after __extension__.
2024-09-03	PR116080: Fix test suite checks for musttail	Andi Kleen	12	-20/+38
	This is a new attempt to fix PR116080. The previous try was reverted because it just broke a bunch of tests, hiding the problem. - musttail behaves differently than tailcall at -O0. Some of the test run at -O0, so add separate effective target tests for musttail. - New effective target tests need to use unique file names to make dejagnu caching work - Change the tests to use new targets - Add a external_musttail test to check for target's ability to do tail calls between translation units. This covers some powerpc ABIs. gcc/testsuite/ChangeLog: PR testsuite/116080 * c-c++-common/musttail1.c: Use musttail target. * c-c++-common/musttail12.c: Use struct_musttail target. * c-c++-common/musttail2.c: Use musttail target. * c-c++-common/musttail3.c: Likewise. * c-c++-common/musttail4.c: Likewise. * c-c++-common/musttail7.c: Likewise. * c-c++-common/musttail8.c: Likewise. * g++.dg/musttail10.C: Likewise. Replace powerpc checks with external_musttail. * g++.dg/musttail11.C: Use musttail target. * g++.dg/musttail6.C: Use musttail target. Replace powerpc checks with external_musttail. * g++.dg/musttail9.C: Use musttail target. * lib/target-supports.exp: Add musttail, struct_musttail, external_musttail targets. Remove optimization for musttail. Use unique file names for musttail.
2024-09-03	pretty-print: split up pretty_printer::format into subroutines	David Malcolm	3	-112/+131
	The body of pretty_printer::format is almost 500 lines long, mostly comprising two distinct phases. This patch splits it up so that there are explicit subroutines for the two different phases, reducing the scope of various locals, and making it easier to e.g. put a breakpoint on phase 2. No functional change intended. gcc/ChangeLog: * pretty-print-markup.h (pp_markup::context::context): Drop params "buf" and "chunk_idx", initializing m_buf from pp. (pp_markup::context::m_chunk_idx): Drop field. * pretty-print.cc (pretty_printer::format): Convert param from a text_info * to a text_info &. Split out phase 1 and phase 2 into subroutines... (format_phase_1): New, from pretty_printer::format. (format_phase_2): Likewise. * pretty-print.h (pretty_printer::format): Convert param from a text_info * to a text_info &. (pp_format): Update for above change. Assert that text_info is non-null. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-09-03	pretty-print: add selftest of pp_format's stack	David Malcolm	2	-0/+81
	gcc/ChangeLog: * pretty-print-format-impl.h (pp_formatted_chunks::get_prev): New accessor. * pretty-print.cc (selftest::push_pp_format): New. (ASSERT_TEXT_TOKEN): New macro. (selftest::test_pp_format_stack): New test. (selftest::pretty_print_cc_tests): New. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-09-03	pretty-print: naming cleanups	David Malcolm	14	-160/+182
	This patch is a followup to r15-3311-ge31b6176996567 making some cleanups to pretty-printing to reflect those changes: - renaming "chunk_info" to "pp_formatted_chunks" - renaming "cur_chunk_array" to "m_cur_fomatted_chunks" - rewording/clarifying comments and taking the opportunity to add a "m_" prefix to all fields of output_buffer. No functional change intended. gcc/analyzer/ChangeLog: * analyzer-logging.cc (logger::logger): Prefix all output_buffer fields with "m_". gcc/c-family/ChangeLog: * c-ada-spec.cc (dump_ada_node): Prefix all output_buffer fields with "m_". * c-pretty-print.cc (pp_c_integer_constant): Likewise. (pp_c_integer_constant): Likewise. (pp_c_floating_constant): Likewise. (pp_c_fixed_constant): Likewise. gcc/c/ChangeLog: * c-objc-common.cc (print_type): Prefix all output_buffer fields with "m_". gcc/cp/ChangeLog: * error.cc (type_to_string): Prefix all output_buffer fields with "m_". (append_formatted_chunk): Likewise. Rename "chunk_info" to "pp_formatted_chunks" and field cur_chunk_array with m_cur_formatted_chunks. gcc/fortran/ChangeLog: * error.cc (gfc_move_error_buffer_from_to): Prefix all output_buffer fields with "m_". (gfc_diagnostics_init): Likewise. gcc/ChangeLog: * diagnostic.cc (diagnostic_set_caret_max_width): Prefix all output_buffer fields with "m_". * dumpfile.cc (emit_any_pending_textual_chunks): Likewise. (emit_any_pending_textual_chunks): Likewise. * gimple-pretty-print.cc (gimple_dump_bb_buff): Likewise. * json.cc (value::dump): Likewise. * pretty-print-format-impl.h (class chunk_info): Rename to... (class pp_formatted_chunks): ...this. Add friend class output_buffer. Update comment near end of decl to show the pp_formatted_chunks instance on the chunk_obstack. (pp_formatted_chunks::pop_from_output_buffer): Delete decl. (pp_formatted_chunks::on_begin_quote): Delete decl that should have been removed in r15-3311-ge31b6176996567. (pp_formatted_chunks::on_end_quote): Likewise. (pp_formatted_chunks::m_prev): Update for renaming. * pretty-print.cc (output_buffer::output_buffer): Prefix all fields with "m_". Rename "cur_chunk_array" to "m_cur_formatted_chunks". (output_buffer::~output_buffer): Prefix all fields with "m_". (output_buffer::push_formatted_chunks): New. (output_buffer::pop_formatted_chunks): New. (pp_write_text_to_stream): Prefix all output_buffer fields with "m_". (pp_write_text_as_dot_label_to_stream): Likewise. (pp_write_text_as_html_like_dot_to_stream): Likewise. (chunk_info::append_formatted_chunk): Rename to... (pp_formatted_chunks::append_formatted_chunk): ...this. (chunk_info::pop_from_output_buffer): Delete. (pretty_printer::format): Update leading comment to mention pushing pp_formatted_chunks, and to reflect changes in r15-3311-ge31b6176996567. Prefix all output_buffer fields with "m_". (pp_output_formatted_text): Update leading comment to mention popping a pp_formatted_chunks, and to reflect the changes in r15-3311-ge31b6176996567. Prefix all output_buffer fields with "m_" and rename "cur_chunk_array" to "m_cur_formatted_chunks". Replace call to chunk_info::pop_from_output_buffer with a call to output_buffer::pop_formatted_chunks. (pp_flush): Prefix all output_buffer fields with "m_". (pp_really_flush): Likewise. (pp_clear_output_area): Likewise. (pp_append_text): Likewise. (pretty_printer::remaining_character_count_for_line): Likewise. (pp_newline): Likewise. (pp_character): Likewise. (pp_markup::context::push_back_any_text): Likewise. * pretty-print.h (class chunk_info): Rename to... (class pp_formatted_chunks): ...this. (class output_buffer): Delete unimplemented rule-of-5 members. (output_buffer::push_formatted_chunks): New decl. (output_buffer::pop_formatted_chunks): New decl. (output_buffer::formatted_obstack): Rename to... (output_buffer::m_formatted_obstack): ...this. (output_buffer::chunk_obstack): Rename to... (output_buffer::m_chunk_obstack): ...this. (output_buffer::obstack): Rename to... (output_buffer::m_obstack): ...this. (output_buffer::cur_chunk_array): Rename to... (output_buffer::m_cur_formatted_chunks): ...this. (output_buffer::stream): Rename to... (output_buffer::m_stream): ...this. (output_buffer::line_length): Rename to... (output_buffer::m_line_length): ...this. (output_buffer::digit_buffer): Rename to... (output_buffer::m_digit_buffer): ...this. (output_buffer::flush_p): Rename to... (output_buffer::m_flush_p): ...this. (output_buffer_formatted_text): Prefix all output_buffer fields with "m_". (output_buffer_append_r): Likewise. (output_buffer_last_position_in_text): Likewise. (pretty_printer::set_output_stream): Likewise. (pp_scalar): Likewise. (pp_wide_int): Likewise. * tree-pretty-print.cc (dump_generic_node): Likewise. (dump_generic_node): Likewise. (pp_double_int): Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-09-03	c++: add fixed test [PR109095]	Marek Polacek	1	-0/+19
	Fixed by r13-6693. PR c++/109095 gcc/testsuite/ChangeLog: * g++.dg/cpp2a/nontype-class66.C: New test.
2024-09-03	Zen5 tuning part 4: update reassocation width	Jan Hubicka	2	-13/+20
	Zen5 has 6 instead of 4 ALUs and the integer multiplication can now execute in 3 of them. FP units can do 2 additions and 2 multiplications with latency 2 and 3. This patch updates reassociation width accordingly. This has potential of increasing register pressure but unlike while benchmarking znver1 tuning I did not noticed this actually causing problem on spec, so this patch bumps up reassociation width to 6 for everything except for integer vectors, where there are 4 units with typical latency of 1. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * config/i386/i386.cc (ix86_reassociation_width): Update for Znver5. * config/i386/x86-tune-costs.h (znver5_costs): Update reassociation widths.
2024-09-03	Drop file that should not have been committed.	Jeff Law	1	-1064/+0
	* J: Drop file that should not have been committed
2024-09-03	Zen5 tuning part 3: fix typo in previous patch	Jan Hubicka	1	-1/+1
	gcc/ChangeLog: * config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Fix typo.
2024-09-03	libstdc++: Fix error handling in fs::hard_link_count for Windows	Jonathan Wakely	2	-26/+57
	The recent change to use auto_win_file_handle for std::filesystem::hard_link_count caused a regression. The std::error_code argument should be cleared if no error occurs, but this no longer happens. Add a call to ec.clear() in fs::hard_link_count to fix this. Also change the auto_win_file_handle class to take a reference to the std::error_code and set it if an error occurs, to slightly simplify the control flow in the fs::equiv_files function. libstdc++-v3/ChangeLog: * src/c++17/fs_ops.cc (auto_win_file_handle): Add error_code& member and set it if CreateFileW or GetFileInformationByHandle fails. (fs::equiv_files) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Simplify control flow. (fs::hard_link_count) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Clear ec on success. * testsuite/27_io/filesystem/operations/hard_link_count.cc: Check error handling.
2024-09-03	libstdc++: Specialize std::disable_sized_sentinel_for for std::move_iterator ↵	Jonathan Wakely	2	-0/+60
	[PR116549] LWG 3736 added a partial specialization of this variable template for two std::move_iterator types. This is needed for the case where the types satisfy std::sentinel_for and are subtractable, but do not model the semantics requirements of std::sized_sentinel_for. libstdc++-v3/ChangeLog: PR libstdc++/116549 * include/bits/stl_iterator.h (disable_sized_sentinel_for): Define specialization for two move_iterator types, as per LWG 3736. * testsuite/24_iterators/move_iterator/lwg3736.cc: New test.
2024-09-03	Dump whether a SLP node represents load/store-lanes	Richard Biener	1	-2/+5
	This makes it easier to discover whether SLP load or store nodes participate in load/store-lanes accesses. * tree-vect-slp.cc (vect_print_slp_tree): Annotate load and store-lanes nodes.
2024-09-03	Fix missed peeling for gaps with SLP load-lanes	Richard Biener	1	-0/+1
	The following disables peeling for gap avoidance with using smaller vector accesses when using load-lanes. * tree-vect-stmts.cc (get_group_load_store_type): Only disable peeling for gaps by using smaller vectors when not using load-lanes.
2024-09-03	Zen5 tuning part 3: scheduler tweaks	Jan Hubicka	3	-3/+77
	this patch adds support for new fussion in znver5 documented in the optimization manual: The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions with certain ALU instructions. The following conditions need to be met for fusion to happen: - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B - The MOV is followed by an ALU instruction where the MOV and ALU destination register match. - The ALU instruction may source only registers or immediate data. There cannot be any memory source. - The ALU instruction sources either the source or dest of MOV instruction. - If ALU instruction has 2 reg sources, they should be different. - The following ALU instructions can fuse with an older qualified MOV instruction: ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR (I assume OP is OR) I also increased issue rate from 4 to 6. Theoretically znver5 can do more, but with our model we can't realy use it. Increasing issue rate to 8 leads to infinite loop in scheduler. Finally, I also enabled fuse_alu_and_branch since it is supported by znver5 (I think by earlier zens too). New fussion pattern moves quite few instructions around in common code: @@ -2210,13 +2210,13 @@ .cfi_offset 3, -32 leaq 63(%rsi), %rbx movq %rbx, %rbp + shrq $6, %rbp + salq $3, %rbp subq $16, %rsp .cfi_def_cfa_offset 48 movq %rdi, %r12 - shrq $6, %rbp - movq %rsi, 8(%rsp) - salq $3, %rbp movq %rbp, %rdi + movq %rsi, 8(%rsp) call _Znwm movq 8(%rsp), %rsi movl $0, 8(%r12) @@ -2224,8 +2224,8 @@ movq %rax, (%r12) movq %rbp, 32(%r12) testq %rsi, %rsi - movq %rsi, %rdx cmovns %rsi, %rbx + movq %rsi, %rdx sarq $63, %rdx shrq $58, %rdx sarq $6, %rbx which should help decoder bandwidth and perhaps also cache, though I was not able to measure off-noise effect on SPEC. gcc/ChangeLog: * config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5. (ix86_adjust_cost): Add TODO about znver5 memory latency. (ix86_fuse_mov_alu_p): New. (ix86_macro_fusion_pair_p): Use it. * config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5. (X86_TUNE_FUSE_MOV_AND_ALU): New tune;
2024-09-03	libstdc++: Simplify std::any to fix -Wdeprecated-declarations warning	Jonathan Wakely	3	-2/+24
	We don't need to use std::aligned_storage in std::any. We just need a POD type of the right size. The void* union member already ensures the alignment will be correct. Avoiding std::aligned_storage means we don't need to suppress a -Wdeprecated-declarations warning. libstdc++-v3/ChangeLog: * include/experimental/any (experimental::any::_Storage): Use array of unsigned char instead of deprecated std::aligned_storage. * include/std/any (any::_Storage): Likewise. * testsuite/20_util/any/layout.cc: New test.
2024-09-03	libstdc++: Add missing feature-test macro in various headers	Dhruv Chawla	21	-0/+100
	version.syn#2 requires various headers to define __cpp_lib_allocator_traits_is_always_equal. Currently, only <memory> was defining this macro. Implement fixes for the other headers as well. Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com> libstdc++-v3/ChangeLog: * include/std/deque: Define macro __glibcxx_want_allocator_traits_is_always_equal. * include/std/forward_list: Likewise. * include/std/list: Likewise. * include/std/map: Likewise. * include/std/scoped_allocator: Likewise. * include/std/set: Likewise. * include/std/string: Likewise. * include/std/unordered_map: Likewise. * include/std/unordered_set: Likewise. * include/std/vector: Likewise. * testsuite/20_util/headers/memory/version.cc: New test. * testsuite/20_util/scoped_allocator/version.cc: Likewise. * testsuite/21_strings/headers/string/version.cc: Likewise. * testsuite/23_containers/deque/version.cc: Likewise. * testsuite/23_containers/forward_list/version.cc: Likewise. * testsuite/23_containers/list/version.cc: Likewise. * testsuite/23_containers/map/version.cc: Likewise. * testsuite/23_containers/set/version.cc: Likewise. * testsuite/23_containers/unordered_map/version.cc: Likewise. * testsuite/23_containers/unordered_set/version.cc: Likewise. * testsuite/23_containers/vector/version.cc: Likewise.
2024-09-03	Zen5 tuning part 2: disable gather and scatter	Jan Hubicka	1	-6/+6
	We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to be more complicated. gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot products in TSVC. Curiously enough breaking these out into microbenchmark reversed the situation and it turns out that the performance depends on how indices are distributed. gather is loss if indices are sequential, neutral if they are random and win for some strides (4, 8). This seems to be similar to earlier zens, so I think (especially for backporting znver5 support) that it makes sense to be conistent and disable gather unless we work out a good heuristics on when to use it. Since we typically do not know the indices in advance, I don't see how that can be done. I opened PR116582 with some examples of wins and loses gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5.
2024-09-03	ipa: Don't disable function parameter analysis for fat LTO	H.J. Lu	1	-2/+2
	Update analyze_parms not to disable function parameter analysis for -ffat-lto-objects. Tested on x86-64, there are no differences in zstd with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects". PR ipa/116410 * ipa-modref.cc (analyze_parms): Always analyze function parameter for LTO. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-09-03	[PR target/115921] Improve reassociation for rv64	Jeff Law	3	-4/+1083
	As Jovan pointed out in pr115921, we're not reassociating expressions like this on rv64: (x & 0x3e) << 12 It generates something like this: li a5,258048 slli a0,a0,12 and a0,a0,a5 We have a pattern that's designed to clean this up. Essentially reassociating the operations so that we don't need to load the constant resulting in something like this: andi a0,a0,63 slli a0,a0,12 That pattern wasn't working for certain constants due to its condition. The condition is trying to avoid cases where this kind of reassociation would hinder shadd generation on rv64. That condition was just written poorly. This patch tightens up that condition in a few ways. First, there's no need to worry about shadd cases if ZBA is not enabled. Second we can't use shadd if the shift value isn't 1, 2 or 3. Finally rather than open-coding one of the tests, we can use an existing operand predicate. The net is we'll start performing this transformation in more cases on rv64 while still avoiding reassociation if it would spoil shadd generation. PR target/115921 gcc/ * config/riscv/riscv.md (reassociate bitwise ops): Tighten test for cases we do not want reassociate. gcc/testsuite/ * gcc.target/riscv/pr115921.c: New test.
2024-09-03	Zen5 tuning part 1: avoid FMA chains	Jan Hubicka	1	-4/+5
	testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fared well with FMA, it was because of the split registers. Znver5 benefits from avoding FMA on all widths. This may be different with the mobile version though. On naive matrix multiplication benchmark the difference is 8% with -O3 only since with -Ofast loop interchange solves the problem differently. It is 30% win, for example, on S323 from TSVC: real_t s323(struct args_t * func_args) { // recurrences // coupled recurrence initialise_arrays(__func__); gettimeofday(&func_args->t1, NULL); for (int nl = 0; nl < iterations/2; nl++) { for (int i = 1; i < LEN_1D; i++) { a[i] = b[i-1] + c[i] * d[i]; b[i] = a[i] + c[i] * e[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } gettimeofday(&func_args->t2, NULL); return calc_checksum(__func__); } gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for znver5. (X86_TUNE_AVOID_256FMA_CHAINS): Likewise. (X86_TUNE_AVOID_512FMA_CHAINS): Likewise.
2024-09-03	LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]	Tobias Burnus	6	-16/+16
	When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming sufficient partiations, e.g., via -flto-partition=max), output_offload_tables wrote the output tables once per fork. PR lto/116535 gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables): Remove offload_ frees. * lto-streamer-out.cc (lto_output): Make call to it depend on lto_get_out_decl_state ()->output_offload_tables_p. * lto-streamer.h (struct lto_out_decl_state): Add output_offload_tables_p field. * tree-pass.h (ipa_write_optimization_summaries): Add bool argument. * passes.cc (ipa_write_summaries_1): Add bool output_offload_tables_p arg. (ipa_write_summaries): Update call. (ipa_write_optimization_summaries): Accept output_offload_tables_p. gcc/lto/ChangeLog: * lto.cc (stream_out): Update call to ipa_write_optimization_summaries to pass true for first partition.
2024-09-03	MAINTAINERS: Update my email address	Szabolcs Nagy	1	-1/+2
	* MAINTAINERS: Update my email address and add myself to DCO.
2024-09-03	tree-optimization/116575 - avoid ICE with SLP mask_load_lane	Richard Biener	2	-2/+32
	The following avoids performing re-discovery with single lanes in the attempt to for the use of mask_load_lane as rediscovery will fail since a single lane of a mask load will appear permuted which isn't supported. PR tree-optimization/116575 * tree-vect-slp.cc (vect_analyze_slp): Properly compute the mask argument for vect_load/store_lanes_supported. When the load is masked for now avoid rediscovery. * gcc.dg/vect/pr116575.c: New testcase.
2024-09-03	i386: Fix vfpclassph non-optimizied intrin	Haochen Jiang	2	-2/+79
	The intrin for non-optimized got a typo in mask type, which will cause the high bits of __mmask32 being unexpectedly zeroed. The test does not fail under O0 with current 1b since the testcase is wrong. We need to include avx512-mask-type.h after SIZE is defined, or it will always be __mmask8. That problem also happened in AVX10.2 testcases. I will write a seperate patch to fix that. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32. (_mm512_fpclass_ph_mask): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.
2024-09-03	Do not assert NUM_POLY_INT_COEFFS != 1 early	Richard Biener	1	-1/+2
	The following moves the assert on NUM_POLY_INT_COEFFS != 1 after INTEGER_CST processing. * fold-const.cc (poly_int_binop): Move assert on NUM_POLY_INT_COEFFS after INTEGER_CST processing.
2024-09-03	lower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering ↵	Jakub Jelinek	2	-2/+22
	[PR116501] The following testcase is miscompiled. The problem is in the last_ovf step. The second operand has signed _BitInt(513) type but has the MSB clear, so range_to_prec returns 512 for it (i.e. it fits into unsigned _BitInt(512)). Because of that the last step actually doesn't need to get the most significant bit from the second operand, but the code was deciding what to use purely from TYPE_UNSIGNED (type1) - if unsigned, use 0, otherwise sign-extend the last processed bit; but that in this case was set. We don't want to treat the positive operand as if it was negative regardless of the bit below that precision, and precN >= 0 indicates that the operand is in the [0, inf) range. 2024-09-03 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/116501 * gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow): In the last_ovf case, use build_zero_cst operand not just when TYPE_UNSIGNED (typeN), but also when precN >= 0. * gcc.dg/torture/bitint-73.c: New test.
2024-09-03	ada: Add kludge for quirk of ancient 32-bit ABIs to previous change	Eric Botcazou	1	-2/+14
	Some ancient 32-bit ABIs, most notably that of x86/Linux, misalign double scalars in record types, so comparing DECL_ALIGN with TYPE_ALIGN directly may give the wrong answer for them. gcc/ada/ * gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Add kludge to cope with ancient 32-bit ABIs.
2024-09-03	ada: Plug loophole exposed by previous change	Eric Botcazou	1	-0/+3
	The change causes more temporaries to be created at call sites for unaligned actual parameters, thus revealing that the machinery does not properly deal with unconstrained nominal subtypes for them. gcc/ada/ * gcc-interface/trans.cc (create_temporary): Deal with types whose size is self-referential by allocating the maximum size.
2024-09-03	ada: Fix internal error with Atomic Volatile_Full_Access object	Eric Botcazou	1	-4/+6
	The initial implementation of the GNAT aspect/pragma Volatile_Full_Access made it incompatible with Atomic, because it was not decided whether the read-modify-write sequences generated by Volatile_Full_Access would need to be implemented atomically when Atomic was also specified, which would have required a compare-and-swap primitive from the target architecture. But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic in the process, answering the above question by the negative, so the incompatibility between Volatile_Full_Access and Atomic was lifted in Ada 2012 as well, unfortunately without adjusting the implementation. gcc/ada/ * gcc-interface/trans.cc (get_atomic_access): Deal specifically with nodes that are both Atomic and Volatile_Full_Access in Ada 2012.