aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
4 daysDaily bump.GCC Administrator5-1/+52
4 daystarget.def: Properly mark up __cxa_atexit as codeGerald Pfeifer2-6/+8
gcc: * target.def (dtors_from_cxa_atexit): Properly mark up __cxa_atexit as code. * doc/tm.texi: Regenerate.
5 dayslibstdc++: Fix ranges::shuffle for non-sized range [PR121917]Patrick Palka2-28/+56
ranges::shuffle has a two-at-a-time PRNG optimization (copied from std::shuffle) that considers the PRNG width vs the size of the range. But in C++20 a random access sentinel isn't always sized so we can't unconditionally do __last - __first to obtain the size in constant time. We could instead use ranges::distance, but that'd take linear time for a non-sized sentinel which makes the optimization less clear of a win. So this patch instead makes us only consider this optimization for sized ranges. PR libstdc++/121917 libstdc++-v3/ChangeLog: * include/bits/ranges_algo.h (__shuffle_fn::operator()): Only consider the two-at-a-time PRNG optimization if the range is sized. * testsuite/25_algorithms/shuffle/constrained.cc (test03): New test. Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
5 dayslra: Stop constraint processing on error [PR121205]Stefan Schulze Frielinghaus1-1/+4
It looks like we didn't have a test so far reaching this point which changed with the new hard register constraint tests. Bootstrap and regtest are still running on x86_64. If they succeed, ok for mainline? -- >8 -- As noted by Sam in the PR, with checking enabled tests gcc.target/i386/asm-hard-reg-{1,2}.c fail with an ICE. If an error is detected in curr_insn_transform(), lra_asm_insn_error() is called and deletes the current insn. However, afterwards processing continues with the deleted insn and via lra_process_new_insns() we finally call recog() for NOTE_INSN_DELETED which ICEs in case of a checking build. Thus, in case of an error during curr_insn_transform() bail out and stop processing. gcc/ChangeLog: PR rtl-optimization/121205 * lra-constraints.cc (curr_insn_transform): Stop processing on error.
5 daysdoc: Editorial changes around -fprofile-partial-trainingGerald Pfeifer1-10/+10
gcc: * doc/invoke.texi (Optimize Options): Editorial changes around -fprofile-partial-training.
5 daystestsuite: Port asm-hard-reg tests for PRUDimitar Dimitrov3-1/+12
Add the necessary register definitions for PRU, so that asm-hard-reg tests can pass for PRU. gcc/testsuite/ChangeLog: * gcc.dg/asm-hard-reg-error-1.c: Enable test for PRU, and define registers for PRU. * gcc.dg/asm-hard-reg-error-4.c: Define hard regs for PRU. * gcc.dg/asm-hard-reg-error-5.c: Ditto. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
5 daysc: Implement C2y N3517 array subscripting without decayJoseph Myers10-11/+415
N3517 (array subscripting without decay) has been added to C2y (via a remote vote in May, not at a meeting). Implement this in GCC. The conceptual change, that the array subscripting operator [] no longer involves an array operand decaying to a pointer, is something GCC has done for a very long time. The main effect in terms of what is made possible in the language, subscripting a register array (undefined behavior in C23 and before), was available as a GNU extension, but only with constant indices. There is also a new constraint that array indices must not be negative when they are integer constant expressions and the array operand has array type (negative indices are fine with pointers) - an access out of bounds of an array (even when contained within a larger object) has undefined behavior at runtime when not a constraint violation. Thus, the previous GCC extension is adapted to allow the cases of register arrays not previously allowed, clearing DECL_REGISTER on them as needed (similar to what is done with register declarations of structures with volatile members) and restricting the pedwarn to pedwarn_c23. That pedwarn_c23 is also extended to cover the C23 case of register compound literals (although not strictly needed since it was undefined behavior rather than a constraint violation in C23). The new error is added (only for flag_isoc2y) for negative array indices with an operand of array type. N3517 has some specific wording about the type of the result of non-lvalue array element access. It's unclear what's actually desired there in the case where the array element is itself of array type; see C23 issue 1001 regarding types of qualified members of rvalue structures and unions more generally. Rather than implementing the specific wording about this in N3517, that is deferred until there's an accepted resolution to issue 1001 and can be dealt with as part of implementing such a resolution. Nothing specific is done about the obsolescence in that paper of writing index[array] or index[pointer] as opposed to array[index] or pointer[index], although that seems like a reasonable enough thing to warn about. Bootstrapped with no regressions for x86_64-pc-linux-gnu. gcc/c/ * c-typeck.cc (c_mark_addressable): New parameter override_register. (build_array_ref): Update calls to c_mark_addressable. Give error in C2Y mode for negative array indices when array expression is an array not a pointer. Use pedwarn_c23 for subscripting register array; diagnose that also for register compound literal. * c-tree.h (c_mark_addressable): Update prototype. gcc/testsuite/ * gcc.dg/c23-array-negative-1.c, gcc.dg/c23-register-array-1.c, gcc.dg/c23-register-array-2.c, gcc.dg/c23-register-array-3.c, gcc.dg/c23-register-array-4.c, gcc.dg/c2y-array-negative-1.c, gcc.dg/c2y-register-array-2.c, gcc.dg/c2y-register-array-3.c: New tests.
5 daysDaily bump.GCC Administrator5-1/+299
5 daysFix latent LRA bugJeff Law1-0/+1
Shreya's work to add the addptr pattern on the RISC-V port exposed a latent bug in LRA. We lazily allocate/reallocate the ira_reg_equiv structure and when we do (re)allocation we'll over-allocate and zero-fill so that we don't have to actually allocate and relocate the data so often. In the case exposed by Shreya's work we had N requested entries at the last rellocation step. We actually allocate N+M entries. During LRA we allocate enough new pseudos and thus have N+M+1 pseudos. In get_equiv we read ira_reg_equiv[regno] without bounds checking so we read past the allocated part of the array and get back junk which we use and depending on the precise contents we fault in various fun and interesting ways. We could either arrange to re-allocate ira_reg_equiv again on some path through LRA (possibly in get_equiv itself). We could also just insert the bounds check in get_equiv like is done elsewhere in LRA. Vlad indicated no strong preference in an email last week. So this just adds the bounds check in a manner similar to what's done elsewhere in LRA. Bootstrapped and regression tested on x86_64 as well as RISC-V with Shreya's work enabled and regtested across the various embedded targets. gcc/ * lra-constraints.cc (get_equiv): Bounds check before accessing data in ira_reg_equiv.
6 dayslibstdc++: ranges::rotate should use ranges::iter_move [PR121913]Jonathan Wakely2-2/+47
Using std::move(*it) is incorrect for iterators that use proxy refs, we should use ranges::iter_move(it) instead. libstdc++-v3/ChangeLog: PR libstdc++/121913 * include/bits/ranges_algo.h (__rotate_fn::operator()): Use ranges::iter_move(it) instead of std::move(*it). * testsuite/25_algorithms/rotate/121913.cc: New test. Reviewed-by: Patrick Palka <ppalka@redhat.com>
6 dayslibstdc++: Fix algorithms to use iterators' difference_type for arithmetic ↵Jonathan Wakely20-91/+230
[PR121890] Whenever we use operator+ or similar operators on random access iterators we need to be careful to use the iterator's difference_type rather than some other integer type. It's not guaranteed that an expression with an arbitrary integer type, such as `it + 1u`, has the same effects as `it + iter_difference_t<It>(1)`. Some of our algorithms need changes to cast values to the correct type, or to use std::next or ranges::next instead of `it + n`. Several tests also need fixes where the arithmetic occurs directly in the test. The __gnu_test::random_access_iterator_wrapper class template is adjusted to have deleted operators that make programs ill-formed if the argument to relevant operators is not the difference_type. This will make it easier to avoid regressing in future. libstdc++-v3/ChangeLog: PR libstdc++/121890 * include/bits/ranges_algo.h (ranges::rotate, ranges::shuffle) (__insertion_sort, __unguarded_partition_pivot, __introselect): Use ranges::next to advance iterators. Use local variables in rotate to avoid duplicate expressions. (ranges::push_heap, ranges::pop_heap, ranges::partial_sort) (ranges::partial_sort_copy): Use ranges::prev. (__final_insertion_sort): Use iter_difference_t<Iter> for operand of operator+ on iterator. * include/bits/ranges_base.h (ranges::advance): Use iterator's difference_type for all iterator arithmetic. * include/bits/stl_algo.h (__search_n_aux, __rotate) (__insertion_sort, __unguarded_partition_pivot, __introselect) (__final_insertion_sort, for_each_n, random_shuffle): Likewise. Use local variables in __rotate to avoid duplicate expressions. * include/bits/stl_algobase.h (__fill_n_a, __lc_rai::__newlast1): Likewise. * include/bits/stl_heap.h (push_heap): Likewise. (__is_heap_until): Add static_assert. (__is_heap): Convert distance to difference_type. * include/std/functional (boyer_moore_searcher::operator()): Use iterator's difference_type for iterator arithmetic. * testsuite/util/testsuite_iterators.h (random_access_iterator_wrapper): Add deleted overloads of operators that should be called with difference_type. * testsuite/24_iterators/range_operations/advance.cc: Use ranges::next. * testsuite/25_algorithms/heap/constrained.cc: Use ranges::next and ranges::prev. * testsuite/25_algorithms/nth_element/58800.cc: Use std::next. * testsuite/25_algorithms/nth_element/constrained.cc: Use ptrdiff_t for loop variable. * testsuite/25_algorithms/nth_element/random_test.cc: Use iterator's difference_type instead of int. * testsuite/25_algorithms/partial_sort/check_compare_by_value.cc: Use std::next. * testsuite/25_algorithms/partial_sort/constrained.cc: Use ptrdiff_t for loop variable. * testsuite/25_algorithms/partial_sort/random_test.cc: Use iterator's difference_type instead of int. * testsuite/25_algorithms/partial_sort_copy/constrained.cc: Use ptrdiff_t for loop variable. * testsuite/25_algorithms/partial_sort_copy/random_test.cc: Use iterator's difference_type instead of int. * testsuite/std/ranges/adaptors/drop.cc: Use ranges::next. * testsuite/25_algorithms/fill_n/diff_type.cc: New test. * testsuite/25_algorithms/lexicographical_compare/diff_type.cc: New test. Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
6 daysTestsuite: Fix more spurious failure of ACATS-4 testsEric Botcazou3-10/+10
This tentatively applies the same tweak to twin testcases. gcc/testsuite/ PR ada/121532 * ada/acats-4/tests/cxa/cxai034.a: Use Long_Switch_To_New_Task constant instead of Switch_To_New_Task in delay statements. * ada/acats-4/tests/cxa/cxai035.a: Likewise. * ada/acats-4/tests/cxa/cxai036.a: Likewise.
6 daysc++: pack indexing is a non-deduced context [PR121795]Patrick Palka3-2/+25
We weren't explicitly treating a pack index specifier as a non-deduced context (as per [temp.deduct.type]/5), leading to an ICE for the first testcase below. PR c++/121795 gcc/cp/ChangeLog: * pt.cc (unify) <case PACK_INDEX_TYPE>: New non-deduced context case. gcc/testsuite/ChangeLog: * g++.dg/cpp26/pack-indexing17.C: New test. * g++.dg/cpp26/pack-indexing17a.C: New test. Reviewed-by: Marek Polacek <polacek@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
6 daysRISC-V: Support vnclip idiom testcase [PR120378]Edwin Lu4-0/+84
This patch contains testcases for PR120378 after the change made to support the vnclipu variant of the SAT_TRUNC pattern. PR target/120378 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr120378-1.c: New test. * gcc.target/riscv/rvv/autovec/pr120378-2.c: New test. * gcc.target/riscv/rvv/autovec/pr120378-3.c: New test. * gcc.target/riscv/rvv/autovec/pr120378-4.c: New test. Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
6 daysMatch: Support SAT_TRUNC variant NARROW_CLIPEdwin Lu2-0/+58
This patch tries to add support for a variant of SAT_TRUNC where negative numbers are clipped to 0 instead of NARROW_TYPE_MAX_VALUE. This form is seen in x264, aka UT clip (T a) { return a & (UT)(-1) ? (-a) >> 31 : a; } Where sizeof(UT) < sizeof(T) I'm unable to get the SAT_TRUNC pattern to appear on x86_64, however it does appear when building for riscv as seen below: Before this patch: <bb 3> [local count: 764504183]: # i_21 = PHI <i_14(8), 0(15)> # vectp_x.10_54 = PHI <vectp_x.10_55(8), x_10(D)(15)> # vectp_res.20_66 = PHI <vectp_res.20_67(8), res_11(D)(15)> # ivtmp_70 = PHI <ivtmp_71(8), _69(15)> _72 = .SELECT_VL (ivtmp_70, POLY_INT_CST [4, 4]); _1 = (long unsigned int) i_21; _2 = _1 * 4; _3 = x_10(D) + _2; ivtmp_53 = _72 * 4; vect__4.12_57 = .MASK_LEN_LOAD (vectp_x.10_54, 32B, { -1, ... }, _56(D), _72, 0); vect_x.13_58 = VIEW_CONVERT_EXPR<vector([4,4]) unsigned int>(vect__4.12_57); vect__38.15_60 = -vect_x.13_58; vect__15.16_61 = VIEW_CONVERT_EXPR<vector([4,4]) int>(vect__38.15_60); vect__16.17_62 = vect__15.16_61 >> 31; mask__29.14_59 = vect_x.13_58 > { 255, ... }; vect__17.18_63 = VEC_COND_EXPR <mask__29.14_59, vect__16.17_62, vect__4.12_57>; vect__18.19_64 = (vector([4,4]) unsigned char) vect__17.18_63; _4 = *_3; _5 = res_11(D) + _1; x.0_12 = (unsigned int) _4; _38 = -x.0_12; _15 = (int) _38; _16 = _15 >> 31; _29 = x.0_12 > 255; _17 = _29 ? _16 : _4; _18 = (unsigned char) _17; .MASK_LEN_STORE (vectp_res.20_66, 8B, { -1, ... }, _72, 0, vect__18.19_64); i_14 = i_21 + 1; vectp_x.10_55 = vectp_x.10_54 + ivtmp_53; vectp_res.20_67 = vectp_res.20_66 + _72; ivtmp_71 = ivtmp_70 - _72; if (ivtmp_71 != 0) goto <bb 8>; [89.00%] else goto <bb 17>; [11.00%] After this patch: <bb 3> [local count: 764504183]: # i_21 = PHI <i_14(8), 0(15)> # vectp_x.10_68 = PHI <vectp_x.10_69(8), x_10(D)(15)> # vectp_res.15_75 = PHI <vectp_res.15_76(8), res_11(D)(15)> # ivtmp_79 = PHI <ivtmp_80(8), _78(15)> _81 = .SELECT_VL (ivtmp_79, POLY_INT_CST [4, 4]); _1 = (long unsigned int) i_21; _2 = _1 * 4; _3 = x_10(D) + _2; ivtmp_67 = _81 * 4; vect__4.12_71 = .MASK_LEN_LOAD (vectp_x.10_68, 32B, { -1, ... }, _70(D), _81, 0); vect_patt_37.13_72 = MAX_EXPR <{ 0, ... }, vect__4.12_71>; vect_patt_39.14_73 = .SAT_TRUNC (vect_patt_37.13_72); _4 = *_3; _5 = res_11(D) + _1; x.0_12 = (unsigned int) _4; _38 = -x.0_12; _15 = (int) _38; _16 = _15 >> 31; _29 = x.0_12 > 255; _17 = _29 ? _16 : _4; _18 = (unsigned char) _17; .MASK_LEN_STORE (vectp_res.15_75, 8B, { -1, ... }, _81, 0, vect_patt_39.14_73); i_14 = i_21 + 1; vectp_x.10_69 = vectp_x.10_68 + ivtmp_67; vectp_res.15_76 = vectp_res.15_75 + _81; ivtmp_80 = ivtmp_79 - _81; if (ivtmp_80 != 0) goto <bb 8>; [89.00%] else goto <bb 17>; [11.00%] gcc/ChangeLog: * match.pd: New NARROW_CLIP variant for SAT_TRUNC. * tree-vect-patterns.cc (gimple_unsigned_integer_narrow_clip): Add new decl for NARROW_CLIP. (vect_recog_sat_trunc_pattern): Add NARROW_CLIP check. Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
6 dayssparc: Compile TLS LD tests with -fPICH.J. Lu7-7/+7
After commit 8cad8f94b450be9b73d07bdeef7fa1778d3f2b96 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Sep 5 15:40:51 2025 -0700 c: Update TLS model after processing a TLS variable GCC will upgrade local-dynamic TLS model to local-exec without -fPIC. Compile TLS LD tests with -fPIC to keep local-dynamic TLS model. PR testsuite/121888 * gcc.target/sparc/tls-ld-int16.c: Compile with -fPIC. * gcc.target/sparc/tls-ld-int32.c: Likewise. * gcc.target/sparc/tls-ld-int64.c: Likewise. * gcc.target/sparc/tls-ld-int8.c: Likewise. * gcc.target/sparc/tls-ld-uint16.c: Likewise. * gcc.target/sparc/tls-ld-uint32.c: Likewise. * gcc.target/sparc/tls-ld-uint8.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
6 daysdiagnostics: handle fatal_error in SARIF output [PR120063]David Malcolm5-3/+71
gcc/ChangeLog: PR diagnostics/120063 * diagnostics/context.cc (context::execution_failed_p): Also treat any kind::fatal errors as leading to failed execution. * diagnostics/sarif-sink.cc (maybe_get_sarif_level): Handle kind::fatal as SARIF level "error". gcc/testsuite/ChangeLog: PR diagnostics/120063 * gcc.dg/fatal-error.c: New test. * gcc.dg/fatal-error-html.py: New test. * gcc.dg/fatal-error-sarif.py: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
6 daysdiagnostics: fix crash-handling inside nested diagnostics [PR121876]David Malcolm9-3/+276
PR diagnostics/121876 tracks an issue inside our crash-handling, where if an ICE happens when we're within a nested diagnostic, an assertion fails inside diagnostic::context::set_diagnostic_buffer, leading to a 2nd ICE. Happily, this does not infinitely recurse, but it obscures the original ICE and the useful part of the backtrace, and any SARIF or HTML sinks we were writing to are left as empty files. This patch tweaks the above so that the assertion doesn't fail, and adds test coverage (via a plugin) to ensure that such ICEs/crashes are gracefully handled and e.g. captured in SARIF/HTML output. gcc/ChangeLog: PR diagnostics/121876 * diagnostics/buffering.cc (context::set_diagnostic_buffer): Add early reject of the no-op case. gcc/testsuite/ChangeLog: PR diagnostics/121876 * gcc.dg/plugin/crash-test-nested-ice-html.py: New test. * gcc.dg/plugin/crash-test-nested-ice-sarif.py: New test. * gcc.dg/plugin/crash-test-nested-ice.c: New test. * gcc.dg/plugin/crash-test-nested-write-through-null-html.py: New test. * gcc.dg/plugin/crash-test-nested-write-through-null-sarif.py: New test. * gcc.dg/plugin/crash-test-nested-write-through-null.c: New test. * gcc.dg/plugin/crash_test_plugin.cc: Add "nested" argument, and when set, inject the problem within a nested diagnostic. * gcc.dg/plugin/plugin.exp: Add crash-test-nested-ice.c and crash-test-nested-write-through-null.c. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
6 daystestsuite: fix typo in name of plugin test fileDavid Malcolm3-4/+4
gcc/testsuite/ChangeLog: * gcc.dg/plugin/crash-test-write-though-null-sarif.c: Rename to... * gcc.dg/plugin/crash-test-write-through-null-sarif.c: ...this. * gcc.dg/plugin/crash-test-write-though-null-stderr.c: Rename to... * gcc.dg/plugin/crash-test-write-through-null-stderr.c: ...this. * gcc.dg/plugin/plugin.exp: Update for above renamings. Sort the test files for crash_test_plugin.cc alphabetically. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
6 days[RISC-V] Adjust ABI specification in recently added Andes testsJeff Law32-32/+32
Another lp64 vs lp64d issue. This time adjusting a #include in the test isn't sufficient. So instead this sets the ABI to lp64d instead of lp64. I don't think that'll impact the test materially. Tested on the BPI and Pioneer systems where it fixes the failures with the Andes tests. Pushing to the trunk. gcc/testsuite * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dots.c: Adjust ABI specification. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotsu.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotu.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfncvtbf16s.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadb.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadt.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfwcvtsbf16.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vln8.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dots.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotsu.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotu.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfncvtbf16s.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadb.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadt.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfwcvtsbf16.c: Likewise. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vln8.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dots.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotsu.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotu.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfncvtbf16s.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadb.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadt.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfwcvtsbf16.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vln8.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dots.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotsu.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotu.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfncvtbf16s.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadb.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadt.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfwcvtsbf16.c: Likewise. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vln8.c: Likewise.
6 dayslibstdc++: Fix memory leak in PSTL TBB backend [PR117276]Jonathan Wakely1-3/+9
Backport of upstream patch: https://github.com/uxlfoundation/oneDPL/pull/1589 libstdc++-v3/ChangeLog: PR libstdc++/117276 * include/pstl/parallel_backend_tbb.h (__func_task::finalize): Make deallocation unconditional.
6 dayslibstdc++: Constrain __gnu_debug::bitset(const CharT*) constructor [PR121046]Jonathan Wakely1-1/+7
The r16-3435-gbbc0e70b610f19 change (for LWG 4294) needs to be applied to the debug mode __gnu_debug::bitset as well as the normal one. libstdc++-v3/ChangeLog: PR libstdc++/121046 * include/debug/bitset (bitset(const CharT*, ...)): Add constraints on CharT type.
6 daysc++/modules: Fix missed unwrapping of STAT_HACK in ADL [PR121893]Nathaniel Shead3-1/+33
My r16-3559-gc2e567a6edb563 reworked ADL for modules, including a change to allow seeing module-linkage declarations if they only exist on the instantiation path. This caused a crash however as I neglected to unwrap the stat hack wrapper when we were happy to see all declarations, allowing search_adl to add non-functions to the overload set. PR c++/121893 gcc/cp/ChangeLog: * name-lookup.cc (name_lookup::adl_namespace_fns): Unwrap the STAT_HACK also when on_inst_path. gcc/testsuite/ChangeLog: * g++.dg/modules/adl-10_a.C: New test. * g++.dg/modules/adl-10_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
6 daysipa-free-lang-data: Don't walk into DECL_CHAIN when finding decls/types ↵Nathaniel Shead5-5/+23
[PR121865] On a DECL, TREE_CHAIN will find any other declarations in the same binding level. This caused an ICE in PR121865 because the next entity in the binding level was the uninstantiated unique friend 'foo', for which after being found the compiler tries to generate a mangled name for it and crashes. This didn't happen in non-modules testcases only because normally the unique friend function would have been chained after its template_decl, and find_decl_types_r bails on lang-specific nodes so it never saw the uninstantiated decl. With modules however the order of chaining changed, causing the error. I don't think it's ever necessary to walk into the DECL_CHAIN, from what I can see; other cases where it might be useful (block vars or type fields) are already handled explicitly elsewhere, and only one test fails because of the change, due to accidentally relying on this "walk into the next in-scope declaration" behaviour. PR c++/121865 gcc/ChangeLog: * ipa-free-lang-data.cc (find_decls_types_r): Don't walk into DECL_CHAIN for any DECL. gcc/testsuite/ChangeLog: * g++.dg/lto/pr101396_0.C: Ensure A will be walked into (and isn't constant-folded out of the GIMPLE for the function). * g++.dg/lto/pr101396_1.C: Add message. * g++.dg/modules/lto-4_a.C: New test. * g++.dg/modules/lto-4_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Richard Biener <rguenther@suse.de>
6 dayslibstdc++: Fix bootstrap failure in atomicity.ccJonathan Wakely1-1/+4
My r16-3810-g6456da6bab8a2c changes broke bootstrap for targets that use the mutex-based atomic helpers. This fixes it by casting away the unnecessary volatile-qualification on the _Atomic_word* before passing it to __exchange_and_add_single. libstdc++-v3/ChangeLog: * config/cpu/generic/atomicity_mutex/atomicity.h (__exchange_and_add): Use const_cast to remove volatile.
6 daysMinor tweaks to ipa-pure-const.ccEric Botcazou1-7/+5
gcc/ * ipa-pure-const.cc (check_stmt): Minor formatting tweaks. (pass_data_nothrow): Fix pasto in description.
6 daysmiddle-end: Use addhn for compression instead of inclusive OR when reducing ↵Tamar Christina9-4/+205
comparison values Given a sequence such as int foo () { #pragma GCC unroll 4 for (int i = 0; i < N; i++) if (a[i] == 124) return 1; return 0; } where a[i] is long long, we will unroll the loop and use an OR reduction for early break on Adv. SIMD. Afterwards the sequence is followed by a compression sequence to compress the 128-bit vectors into 64-bits for use by the branch. However if we have support for add halving and narrowing then we can instead of using an OR, use an ADDHN which will do the combining and narrowing. Note that for now I only do the last OR, however if we have more than one level of unrolling we could technically chain them. I will revisit this in another up coming early break series, however an unroll of 2 is fairly common. gcc/ChangeLog: * internal-fn.def (VEC_TRUNC_ADD_HIGH): New. * doc/generic.texi: Document it. * optabs.def (vec_trunc_add_high): New. * doc/md.texi: Document it. * tree-vect-stmts.cc (vectorizable_early_exit): Use addhn if supported. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-early-break-addhn_1.c: New test. * gcc.target/aarch64/vect-early-break-addhn_2.c: New test. * gcc.target/aarch64/vect-early-break-addhn_3.c: New test. * gcc.target/aarch64/vect-early-break-addhn_4.c: New test.
6 daysAarch64: Add support for addhn vectorizer optabs for Adv.SIMDTamar Christina2-0/+97
This implements the new vector optabs vec_<su>addh_narrow<mode> adding support for in-vectorizer use for early break. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_addh_narrow<mode>): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-addhn_1.c: New test.
6 daysmiddle-end: clear the user unroll flag if the cost model has overriden itTamar Christina2-5/+8
If the user has requested loop unrolling through pragma GCC unroll then at the moment we only set LOOP_VINFO_USER_UNROLL if the vectorizer has not overrode the unroll factor (through backend costing) or if the VF made the requested unroll factor be 1. When we have a loop of say int and a pragma unroll 4 If the vectorizer picks V4SI as the mode, the requested unroll ended up exactly matching the VF. As such the requested unroll is 1 and we don't clear the pragma. So it did honor the requested unroll factor. However since we didn't set the unroll amount back and left it at 4 the rtl unroller won't use the rtl cost model at all and just unroll the vector loop 4 times. But of these events are costing related, and so it stands to reason that we should set LOOP_VINFO_USER_UNROLL to we return the RTL unroller to use the backend costing for any further unrolling. gcc/ChangeLog: * tree-vect-loop.cc (vect_analyze_loop_1): If the unroll pragma was set mark it as handled. * doc/extend.texi (pragma GCC unroll): Update documentation.
6 daysDaily bump.GCC Administrator7-1/+524
7 daysdoc: Correct the return type of float comparisonTrevor Gross1-24/+30
Documentation for `__cmpsf2` and similar functions currently indicate a return type of `int`. This is not correct however; the `libgcc` functions return `CMPtype`, the size of which is determined by the `libgcc_cmp_return` mode. Update documentation to use `CMPtype` and indicate that this is target-dependent, also mentioning the usual modes. Reported-by: beetrees <b@beetr.ee> Fixes: https://github.com/rust-lang/compiler-builtins/issues/919#issuecomment-2905347318 Signed-off-by: Trevor Gross <tmgross@umich.edu> * doc/libgcc.texi (Comparison functions): Document functions as returning CMPtype.
7 daysFortran: fix assignment to allocatable scalar polymorphic component [PR121616]Harald Anlauf2-0/+98
PR fortran/121616 gcc/fortran/ChangeLog: * primary.cc (gfc_variable_attr): Properly set dimension attribute from a component ref. gcc/testsuite/ChangeLog: * gfortran.dg/alloc_comp_assign_17.f90: New test.
7 dayslibstdc++: Trap on std::shared_ptr reference count overflow [PR71945]Jonathan Wakely1-4/+50
This adds checks when incrementing the shared count and weak count and will trap if they would be be incremented past its maximum. The maximum value is the value at which incrementing it produces an invalid use_count(). So that is either the maximum positive value of _Atomic_word, or for targets where we now allow the counters to wrap around to negative values, the "maximum" value is -1, because that is the value at which one more increment overflows the usable range and resets the counter to zero. For the weak count the maximum is always -1 as we always allow that count to use nagative values, so we only tap if it wraps all the way back to zero. libstdc++-v3/ChangeLog: PR libstdc++/71945 * include/bits/shared_ptr_base.h (_Sp_counted_base::_S_chk): Trap if a reference count cannot be incremented any higher. (_Sp_counted_base::_M_add_ref_copy): Use _S_chk. (_Sp_counted_base::_M_add_weak_ref): Likewise. (_Sp_counted_base<_S_mutex>::_M_add_ref_lock_nothrow): Likewise. (_Sp_counted_base<_S_atomic>::_M_add_ref_lock_nothrow): Likewise. (_Sp_counted_base<_S_single>::_M_add_ref_copy): Use _S_chk. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
7 dayslibstdc++: Allow std::shared_ptr reference counts to be negative [PR71945]Jonathan Wakely1-30/+35
This change doubles the effective range of the std::shared_ptr and std::weak_ptr reference counts for most 64-bit targets. The counter type, _Atomic_word, is usually a signed 32-bit int (except on Solaris v9 where it is a signed 64-bit long). The return type of std::shared_ptr::use_count() is long. For targets where long is wider than _Atomic_word (most 64-bit targets) we can treat the _Atomic_word reference counts as unsigned and allow them to wrap around from their most positive value to their most negative value without any problems. The logic that operates on the counts only cares if they are zero or non-zero, and never performs relational comparisons. The atomic fetch_add operations on integers are required by the standard to behave like unsigned types, so that overflow is well-defined: "the result is as if the object value and parameters were converted to their corresponding unsigned types, the computation performed on those types, and the result converted back to the signed type." So if we allow the counts to wrap around to negative values, all we need to do is cast the value to make_unsigned_t<_Atomic_word> before returning it as long from the use_count() function. In practice even exceeding INT_MAX is extremely unlikely, as it would require billions of shared_ptr or weak_ptr objects to have been constructed and never destroyed. However, if that happens we now have double the range before the count returns to zero and causes problems. Some of the member functions for the _Sp_counted_base<_S_single> specialization are adusted to use the __atomic_add_single and __exchange_and_add_single helpers instead of plain ++ and -- operations. This is done because those helpers use unsigned arithmetic, where the plain increments and decrements would have undefined behaviour on overflow. libstdc++-v3/ChangeLog: PR libstdc++/71945 * include/bits/shared_ptr_base.h (_Sp_counted_base::_M_get_use_count): Cast _M_use_count to unsigned before returning as long. (_Sp_counted_base<_S_single>::_M_add_ref_copy): Use atomic helper function to adjust ref count using unsigned arithmetic. (_Sp_counted_base<_S_single>::_M_weak_release): Likewise. (_Sp_counted_base<_S_single>::_M_get_use_count): Cast _M_use_count to unsigned before returning as long. (_Sp_counted_base<_S_single>::_M_add_ref_lock_nothrow): Use _M_add_ref_copy to do increment using unsigned arithmetic. (_Sp_counted_base<_S_single>::_M_release): Use atomic helper and _M_weak_release to do decrements using unsigned arithmetic. (_Sp_counted_base<_S_mutex>::_M_release): Add comment. (_Sp_counted_base<_S_single>::_M_weak_add_ref): Remove specialization. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
7 dayslibstdc++: Make atomicity helpers use unsigned arithmetic [PR121148]Jonathan Wakely4-6/+44
The standard requires that std::atomic<integral-type>::fetch_add does not have undefined behaviour for signed overflow, instead it wraps like unsigned integers. The compiler ensures this is true for the atomic built-ins that std::atomic uses, but it's not currently true for the __gnu_cxx::__exchange_and_add and __gnu_cxx::__atomic_add functions defined in libstdc++, which operate on type _Atomic_word. For the inline __exchange_and_add_single function (used when there's only one thread in the process), we can copy the value to an unsigned long and do the addition on that, then assign it back to the _Atomic_word variable. The __exchange_and_add in config/cpu/generic/atomicity_mutex/atomicity.h locks a mutex and then performs exactly the same steps as __exchange_and_add_single. Calling __exchange_and_add_single instead of duplicating the code benefits from the fix just made to __exchange_and_add_single. For the remaining config/cpu/$arch/atomicity.h implementations, they either use inline assembly which uses wrapping instructions (so no changes needed), or we can fix them by compiling with -fwrapv. After ths change, UBsan no longer gives an error for: _Atomic_word i = INT_MAX; __gnu_cxx::__exchange_and_add_dispatch(&i, 1); /usr/include/c++/14/ext/atomicity.h:85:12: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' libstdc++-v3/ChangeLog: PR libstdc++/121148 * config/cpu/generic/atomicity_mutex/atomicity.h (__exchange_and_add): Call __exchange_and_add_single. * include/ext/atomicity.h (__exchange_and_add_single): Use an unsigned type for the addition. * libsupc++/Makefile.am (atomicity.o): Compile with -fwrapv. * libsupc++/Makefile.in: Regenerate. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
7 dayspr107421.f90: Require PIE and pass -fPIE for non-x86 targetsH.J. Lu1-0/+4
-mno-direct-extern-access is used to disable direct access to external symbol from executable with and without PIE for x86. Require PIE and pass -fPIE to disable direct access to external symbol for other targets. PR fortran/107421 PR testsuite/121848 * gfortran.dg/gomp/pr107421.f90: Require PIE and pass -fPIE for non-x86 targets. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
7 dayslibstdc++: Use consteval for _S_noexcept() helper functionsJonathan Wakely3-10/+10
These _S_noexcept() functions are only used in noexcept-specifiers and never need to be called at runtime. They can be immediate functions, i.e. consteval. libstdc++-v3/ChangeLog: * include/bits/iterator_concepts.h (_IterMove::_S_noexcept) (_IterSwap::_S_noexcept): Change constexpr to consteval. * include/bits/ranges_base.h (_Begin::_S_noexcept) (_End::_S_noexcept, _RBegin::_S_noexcept, _REnd::_S_noexcept) (_Size::_S_noexcept, _Empty::_S_noexcept, _Data::_S_noexcept): Likewise. * include/std/concepts (_Swap::_S_noexcept): Likewise. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
7 dayslibstdc++: Add always_inline to ranges iterator ops and access functionsJonathan Wakely1-17/+30
Most of the basis operations for ranges such as ranges::begin and ranges::next are trivial one-line function bodies, so can be made always_inline to reduce the abstraction penalty for -O0 code. Now that we no longer need to support the -fconcepts-ts grammar, we can also move some [[nodiscard]] attributes to the more natural position before the function declaration, instead of between the declarator-id and the function parameters, e.g. we can use: template<typename T> requires C<T> [[nodiscard]] auto operator()(T&&) instead of: template<typename T> requires C<T> auto operator() [[nodiscard]] (T&&) The latter form was necessary because -fconcepts-ts used a different grammar for the requires-clause, parsing 'C<T>[[x]]' as a subscripting operator with an ill-formed argument '[x]'. In the C++20 grammar you would need to use parentheses to use a subscript in a constraint, so without parentheses it's parsed as an attribute. libstdc++-v3/ChangeLog: * include/bits/ranges_base.h (__detail::__to_unsigned_like) (__access::__possible_const_range, __access::__as_const) (__distance_fn::operator(), __next_fn::operator()) (__prev_fn::operator()): Add always_inline attribute. (_Begin::operator(), _End::operator(), _RBegin::operator()) (_REnd::operator(), _Size::operator(), _SSize::operator()) (_Empty::operator(), _Data::operator(), _SSize::operator()): Likewise. Move nodiscard attribute to start of declaration. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
7 daystestsuite: Add tests for PR c/107419 and PR c++/107393H.J. Lu3-0/+50
Both C and C++ frontends should set a tentative TLS model in grokvardecl and update TLS mode with the default TLS access model after a TLS variable has been fully processed if the default TLS access model is stronger. PR c/107419 PR c++/107393 * c-c++-common/tls-attr-common.c: New test. * c-c++-common/tls-attr-le-pic.c: Likewise. * c-c++-common/tls-attr-le-pie.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
7 dayslibstdc++: optimize weak_ptr converting constructor/assignmentGiuseppe D'Angelo2-7/+126
Converting a weak_ptr<Derived> to a weak_ptr<Base> requires calling lock() on the source object in the general case. Although the source weak_ptr<Derived> does contain a raw pointer to Derived, we can't just get it and (up)cast it to Base, as that will dereference the pointer in case Base is a virtual base class of Derived. We don't know if the managed object is still alive, and therefore if this operation is safe to do; we therefore temporarily lock() the source weak_ptr, do the cast using the resulting shared_ptr, and then discard this shared_ptr. Simply checking the strong counter isn't sufficient, because if multiple threads are involved then we'd have a race / TOCTOU problem; the object may get destroyed after we check the strong counter and before we cast the pointer. However lock() is not necessary if we know that Base is *not* a virtual base class of Derived; in this case we can avoid the relatively expensive call to lock() and just cast the pointer. This commit uses the newly added builtin to detect this case and optimize std::weak_ptr's converting constructors and assignment operations. Apart from non-virtual bases, there's also another couple of interesting cases where we can also avoid locking. Specifically: 1) converting a weak_ptr<T[N]> to a weak_ptr<T cv[]>; 2) converting a weak_ptr<T*> to a weak_ptr<T const * const> or similar. Since this logic is going to be used by multiple places, I've centralized it in a new static helper. libstdc++-v3/ChangeLog: * include/bits/shared_ptr_base.h (__weak_ptr): Avoid calling lock() when converting or assigning a weak_ptr<Derived> to a weak_ptr<Base> in case Base is not a virtual base of Derived. This logic is centralized in _S_safe_upcast, called by the various converting constructors/assignment operators. (_S_safe_upcast): New helper function. * testsuite/20_util/weak_ptr/cons/virtual_bases.cc: New test. Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
7 daysc++: Don't upgrade TLS model if TLS model isn't set.H.J. Lu2-3/+16
Don't upgrade TLS model when cplus_decl_attributes is called on a thread local variable whose TLS model isn't set yet. gcc/cp/ PR c++/121889 * decl2.cc (cplus_decl_attributes): Don't upgrade TLS model if TLS model isn't set yet. gcc/testsuite/ PR c++/121889 * g++.dg/tls/pr121889.C: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
7 daysAArch64: Add isfinite expander [PR 66462]Wilco Dijkstra2-0/+48
Add an expander for isfinite using integer arithmetic. This is typically faster and avoids generating spurious exceptions on signaling NaNs. This fixes part of PR66462. int isfinite1 (float x) { return __builtin_isfinite (x); } Before: fabs s0, s0 mov w0, 2139095039 fmov s31, w0 fcmp s0, s31 cset w0, hi eor w0, w0, 1 ret After: fmov w1, s0 mov w0, -16777216 cmp w0, w1, lsl 1 cset w0, hi ret gcc: PR middle-end/66462 * config/aarch64/aarch64.md (isfinite<mode>2): Add new expander. gcc/testsuite: PR middle-end/66462 * gcc.target/aarch64/pr66462.c: Add tests for isfinite.
7 daystree-optimization/121595 - new fabs(a+0.0) -> fabs(a) patternMatteo Nicoli3-0/+28
With -fno-trapping-math it is safe to optimize fabs(a + 0.0) as fabs (a). PR tree-optimization/121595 * match.pd (fabs(a + 0.0) -> fabs (a)): Optimization pattern limited to the -fno-trapping-math case. * gcc.dg/fabs-plus-zero-1.c: New testcase. * gcc.dg/fabs-plus-zero-2.c: Likewise. Signed-off-by: Matteo Nicoli <matteo.nicoli001@gmail.com> Reviewed-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
7 daystestsuite: LoongArch: Enable 16B atomic tests if the test machine supports ↵Xi Ruoyao9-2/+32
LSX and SCQ Enable those tests so we won't make too stupid mistakes in 16B atomic implementation anymore. All these test passed on a Loongson 3C6000/S except atomic-other-int128.c. With GDB patched to support sc.q (https://sourceware.org/pipermail/gdb-patches/2025-August/220034.html) this test also XPASS. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_loongarch_scq_hw): New. (check_effective_target_sync_int_128_runtime): Return 1 on loongarch64-*-* if hardware supports both LSX and SCQ. * gcc.dg/atomic-compare-exchange-5.c: Pass -mlsx -mscq for loongarch64-*-*. * gcc.dg/atomic-exchange-5.c: Likewise. * gcc.dg/atomic-load-5.c: Likewise. * gcc.dg/atomic-op-5.c: Likewise. * gcc.dg/atomic-store-5.c: Likewise. * gcc.dg/atomic-store-6.c: Likewise. * gcc.dg/simulate-thread/atomic-load-int128.c: Likewise. * gcc.dg/simulate-thread/atomic-other-int128.c: Likewise. (dg-final): xfail on loongarch64-*-* because gdb does not handle sc.q properly yet.
7 daysLoongArch: Fix the semantic of 16B CASXi Ruoyao1-41/+63
In a CAS operation, even if expected != *memory we still need to do an atomic load of *memory into output. But I made a mistake in the initial implementation, causing the output to contain junk in this situation. Like a normal atomic load, the atomic load embedded in the CAS semantic is required to work on read-only page. Thus we cannot rely on sc.q to ensure the atomicity of the load. Use LSX to perform the load instead, and also use LSX to compare the 16B values to keep the ll-sc loop body short. gcc/ChangeLog: * config/loongarch/sync.md (atomic_compare_and_swapti_scq): Require LSX. Change the operands for the output, the memory, and the expected value to LSX vector modes. Add a FCCmode output to indicate if CAS has written the desired value into memory. Use LSX to atomically load both words of the 16B value in memory. (atomic_compare_and_swapti): Pun the modes to satisify the new atomic_compare_and_swapti_scq implementation. Read the bool return value from the FCC instead of performing a comparision.
7 daysLoongArch: Fix the "%t" modifier handling for (const_int 0)Xi Ruoyao1-2/+1
This modifier is intended to output $r0 for (const_int 0), but the logic: GET_MODE (op) != TImode || (op != CONST0_RTX (TImode) && code != REG) will reject (const_int 0) because (const_int 0) actually does not have a mode and GET_MODE will return VOIDmode for it. Use reg_or_0_operand instead to fix the issue. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand): Call reg_or_0_operand for checking the sanity of %t.
7 dayslibstdc++: Remove trailing whitespace in <syncstream>Jonathan Wakely1-1/+1
libstdc++-v3/ChangeLog: * include/std/syncstream: Remove trailing whitespace.
7 daystree-optimization/121703 - UBSAN error with moving from uninit dataRichard Biener1-2/+2
The PR reports vectorizer.h:276:3: runtime error: load of value 32695, which is not a valid value for type 'internal_fn' which I believe is from slp_node->data = new vect_load_store_data (std::move (ls)); where 'ls' can be partly uninitialized (and that data will be not used, but of course the move CTOR doesn't know this). The following tries to fix that by using value-initialization of 'ls'. PR tree-optimization/121703 * tree-vect-stmts.cc (vectorizable_store): Value-initialize ls. (vectorizable_load): Likewise.
7 daysRISC-V: Suppress cross CC sibcall optimization from vectorTsukasa OI4-0/+84
In general, tail call optimization requires that the callee's saved registers are a superset of the caller's. The Standard Vector Calling Convention Variant (assembler: .variant_cc) requires that a function with this calling convention preserves vector registers v1-v7 and v24-v31 across calls (i.e. callee-saved). However, the same set of registers are (function-local) temporary registers (i.e. caller-saved) on the normal (non-vector) calling convention. Even if a function with this calling convention variant calls another function with a non-vector calling convention, those vector registers are correctly clobbered -- except when the sibling (tail) call optimization occurs as it violates the general rule mentioned above. If this happens, following function body: 1. Save v1-v7 and v24-v31 for clobbering 2. Call another function with a non-vector calling convention (which may destroy v1-v7 and/or v24-v31) 3. Restore v1-v7 and v24-v31 4. Return. may be incorrectly optimized into the following sequence: 1. Save v1-v7 and v24-v31 for clobbering 2. Restore v1-v7 and v24-v31 (?!) 3. Jump to another function with a non-vector calling convention (which may destroy v1-v7 and/or v24-v31). This commit suppresses cross CC sibling call optimization from the vector calling convention variant. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_function_ok_for_sibcall): Suppress cross calling convention sibcall optimization from the vector calling convention variant. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall.c: New test. * gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall-indirect-1.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall-indirect-2.c: Ditto.
7 daystree-optimization/121829 - bogus CFG with asm gotoRichard Biener2-1/+32
When the vectorizer removes a forwarder created earlier by split_edge it uses redirect_edge_pred for convenience and efficiency. That breaks down when the edge split is originating from an asm goto as that is a jump that needs adjustments from redirect_edge_and_branch. The following factores a simple vect_remove_forwarder handling this situation appropriately. PR tree-optimization/121829 * cfgloopmanip.cc (create_preheader): Ensure we can insert at the end of a preheader. * gcc.dg/torture/pr121829.c: New testcase.