aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-08-08vect: Add target hook to prefer gather/scatter instructionsdevel/omp/gcc-15Andrew Stubbs7-13/+73
For AMD GCN, the instructions available for loading/storing vectors are always scatter/gather operations (i.e. there are separate addresses for each vector lane), so the current heuristic to avoid gather/scatter operations with too many elements in get_group_load_store_type is counterproductive. Avoiding such operations in that function can subsequently lead to a missed vectorization opportunity whereby later analyses in the vectorizer try to use a very wide array type which is not available on this target, and thus it bails out. This patch adds a target hook to override the "single_element_p" heuristic in the function as a target hook, and activates it for GCN. This allows much better code to be generated for affected loops. Co-authored-by: Julian Brown <julian@codesourcery.com> gcc/ * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add documentation hook. * doc/tm.texi: Regenerate. * target.def (prefer_gather_scatter): Add target hook under vectorizer. * hooks.cc (hook_bool_mode_int_unsigned_false): New function. * hooks.h (hook_bool_mode_int_unsigned_false): New prototype. * tree-vect-stmts.cc (vect_use_strided_gather_scatters_p): Add parameters group_size and single_element_p, and rework to use targetm.vectorize.prefer_gather_scatter. (get_group_load_store_type): Move some of the condition into vect_use_strided_gather_scatters_p. * config/gcn/gcn.cc (gcn_prefer_gather_scatter): New function. (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook. (cherry picked from commit 36c5a7aa9a6dbaed07e3a2482c66743ddcb3e776)
2025-07-30Don't pass vector params through to offload targetsAndrew Stubbs3-5/+26
The optimization options are deliberately passed through to the LTO compiler, but when the same mechanism is reused for offloading it ends up forcing the host compiler settings onto the device compiler. Maybe this should be removed completely, but this patch just fixes a few of them. In particular, param_vect_partial_vector_usage is disabled by x86 and this really hurts amdgcn. I also fixed an ambiguous else warning in the generated file by adding braces. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_option_override): Add note to set default for param_vect_partial_vector_usage to "1". * optc-save-gen.awk: Don't pass through options marked "NoOffload". * params.opt (-param=vect-epilogues-nomask): Add NoOffload. (-param=vect-partial-vector-usage): Likewise. (-param=vect-inner-loop-cost-factor): Likewise. (cherry picked from commit b31fa1ce19542e14bea10f46240f39cb37277b80)
2025-07-30amdgcn: add DImode offsets for gather/scatterAndrew Stubbs2-12/+103
Add new variant of he gather_load and scatter_store instructions that take the offsets in DImode. This is not the natural width for offsets in the instruction set, but we can use them to compute a vector of absolute addresses, which does work. This enables the autovectorizer to use gather/scatter in a number of additional scenarios (one of which shows up in the SPEC HPC lbm benchmark). gcc/ChangeLog: * config/gcn/gcn-valu.md (gather_load<mode><vndi>): New. (scatter_store<mode><vndi>): New. (mask_gather_load<mode><vndi>): New. (mask_scatter_store<mode><vndi>): New. * config/gcn/gcn.cc (gcn_expand_scaled_offsets): Support DImode. (cherry picked from commit 351fa55c58a036f148d13bca972e687a0bacd113)
2025-07-30amdgcn: Add ashlvNm, mulvNm macrosAndrew Stubbs1-27/+41
I need some extra shift varieties in the mode-independent code, but the macros don't permit insns that don't have QI/HI variants. This fixes the problem, and adds the new functions for the follow-up patch to use. gcc/ChangeLog: * config/gcn/gcn.cc (GEN_VNM_NOEXEC): Use USE_QHF. (GEN_VNM): Likewise, and call for new ashl and mul variants. (cherry picked from commit f194924984c4eb9c8be5310f78b191b35e576ab8)
2025-07-30amdgcn: add more insn patterns using vec_duplicateAndrew Stubbs2-6/+179
These new insns allow more efficient use of scalar inputs to 64-bit vector add and mul. Also, the patch adjusts the existing mul.._dup because it was actually a dup2 (the vec_duplicate is on the second input), and that was inconveniently inconsistent. The patterns are generally useful, but will be used directly by a follow-up patch. gcc/ChangeLog: * config/gcn/gcn-valu.md (add<mode>3_dup): New. (add<mode>3_dup_exec): New. (<su>mul<mode>3_highpart_dup<exec>): New. (mul<mode>3_dup): Move the vec_duplicate to operand 1. (mul<mode>3_dup_exec): New. (vec_series<mode>): Adjust call to gen_mul<mode>3_dup. * config/gcn/gcn.cc (gcn_expand_vector_init): Likewise. (cherry picked from commit bdc4062a0796788e44d5e6ecd753268a8b453cc7)
2025-07-30amdgcn: Fix various unrecognized pattern issues with add<mode>3_vcc_dupAndrew Stubbs1-11/+11
The patterns did not accept inline immediate constants, even though the hardware instructions do, which has lead to some errors in some patches I'm working on. Also the VCC update RTL was using the wrong operands in the wrong places. This appears to have been harmless(?) but is definitely not intended. gcc/ChangeLog: * config/gcn/gcn-valu.md (add<mode>3_vcc_dup<exec_vcc>): Change operand 2 to allow gcn_alu_operand. Swap the operands in the VCC update RTL. (add<mode>3_vcc_zext_dup): Likewise. (add<mode>3_vcc_zext_dup_exec): Likewise. (add<mode>3_vcc_zext_dup2): Likewise. (add<mode>3_vcc_zext_dup2_exec): Likewise. (cherry picked from commit 4a0967f7509b5fad1c9bda432f71deb0d342a879)
2025-07-30amdgcn: fix vec_ucmp infinite recursionAndrew Stubbs1-3/+3
I suppose this pattern doesn't get used much! The unsigned compare was meant to be defined using the signed compare pattern, but actually ended up trying to recursively call itself. This patch fixes the issue in the obvious way. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_cmpu<mode>di_exec): Call gen_vec_cmp*, not gen_vec_cmpu*. (cherry picked from commit d8680bac95c68002d7e4b13ae1dab1116fdfefc6)
2025-07-30amdgcn: Don't clobber VCC if we don't need toAndrew Stubbs2-30/+21
This is a hold-over from GCN3 where v_add always wrote to the condition register, whether you wanted it or not. This hasn't been true since GCN5, and we dropped support for GCN3 a little while ago, so let's fix it. There was actually a latent bug here because some other post-reload splitters were generating v_add instructions without declaring the VCC clobber (at least mul did this), so this should fix some wrong-code bugs also. gcc/ChangeLog: * config/gcn/gcn-valu.md (add<mode>3<exec_clobber>): Rename ... (add<mode>3<exec>): ... to this, remove the clobber, and change the instruction from v_add_co_u32 to v_add_u32. (add<mode>3_dup<exec_clobber>): Rename ... (add<mode>3_dup<exec>): ... to this, and likewise. (sub<mode>3<exec_clobber>): Rename ... (sub<mode>3<exec>): ... to this, and likewise * config/gcn/gcn.md (addsi3): Remove the DI clobber, and change the instruction from v_add_co_u32 to v_add_u32. (addsi3_scc): Likewise. (subsi3): Likewise, but for v_sub_co_u32. (muldi3): Likewise. (cherry picked from commit 0eee2dd2865faf61d9d74425510421e20434ec03)
2025-07-22ChangeLog.omp bumpThomas Schwinge4-1/+78
2025-07-21GCN, nvptx offloading: Restrain 'WARNING: program timed out.' while in ↵Thomas Schwinge6-6/+6
'dynamic_cast' only for effective-target 'offload_device' [PR119692] In PR119692 "C++ 'typeinfo', 'vtable' vs. OpenACC, OpenMP 'target' offloading": > --- Comment #8 from Rainer Orth <ro at gcc dot gnu.org> --- > The last commit made things worse on sparc-sun-solaris2.11: since that one > (dg-timeout 10) I regularly get > > WARNING: libgomp.c++/target-exceptions-bad_cast-1.C (test for excess errors) > program timed out. > FAIL: libgomp.c++/target-exceptions-bad_cast-1.C (test for excess errors) > UNRESOLVED: libgomp.c++/target-exceptions-bad_cast-1.C compilation failed to produce executable > UNRESOLVED: libgomp.c++/target-exceptions-bad_cast-1.C scan-tree-dump-times optimized "gimple_call <__cxa_bad_cast, " 1 > > Before that, the test had no issue. Compiling the test on an unloaded system > usually takes less than 1 sec, but when fully loaded, times can go up. To keep things simple, let's restrict this temporary (yeah...) workaround to apply only for effective-target 'offload_device', just like the 'dg-xfail-run-if' itself. PR target/119692 libgomp/ * testsuite/libgomp.c++/pr119692-1-4.C: '{ dg-timeout 10 { target offload_device } }'. * testsuite/libgomp.c++/pr119692-1-5.C: Likewise. * testsuite/libgomp.c++/target-exceptions-bad_cast-1.C: Likewise. * testsuite/libgomp.c++/target-exceptions-bad_cast-2.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-1.C: Likewise. * testsuite/libgomp.oacc-c++/exceptions-bad_cast-2.C: Likewise. (cherry picked from commit aa143261bdf6db4334b3fcad7768b53e231f998e)
2025-07-21nvptx: Support '-march=sm_61'Thomas Schwinge13-12/+111
gcc/ * config/nvptx/nvptx-sm.def: Add '61'. * config/nvptx/nvptx-gen.h: Regenerate. * config/nvptx/nvptx-gen.opt: Likewise. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust. * config/nvptx/nvptx.opt (-march-map=sm_61, -march-map=sm_62): Likewise. * config.gcc: Likewise. * doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_61'. * config/nvptx/gen-multilib-matches-tests: Extend. gcc/testsuite/ * gcc.target/nvptx/march-map=sm_61.c: Adjust. * gcc.target/nvptx/march-map=sm_62.c: Likewise. * gcc.target/nvptx/march=sm_61.c: New. libgomp/ * testsuite/libgomp.c/declare-variant-3-sm61.c: New. * testsuite/libgomp.c/declare-variant-3.h: Adjust. (cherry picked from commit 7b53b88381179c5c8152bcb890460f66d9c88fac)
2025-07-21nvptx: Support '-mptx=5.0'Thomas Schwinge6-0/+29
gcc/ * config/nvptx/nvptx-opts.h (enum ptx_version): Add 'PTX_VERSION_5_0'. * config/nvptx/nvptx.cc (ptx_version_to_string) (ptx_version_to_number): Adjust. * config/nvptx/nvptx.h (TARGET_PTX_5_0): New. * config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue' '5.0' for 'PTX_VERSION_5_0'. * doc/invoke.texi (Nvidia PTX Options): Document '-mptx=5.0'. gcc/testsuite/ * gcc.target/nvptx/mptx=5.0.c: New. (cherry picked from commit 97616687149f115e0ab946b9a05a9f8c1e47429e)
2025-07-21Adjust 'libgomp.c++/target-cdtor-{1,2}.C' for 'targetm.cxx.use_aeabi_atexit' ↵Thomas Schwinge2-12/+22
[PR119853, PR119854] Fix-up for commit aafe942227baf8c2bcd4cac2cb150e49a4b895a9 "GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]": we need to adjust for 'targetm.cxx.use_aeabi_atexit': gcc/config/arm/arm.cc:#define TARGET_CXX_USE_AEABI_ATEXIT arm_cxx_use_aeabi_atexit gcc/config/arm/arm.cc:/* The EABI says __aeabi_atexit should be used to register static gcc/config/arm/arm.cc- destructors. */ gcc/config/arm/arm.cc- gcc/config/arm/arm.cc-static bool gcc/config/arm/arm.cc:arm_cxx_use_aeabi_atexit (void) gcc/config/arm/arm.cc-{ gcc/config/arm/arm.cc- return TARGET_AAPCS_BASED; gcc/config/arm/arm.cc-} ..., which 'gcc/cp/decl.cc:get_atexit_node' then acts on: call '__aeabi_atexit' instead of '__cxa_atexit', and swap two arguments. PR target/119853 PR target/119854 libgomp/ * testsuite/libgomp.c++/target-cdtor-1.C: Adjust for 'targetm.cxx.use_aeabi_atexit'. * testsuite/libgomp.c++/target-cdtor-2.C: Likewise. (cherry picked from commit 04b42c4245d85f77aa54ec002ebd7bbe6fde5f11)
2025-07-03ChangeLog.omp bumpThomas Schwinge3-1/+20
2025-07-03OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins: Fix test casesThomas Schwinge2-4/+4
With this fix-up for commit 387209938d2c476a67966c6ddbdbf817626f24a2 "OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins", we progress: PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c (test for excess errors) PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-not optimized "abort" -FAIL: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices;" 1 +PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices" 1 PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump optimized "_1 = __builtin_omp_get_num_devices \\(\\);[\\r\\n]+[ ]+return _1;" ... etc. for offloading configurations. gcc/testsuite/ * c-c++-common/gomp/omp_get_num_devices_initial_device.c: Fix. * gfortran.dg/gomp/omp_get_num_devices_initial_device.f90: Likewise. (cherry picked from commit 13c766066e23eb6ddf6bad7a5664b9d3ca8c1974)
2025-07-03libgomp: Fix up omp_target_memset-3.c test for C++ [PR120444]Jakub Jelinek1-2/+2
The test PASSes for C, but FAILs for C++: .../libgomp.c-c++-common/omp_target_memset-3.c: In function 'void test_it(void*, int, size_t)': .../libgomp.c-c++-common/omp_target_memset-3.c:31:7: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] .../libgomp.c-c++-common/omp_target_memset-3.c:33:13: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive] .../libgomp.c-c++-common/omp_target_memset-3.c:10:19: note: initializing argument 1 of 'void init_val(int8_t*, int, size_t)' .../libgomp.c-c++-common/omp_target_memset-3.c:37:14: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive] .../libgomp.c-c++-common/omp_target_memset-3.c:17:20: note: initializing argument 1 of 'void check_val(int8_t*, int, size_t)' .../libgomp.c-c++-common/omp_target_memset-3.c:38:18: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] .../libgomp.c-c++-common/omp_target_memset-3.c:38:18: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive] .../libgomp.c-c++-common/omp_target_memset-3.c:17:20: note: initializing argument 1 of 'void check_val(int8_t*, int, size_t)' .../libgomp.c-c++-common/omp_target_memset-3.c: In function 'int main()': .../libgomp.c-c++-common/omp_target_memset-3.c:46:7: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] The following two-liner fixes that, tested on x86_64-linux and i686-linux. 2025-06-03 Jakub Jelinek <jakub@redhat.com> PR libgomp/120444 * testsuite/libgomp.c-c++-common/omp_target_memset-3.c (test_it): Change ptr argument type from void * to int8_t *. (main): Change ptr variable type from void * to int8_t * and cast omp_target_alloc result to the latter type. (cherry picked from commit a8c03f056f4070a618bc59afcae2290cf21456ea)
2025-06-17ChangeLog.omp bumpTobias Burnus3-1/+17
2025-06-17OpenMP: Fix implicit 'declare target' for <ostream>Tobias Burnus2-1/+27
libstdc++-v3/include/std/ostream contains: namespace std _GLIBCXX_VISIBILITY(default) { ... template<typename _CharT, typename _Traits> inline basic_ostream<_CharT, _Traits>& endl(basic_ostream<_CharT, _Traits>& __os) { return flush(__os.put(__os.widen('\n'))); } ... #include <bits/ostream.tcc> and the latter, libstdc++-v3/include/bits/ostream.tcc, has: // Inhibit implicit instantiations for required instantiations, // which are defined via explicit instantiations elsewhere. #if _GLIBCXX_EXTERN_TEMPLATE extern template class basic_ostream<char>; extern template ostream& endl(ostream&); Before this commit, omp_discover_declare_target_tgt_fn_r marked 'endl' as (implicitly) declare target - but not the calls in it due to the 'extern' (DECL_EXTERNAL). Thanks to inlining and as 'endl' is (therefore) not used and, hence, discarded by the linker; hencet, it works with -O0 and -O1. However, as the (unused) function still exits, IPA CP (enabled by -O2) will try to do constant-value propagation and fails as the definition of 'widen' is not available. Solution is to still walk 'endl' despite being an 'extern(al)' decl; this has been restricted for now to DECL_DECLARED_INLINE_P. gcc/ChangeLog: * omp-offload.cc (omp_discover_declare_target_tgt_fn_r): Also walk external functions that are declare inline (and have a DECL_SAVED_TREE). libgomp/ChangeLog: * testsuite/libgomp.c++/declare_target-2.C: New test. (cherry picked from commit ea43b99537591b1103da3961c61f1cbfae968859)
2025-06-17Merge branch 'releases/gcc-15' into devel/omp/gcc-15Tobias Burnus32-470/+974
Merge up to r15-9840-g9803e23a212962 (June 17, 2025)
2025-06-17Daily bump.GCC Administrator1-1/+1
2025-06-16Daily bump.GCC Administrator1-1/+1
2025-06-15Daily bump.GCC Administrator3-1/+16
2025-06-14AVR: Fix PR120423 / PR116389.Georg-Johann Lay4-0/+116
The problem with PR120423 and PR116389 is that reload might assign an invalid hard register to a paradoxical subreg. For example with the test case from the PR, it assigns (REG:QI 31) to the inner of (subreg:HI (QI) 0) which is valid, but the subreg will be turned into (REG:HI 31) which is invalid and triggers an ICE in postreload. The problem only occurs with the old reload pass. The patch maps the paradoxical subregs to a zero-extends which will be allocated correctly. For the 120423 testcases, the code is the same like with -mlra (which doesn't implement the fix), so the patch doesn't even introduce a performance penalty. The patch is only needed for v15: v14 is not affected, and in v16 reload will be removed. PR rtl-optimization/120423 PR rtl-optimization/116389 gcc/ * config/avr/avr.md [-mno-lra]: Add pre-reload split to transform (left shift of) a paradoxical subreg to a (left shift of) zero-extend. gcc/testsuite/ * gcc.target/avr/torture/pr120423-1.c: New test. * gcc.target/avr/torture/pr120423-2.c: New test. * gcc.target/avr/torture/pr120423-116389.c: New test.
2025-06-14Daily bump.GCC Administrator4-1/+70
2025-06-13Fix test case for PR117811 which failed for int < 32 bit.Georg-Johann Lay1-0/+5
PR middle-end/117811 PR testsuite/52641 gcc/testsuite/ * gcc.dg/torture/pr117811.c: Fix for int < 32 bit. (cherry picked from commit 07f229c2d7ee6b604e5a86092e675d5d36c1ba4e)
2025-06-13recip: Reset range info when replacing sqrt with rsqrt [PR120638]Jakub Jelinek2-0/+32
This pass reuses a SSA_NAME on the lhs of sqrt etc. call as lhs of .RSQRT etc. call. The following testcase is miscompiled since my recent ranger cast changes, because we compute (correct) range for sqrtf argument as well as result but then recip pass keeps using that range for the .RQSRT call which returns 1. / sqrt, so the function then returns 0.5f unconditionally. Note, on foo this is a regression from GCC 15, but on bar it regressed already with the r14-536 change. 2025-06-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/120638 * tree-ssa-math-opts.cc (pass_cse_reciprocals::execute): Call reset_flow_sensitive_info on arg1. * gcc.dg/pr120638.c: New test. (cherry picked from commit 8804e5b5b127b27d099d0c361fa2161d0b13edef)
2025-06-13real: Fix up real_from_integer [PR120547]Jakub Jelinek2-12/+41
The function has 2 problems, one is _BitInt specific and the other is most likely also reproduceable only with it. The first issue is that I've missed updating the function for _BitInt, maxbitlen as MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT obviously isn't guaranteed to be larger than any integral type we might want to convert at compile time from wide_int to REAL_VALUE_FORMAT. Just using len instead of it works fine, at least when used after HOST_BITS_PER_WIDE_INT is added to it and it is truncated to multiples of HOST_BITS_PER_WIDE_INT. The other bug is that if the value has too many significant bits (formerly maxbitlen - cnt_l_z, now len - cnt_l_z), the code just shifts it right and adds the shift count to the future exponent. That isn't correct for rounding as the testcase attempts to show, the internal real format has more bits than any precision in supported format, but we still need to distinguish bewtween values exactly half way between representable floating point values (those should be rounded to even) and the case when we've shifted away some non-zero bits, so the value was tiny bit larger than half way and then we should round up. The patch uses something like e.g. soft-fp uses in these cases, right shift with sticky bit in the least significant bit. 2025-06-05 Jakub Jelinek <jakub@redhat.com> PR middle-end/120547 * real.cc (real_from_integer): Remove maxbitlen variable, use len instead of that. When shifting right, or in 1 if any of the shifted away bits are non-zero. Formatting fix. * gcc.dg/bitint-123.c: New test. (cherry picked from commit ea9ea72e448e391d4be781b74956a0190f93afc8)
2025-06-13tree-chrec: Use signed_type_for in convert_affine_scevJakub Jelinek1-1/+1
On s390x-linux I've run into the gcc.dg/torture/bitint-27.c test ICEing in build_nonstandard_integer_type called from convert_affine_scev (not sure why it doesn't trigger on x86_64/aarch64). The problem is clear, when ct is a BITINT_TYPE with some large TYPE_PRECISION, build_nonstandard_integer_type won't really work on it. The patch fixes it similarly what has been done for GCC 14 in various other spots. 2025-05-20 Jakub Jelinek <jakub@redhat.com> * tree-chrec.cc (convert_affine_scev): Use signed_type_for instead of build_nonstandard_integer_type. (cherry picked from commit e38027c8ff449ffadaca449004bb891b9094ad00)
2025-06-13Fortran: Fix missing substring ref for allocatable saved vars [PR120483]Andre Vehreschild2-3/+26
Compute a substring ref on an allocatable static character array using pointer arithmetic. Using an array type corrupts type layouting and crashes omp generation. PR fortran/120483 gcc/fortran/ChangeLog: * trans-expr.cc (gfc_conv_substring): Use pointer arithmetic on static allocatable char arrays. gcc/testsuite/ChangeLog: * gfortran.dg/save_8.f90: New test. (cherry picked from commit 5c9bdfd2748b8159856a37404ab7b34d977242ce)
2025-06-13Daily bump.GCC Administrator6-1/+69
2025-06-12Update gcc es.poJoseph Myers1-106/+87
* es.po: Update.
2025-06-12libstdc++: Do not specialize std::formatter for incomplete type [PR120625]Jonathan Wakely2-7/+30
Using an incomplete type as the template argument for std::formatter specializations causes problems for program-defined specializations of std::formatter which have constraints. When the compiler has to find which specialization of std::formatter to use for the incomplete type it considers the program-defined specializations and checks to see if their constraints are satisfied, which can give errors if the constraints cannot be checked for incomplete types. This replaces the base class of the disabled specializations with a concrete class __formatter_disabled, so there is no need to match a specialization and no more incomplete type. libstdc++-v3/ChangeLog: PR libstdc++/120625 * include/std/format (__format::__disabled): Remove. (__formatter_disabled): New type. (formatter<char*, wchar_t>, formatter<const char*, wchar_t>) (formatter<char[N], wchar_t>, formatter<string, wchar_t>) (formatter<string_view, wchar_t>): Use __formatter_disabled as base class instead of formatter<__disabled, wchar_t>. * testsuite/std/format/formatter/120625.cc: New test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> (cherry picked from commit 76bf78d32c683af3bf88f4aef595048edbd82372)
2025-06-12ipa: When inlining, don't combine PT JFs changing signedness (PR120295)Martin Jambor2-0/+94
In GCC 15 we allowed jump-function generation code to skip over a type-cast converting one integer to another as long as the latter can hold all the values of the former or has at least the same precision. This works well for IPA-CP where we do then evaluate each jump function as we propagate values and value-ranges. However, the test-case in PR 120295 shows a problem with inlining, where we combine pass-through jump-functions so that they are always relative to the function which is the root of the inline tree. Unfortunately, we are happy to combine also those with type-casts to a different signedness which makes us use sign zero extension for the expected value ranges where we should have used sign extension. When the value-range which then leads to wrong insertion of a call to builtin_unreachable is being computed, the information about an existence of a intermediary signed type has already been lost during previous inlining. This patch simply blocks combining such jump-functions so that it is back-portable to GCC 15. Once we switch pass-through jump functions to use a vector of operations rather than having room for just one, we will be able to address this situation with adding an extra conversion instead. gcc/ChangeLog: 2025-05-19 Martin Jambor <mjambor@suse.cz> PR ipa/120295 * ipa-prop.cc (update_jump_functions_after_inlining): Do not combine pass-through jump functions with type-casts changing signedness. gcc/testsuite/ChangeLog: 2025-05-19 Martin Jambor <mjambor@suse.cz> PR ipa/120295 * gcc.dg/ipa/pr120295.c: New test. (cherry picked from commit 0b004c92f5ea239936a403a2a757e12ca82ce6d8)
2025-06-12ada: Fix documentation of Generalized Finalization extensionEric Botcazou2-259/+162
The current documentation does not reflect the implementation present in the compiler and contains various other inaccuracies. gcc/ada/ChangeLog: * doc/gnat_rm/gnat_language_extensions.rst (Generalized Finalization): Document the actual implementation. (No_Raise): Move to separate section. * gnat_rm.texi: Regenerate.
2025-06-12ada: Fix wrong visibility over discriminantsRonan Desplanques1-4/+12
This patch fixes an issue where the compiler was incorrectly allowing references to discriminants of the ancestor type in private type extensions. gcc/ada/ChangeLog: * sem_ch3.adb (Build_Derived_Private_Type): Fix test. (Build_Derived_Record_Type): Adjust error recovery paths.
2025-06-12ada: Tweak special handling of synchronized type scopesRonan Desplanques1-8/+20
Exp_Util.Insert_Actions handles scopes of synchronized types specially, but the condition it tested before this patch was not quite correct in some cases, for example during some expansion operations made under Expand_N_Task_Type_Declaration. This patch refines the test. gcc/ada/ChangeLog: * exp_util.adb (Insert_Actions): Refine test.
2025-06-12ada: Small tweak to latest changeEric Botcazou2-5/+3
gcc/ada/ChangeLog: * doc/gnat_ugn/building_executable_programs_with_gnat.rst (Compiler switches) <-O>: Fix long line. * gnat_ugn.texi: Regenerate.
2025-06-12ada: Document supported GCC optimization switchesEric Botcazou4-67/+119
In particular the most recently added ones, namely -Og and -Oz. But -Ofast is not documented because it disregards strict compliance with standards. gcc/ada/ChangeLog: * usage.adb (Usage): Justify the documentation of common switches like that of other switches. Rework that of the -O switch. * doc/gnat_ugn/building_executable_programs_with_gnat.rst (Compiler switches) <-O>: Rework and document 'z' and 'g' operands. * doc/gnat_ugn/gnat_and_program_execution.rst (Optimization Levels): Rework and document -Oz and -Og switches. * gnat_ugn.texi: Regenerate.
2025-06-12Daily bump.GCC Administrator1-1/+1
2025-06-11Daily bump.GCC Administrator4-1/+77
2025-06-10ChangeLog.omp bumpTobias Burnus4-1/+65
2025-06-10gcn: Add experimental MI300 (gfx942) supportTobias Burnus11-76/+208
As gfx942 and gfx950 belong to gfx9-4-generic, the latter two are also added. Note that there are no specific optimizations for MI300, yet. For none of the mentioned devices, any multilib is build by default; use '--with-multilib-list=' when configuring GCC to build them alongside. gfx942 was added in LLVM (and its mc assembler, used by GCC) in version 18, generic support in LLVM 19 and gfx950 in LLVM 20. gcc/ChangeLog: * config/gcn/gcn-devices.def: Add gfx942, gfx950 and gfx9-4-generic. * config/gcn/gcn-opts.h (TARGET_CDNA3, TARGET_CDNA3_PLUS, TARGET_GLC_NAME, TARGET_TARGET_SC_CACHE): Define. (TARGET_ARCHITECTED_FLAT_SCRATCH): Use also for CDNA3. * config/gcn/gcn.h (gcn_isa): Add ISA_CDNA3 to the enum. * config/gcn/gcn.cc (print_operand): Update 'g' to use TARGET_GLC_NAME; add 'G' to print TARGET_GLC_NAME unconditionally. * config/gcn/gcn-valu.md (scatter, gather): Use TARGET_GLC_NAME. * config/gcn/gcn.md: Use %G<num> instead of glc; use 'buffer_inv sc1' for TARGET_TARGET_SC_CACHE. * doc/invoke.texi (march): Add gfx942, gfx950 and gfx9-4-generic. * doc/install.texi (amdgcn*-*-*): Add gfx942, gfx950 and gfx9-4-generic. * config/gcn/gcn-tables.opt: Regenerate. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4.h (gfx942): New variant function. * testsuite/libgomp.c/declare-variant-4-gfx942.c: New test. (cherry picked from commit 37b454b7e171bd8a792cbe4c57ea0f9702afa22d)
2025-06-10libgomp: Add OpenMP's omp_target_memset/omp_target_memset_asyncTobias Burnus17-4/+642
PR libgomp/120444 include/ChangeLog: * cuda/cuda.h (cuMemsetD8, cuMemsetD8Async): Declare. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memset): Declare. * libgomp.h (struct gomp_device_descr): Add memset_func. * libgomp.map (GOMP_6.0.1): Add omp_target_memset{,_async}. * libgomp.texi (Device Memory Routines): Document them. * omp.h.in (omp_target_memset, omp_target_memset_async): Declare. * omp_lib.f90.in (omp_target_memset, omp_target_memset_async): Add interfaces. * omp_lib.h.in (omp_target_memset, omp_target_memset_async): Likewise. * plugin/cuda-lib.def: Add cuMemsetD8. * plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add hsa_amd_memory_fill_fn. (init_hsa_runtime_functions): DLSYM_OPT_FN load it. (GOMP_OFFLOAD_memset): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memset): New. * target.c (omp_target_memset_int, omp_target_memset, omp_target_memset_async_helper, omp_target_memset_async): New. (gomp_load_plugin_for_device): Add DLSYM (memset). * testsuite/libgomp.c-c++-common/omp_target_memset.c: New test. * testsuite/libgomp.c-c++-common/omp_target_memset-2.c: New test. * testsuite/libgomp.c-c++-common/omp_target_memset-3.c: New test. * testsuite/libgomp.fortran/omp_target_memset.f90: New test. * testsuite/libgomp.fortran/omp_target_memset-2.f90: New test. (cherry picked from commit 4e47e2f833732c5d9a3c3e69dc753f99b3a56737)
2025-06-10Merge branch 'releases/gcc-15' into devel/omp/gcc-15Tobias Burnus34-435/+1414
Merge up to r15-9819-g5327eef7b003f6 (June 10, 2025)
2025-06-10libstdc++: Make system_clock::to_time_t always_inline [PR99832]Jonathan Wakely2-0/+16
For some 32-bit targets Glibc supports changing the size of time_t to be 64 bits by defining _TIME_BITS=64. That causes an ABI change which would affect std::chrono::system_clock::to_time_t. Because to_time_t is not a function template, its mangled name does not depend on the return type, so it has the same mangled name whether it returns a 32-bit time_t or a 64-bit time_t. On targets where the size of time_t can be selected at preprocessing time, that can cause ODR violations, e.g. the linker selects a definition of to_time_t that returns a 32-bit value but a caller expects 64-bit and so reads 32 bits of garbage from the stack. This commit adds always_inline to to_time_t so that all callers inline the conversion to time_t, and will do so using whatever type time_t happens to be in that translation unit. Existing objects compiled before this change will either have inlined the function anyway (which is likely if compiled with any optimization enabled) or will contain a COMDAT definition of the inline function and so still be able to find it at link-time. The attribute is also added to system_clock::from_time_t, because that's an equally simple function and it seems reasonable for them to both be always inlined. libstdc++-v3/ChangeLog: PR libstdc++/99832 * include/bits/chrono.h (system_clock::to_time_t): Add always_inline attribute to be agnostic to the underlying type of time_t. (system_clock::from_time_t): Add always_inline for consistency with to_time_t. * testsuite/20_util/system_clock/99832.cc: New test. (cherry picked from commit d045eb13b0b42870a1f081895df3901112a358f0)
2025-06-10libstdc++: Fix std::format thousands separators when sign present [PR120548]Jonathan Wakely2-2/+19
The leading sign character should be skipped when deciding whether to insert thousands separators into a floating-point format. libstdc++-v3/ChangeLog: PR libstdc++/120548 * include/std/format (__formatter_fp::_M_localize): Do not include a leading sign character in the string to be grouped. * testsuite/std/format/functions/format.cc: Check grouping when sign is present in the output. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> (cherry picked from commit 2c3559839d70df6311da18fd93237050405580c3)
2025-06-10vectorizer: Fix riscv build [PR120042]Andrew Pinski1-0/+1
r15-9859-ga6cfde60d8c added a call to dominated_by_p to tree-vectorizer.h but dominance.h is not always included; and you get a build failure on riscv building riscv-vector-costs.cc. Let's add the include of dominance.h to tree-vectorizer.h Pushed as obvious after builds for riscv and x86_64. gcc/ChangeLog: PR target/120042 * tree-vectorizer.h: Include dominance.h. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> (cherry picked from commit 299d48ff4a34c00a6ef964b694fb9b1312683049)
2025-06-10ada: Error on subtype with static predicate used in case_expressionGary Dismukes2-4/+6
The compiler improperly flags an error on the use of a subtype with a static predicate as a choice in a case expression alternative, complaining that the subtype has a nonstatic predicate. The fix for this is to add a test for the subtype not having a static predicate. gcc/ada/ChangeLog: * einfo.ads: Revise comment about Dynamic_Predicate flag to make it more accurate. * sem_case.adb (Check_Choices): Test "not Has_Static_Predicate_Aspect" as additional guard for error about use of subtype with nonstatic predicate as a case choice. Improve related error message.
2025-06-10ada: Fix fallout of latest changeEric Botcazou1-1/+7
Freeze_Static_Object needs to deal with the objects that have been created by Insert_Conditional_Object_Declaration. gcc/ada/ChangeLog: * freeze.adb (Freeze_Static_Object): Do not issue any error message for compiler-generated entities.
2025-06-10ada: Fix wrong initialization of library-level object by conditional expressionEric Botcazou2-4/+15
The previous fix was not robust enough in the presence of transient scopes. gcc/ada/ChangeLog: * exp_ch4.adb (Insert_Conditional_Object_Declaration): Deal with a transient scope being created around the declaration. * freeze.adb (Freeze_Entity): Do not call Freeze_Static_Object for a renaming declaration.