Age | Commit message (Collapse) | Author | Files | Lines |
|
For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.
This patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.
Co-authored-by: Julian Brown <julian@codesourcery.com>
gcc/
* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
documentation hook.
* doc/tm.texi: Regenerate.
* target.def (prefer_gather_scatter): Add target hook under vectorizer.
* hooks.cc (hook_bool_mode_int_unsigned_false): New function.
* hooks.h (hook_bool_mode_int_unsigned_false): New prototype.
* tree-vect-stmts.cc (vect_use_strided_gather_scatters_p): Add
parameters group_size and single_element_p, and rework to use
targetm.vectorize.prefer_gather_scatter.
(get_group_load_store_type): Move some of the condition into
vect_use_strided_gather_scatters_p.
* config/gcn/gcn.cc (gcn_prefer_gather_scatter): New function.
(TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook.
(cherry picked from commit 36c5a7aa9a6dbaed07e3a2482c66743ddcb3e776)
|
|
The optimization options are deliberately passed through to the LTO compiler,
but when the same mechanism is reused for offloading it ends up forcing the
host compiler settings onto the device compiler. Maybe this should be removed
completely, but this patch just fixes a few of them. In particular,
param_vect_partial_vector_usage is disabled by x86 and this really hurts amdgcn.
I also fixed an ambiguous else warning in the generated file by adding braces.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_option_override): Add note to set default for
param_vect_partial_vector_usage to "1".
* optc-save-gen.awk: Don't pass through options marked "NoOffload".
* params.opt (-param=vect-epilogues-nomask): Add NoOffload.
(-param=vect-partial-vector-usage): Likewise.
(-param=vect-inner-loop-cost-factor): Likewise.
(cherry picked from commit b31fa1ce19542e14bea10f46240f39cb37277b80)
|
|
Add new variant of he gather_load and scatter_store instructions that take the
offsets in DImode. This is not the natural width for offsets in the
instruction set, but we can use them to compute a vector of absolute addresses,
which does work.
This enables the autovectorizer to use gather/scatter in a number of additional
scenarios (one of which shows up in the SPEC HPC lbm benchmark).
gcc/ChangeLog:
* config/gcn/gcn-valu.md (gather_load<mode><vndi>): New.
(scatter_store<mode><vndi>): New.
(mask_gather_load<mode><vndi>): New.
(mask_scatter_store<mode><vndi>): New.
* config/gcn/gcn.cc (gcn_expand_scaled_offsets): Support DImode.
(cherry picked from commit 351fa55c58a036f148d13bca972e687a0bacd113)
|
|
I need some extra shift varieties in the mode-independent code, but the macros
don't permit insns that don't have QI/HI variants. This fixes the problem, and
adds the new functions for the follow-up patch to use.
gcc/ChangeLog:
* config/gcn/gcn.cc (GEN_VNM_NOEXEC): Use USE_QHF.
(GEN_VNM): Likewise, and call for new ashl and mul variants.
(cherry picked from commit f194924984c4eb9c8be5310f78b191b35e576ab8)
|
|
These new insns allow more efficient use of scalar inputs to 64-bit vector
add and mul. Also, the patch adjusts the existing mul.._dup because it was
actually a dup2 (the vec_duplicate is on the second input), and that was
inconveniently inconsistent.
The patterns are generally useful, but will be used directly by a follow-up
patch.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (add<mode>3_dup): New.
(add<mode>3_dup_exec): New.
(<su>mul<mode>3_highpart_dup<exec>): New.
(mul<mode>3_dup): Move the vec_duplicate to operand 1.
(mul<mode>3_dup_exec): New.
(vec_series<mode>): Adjust call to gen_mul<mode>3_dup.
* config/gcn/gcn.cc (gcn_expand_vector_init): Likewise.
(cherry picked from commit bdc4062a0796788e44d5e6ecd753268a8b453cc7)
|
|
The patterns did not accept inline immediate constants, even though the
hardware instructions do, which has lead to some errors in some patches I'm
working on.
Also the VCC update RTL was using the wrong operands in the wrong places. This
appears to have been harmless(?) but is definitely not intended.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (add<mode>3_vcc_dup<exec_vcc>): Change
operand 2 to allow gcn_alu_operand. Swap the operands in the VCC
update RTL.
(add<mode>3_vcc_zext_dup): Likewise.
(add<mode>3_vcc_zext_dup_exec): Likewise.
(add<mode>3_vcc_zext_dup2): Likewise.
(add<mode>3_vcc_zext_dup2_exec): Likewise.
(cherry picked from commit 4a0967f7509b5fad1c9bda432f71deb0d342a879)
|
|
I suppose this pattern doesn't get used much! The unsigned compare was meant to
be defined using the signed compare pattern, but actually ended up trying to
recursively call itself. This patch fixes the issue in the obvious way.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (vec_cmpu<mode>di_exec): Call gen_vec_cmp*,
not gen_vec_cmpu*.
(cherry picked from commit d8680bac95c68002d7e4b13ae1dab1116fdfefc6)
|
|
This is a hold-over from GCN3 where v_add always wrote to the condition
register, whether you wanted it or not. This hasn't been true since GCN5, and
we dropped support for GCN3 a little while ago, so let's fix it.
There was actually a latent bug here because some other post-reload splitters
were generating v_add instructions without declaring the VCC clobber (at least
mul did this), so this should fix some wrong-code bugs also.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (add<mode>3<exec_clobber>): Rename ...
(add<mode>3<exec>): ... to this, remove the clobber, and change the
instruction from v_add_co_u32 to v_add_u32.
(add<mode>3_dup<exec_clobber>): Rename ...
(add<mode>3_dup<exec>): ... to this, and likewise.
(sub<mode>3<exec_clobber>): Rename ...
(sub<mode>3<exec>): ... to this, and likewise
* config/gcn/gcn.md (addsi3): Remove the DI clobber, and change the
instruction from v_add_co_u32 to v_add_u32.
(addsi3_scc): Likewise.
(subsi3): Likewise, but for v_sub_co_u32.
(muldi3): Likewise.
(cherry picked from commit 0eee2dd2865faf61d9d74425510421e20434ec03)
|
|
|
|
'dynamic_cast' only for effective-target 'offload_device' [PR119692]
In PR119692 "C++ 'typeinfo', 'vtable' vs. OpenACC, OpenMP 'target' offloading":
> --- Comment #8 from Rainer Orth <ro at gcc dot gnu.org> ---
> The last commit made things worse on sparc-sun-solaris2.11: since that one
> (dg-timeout 10) I regularly get
>
> WARNING: libgomp.c++/target-exceptions-bad_cast-1.C (test for excess errors)
> program timed out.
> FAIL: libgomp.c++/target-exceptions-bad_cast-1.C (test for excess errors)
> UNRESOLVED: libgomp.c++/target-exceptions-bad_cast-1.C compilation failed to produce executable
> UNRESOLVED: libgomp.c++/target-exceptions-bad_cast-1.C scan-tree-dump-times optimized "gimple_call <__cxa_bad_cast, " 1
>
> Before that, the test had no issue. Compiling the test on an unloaded system
> usually takes less than 1 sec, but when fully loaded, times can go up.
To keep things simple, let's restrict this temporary (yeah...) workaround to
apply only for effective-target 'offload_device', just like the
'dg-xfail-run-if' itself.
PR target/119692
libgomp/
* testsuite/libgomp.c++/pr119692-1-4.C: '{ dg-timeout 10 { target offload_device } }'.
* testsuite/libgomp.c++/pr119692-1-5.C: Likewise.
* testsuite/libgomp.c++/target-exceptions-bad_cast-1.C: Likewise.
* testsuite/libgomp.c++/target-exceptions-bad_cast-2.C: Likewise.
* testsuite/libgomp.oacc-c++/exceptions-bad_cast-1.C: Likewise.
* testsuite/libgomp.oacc-c++/exceptions-bad_cast-2.C: Likewise.
(cherry picked from commit aa143261bdf6db4334b3fcad7768b53e231f998e)
|
|
gcc/
* config/nvptx/nvptx-sm.def: Add '61'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_61, -march-map=sm_62):
Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_61'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_61.c: Adjust.
* gcc.target/nvptx/march-map=sm_62.c: Likewise.
* gcc.target/nvptx/march=sm_61.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm61.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.
(cherry picked from commit 7b53b88381179c5c8152bcb890460f66d9c88fac)
|
|
gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
'PTX_VERSION_5_0'.
* config/nvptx/nvptx.cc (ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h (TARGET_PTX_5_0): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'5.0' for 'PTX_VERSION_5_0'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=5.0'.
gcc/testsuite/
* gcc.target/nvptx/mptx=5.0.c: New.
(cherry picked from commit 97616687149f115e0ab946b9a05a9f8c1e47429e)
|
|
[PR119853, PR119854]
Fix-up for commit aafe942227baf8c2bcd4cac2cb150e49a4b895a9
"GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]":
we need to adjust for 'targetm.cxx.use_aeabi_atexit':
gcc/config/arm/arm.cc:#define TARGET_CXX_USE_AEABI_ATEXIT arm_cxx_use_aeabi_atexit
gcc/config/arm/arm.cc:/* The EABI says __aeabi_atexit should be used to register static
gcc/config/arm/arm.cc- destructors. */
gcc/config/arm/arm.cc-
gcc/config/arm/arm.cc-static bool
gcc/config/arm/arm.cc:arm_cxx_use_aeabi_atexit (void)
gcc/config/arm/arm.cc-{
gcc/config/arm/arm.cc- return TARGET_AAPCS_BASED;
gcc/config/arm/arm.cc-}
..., which 'gcc/cp/decl.cc:get_atexit_node' then acts on: call '__aeabi_atexit'
instead of '__cxa_atexit', and swap two arguments.
PR target/119853
PR target/119854
libgomp/
* testsuite/libgomp.c++/target-cdtor-1.C: Adjust for
'targetm.cxx.use_aeabi_atexit'.
* testsuite/libgomp.c++/target-cdtor-2.C: Likewise.
(cherry picked from commit 04b42c4245d85f77aa54ec002ebd7bbe6fde5f11)
|
|
|
|
With this fix-up for commit 387209938d2c476a67966c6ddbdbf817626f24a2
"OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins", we progress:
PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c (test for excess errors)
PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-not optimized "abort"
-FAIL: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices;" 1
+PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump-times optimized "omp_get_num_devices" 1
PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c scan-tree-dump optimized "_1 = __builtin_omp_get_num_devices \\(\\);[\\r\\n]+[ ]+return _1;"
... etc. for offloading configurations.
gcc/testsuite/
* c-c++-common/gomp/omp_get_num_devices_initial_device.c: Fix.
* gfortran.dg/gomp/omp_get_num_devices_initial_device.f90: Likewise.
(cherry picked from commit 13c766066e23eb6ddf6bad7a5664b9d3ca8c1974)
|
|
The test PASSes for C, but FAILs for C++:
.../libgomp.c-c++-common/omp_target_memset-3.c: In function 'void test_it(void*, int, size_t)':
.../libgomp.c-c++-common/omp_target_memset-3.c:31:7: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith]
.../libgomp.c-c++-common/omp_target_memset-3.c:33:13: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive]
.../libgomp.c-c++-common/omp_target_memset-3.c:10:19: note: initializing argument 1 of 'void init_val(int8_t*, int, size_t)'
.../libgomp.c-c++-common/omp_target_memset-3.c:37:14: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive]
.../libgomp.c-c++-common/omp_target_memset-3.c:17:20: note: initializing argument 1 of 'void check_val(int8_t*, int, size_t)'
.../libgomp.c-c++-common/omp_target_memset-3.c:38:18: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith]
.../libgomp.c-c++-common/omp_target_memset-3.c:38:18: error: invalid conversion from 'void*' to 'int8_t*' {aka 'signed char*'} [-fpermissive]
.../libgomp.c-c++-common/omp_target_memset-3.c:17:20: note: initializing argument 1 of 'void check_val(int8_t*, int, size_t)'
.../libgomp.c-c++-common/omp_target_memset-3.c: In function 'int main()':
.../libgomp.c-c++-common/omp_target_memset-3.c:46:7: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith]
The following two-liner fixes that, tested on x86_64-linux and i686-linux.
2025-06-03 Jakub Jelinek <jakub@redhat.com>
PR libgomp/120444
* testsuite/libgomp.c-c++-common/omp_target_memset-3.c (test_it):
Change ptr argument type from void * to int8_t *.
(main): Change ptr variable type from void * to int8_t * and cast
omp_target_alloc result to the latter type.
(cherry picked from commit a8c03f056f4070a618bc59afcae2290cf21456ea)
|
|
|
|
libstdc++-v3/include/std/ostream contains:
namespace std _GLIBCXX_VISIBILITY(default)
{
...
template<typename _CharT, typename _Traits>
inline basic_ostream<_CharT, _Traits>&
endl(basic_ostream<_CharT, _Traits>& __os)
{ return flush(__os.put(__os.widen('\n'))); }
...
#include <bits/ostream.tcc>
and the latter, libstdc++-v3/include/bits/ostream.tcc, has:
// Inhibit implicit instantiations for required instantiations,
// which are defined via explicit instantiations elsewhere.
#if _GLIBCXX_EXTERN_TEMPLATE
extern template class basic_ostream<char>;
extern template ostream& endl(ostream&);
Before this commit, omp_discover_declare_target_tgt_fn_r marked 'endl'
as (implicitly) declare target - but not the calls in it due to the
'extern' (DECL_EXTERNAL).
Thanks to inlining and as 'endl' is (therefore) not used and, hence,
discarded by the linker; hencet, it works with -O0 and -O1. However,
as the (unused) function still exits, IPA CP (enabled by -O2) will try
to do constant-value propagation and fails as the definition of 'widen'
is not available.
Solution is to still walk 'endl' despite being an 'extern(al)' decl;
this has been restricted for now to DECL_DECLARED_INLINE_P.
gcc/ChangeLog:
* omp-offload.cc (omp_discover_declare_target_tgt_fn_r): Also
walk external functions that are declare inline (and have a
DECL_SAVED_TREE).
libgomp/ChangeLog:
* testsuite/libgomp.c++/declare_target-2.C: New test.
(cherry picked from commit ea43b99537591b1103da3961c61f1cbfae968859)
|
|
Merge up to r15-9840-g9803e23a212962 (June 17, 2025)
|
|
|
|
|
|
|
|
The problem with PR120423 and PR116389 is that reload might assign an invalid
hard register to a paradoxical subreg. For example with the test case from
the PR, it assigns (REG:QI 31) to the inner of (subreg:HI (QI) 0) which is
valid, but the subreg will be turned into (REG:HI 31) which is invalid
and triggers an ICE in postreload.
The problem only occurs with the old reload pass.
The patch maps the paradoxical subregs to a zero-extends which will be
allocated correctly. For the 120423 testcases, the code is the same like
with -mlra (which doesn't implement the fix), so the patch doesn't even
introduce a performance penalty.
The patch is only needed for v15: v14 is not affected, and in v16 reload
will be removed.
PR rtl-optimization/120423
PR rtl-optimization/116389
gcc/
* config/avr/avr.md [-mno-lra]: Add pre-reload split to transform
(left shift of) a paradoxical subreg to a (left shift of) zero-extend.
gcc/testsuite/
* gcc.target/avr/torture/pr120423-1.c: New test.
* gcc.target/avr/torture/pr120423-2.c: New test.
* gcc.target/avr/torture/pr120423-116389.c: New test.
|
|
|
|
PR middle-end/117811
PR testsuite/52641
gcc/testsuite/
* gcc.dg/torture/pr117811.c: Fix for int < 32 bit.
(cherry picked from commit 07f229c2d7ee6b604e5a86092e675d5d36c1ba4e)
|
|
This pass reuses a SSA_NAME on the lhs of sqrt etc. call as lhs
of .RSQRT etc. call. The following testcase is miscompiled since my recent
ranger cast changes, because we compute (correct) range for sqrtf argument
as well as result but then recip pass keeps using that range for the .RQSRT
call which returns 1. / sqrt, so the function then returns 0.5f
unconditionally.
Note, on foo this is a regression from GCC 15, but on bar it regressed
already with the r14-536 change.
2025-06-12 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/120638
* tree-ssa-math-opts.cc (pass_cse_reciprocals::execute): Call
reset_flow_sensitive_info on arg1.
* gcc.dg/pr120638.c: New test.
(cherry picked from commit 8804e5b5b127b27d099d0c361fa2161d0b13edef)
|
|
The function has 2 problems, one is _BitInt specific and the other is
most likely also reproduceable only with it.
The first issue is that I've missed updating the function for _BitInt,
maxbitlen as MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT
obviously isn't guaranteed to be larger than any integral type we might
want to convert at compile time from wide_int to REAL_VALUE_FORMAT.
Just using len instead of it works fine, at least when used after
HOST_BITS_PER_WIDE_INT is added to it and it is truncated to multiples
of HOST_BITS_PER_WIDE_INT.
The other bug is that if the value has too many significant bits (formerly
maxbitlen - cnt_l_z, now len - cnt_l_z), the code just shifts it right and
adds the shift count to the future exponent. That isn't correct for
rounding as the testcase attempts to show, the internal real format has more
bits than any precision in supported format, but we still need to
distinguish bewtween values exactly half way between representable floating
point values (those should be rounded to even) and the case when we've
shifted away some non-zero bits, so the value was tiny bit larger than half
way and then we should round up.
The patch uses something like e.g. soft-fp uses in these cases, right shift
with sticky bit in the least significant bit.
2025-06-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/120547
* real.cc (real_from_integer): Remove maxbitlen variable, use
len instead of that. When shifting right, or in 1 if any of the
shifted away bits are non-zero. Formatting fix.
* gcc.dg/bitint-123.c: New test.
(cherry picked from commit ea9ea72e448e391d4be781b74956a0190f93afc8)
|
|
On s390x-linux I've run into the gcc.dg/torture/bitint-27.c test ICEing in
build_nonstandard_integer_type called from convert_affine_scev (not sure
why it doesn't trigger on x86_64/aarch64).
The problem is clear, when ct is a BITINT_TYPE with some large
TYPE_PRECISION, build_nonstandard_integer_type won't really work on it.
The patch fixes it similarly what has been done for GCC 14 in various
other spots.
2025-05-20 Jakub Jelinek <jakub@redhat.com>
* tree-chrec.cc (convert_affine_scev): Use signed_type_for instead of
build_nonstandard_integer_type.
(cherry picked from commit e38027c8ff449ffadaca449004bb891b9094ad00)
|
|
Compute a substring ref on an allocatable static character array
using pointer arithmetic. Using an array type corrupts type
layouting and crashes omp generation.
PR fortran/120483
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_conv_substring): Use pointer arithmetic on
static allocatable char arrays.
gcc/testsuite/ChangeLog:
* gfortran.dg/save_8.f90: New test.
(cherry picked from commit 5c9bdfd2748b8159856a37404ab7b34d977242ce)
|
|
|
|
* es.po: Update.
|
|
Using an incomplete type as the template argument for std::formatter
specializations causes problems for program-defined specializations of
std::formatter which have constraints. When the compiler has to find
which specialization of std::formatter to use for the incomplete type it
considers the program-defined specializations and checks to see if their
constraints are satisfied, which can give errors if the constraints
cannot be checked for incomplete types.
This replaces the base class of the disabled specializations with a
concrete class __formatter_disabled, so there is no need to match a
specialization and no more incomplete type.
libstdc++-v3/ChangeLog:
PR libstdc++/120625
* include/std/format (__format::__disabled): Remove.
(__formatter_disabled): New type.
(formatter<char*, wchar_t>, formatter<const char*, wchar_t>)
(formatter<char[N], wchar_t>, formatter<string, wchar_t>)
(formatter<string_view, wchar_t>): Use __formatter_disabled as
base class instead of formatter<__disabled, wchar_t>.
* testsuite/std/format/formatter/120625.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 76bf78d32c683af3bf88f4aef595048edbd82372)
|
|
In GCC 15 we allowed jump-function generation code to skip over a
type-cast converting one integer to another as long as the latter can
hold all the values of the former or has at least the same precision.
This works well for IPA-CP where we do then evaluate each jump
function as we propagate values and value-ranges. However, the
test-case in PR 120295 shows a problem with inlining, where we combine
pass-through jump-functions so that they are always relative to the
function which is the root of the inline tree. Unfortunately, we are
happy to combine also those with type-casts to a different signedness
which makes us use sign zero extension for the expected value ranges
where we should have used sign extension. When the value-range which
then leads to wrong insertion of a call to builtin_unreachable is
being computed, the information about an existence of a intermediary
signed type has already been lost during previous inlining.
This patch simply blocks combining such jump-functions so that it is
back-portable to GCC 15. Once we switch pass-through jump functions
to use a vector of operations rather than having room for just one, we
will be able to address this situation with adding an extra conversion
instead.
gcc/ChangeLog:
2025-05-19 Martin Jambor <mjambor@suse.cz>
PR ipa/120295
* ipa-prop.cc (update_jump_functions_after_inlining): Do not
combine pass-through jump functions with type-casts changing
signedness.
gcc/testsuite/ChangeLog:
2025-05-19 Martin Jambor <mjambor@suse.cz>
PR ipa/120295
* gcc.dg/ipa/pr120295.c: New test.
(cherry picked from commit 0b004c92f5ea239936a403a2a757e12ca82ce6d8)
|
|
The current documentation does not reflect the implementation present in
the compiler and contains various other inaccuracies.
gcc/ada/ChangeLog:
* doc/gnat_rm/gnat_language_extensions.rst
(Generalized Finalization): Document the actual implementation.
(No_Raise): Move to separate section.
* gnat_rm.texi: Regenerate.
|
|
This patch fixes an issue where the compiler was incorrectly allowing
references to discriminants of the ancestor type in private type
extensions.
gcc/ada/ChangeLog:
* sem_ch3.adb (Build_Derived_Private_Type): Fix test.
(Build_Derived_Record_Type): Adjust error recovery paths.
|
|
Exp_Util.Insert_Actions handles scopes of synchronized types specially,
but the condition it tested before this patch was not quite correct in
some cases, for example during some expansion operations made under
Expand_N_Task_Type_Declaration. This patch refines the test.
gcc/ada/ChangeLog:
* exp_util.adb (Insert_Actions): Refine test.
|
|
gcc/ada/ChangeLog:
* doc/gnat_ugn/building_executable_programs_with_gnat.rst (Compiler
switches) <-O>: Fix long line.
* gnat_ugn.texi: Regenerate.
|
|
In particular the most recently added ones, namely -Og and -Oz. But -Ofast
is not documented because it disregards strict compliance with standards.
gcc/ada/ChangeLog:
* usage.adb (Usage): Justify the documentation of common switches
like that of other switches. Rework that of the -O switch.
* doc/gnat_ugn/building_executable_programs_with_gnat.rst (Compiler
switches) <-O>: Rework and document 'z' and 'g' operands.
* doc/gnat_ugn/gnat_and_program_execution.rst (Optimization Levels):
Rework and document -Oz and -Og switches.
* gnat_ugn.texi: Regenerate.
|
|
|
|
|
|
|
|
As gfx942 and gfx950 belong to gfx9-4-generic, the latter two are also added.
Note that there are no specific optimizations for MI300, yet.
For none of the mentioned devices, any multilib is build by default; use
'--with-multilib-list=' when configuring GCC to build them alongside.
gfx942 was added in LLVM (and its mc assembler, used by GCC) in version 18,
generic support in LLVM 19 and gfx950 in LLVM 20.
gcc/ChangeLog:
* config/gcn/gcn-devices.def: Add gfx942, gfx950 and gfx9-4-generic.
* config/gcn/gcn-opts.h (TARGET_CDNA3, TARGET_CDNA3_PLUS,
TARGET_GLC_NAME, TARGET_TARGET_SC_CACHE): Define.
(TARGET_ARCHITECTED_FLAT_SCRATCH): Use also for CDNA3.
* config/gcn/gcn.h (gcn_isa): Add ISA_CDNA3 to the enum.
* config/gcn/gcn.cc (print_operand): Update 'g' to use
TARGET_GLC_NAME; add 'G' to print TARGET_GLC_NAME unconditionally.
* config/gcn/gcn-valu.md (scatter, gather): Use TARGET_GLC_NAME.
* config/gcn/gcn.md: Use %G<num> instead of glc; use 'buffer_inv sc1'
for TARGET_TARGET_SC_CACHE.
* doc/invoke.texi (march): Add gfx942, gfx950 and gfx9-4-generic.
* doc/install.texi (amdgcn*-*-*): Add gfx942, gfx950 and gfx9-4-generic.
* config/gcn/gcn-tables.opt: Regenerate.
libgomp/ChangeLog:
* testsuite/libgomp.c/declare-variant-4.h (gfx942): New variant function.
* testsuite/libgomp.c/declare-variant-4-gfx942.c: New test.
(cherry picked from commit 37b454b7e171bd8a792cbe4c57ea0f9702afa22d)
|
|
PR libgomp/120444
include/ChangeLog:
* cuda/cuda.h (cuMemsetD8, cuMemsetD8Async): Declare.
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_memset): Declare.
* libgomp.h (struct gomp_device_descr): Add memset_func.
* libgomp.map (GOMP_6.0.1): Add omp_target_memset{,_async}.
* libgomp.texi (Device Memory Routines): Document them.
* omp.h.in (omp_target_memset, omp_target_memset_async): Declare.
* omp_lib.f90.in (omp_target_memset, omp_target_memset_async):
Add interfaces.
* omp_lib.h.in (omp_target_memset, omp_target_memset_async): Likewise.
* plugin/cuda-lib.def: Add cuMemsetD8.
* plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add
hsa_amd_memory_fill_fn.
(init_hsa_runtime_functions): DLSYM_OPT_FN load it.
(GOMP_OFFLOAD_memset): New.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_memset): New.
* target.c (omp_target_memset_int, omp_target_memset,
omp_target_memset_async_helper, omp_target_memset_async): New.
(gomp_load_plugin_for_device): Add DLSYM (memset).
* testsuite/libgomp.c-c++-common/omp_target_memset.c: New test.
* testsuite/libgomp.c-c++-common/omp_target_memset-2.c: New test.
* testsuite/libgomp.c-c++-common/omp_target_memset-3.c: New test.
* testsuite/libgomp.fortran/omp_target_memset.f90: New test.
* testsuite/libgomp.fortran/omp_target_memset-2.f90: New test.
(cherry picked from commit 4e47e2f833732c5d9a3c3e69dc753f99b3a56737)
|
|
Merge up to r15-9819-g5327eef7b003f6 (June 10, 2025)
|
|
For some 32-bit targets Glibc supports changing the size of time_t to be
64 bits by defining _TIME_BITS=64. That causes an ABI change which
would affect std::chrono::system_clock::to_time_t. Because to_time_t is
not a function template, its mangled name does not depend on the return
type, so it has the same mangled name whether it returns a 32-bit time_t
or a 64-bit time_t. On targets where the size of time_t can be selected
at preprocessing time, that can cause ODR violations, e.g. the linker
selects a definition of to_time_t that returns a 32-bit value but a
caller expects 64-bit and so reads 32 bits of garbage from the stack.
This commit adds always_inline to to_time_t so that all callers inline
the conversion to time_t, and will do so using whatever type time_t
happens to be in that translation unit.
Existing objects compiled before this change will either have inlined
the function anyway (which is likely if compiled with any optimization
enabled) or will contain a COMDAT definition of the inline function and
so still be able to find it at link-time.
The attribute is also added to system_clock::from_time_t, because that's
an equally simple function and it seems reasonable for them to both be
always inlined.
libstdc++-v3/ChangeLog:
PR libstdc++/99832
* include/bits/chrono.h (system_clock::to_time_t): Add
always_inline attribute to be agnostic to the underlying type of
time_t.
(system_clock::from_time_t): Add always_inline for consistency
with to_time_t.
* testsuite/20_util/system_clock/99832.cc: New test.
(cherry picked from commit d045eb13b0b42870a1f081895df3901112a358f0)
|
|
The leading sign character should be skipped when deciding whether to
insert thousands separators into a floating-point format.
libstdc++-v3/ChangeLog:
PR libstdc++/120548
* include/std/format (__formatter_fp::_M_localize): Do not
include a leading sign character in the string to be grouped.
* testsuite/std/format/functions/format.cc: Check grouping when
sign is present in the output.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
(cherry picked from commit 2c3559839d70df6311da18fd93237050405580c3)
|
|
r15-9859-ga6cfde60d8c added a call to dominated_by_p to tree-vectorizer.h
but dominance.h is not always included; and you get a build failure on riscv building
riscv-vector-costs.cc.
Let's add the include of dominance.h to tree-vectorizer.h
Pushed as obvious after builds for riscv and x86_64.
gcc/ChangeLog:
PR target/120042
* tree-vectorizer.h: Include dominance.h.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 299d48ff4a34c00a6ef964b694fb9b1312683049)
|
|
The compiler improperly flags an error on the use of a subtype with a
static predicate as a choice in a case expression alternative, complaining
that the subtype has a nonstatic predicate. The fix for this is to add
a test for the subtype not having a static predicate.
gcc/ada/ChangeLog:
* einfo.ads: Revise comment about Dynamic_Predicate flag to make it
more accurate.
* sem_case.adb (Check_Choices): Test "not Has_Static_Predicate_Aspect"
as additional guard for error about use of subtype with nonstatic
predicate as a case choice. Improve related error message.
|
|
Freeze_Static_Object needs to deal with the objects that have been created
by Insert_Conditional_Object_Declaration.
gcc/ada/ChangeLog:
* freeze.adb (Freeze_Static_Object): Do not issue any error message
for compiler-generated entities.
|
|
The previous fix was not robust enough in the presence of transient scopes.
gcc/ada/ChangeLog:
* exp_ch4.adb (Insert_Conditional_Object_Declaration): Deal with a
transient scope being created around the declaration.
* freeze.adb (Freeze_Entity): Do not call Freeze_Static_Object for
a renaming declaration.
|