Age | Commit message (Collapse) | Author | Files | Lines |
|
The following avoids exceeding the maximum object size on 32bit
platforms.
* gcc.dg/pr107554.c: Restrict to lp64.
(cherry picked from commit e7ebdf51ea514ad0b2272ecfa97d6ec72a527e40)
|
|
|
|
While looking at PR 105549, which is about fixing the ABI break
introduced in GCC 9.1 in parameter alignment with bit-fields, we
noticed that the GCC 9.1 warning is not emitted in all the cases where
it should be. This patch fixes that and the next patch in the series
fixes the GCC 9.1 break.
We split this into two patches since patch #2 introduces a new ABI
break starting with GCC 13.1. This way, patch #1 can be back-ported
to release branches if needed to fix the GCC 9.1 warning issue.
The main idea is to add a new global boolean that indicates whether
we're expanding the start of a function, so that aarch64_layout_arg
can emit warnings for callees as well as callers. This removes the
need for aarch64_function_arg_boundary to warn (with its incomplete
information). However, in the first patch there are still cases where
we emit warnings were we should not; this is fixed in patch #2 where
we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly.
The fix in aarch64_function_arg_boundary (replacing & with &&) looks
like an oversight of a previous commit in this area which changed
'abi_break' from a boolean to an integer.
We also take the opportunity to fix the comment above
aarch64_function_arg_alignment since the value of the abi_break
parameter was changed in a previous commit, no longer matching the
description.
2022-11-28 Christophe Lyon <christophe.lyon@arm.com>
Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix
comment.
(aarch64_layout_arg): Factorize warning conditions.
(aarch64_function_arg_boundary): Fix typo.
* function.cc (currently_expanding_function_start): New variable.
(expand_function_start): Handle
currently_expanding_function_start.
* function.h (currently_expanding_function_start): Declare.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New
test.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New
test.
* gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test.
* gcc.target/aarch64/bitfield-abi-warning.h: New test.
* g++.target/aarch64/bitfield-abi-warning-align16-O2.C: New test.
* g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: New
test.
* g++.target/aarch64/bitfield-abi-warning-align32-O2.C: New test.
* g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: New
test.
* g++.target/aarch64/bitfield-abi-warning-align8-O2.C: New test.
* g++.target/aarch64/bitfield-abi-warning.h: New test.
(cherry picked from commit 3df1a115be22caeab3ffe7afb12e71adb54ff132)
|
|
|
|
vect_update_ivs_after_vectorizer can end up emitting a signed
IV update when the loop body performed an unsigned computation.
The following makes sure to perform that update in the type
of the loop update type to avoid undefined behavior on overflow.
PR tree-optimization/108164
* tree-vect-loop-manip.cc (vect_update_ivs_after_vectorizer):
Perform vect_step_op_add update in the appropriate type.
* gcc.dg/pr108164.c: New testcase.
(cherry picked from commit ec459469f8a75d96a9b26694554efcc900d411de)
|
|
When doing if-conversion we simply throw away labels without checking
whether they are possibly targets of non-local gotos or have their
address taken. The following rectifies this and refuses to if-convert
such loops.
PR tree-optimization/108076
* tree-if-conv.cc (if_convertible_loop_p_1): Reject blocks
with non-local or forced labels that we later remove
labels from.
* gcc.dg/torture/pr108076.c: New testcase.
(cherry picked from commit b4fddbe9592e9feb37ce567d90af822b75995531)
|
|
The following avoids passing down error_mark_node to fold_convert.
PR middle-end/107994
* gimplify.cc (gimplify_expr): Catch errorneous comparison
operand.
(cherry picked from commit 845b514e8a150447ba041294586af76a6ac05158)
|
|
The following fixes a wrongly typed variable causing an ICE.
PR tree-optimization/107554
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes):
Use unsigned HOST_WIDE_INT type for the strlen.
* gcc.dg/pr107554.c: New testcase.
Co-Authored-By: Nikita Voronov <nik_1357@mail.ru>
(cherry picked from commit 81de4037454275f8ed6d858fbc129e832c6147ef)
|
|
The bug appeared afte r13-2010-g1270ccda70ca09 "Factor out
jobserver_active_p" slightly changed `putenv()` use from allocating
to non-allocating:
-xputenv (concat ("MAKEFLAGS=", dup, NULL));
+xputenv (jinfo.skipped_makeflags.c_str ());
`xputenv()` (and `putenv()`) don't copy strings and only store the
pointer in the `environ` global table. As a result `environ` got
corrupted as soon as `jinfo.skipped_makeflags` store got deallocated.
This started causing bootstrap crashes in `execv()` calls:
xgcc: fatal error: cannot execute '/build/build/./prev-gcc/collect2': execv: Bad address
The change restores memory allocation for `xputenv()` argument.
gcc/
PR driver/106624
* gcc.cc (driver::detect_jobserver): Allocate storage xputenv()
argument using xstrdup().
(cherry picked from commit 2b403297b111c990c331b5bbb6165b061ad2259b)
|
|
"minimal" mode': 'libgomp/ChangeLog.omp'
|
|
"minimal" mode'
libgomp/
* libgomp.texi (nvptx): Update for
'nvptx, libgfortran: Switch out of "minimal" mode'.
|
|
..., where it currently fails:
[...]/libgcc/config/nvptx/crt0.c:22:10: fatal error: stdlib.h: No such file or directory
22 | #include <stdlib.h>
| ^~~~~~~~~~
Fix-up for "nvptx: Support global constructors/destructors via 'collect2'".
libgcc/
* config/nvptx/crt0.c [!HAVE_STDLIB_H]: Don't '#include <stdlib.h>'.
(atexit): Prototype.
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/108434
* expr.cc (class_allocatable): Prevent NULL pointer dereference
or invalid read.
(class_pointer): Likewise.
gcc/testsuite/ChangeLog:
PR fortran/108434
* gfortran.dg/pr108434.f90: New test.
(cherry picked from commit 117848f425a3c0eda85517b4bdaf2ebe3bc705c2)
|
|
Merge up to r12-9058-ge1357577e6e39430869e294f94c2c547717b960f (23rd Jan 2023)
|
|
This avoids generating illegal (strict_low_part (reg ...)) RTXs. This
required two changes:
1. Do not use gen_lowpart to generate the inner expression of a
STRICT_LOW_PART. gen_lowpart might fold the SUBREG either because
there is already a paradoxical subreg or because it can directly be
applied to the register. A new wrapper function makes sure that we
always end up having an actual SUBREG.
2. Change the movstrict patterns to enforce a SUBREG as inner operand
of the STRICT_LOW_PARTs. The new predicate introduced for the
destination operand requires a SUBREG expression with a
register_operand as inner operand. However, since reload strips away
the majority of the SUBREGs we have to accept single registers as well
once we reach reload.
Bootstrapped and regression tested on IBM zSystems 64 bit.
gcc/ChangeLog:
PR target/106101
* config/s390/predicates.md (subreg_register_operand): New
predicate.
* config/s390/s390-protos.h (s390_gen_lowpart_subreg): New
function prototype.
* config/s390/s390.cc (s390_gen_lowpart_subreg): New function.
(s390_expand_insv): Use s390_gen_lowpart_subreg instead of
gen_lowpart.
* config/s390/s390.md ("*get_tp_64", "*zero_extendhisi2_31")
("*zero_extendqisi2_31", "*zero_extendqihi2_31"): Likewise.
("movstrictqi", "movstricthi", "movstrictsi"): Use the
subreg_register_operand predicate instead of register_operand.
gcc/testsuite/ChangeLog:
PR target/106101
* gcc.c-torture/compile/pr106101.c: New test.
(cherry picked from commit 585a21bab3ec688c2039bff2922cc372d8558283)
|
|
|
|
|
|
PR fortran/106731
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_trans_auto_array_allocation): Remove gcc_assert (!TREE_STATIC()).
gcc/testsuite/ChangeLog:
* gfortran.dg/pr106731.f90: New test.
|
|
|
|
..., in order to enable (portions of) Fortran I/O, for example.
libgfortran/ChangeLog:
* configure: Regenerate.
* configure.ac: No longer set LIBGFOR_MINIMAL for nvptx.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-print-1.f90: Adjust.
* testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove.
* testsuite/libgomp.oacc-fortran/print-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Remove.
* testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust.
* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
Co-authored-by: Andrew Stubbs <ams@codesourcery.com>
|
|
Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.
The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.
libgcc/ChangeLog:
* config/nvptx/t-nvptx: Add unwind-nvptx.c.
* config/nvptx/unwind-nvptx.c: New file.
Co-authored-by: Andrew Stubbs <ams@codesourcery.com>
|
|
This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.
libgcc/
* config/nvptx/crtstuff.c ["mgomp"]
(__do_global_ctors__entry__mgomp)
(__do_global_dtors__entry__mgomp): New.
[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
(nvptx_close_device, GOMP_OFFLOAD_load_image)
(GOMP_OFFLOAD_unload_image): Call it.
|
|
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".
This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".
gcc/
* collect2.cc (write_c_file_glob): Allow for
'COLLECT2_MAIN_REFERENCE' override.
* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
* config/nvptx/nvptx.h: Adjust.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
'crtend.o' to 'extra_parts'.
* config/nvptx/crt0.c: Invoke '__do_global_ctors',
'__do_global_dtors'.
* config/nvptx/crtstuff.c: New.
* config/nvptx/t-nvptx: Adjust.
|
|
'__nvptx_uni'
As I have reported to Nvidia in 2022-12-01 'NVIDIA Incident Report (3891704):
ptxas: Duplicate declaration error: "cannot be resolved by a '.static'"',
'ptxas' has an inscrutable error mode for duplicate declarations:
ptxas softstack-decl-1.o, line 11; error : '.extern' variable '__nvptx_stacks' cannot be resolved by a '.static'
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
ptxas uniform-simt-decl-1.o, line 12; error : '.extern' variable '__nvptx_uni' cannot be resolved by a '.static'
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
This is inscrutable, because (a) what is "cannot be resolved by a '.static'"
supposed to tell me (there is no '.static' in PTX?), and (b) why arent't
repeated declaration just verified to match the first, but otherwise a no-op
(like in other programming languages)?
gcc/
* config/nvptx/nvptx.cc (nvptx_assemble_undefined_decl): Notice
'__nvptx_stacks', '__nvptx_uni' declarations.
(nvptx_file_end): Don't emit duplicate declarations for those.
gcc/testsuite/
* gcc.target/nvptx/softstack-decl-1.c: Make 'dg-do assemble',
adjust.
* gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
|
|
'gcc.target/nvptx/uniform-simt-decl-1.c'
... to document the status quo re implicit (via 'need_softstack_decl',
'need_unisimt_decl') and explicit declarations of '__nvptx_stacks',
'__nvptx_uni'.
gcc/testsuite/
* gcc.target/nvptx/softstack-decl-1.c: New.
* gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
|
|
For example, this allows for '-muniform-simt' code to be executed
single-threaded, which currently fails (device-side 'trap'), as the 0xffffffff
mask isn't correct if not all 32 threads of a warp are active. The same
issue/fix, I suppose but have not verified, would apply if we were to allow for
OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.
We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
which evidently appears to do the right thing. (I've tested '-muniform-simt'
code executing single-threaded.)
gcc/
* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
non-full-warp execution.
gcc/testsuite/
* gcc.target/nvptx/nvptx.exp
(check_effective_target_default_ptx_isa_version_at_least_6_0):
New.
* gcc.target/nvptx/uniform-simt-5.c: New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_exec): Assert what we know about
'blockDimX'.
|
|
'abort' [GCC PR85463]"
PR target/85463
libgfortran/
* runtime/minimal.c [__nvptx__] (exit): Don't override.
libgomp/
* config/nvptx/error.c (exit): Don't override.
* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.
|
|
Tobias pointed out that as of my recent
og12 commit e7d4bcb974915bfe95be6c385641fc66a4201581
"Fix 'libgomp.c/simd-math-1.c' configuration",
in GCC configurations without GCN offloading configured, we'd get:
xgcc: error: GCC is not configured to support 'amdgcn-amdhsa' as '-foffload=' argument
("Interestingly", GCC doesn't complain for '-foffload-options=-lm' if there are
no offload targets configured...)
libgomp/
* testsuite/libgomp.c/simd-math-1.c: Fix configuration, again.
|
|
'libgomp.oacc-c-c++-common/abort-3.c'
libgomp/
* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Force
'--param openacc-kernels=parloops'.
|
|
If nvptx offloading is configured in addition to GCN, we see:
FAIL: libgomp.c/simd-math-1.c (test for excess errors)
UNRESOLVED: libgomp.c/simd-math-1.c compilation failed to produce executable
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: error: unrecognized command-line option '-mstack-size=3000000'
Thus, restrict that ooption to GCN offloading compilation, and on the other
hand, there's no reason to skip this test for non-GCN offloading execution:
even if not SIMD-vectorized there, we still benefit from correctness testing.
libgomp/
* testsuite/libgomp.c/simd-math-1.c: Fix configuration.
|
|
|
|
Merge up to r12-9052-g61ef24af3ce8ec9c5eb65770f8047d98f42a93bf (19th Jan 2023)
|
|
expand_omp_for_init_counts was using for the case where collapse(2)
inner loop has init expression dependent on non-constant multiple of
the outer iterator and the condition upper bound expression doesn't
depend on the outer iterator fold_unary (NEGATE_EXPR, ...). This
will just return NULL if it can't be folded, we need fold_build1
instead.
2023-01-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108459
* omp-expand.cc (expand_omp_for_init_counts): Use fold_build1 rather
than fold_unary for NEGATE_EXPR.
* testsuite/libgomp.c/pr108459.c: New test.
(cherry picked from commit 46644ec99cb355845b23bb1d02775c057ed8ee88)
|
|
|
|
The commit r12-5877-g9e18a25331fa25 removed the incorrect
noexcept-specifier from std::condition_variable::wait and gave the new
symbol version @@GLIBCXX_3.4.30. It also redefined the original symbol
std::condition_variable::wait(unique_lock<mutex>&)@GLIBCXX_3.4.11 as an
alias for a new symbol, __gnu_cxx::__nothrow_wait_cv::wait, which still
has the incorrect noexcept guarantee. That __nothrow_wait_cv::wait is
just a wrapper around the real condition_variable::wait which adds
noexcept and so terminates on a __forced_unwind exception.
This doesn't work on uclibc, possibly due to a dynamic linker bug. When
__nothrow_wait_cv::wait calls the condition_variable::wait function it
binds to the alias symbol, which means it just calls itself recursively
until the stack overflows.
This change avoids the possibility of a recursive call by changing the
__nothrow_wait_cv::wait function so that instead of calling
condition_variable::wait it re-implements it. This requires accessing
the private _M_cond member of condition_variable, so we need to use the
trick of instantiating a template with the member-pointer of the private
member.
libstdc++-v3/ChangeLog:
PR libstdc++/105730
* src/c++11/compatibility-condvar.cc (__nothrow_wait_cv::wait):
Access private data member of base class and call its wait
member.
(cherry picked from commit ee4af2ed0b7322884ec4ff537564683c3749b813)
|
|
|
|
|
|
|
|
* ka.po: New.
|
|
When using a mutex and condition variable, the notifying thread needs to
increment _M_ver while holding the mutex lock, and the waiting thread
needs to re-check after locking the mutex. This avoids a missed
notification as described in the PR.
By moving the increment of _M_ver to the base _M_notify we can make the
use of the mutex local to the use of the condition variable, and
simplify the code a little. We can use a relaxed store because the mutex
already provides sequential consistency. Also we don't need to check
whether __addr == &_M_ver because we know that's always true for
platforms that use a condition variable, and so we also know that we
always need to use notify_all() not notify_one().
Reviewed-by: Thomas Rodgers <trodgers@redhat.com>
libstdc++-v3/ChangeLog:
PR libstdc++/106183
* include/bits/atomic_wait.h (__waiter_pool_base::_M_notify):
Move increment of _M_ver here.
[!_GLIBCXX_HAVE_PLATFORM_WAIT]: Lock mutex around increment.
Use relaxed memory order and always notify all waiters.
(__waiter_base::_M_do_wait) [!_GLIBCXX_HAVE_PLATFORM_WAIT]:
Check value again after locking mutex.
(__waiter_base::_M_notify): Remove increment of _M_ver.
(cherry picked from commit af98cb88eb4be6a1668ddf966e975149bf8610b1)
|
|
gcc/fortran/ChangeLog:
PR fortran/107706
* openmp.cc (gfc_resolve_omp_assumptions): Reject nonscalars.
gcc/testsuite/ChangeLog:
PR fortran/107706
* gfortran.dg/gomp/assume-2.f90: Update dg-error.
* gfortran.dg/gomp/assumes-2.f90: Likewise.
* gfortran.dg/gomp/assume-5.f90: New test.
(cherry picked from commit 2ce55247a8bf32985a96ed63a7a92d36746723dc)
|
|
Merge up to r12-9046-gd369eb486bdc720e4c50563226dbbb11a0226b5d (16th Jan 2023)
|
|
|
|
|
|
|
|
This should optimize cache-lines on the AMD GPUs somewhat.
libgomp/ChangeLog:
* usm-allocator.c (ALIGN): Use 128-byte alignment.
|
|
|
|
|
|
There were problems with critical driver data sharing pages with USM data, so
this new allocator implementation moves USM to entirely different pages.
libgomp/ChangeLog:
* plugin/plugin-gcn.c: Include sys/mman.h and unistd.h.
(usm_heap_create): New function.
(struct usm_splay_tree_key_s): Delete function.
(usm_splay_compare): Delete function.
(splay_tree_prefix): Delete define.
(GOMP_OFFLOAD_usm_alloc): Use new allocator.
(GOMP_OFFLOAD_usm_free): Likewise.
(GOMP_OFFLOAD_is_usm_ptr): Likewise.
(gomp_fatal): Delete macro.
(splay_tree_c): Delete.
* usm-allocator.c: New file.
|