Age | Commit message (Collapse) | Author | Files | Lines |
|
It's an orthogonal concern why these diagnostics do appear at all for
non-offloaded OpenACC constructs (where they're not relevant at all); PR90115.
Depending on how 'assert' is implemented, it may cause temporaries to be
created, and/or may lower into 'COND_EXPR's, and
'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'.
Fix-up for commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and
corresponding testsuite coverage [PR90115]".
PR testsuite/102841
libgomp/
* testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust.
|
|
making certain variables addressable [PR100280, PR104892]
Currently in OpenACC 'kernels' decomposition, there is special handling of
'GOMP_MAP_FORCE_TOFROM', documented to be done to avoid "internal compiler
errors in later passes". For performance reasons, the current repetitive
to/from device copying for every region is not ideal, compared to using
'present' clauses, as done for almost all other 'GOMP_MAP_*'. Also, the
current special handling (incomplete, evidently) is the reason for the PR104892
misbehavior. For PR100280 etc. we've resolved all such known ICEs -- removing
the special handling for 'GOMP_MAP_FORCE_TOFROM' now resolves PR104892.
PR middle-end/100280
PR middle-end/104892
gcc/
* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
Remove special handling of 'GOMP_MAP_FORCE_TOFROM'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
Likewise.
|
|
certain variables addressable [PR104892]
Document a few examples of the status quo.
PR middle-end/104892
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Point
to PR104892.
* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise,
enable '--param=openacc-kernels=decompose' and adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
Likewise.
|
|
[PR90115]
As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".
PR middle-end/90115
libgomp/
* testsuite/libgomp.oacc-c-c++-common/default-1.c: Enhance.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.
|
|
addressable [PR100280, PR104086]
... like in recent commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in synthesized
data clauses as addressable [PR100280]". Otherwise, we may run into
'gcc/omp-low.cc:lower_omp_target':
13125 else if (is_gimple_reg (var))
13126 {
13127 gcc_assert (offloaded);
PR middle-end/100280
PR middle-end/104086
gcc/
* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
Mark variables used in 'present' clauses as addressable.
* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Gracefully
handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust,
extend.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Merge this...
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
..., and this...
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into
this, and adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Extend.
|
|
1. Thomas reported in
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589039.html
that this testcase is randomly failing. The problem was fixed pool
size which was exhausted when there were a lot of threads. Fixed it
by removing pool_size trait which causes default pool size to be used
which should be big enough.
2. Array indices have been changed to check the last element in the
array.
3. Remove a redundant assignment and move some code to better match
C testcase.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-1.f90: Remove pool_size
trait. Test last index in w and v array. Remove redundant
assignment to V(1). Move alignment checks at the end of
parallel region.
|
|
I ran into a hang for this code:
...
#pragma omp target map(tofrom: counter_N0)
#pragma omp simd
for (int i = 0 ; i < 1 ; i++ )
{
#pragma omp atomic update
counter_N0 = counter_N0 + 1 ;
}
...
This has to do with the nature of -muniform-simt. It has two modes of
operation: inside and outside an SIMT region.
Outside an SIMT region, a warp pretends to execute a single thread, but
actually executes in all threads, to keep the local registers in all threads
consistent. This approach works unless the insn that is executed is a syscall
or an atomic insn. In that case, the insn is predicated, such that it
executes in only one thread. If the predicated insn writes a result to a
register, then that register is propagated to the other threads, after which
the local registers in all threads are consistent again.
Inside an SIMT region, a warp executes in all threads. However, the
predication and propagation for syscalls and atomic insns is also present
here, because nvptx_reorg_uniform_simt works on all code. Care has been taken
though to ensure that the predication and propagation is a nop. That is,
inside an SIMT region:
- the predicate evalutes to true for each thread, and
- the propagation insn copies a register from each thread to the same thread.
That works fine, until we use -mptx=6.0, and instead of using the deprecated
warp propagation insn shfl, we start using shfl.sync:
...
@%r33 atom.add.u32 _, [%r29], 1;
shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff;
...
The shfl.sync specifies a member mask indicating all threads, but given that
the loop only has a single iteration, only thread 0 will execute the insn,
where it will hang waiting for the other threads.
Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
uniform warp check) such that it only executes outside the SIMT region.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-03-08 Tom de Vries <tdevries@suse.de>
PR target/104783
* config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
(nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate.
(nvptx_get_unisimt_outside_simt_predicate): New function.
(predicate_insn): New function, factored out of ...
(nvptx_reorg_uniform_simt): ... here. Predicate all emitted insns.
* config/nvptx/nvptx.h (struct machine_function): Add
unisimt_outside_simt_predicate field.
* config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
(define_insn "nvptx_uniform_warp_check"): Make predicable.
libgomp/ChangeLog:
2022-03-10 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/pr104783.c: New test.
|
|
PR102330, PR104774]
... so that it matches what we analyze and what we action on.
Fix-up for commit 29a2f51806c5b30e17a8d0e9ba7915a3c53c34ff "openacc:
Add support for gang local storage allocation in shared memory [PR90115]".
PR middle-end/90115
PR middle-end/102330
PR middle-end/104774
gcc/
* omp-low.cc (oacc_privatization_candidate_p)
(oacc_privatization_scan_clause_chain)
(oacc_privatization_scan_decl_chain, lower_oacc_private_marker):
Analyze 'lookup_decl'-translated DECL.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
* c-c++-common/goacc/privatization-1-compute-loop.c: Likewise.
* c-c++-common/goacc/privatization-1-compute.c: Likewise.
* c-c++-common/goacc/privatization-1-routine_gang-loop.c:
Likewise.
* c-c++-common/goacc/privatization-1-routine_gang.c: Likewise.
* gfortran.dg/goacc-gomp/pr102330-1.f90: Likewise, and subsume...
* gfortran.dg/goacc-gomp/pr102330-2.f90: ... this file, and...
* gfortran.dg/goacc-gomp/pr102330-3.f90: ... this file.
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Enhance.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Likewise.
* testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
|
|
[PR90115]
As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".
PR middle-end/90115
gcc/testsuite/
* c-c++-common/goacc/nesting-1.c: Enhance.
* gcc.dg/goacc/nested-function-1.c: Likewise.
* gcc.dg/goacc/nested-function-2.c: Likewise.
* gfortran.dg/goacc/nested-function-1.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-fortran/routine-1.f90: Enhance.
* testsuite/libgomp.oacc-fortran/routine-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/routine-3.f90: Likewise.
* testsuite/libgomp.oacc-fortran/routine-9.f90: Likewise.
|
|
Fix-up for recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]": adjust for a GCN offloading workaround
added just before commit: '(volatile void *) &f1;'.
PR testsuite/104791
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Fix
expected diagnostics.
|
|
'--param=openacc-kernels=decompose' [PR104784]
Before recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]", 'libgomp.oacc-c' testing already worked fine,
but 'libgomp.oacc-c++' testing ICEed. Via the commit mentioned, the C++
testing ICEs are now resolved, but the underlying issue remains to be looked
into: PR104784 "OpenACC 'kernels' decomposition: C vs. C++ differences".
PR middle-end/104784
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Test with '--param=openacc-kernels=decompose'.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
Likewise.
|
|
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Test '-fopt-info-omp-all'.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
Likewise.
|
|
[PR100280, PR104132, PR104133]
... by generalizing the existing 'gcc/omp-low.cc:task_shared_vars'.
Fix-up for commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".
PR middle-end/100280
PR middle-end/104132
PR middle-end/104133
gcc/
* omp-low.cc (task_shared_vars): Rename to
'make_addressable_vars'. Adjust all users.
(scan_sharing_clauses) <OMP_CLAUSE_MAP> Use it for
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs, too.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Extend.
|
|
lowering [PR100280]
... in preparation for later changes. No functional change.
Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".
PR middle-end/100280
gcc/
* tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New.
* tree-core.h: Document it.
* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Handle
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of
'TREE_ADDRESSABLE'.
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
|
|
declared in block made addressable" [PR100280]
Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".
PR middle-end/100280
gcc/
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
Add diagnostic: "note: OpenACC 'kernels' decomposition: variable
'[...]' declared in block made addressable".
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
'--param=openacc-privatization=noisy'.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
|
|
When running with target board unix/-foffload=-mptx=3.1, we run into:
...
lto1: error: PTX version (-mptx) needs to be at least 4.2 to support \
selected -misa (sm_53)^M
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned \
1 exit status^M
compilation terminated.^M
...
FAIL: libgomp.c/declare-variant-3-sm53.c (test for excess errors)
...
Fix this by adding -foffload=-mptx=_ in the libgomp.c/declare-variant-3-sm*.c
test-cases.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/declare-variant-3-sm30.c: Add -foffload=-mptx=_.
* testsuite/libgomp.c/declare-variant-3-sm35.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm53.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm70.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm75.c: Same.
* testsuite/libgomp.c/declare-variant-3-sm80.c: Same.
|
|
Add openmp test-cases that test the omp declare variant construct:
...
#pragma omp declare variant (f30) match (device={isa("sm_30")})
...
using the available nvptx isas.
Only the one for sm_30 is a dg-do run test-case, the other ones are dg-do
link.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-02-24 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/declare-variant-3-sm30.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm35.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm53.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm70.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm75.c: New test.
* testsuite/libgomp.c/declare-variant-3-sm80.c: New test.
* testsuite/libgomp.c/declare-variant-3.h: New header file.
|
|
'libgomp.oacc-fortran/privatized-ref-2.f90'
This was a latent problem, and this commit here now resolves a regression that
after recent commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed
"amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a
GCN offloading '-march=gfx908' system:
{+WARNING: program timed out.+}
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test
Same for other optimization levels.
Make sure that we're not executing non-parallelized code in gang-redundant
mode, by putting these parts into their own 'parallel' constructs, which then
default to 'num_gangs(1)'.
libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Fix OpenACC
gang-redundant execution.
|
|
Consider the following omp fragment.
...
#pragma omp target
#pragma omp parallel num_threads (2)
#pragma omp task
;
...
This hangs at -O0 for nvptx.
Investigating the behaviour gives us the following trace of events:
- both threads execute GOMP_task, where they:
- deposit a task, and
- execute gomp_team_barrier_wake
- thread 1 executes gomp_team_barrier_wait_end and, not being the last thread,
proceeds to wait at the team barrier
- thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it
calls gomp_barrier_handle_tasks, where it:
- executes both tasks and marks the team barrier done
- executes a gomp_team_barrier_wake which wakes up thread 1
- thread 1 exits the team barrier
- thread 0 returns from gomp_barrier_handle_tasks and goes to wait at
the team barrier.
- thread 0 hangs.
To understand why there is a hang here, it's good to understand how things
are setup for nvptx. The libgomp/config/nvptx/bar.c implementation is
a copy of the libgomp/config/linux/bar.c implementation, with uses of both
futex_wake and do_wait replaced with uses of ptx insn bar.sync:
...
if (bar->total > 1)
asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
...
The point where thread 0 goes to wait at the team barrier, corresponds in
the linux implementation with a do_wait. In the linux case, the call to
do_wait doesn't hang, because it's waiting for bar->generation to become
a certain value, and if bar->generation already has that value, it just
proceeds, without any need for coordination with other threads.
In the nvtpx case, the bar.sync waits until thread 1 joins it in the same
logical barrier, which never happens: thread 1 is lingering in the
thread pool at the thread pool barrier (using a different logical barrier),
waiting to join a new team.
The easiest way to fix this is to revert to the posix implementation for
bar.{c,h}. That however falls back on a busy-waiting approach, and
does not take advantage of the ptx bar.sync insn.
Instead, we revert to the linux implementation for bar.c,
and implement bar.c local functions futex_wait and futex_wake using the
bar.sync insn.
The bar.sync insn takes an argument specifying how many threads are
participating, and that doesn't play well with the futex syntax where it's
not clear in advance how many threads will be woken up.
This is solved by waking up all waiting threads each time a futex_wait or
futex_wake happens, and possibly going back to sleep with an updated thread
count.
Tested libgomp on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2021-04-20 Tom de Vries <tdevries@suse.de>
PR target/99555
* config/nvptx/bar.c (generation_to_barrier): New function, copied
from config/rtems/bar.c.
(futex_wait, futex_wake): New function.
(do_spin, do_wait): New function, copied from config/linux/wait.h.
(gomp_barrier_wait_end, gomp_barrier_wait_last)
(gomp_team_barrier_wake, gomp_team_barrier_wait_end):
(gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove
and replace with include of config/linux/bar.c.
* config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock.
(gomp_barrier_init): Init new fields.
* testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific
workarounds.
* testsuite/libgomp.c/pr99555-1.c: Same.
* testsuite/libgomp.fortran/task-detach-6.f90: Same.
|
|
When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...
The problem is that we're expecting the following ptxas error:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
ptxas /tmp/ccZYDw8N.o, line 90; error : Call to 'baz' requires call prototype
ptxas /tmp/ccZYDw8N.o, line 90; error : Unknown symbol 'baz'
...
But it's not triggered because ptxas is not in the path, so nvptx-none-as
defaults to --no-verify.
So instead, we run into the same error at execution time.
Fix this by forcing verification using:
...
/* { dg-additional-options "-foffload=-Wa,--verify" \
{ target offload_target_nvptx } } */
...
such that we run into the xfail in this way instead:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory
nvptx-as: ptxas returned 255 exit status
...
Tested on x86_64-linux with nvptx accelerator.
libgomp/ChangeLog:
2022-02-21 Tom de Vries <tdevries@suse.de>
PR testsuite/104146
* testsuite/libgomp.c++/pr96390.C: Add additional-option
-foffload=-Wa,--verify for nvptx.
* testsuite/libgomp.c-c++-common/pr96390.c: Same.
|
|
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses, gfc_trans_omp_depobj):
Depend on the proper addr, for ptr/alloc depend on pointee.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/depend-4.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/depend-4.f90: New test.
* gfortran.dg/gomp/depend-5.f90: New test.
|
|
PR c++/102204
gcc/cp/ChangeLog:
* decl2.cc (cp_omp_mappable_type_1): Remove check for virtual
members as those are permitted since OpenMP 5.0.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-virtual-1.C: New test.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/unmappable-1.C: Remove previously expected dg-message.
|
|
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff):
has_device_addr(list)
"The has_device_addr clause indicates that its list items already have device
addresses and therefore they may be directly accessed from a target device.
If the device address of a list item is not for the device on which the target
region executes, accessing the list item inside the region results in
unspecified behavior. The list items may include array sections." (p. 200)
"A list item may not be specified in both an is_device_ptr clause and a
has_device_addr clause on the directive." (p. 202)
"A list item that appears in an is_device_ptr or a has_device_addr clause must
not be specified in any data-sharing attribute clause on the same target
construct." (p. 203)
gcc/c-family/ChangeLog:
* c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
* c-pragma.h (enum pragma_kind): Added 5.1 in comment.
(enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr'
clause.
(c_parser_omp_variable_list): Handle array sections.
(c_parser_omp_clause_has_device_addr): Added.
(c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
case.
(c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to
OMP_CLAUSE_MASK.
* c-typeck.cc (handle_omp_array_sections): Handle clause restrictions.
(c_finish_omp_clauses): Handle array sections.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
(cp_parser_omp_var_list_no_open): Handle array sections.
(cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR
case.
(cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
* semantics.cc (handle_omp_array_sections): Handle clause restrictions.
(finish_omp_clauses): Handle array sections.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR
case.
* gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR.
* openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR.
(gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause.
(resolve_omp_clauses): Same.
* trans-openmp.cc (gfc_trans_omp_variable_list): Added
OMP_LIST_HAS_DEVICE_ADDR case.
(gfc_trans_omp_clauses): Firstprivatize of array descriptors.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Added cases for
OMP_CLAUSE_HAS_DEVICE_ADDR
and handle array sections.
(gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case.
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR.
(lower_omp_target): Same.
* tree-core.h (enum omp_clause_code): Same.
* tree-nested.cc (convert_nonlocal_omp_clauses): Same.
(convert_local_omp_clauses): Same.
* tree-pretty-print.cc (dump_omp_clause): Same.
* tree.cc: Same.
libgomp/ChangeLog:
* libgomp.texi: Updated entry for HAS_DEVICE_ADDR.
* target.c (copy_firstprivate_data): Copy only if host address is not
NULL.
* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-4.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-5.C: New test.
* testsuite/libgomp.c++/target-has-device-addr-6.C: New test.
* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
* testsuite/libgomp.c/target-has-device-addr-3.c: New test.
* testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test.
* testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases.
* g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases.
* g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases.
* c-c++-common/gomp/target-has-device-addr-1.c: New test.
* c-c++-common/gomp/target-has-device-addr-2.c: New test.
* c-c++-common/gomp/target-is-device-ptr-1.c: New test.
* c-c++-common/gomp/target-is-device-ptr-2.c: New test.
* gfortran.dg/gomp/is_device_ptr-3.f90: New test.
* gfortran.dg/gomp/target-has-device-addr-1.f90: New test.
* gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
|
|
The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits. Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed. The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.
2022-02-08 Jakub Jelinek <jakub@redhat.com>
PR libgomp/104385
* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
clear task->parent.
* testsuite/libgomp.c/pr104385.c: New test.
|
|
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-1.f90: Remove spurious
STOP of previous commit.
|
|
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-1.c (is_64bit_aligned): Renamed
from is_64bit_aligned_.
* testsuite/libgomp.fortran/allocate-1.f90: Fix interface decl
and use it, more implicit none, remove unused argument.
|
|
The ptx insn atom doesn't support local memory. In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.
The message is somewhat confusing given that actually the operation is not
supported on local address space.
Fix this by falling back on a non-atomic version when detecting
a frame-related memory operand.
This only solves some cases that are detected at compile-time. It does
however fix the openacc private-atomic-* test-cases.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1")
(define_insn "atomic_exchange<mode>")
(define_insn "atomic_fetch_add<mode>")
(define_insn "atomic_fetch_addsf")
(define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version
if memory operands is frame-relative.
gcc/testsuite/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/stack-atomics-run.c: New test.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove
PR83812 workaround.
* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same.
* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
|
|
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
recompile the program with 'num_workers = x and vector_length = y' on \
that offloaded region or '-fopenacc-dim=:x:y' where x * y <= 896.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O0 execution test
...
The error does not occur when using GOMP_NVPTX_JIT=-O0.
Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia.
Likewise for some other test-cases.
Tested libgomp on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce
num_workers for nvidia accelerator to fix libgomp error 'insufficient
resources'.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
|
|
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx
accelerator (Nvidia T400, 2GB), I run into:
...
libgomp: cuCtxSynchronize error: unspecified launch failure \
(perhaps abort was called)
libgomp: cuMemFree_v2 error: unspecified launch failure
libgomp: device finalization failed
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 execution test
...
The test-case contains:
...
! Reduced from 25 to 23, otherwise execution runs out of thread stack on
! Nvidia Titan V.
if (fib (23) /= fib_wrapper (23)) stop 2
...
Fix this by reducing the fib/fib_wrapper argument from 23 to 22.
Same for declare_target-2.f90.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
recursion depth.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
|
|
Rather than rubber-stamp whatever requested vs. actual device kernel launch
configuration happens, actually (again) verify the requested values (modulo
expected variations).
This better highlights that "AMD GCN has an upper limit of 'num_workers(16)'",
and the deficiency that "AMD GCN uses the autovectorizer for the vector
dimension: the use of a function call in vector-partitioned code [...] is not
currently supported".
And, this removes several instances of race conditions, where variables are
concurrently written to in OpenACC gang-redundant mode.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Strengthen.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
|
|
gcc/ChangeLog:
* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Update
warning messages.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update scanning
patterns.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c:
Likewise.
* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
|
|
gcc/ChangeLog:
* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Wrap
keyword.
* config/nvptx/nvptx.md: Remove trailing dot.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update keyword
in dg-warning.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise.
* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
|
|
libgomp/
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Extend.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
|
|
libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
|
|
This patch adds support for OpenMP 5.0 allocate clause for fortran. It does not
yet support the allocator-modifier as specified in OpenMP 5.1. The allocate
clause is already supported in C/C++.
gcc/fortran/ChangeLog:
* dump-parse-tree.c (show_omp_clauses): Handle OMP_LIST_ALLOCATE.
* gfortran.h (OMP_LIST_ALLOCATE): New enum value.
* openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATE.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_ALLOCATE
(OMP_PARALLEL_CLAUSES, OMP_DO_CLAUSES, OMP_SECTIONS_CLAUSES)
(OMP_TASK_CLAUSES, OMP_TASKLOOP_CLAUSES, OMP_TARGET_CLAUSES)
(OMP_TEAMS_CLAUSES, OMP_DISTRIBUTE_CLAUSES)
(OMP_SINGLE_CLAUSES): Add OMP_CLAUSE_ALLOCATE.
(OMP_TASKGROUP_CLAUSES): New.
(gfc_match_omp_taskgroup): Use OMP_TASKGROUP_CLAUSES instead of
OMP_CLAUSE_TASK_REDUCTION.
(resolve_omp_clauses): Handle OMP_LIST_ALLOCATE.
(resolve_omp_do): Avoid warning when loop iteration variable is
in allocate clause.
* trans-openmp.c (gfc_trans_omp_clauses): Handle translation of
allocate clause.
(gfc_split_omp_clauses): Update for OMP_LIST_ALLOCATE.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-1.f90: New test.
* gfortran.dg/gomp/allocate-2.f90: New test.
* gfortran.dg/gomp/allocate-3.f90: New test.
* gfortran.dg/gomp/collapse1.f90: Update error message.
* gfortran.dg/gomp/openmp-simd-4.f90: Likewise.
* gfortran.dg/gomp/clauses-1.f90: Uncomment allocate clause.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-1.c: New test.
* testsuite/libgomp.fortran/allocate-1.f90: New test.
* libgomp.texi: Remove string that says that allocate clause
support is for C/C++ only.
|
|
After recent commit be661959a6b6d8f9c3c8608a746789e7b2ec3ca4
"libgomp/testsuite: Improve omp_get_device_num() tests", we're now iterating
over all OpenMP target devices. Intel MIC (emulated) offloading still doesn't
properly implement device-side 'omp_get_device_num', and we thus regress:
PASS: libgomp.c/../libgomp.c-c++-common/target-45.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/target-45.c execution test
PASS: libgomp.c++/../libgomp.c-c++-common/target-45.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/target-45.c execution test
PASS: libgomp.fortran/target10.f90 -O0 (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O0 execution test
PASS: libgomp.fortran/target10.f90 -O1 (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O1 execution test
PASS: libgomp.fortran/target10.f90 -O2 (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O2 execution test
PASS: libgomp.fortran/target10.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test
PASS: libgomp.fortran/target10.f90 -O3 -g (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -O3 -g execution test
PASS: libgomp.fortran/target10.f90 -Os (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90 -Os execution test
Improve the XFAILing added in commit bb75b22aba254e8ff144db27b1c8b4804bad73bb
"Allow matching Intel MIC in OpenMP 'declare variant'" for the case that *any*
Intel MIC offload device is available.
libgomp/
* testsuite/libgomp.c-c++-common/on_device_arch.h
(any_device_arch, any_device_arch_intel_mic): New.
* testsuite/lib/libgomp.exp
(check_effective_target_offload_device_any_intel_mic): New.
* testsuite/libgomp.c-c++-common/target-45.c: Use it.
* testsuite/libgomp.fortran/target10.f90: Likewise.
|
|
'libgomp.oacc-fortran/routine-10.f90' [PR102192]
libgomp/
PR tree-optimization/102192
* testsuite/libgomp.oacc-fortran/routine-10.f90: Document current
'-Wuninitialized' diagnostics.
|
|
OpenACC test cases
... including "note: '[...]' was declared here" emitted since recent
commit 9695e1c23be5b5c55d572ced152897313ddb96ae
"Improve -Wuninitialized note location".
For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es,
including one more instance of PR77504 etc., and several instances where
for "local variables" of reference-data-type reductions (etc.?) we emit
bogus (?) diagnostics.
For implicit data clauses (including 'firstprivate'), we seem to be missing
diagnostics, so I've placed XFAILed 'dg-warning's.
gcc/testsuite/
* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document
current '-Wuninitialized' diagnostics.
* c-c++-common/goacc/mdc-1.c: Likewise.
* c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise.
* c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise.
* c-c++-common/goacc/nested-reductions-1-routine.c: Likewise.
* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
* c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise.
* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
* c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise.
* c-c++-common/goacc/uninit-if-clause.c: Likewise.
* gfortran.dg/goacc/array-with-dt-1.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-2.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-1.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-3.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-4.f90: Likewise.
* gfortran.dg/goacc/derived-classtypes-1.f95: Likewise.
* gfortran.dg/goacc/derived-types-2.f90: Likewise.
* gfortran.dg/goacc/host_data-tree.f95: Likewise.
* gfortran.dg/goacc/kernels-tree.f95: Likewise.
* gfortran.dg/goacc/modules.f95: Likewise.
* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
* gfortran.dg/goacc/parallel-tree.f95: Likewise.
* gfortran.dg/goacc/pr93464.f90: Likewise.
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
* gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise.
* gfortran.dg/goacc/uninit-if-clause.f95: Likewise.
* gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise.
* gfortran.dg/goacc/wait.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document
current '-Wuninitialized' diagnostics.
* testsuite/libgomp.oacc-fortran/data-5.f90: Likewise.
* testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/gemm.f90: Likewise.
* testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
* testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise.
* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reference-reductions.f90:
Likewise.
|
|
gcc/
* tree-streamer-out.c (pack_ts_base_value_fields): Don't pack
'TYPE_ADDR_SPACE' for offloading.
* tree-streamer-in.c (unpack_ts_base_value_fields): Don't unpack
'TYPE_ADDR_SPACE' for offloading.
libgomp/
* testsuite/libgomp.c/address-space-1.c: Remove 'dg-xfail-run-if'
for 'offload_device_intel_mic'.
|
|
In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and
asynchronous data and compute regions, giving rise to data races when the
asynchronicity is actually executed, as is visible in at least on test case
with GCN offloading.
The proper fix is to correctly use the asynchronous interfaces, making the
currently synchronous data regions fully asynchronous (see also
<https://gcc.gnu.org/PR97390> "[OpenACC] 'async' clause on 'data' construct",
which is to share the same implementation), but that's for later; for now add
some more synchronization.
gcc/
* omp-oacc-kernels-decompose.cc (add_wait): New function, split out
of...
(add_async_clauses_and_wait): ...here. Call new outlined function.
(decompose_kernels_region_body): Add wait at the end of
explicitly-asynchronous kernels regions.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN
offloading execution XFAIL.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
clauses as addressable [PR100280]
... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:
13073 else if (is_gimple_reg (var))
13074 {
13075 gcc_assert (offloaded);
13076 tree avar = create_tmp_var (TREE_TYPE (var));
13077 mark_addressable (avar);
..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition). That's all not trivial
to make work, so let's just avoid this case.
gcc/
PR middle-end/100280
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
Mark variables used in synthesized data clauses as addressable.
gcc/testsuite/
PR middle-end/100280
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
'--param openacc-kernels=decompose'.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Update.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
'--param openacc-kernels=decompose'.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
libgomp/
PR middle-end/100280
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Update.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
Suggested-by: Julian Brown <julian@codesourcery.com>
|
|
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-1.c: Enhance.
* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Enhance.
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
|
|
Related to r12-6208-gebc853deb7cc0487de9ef6e891a007ba853d1933
"libgomp: Fix GOMP_DEVICE_NUM_VAR stringification during offload image load"
That commit fixed an issue with omp_get_device_num() on gcn/nvptx that
resulted in having always the value 0.
This commit modifies the tests to iterate over all devices such that on a
multi-nonhost-device system it had detected that always-zero issue.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/target-45.c: Iterate over all devices.
* testsuite/libgomp.fortran/target10.f90: Likewise.
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/103576
* openmp.c (is_scalar_intrinsic_expr): Fix condition.
(resolve_omp_atomic): Fix/update checks, accept compare.
* trans-openmp.c (gfc_trans_omp_atomic): Handle compare.
libgomp/ChangeLog:
* libgomp.texi (OpenMP 5.1): Set Fortran support for atomic to 'Y'.
* testsuite/libgomp.fortran/atomic-19.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/atomic-25.f90: Remove sorry, fix + add checks.
* gfortran.dg/gomp/atomic-26.f90: Likewise.
* gfortran.dg/gomp/atomic-21.f90: New test.
|
|
Some testcases for libgomp.c++ only works for non-shared address space offloading,
because it exercises the zero-length array section behavior for offloaded
address space, testing for NULL/non-NULL cases.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-lambda-1.C: Only run under
"target offload_device_nonshared_as"
* testsuite/libgomp.c++/target-this-3.C: Likewise.
* testsuite/libgomp.c++/target-this-4.C: Likewise.
|
|
front-end adjustments
This patch implements three pieces of functionality:
(1) Adjust array section mapping to have standards conforming behavior,
mapping array sections should *NOT* also map the base-pointer:
struct S { int *ptr; ... };
struct S s;
Instead of generating this during gimplify:
map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0])
Now, adjust to:
(i.e. do not map the base-pointer together. The attach operation is still
generated, and if s.ptr is already mapped prior, attachment will happen)
The correct way of achieving the base-pointer-also-mapped behavior would be to
use:
(A small Fortran front-end patch to trans-openmp.c:gfc_trans_omp_array_section
is also included, which removes generation of a GOMP_MAP_ALWAYS_POINTER for
array types, which appears incorrect and causes a regression in
libgomp.fortranlibgomp.fortran/struct-elem-map-1.f90)
(2) Related to the first item above, are fixes in libgomp/target.c to not
overwrite attached pointers when handling device<->host copies, mainly for the
"always" case.
(3) The third is a set of changes to the C/C++ front-ends to extend the allowed
component access syntax in map clauses. These changes are enabled for both
OpenACC and OpenMP.
gcc/c/ChangeLog:
* c-parser.c (struct omp_dim): New struct type for use inside
c_parser_omp_variable_list.
(c_parser_omp_variable_list): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(c_parser_omp_clause_to): Set 'allow_deref' to true in call to
c_parser_omp_var_list_parens.
(c_parser_omp_clause_from): Likewise.
* c-typeck.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(c_finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.
gcc/cp/ChangeLog:
* parser.c (struct omp_dim): New struct type for use inside
cp_parser_omp_var_list_no_open.
(cp_parser_omp_var_list_no_open): Allow multiple levels of array and
component accesses in array section base-pointer expression.
(cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to
cp_parser_omp_var_list for to/from clauses.
* semantics.c (handle_omp_array_sections_1): Extend allowed range
of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and
POINTER_PLUS_EXPR.
(handle_omp_array_sections): Adjust pointer map generation of
references.
(finish_omp_clauses): Extend allowed ranged of expressions
involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR.
gcc/fortran/ChangeLog:
* trans-openmp.c (gfc_trans_omp_array_section): Do not generate
GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type.
gcc/ChangeLog:
* gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter,
accomodate case where 'offset' return of get_inner_reference is
non-NULL.
(is_or_contains_p): Further robustify conditions.
(omp_target_reorder_clauses): In alloc/to/from sorting phase, also
move following GOMP_MAP_ALWAYS_POINTER maps along. Add new sorting
phase where we make sure pointers with an attach/detach map are ordered
correctly.
(gimplify_scan_omp_clauses): Add modifications to avoid creating
GOMP_MAP_STRUCT and associated alloc map for attach/detach maps.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase.
* c-c++-common/gomp/target-enter-data-1.c: New testcase.
* c-c++-common/gomp/target-implicit-map-2.c: New testcase.
libgomp/ChangeLog:
* target.c (gomp_map_vars_existing): Make sure attached pointer is
not overwritten during cross-host/device copying.
(gomp_update): Likewise.
(gomp_exit_data): Likewise.
* testsuite/libgomp.c++/target-11.C: Adjust testcase.
* testsuite/libgomp.c++/target-12.C: Likewise.
* testsuite/libgomp.c++/target-15.C: Likewise.
* testsuite/libgomp.c++/target-16.C: Likewise.
* testsuite/libgomp.c++/target-17.C: Likewise.
* testsuite/libgomp.c++/target-21.C: Likewise.
* testsuite/libgomp.c++/target-23.C: Likewise.
* testsuite/libgomp.c/target-23.c: Likewise.
* testsuite/libgomp.c/target-29.c: Likewise.
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.
|
|
This patch implements several C++ specific mapping capabilities introduced for
OpenMP 5.0, including implicit mapping of this[:1] for non-static member
functions, zero-length array section mapping of pointer-typed members,
lambda captured variable access in target regions, and use of lambda objects
inside target regions.
Several adjustments to the C/C++ front-ends to allow more member-access syntax
as valid is also included.
PR middle-end/92120
gcc/cp/ChangeLog:
* cp-tree.h (finish_omp_target): New declaration.
(finish_omp_target_clauses): Likewise.
* parser.c (cp_parser_omp_clause_map): Adjust call to
cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
(cp_parser_omp_target): Factor out code, adjust into calls to new
function finish_omp_target.
* pt.c (tsubst_expr): Add call to finish_omp_target_clauses for
OMP_TARGET case.
* semantics.c (handle_omp_array_sections_1): Add handling to create
'this->member' from 'member' FIELD_DECL. Remove case of rejecting
'this' when not in declare simd.
(handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
map clauses. Handle 'A->member' case in map clauses. Remove case of
rejecting 'this' when not in declare simd.
(struct omp_target_walk_data): New struct for walking over
target-directive tree body.
(finish_omp_target_clauses_r): New function for tree walk.
(finish_omp_target_clauses): New function.
(finish_omp_target): New function.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in
call to c_parser_omp_variable_list to 'true'.
* c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in
array base handling.
(c_finish_omp_clauses): Handle 'A->member' case in map clauses.
gcc/ChangeLog:
* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map<tree_operand, tree> *. Adjust struct map handling to handle
cases of *A and A->B expressions. Under !DECL_P case of
GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to
struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when
handling component_ref_p case. Add unshare_expr and gimplification
when created GOMP_MAP_STRUCT is not a DECL. Add code to add
firstprivate pointer for *pointer-to-struct case.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.
* omp-low.c (lower_omp_target):
Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
* tree-pretty-print.c (dump_omp_clause): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/target-3.c: New testcase.
* g++.dg/gomp/target-3.C: New testcase.
* g++.dg/gomp/target-lambda-1.C: New testcase.
* g++.dg/gomp/target-lambda-2.C: New testcase.
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.
* g++.dg/gomp/target-this-3.C: New testcase.
* g++.dg/gomp/target-this-4.C: New testcase.
* g++.dg/gomp/target-this-5.C: New testcase.
* g++.dg/gomp/this-2.C: Adjust testcase.
include/ChangeLog:
* gomp-constants.h (enum gomp_map_kind):
Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
(GOMP_MAP_POINTER_P):
Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.
libgomp/ChangeLog:
* libgomp.h (gomp_attach_pointer): Add bool parameter.
* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_existing): Update assert condition to
include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for mapping a pointer with NULL target.
(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for attaching a pointer with NULL target.
(gomp_map_vars_internal): Update calls to gomp_map_pointer and
gomp_attach_pointer, add handling for
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
* testsuite/libgomp.c++/target-23.C: New testcase.
* testsuite/libgomp.c++/target-lambda-1.C: New testcase.
* testsuite/libgomp.c++/target-lambda-2.C: New testcase.
* testsuite/libgomp.c++/target-this-1.C: New testcase.
* testsuite/libgomp.c++/target-this-2.C: New testcase.
* testsuite/libgomp.c++/target-this-3.C: New testcase.
* testsuite/libgomp.c++/target-this-4.C: New testcase.
* testsuite/libgomp.c++/target-this-5.C: New testcase.
|
|
Fix issue with the Fortran front-end when mapping arrays: when creating the
data MEM_REF for the map clause, there was a convention of casting the
referencing pointer to 'c_char *' by
fold_convert (build_pointer_type (char_type_node), ptr).
This causes the alignment passed to the libgomp runtime for array data
hardwared to '1', and causes alignment errors on the offload target.
This patch fixes this by removing the char_type_node pointer converts, and
adding gcc_asserts to ensure POINTER_TYPE_P (TREE_TYPE (ptr)).
PR fortran/90030
gcc/fortran/ChangeLog:
* trans-openmp.c (gfc_omp_finish_clause): Remove fold_convert to pointer
to char_type_node, add gcc_assert of POINTER_TYPE_P.
(gfc_trans_omp_array_section): Likewise.
(gfc_trans_omp_clauses): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/finalize-1.f: Adjust scan test.
* gfortran.dg/gomp/affinity-clause-1.f90: Likewise.
* gfortran.dg/gomp/affinity-clause-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
* gfortran.dg/gomp/map-3.f90: Likewise.
* gfortran.dg/gomp/pr78260-2.f90: Likewise.
* gfortran.dg/gomp/pr78260-3.f90: Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/pr90030.f90: New test.
* testsuite/libgomp.fortran/pr90030.f90: New test.
|
|
... fixing a regression introduced in the preceding
commit 2b7dac2c0dcb087da9e4018943c023c0678234a3
"Make OpenACC orphan gang reductions errors".
gcc/fortran/
* openmp.c (oacc_is_serial, oacc_is_parallel_or_serial): New.
(resolve_oacc_loop_blocks): Use oacc_is_parallel_or_serial instead of
oacc_is_parallel.
libgomp/
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Remove
temporary skip.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|