Age | Commit message (Collapse) | Author | Files | Lines |
|
For an OpenACC compute construct, we've currently got:
- [...]
- acc_ev_enqueue_launch_start
- launch kernel
- free memory
- acc_ev_free
- acc_ev_enqueue_launch_end
This confused another thing that I'm working on, so I adjusted that to:
- [...]
- acc_ev_enqueue_launch_start
- launch kernel
- acc_ev_enqueue_launch_end
- free memory
- acc_ev_free
Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in
'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
libgomp/
* plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end'
position.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Verify 'acc_ev_alloc', 'acc_ev_free'.
|
|
|
|
When OMP_WAIT_POLICY is not specified, current implementation will cause
icv flag GOMP_ICV_WAIT_POLICY unset, so global variable wait_policy
will remain its uninitialized value. Initialize it to -1 to make
GOMP_SPINCOUNT behavior consistent with its description.
libgomp/ChangeLog:
PR libgomp/109062
* env.c (wait_policy): Initialize to -1.
(initialize_icvs): Initialize icvs->wait_policy to -1.
* testsuite/libgomp.c-c++-common/pr109062.c: New test.
|
|
|
|
libgomp/ChangeLog:
* libgomp.texi (Offload-Target Specifics): Mention GCN_STACK_SIZE.
|
|
|
|
Calls to vectorized versions of routines in the math library will now
be inserted when vectorizing code containing supported math functions.
2023-03-02 Kwok Cheung Yeung <kcy@codesourcery.com>
Paul-Antoine Arras <pa@codesourcery.com>
gcc/
* builtins.cc (mathfn_built_in_explicit): New.
* config/gcn/gcn.cc: Include case-cfn-macros.h.
(mathfn_built_in_explicit): Add prototype.
(gcn_vectorize_builtin_vectorized_function): New.
(gcn_libc_has_function): New.
(TARGET_LIBC_HAS_FUNCTION): Define.
(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define.
gcc/testsuite/
* gcc.target/gcn/simd-math-1.c: New testcase.
* gcc.target/gcn/simd-math-2.c: New testcase.
libgomp/
* testsuite/libgomp.c/simd-math-1.c: New testcase.
|
|
|
|
For is_device_ptr, optional checks should only be done before calling
libgomp, afterwards they are NULL either because of absent or, by
chance, because it is unallocated or unassociated (for pointers/allocatables).
Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'.
PR middle-end/108546
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of
type(C_ptr) variables.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Remove optional handling
on the receiver side, i.e. inside target (data), for
use_device_ptr.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/is_device_ptr-3.f90: New test.
* testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test.
|
|
|
|
'gcc/testsuite/lib/target-supports.exp:check_compile' and elsewhere
I noticed that GCC/Rust recently lost all LTO variants in torture testing:
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O0 (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O1 (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 (test for excess errors)
-PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
-PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O3 -g (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -Os (test for excess errors)
Etc.
The reason is that when probing for availability of LTO, we run into:
spawn [...]/build-gcc/gcc/testsuite/rust/../../gccrs -B[...]/build-gcc/gcc/testsuite/rust/../../ -fdiagnostics-plain-output -frust-incomplete-and-experimental-compiler-do-not-use -flto -c -o lto8274.o lto8274.c
cc1: warning: command-line option '-frust-incomplete-and-experimental-compiler-do-not-use' is valid for Rust but not for C
For GCC/Rust testing, this flag is (as of recently) defaulted in
'gcc/testsuite/lib/rust.exp:rust_init':
lappend ALWAYS_RUSTFLAGS "additional_flags=-frust-incomplete-and-experimental-compiler-do-not-use"
A few more "command-line option [...] is valid for [...] but not for [...]"
instances were found in the test suite logs, when more than one language is
involved.
With '-Wno-complain-wrong-lang' used in
'gcc/testsuite/lib/target-supports.exp:check_compile', we get back:
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O0 (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O1 (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 (test for excess errors)
+PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
+PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O3 -g (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -Os (test for excess errors)
Etc., and in total:
=== rust Summary for unix ===
# of expected passes [-4990-]{+6718+}
# of expected failures [-39-]{+51+}
Anything that 'gcc/opts-global.cc:complain_wrong_lang' might do is cut
short by '-Wno-complain-wrong-lang', not just the one 'warning'
diagnostic. This corresponds to what already exists via
'lang_hooks.complain_wrong_lang_p'.
The 'gcc/opts-common.cc:prune_options' changes follow the same rationale
as PR67640 "driver passes -fdiagnostics-color= always last": we need to
process '-Wno-complain-wrong-lang' early, so that it properly affects
other options appearing before it on the command line.
gcc/
* common.opt (-Wcomplain-wrong-lang): New.
* doc/invoke.texi (-Wno-complain-wrong-lang): Document it.
* opts-common.cc (prune_options): Handle it.
* opts-global.cc (complain_wrong_lang): Use it.
gcc/testsuite/
* gcc.dg/Wcomplain-wrong-lang-1.c: New.
* gcc.dg/Wcomplain-wrong-lang-2.c: Likewise.
* gcc.dg/Wcomplain-wrong-lang-3.c: Likewise.
* gcc.dg/Wcomplain-wrong-lang-4.c: Likewise.
* gcc.dg/Wcomplain-wrong-lang-5.c: Likewise.
* lib/target-supports.exp (check_compile): Use
'-Wno-complain-wrong-lang'.
* g++.dg/abi/empty12.C: Likewise.
* g++.dg/abi/empty13.C: Likewise.
* g++.dg/abi/empty14.C: Likewise.
* g++.dg/abi/empty15.C: Likewise.
* g++.dg/abi/empty16.C: Likewise.
* g++.dg/abi/empty17.C: Likewise.
* g++.dg/abi/empty18.C: Likewise.
* g++.dg/abi/empty19.C: Likewise.
* g++.dg/abi/empty22.C: Likewise.
* g++.dg/abi/empty25.C: Likewise.
* g++.dg/abi/empty26.C: Likewise.
* gfortran.dg/bind-c-contiguous-1.f90: Likewise.
* gfortran.dg/bind-c-contiguous-4.f90: Likewise.
* gfortran.dg/bind-c-contiguous-5.f90: Likewise.
libgomp/
* testsuite/libgomp.fortran/alloc-10.f90: Use
'-Wno-complain-wrong-lang'.
* testsuite/libgomp.fortran/alloc-11.f90: Likewise.
* testsuite/libgomp.fortran/alloc-7.f90: Likewise.
* testsuite/libgomp.fortran/alloc-9.f90: Likewise.
* testsuite/libgomp.fortran/allocate-1.f90: Likewise.
* testsuite/libgomp.fortran/depend-4.f90: Likewise.
* testsuite/libgomp.fortran/depend-5.f90: Likewise.
* testsuite/libgomp.fortran/depend-6.f90: Likewise.
* testsuite/libgomp.fortran/depend-7.f90: Likewise.
* testsuite/libgomp.fortran/depend-inoutset-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90:
Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90:
Likewise.
* testsuite/libgomp.fortran/order-reproducible-1.f90: Likewise.
* testsuite/libgomp.fortran/order-reproducible-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.fortran/task-detach-6.f90: Remove left-over
'dg-prune-output'.
|
|
|
|
I decided to check for repeated the the in libgomp and noticed
there are several occurrences of a typo theads rather than threads
in libgomp.texi.
2023-02-16 Jakub Jelinek <jakub@redhat.com>
* libgomp.texi: Fix typos - theads -> threads.
|
|
I saw
FAIL: libgomp.fortran/target-nowait-array-section.f90 -O execution test
in my last x86_64-linux bootstrap. From quick skimming, it might be just
unreliable test, which assumes that asynchronous execution wouldn't produce
ordered sequence, but can't it happen even with asynchronous execution?
That said, while skimming the test, I've noticed a comment typo and
this patch fixes that up.
2023-02-16 Jakub Jelinek <jakub@redhat.com>
* testsuite/libgomp.fortran/target-nowait-array-section.f90: Fix
comment typo and improve its wording.
|
|
|
|
libgomp/
* target.c (gomp_target_rev): Dereference ptr
to get device address.
* testsuite/libgomp.fortran/reverse-offload-5.f90: Add test
for unallocated allocatable.
|
|
As GOMP_MAP_ALWAYS_POINTER operates on the previous map item, ensure that
with 'target enter data' both are passed together to gomp_map_vars_internal.
libgomp/ChangeLog:
* target.c (gomp_map_vars_internal): Add 'i > 0' before doing a
kind check.
(GOMP_target_enter_exit_data): If the next map item is
GOMP_MAP_ALWAYS_POINTER map it together with the current item.
* testsuite/libgomp.fortran/target-enter-data-3.f90: New test.
|
|
|
|
This patch ensures that loop bounds depending on outer loop vars use the
proper TREE_VEC format. It additionally gives a sorry if such an outer
var has a non-one/non-minus-one increment as currently a count variable
is used in this case (see PR).
Finally, it avoids 'count' and just uses a local loop variable if the
step increment is +/-1.
PR fortran/107424
gcc/fortran/ChangeLog:
* trans-openmp.cc (struct dovar_init_d): Add 'sym' and
'non_unit_incr' members.
(gfc_nonrect_loop_expr): New.
(gfc_trans_omp_do): Call it; use normal loop bounds
for unit stride - and only create local loop var.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-2.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-3.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-4.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Update dg-note.
* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise.
|
|
|
|
Fix-up for recent commit 0b1ce70a813b98ef2893779d14ad6c90c5d06a71
"libgomp: Fix reverse offload issues".
libgomp/
* testsuite/libgomp.fortran/reverse-offload-6.f90: Fix nvptx
offloading compilation.
|
|
|
|
If there is nothing to map, skip the mapping and avoid attempting to
copy 0 bytes from addrs, sizes and kinds.
Additionally, it could happen that a non-allocated address was deallocated,
such as a pointer set, leading to a free for the actual data.
libgomp/
* target.c (gomp_target_rev): Handle mapnum == 0 and avoid
freeing not allocated memory.
* testsuite/libgomp.fortran/reverse-offload-6.f90: New test.
|
|
libgomp/ChangeLog:
* libgomp.texi (5.0 Impl. Status, gcn specifics): Update for
reverse offload.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept
reverse-offload requirement.
|
|
|
|
Switch from using stacks in the "private segment" to using a memory block
allocated on the host side. The primary reason is to permit the reverse
offload implementation to access values located on the device stack, but
there may also be performance benefits, especially with repeated kernel
invocations.
This implementation unifies the stacks with the "team arena" optimization
feature, and now allows both to have run-time configurable sizes.
A new ABI is needed, so all libraries must be rebuilt, and newlib must be
version 4.3.0.20230120 or newer.
gcc/ChangeLog:
* config/gcn/gcn-run.cc: Include libgomp-gcn.h.
(struct kernargs): Replace the common content with kernargs_abi.
(struct heap): Delete.
(main): Read GCN_STACK_SIZE envvar.
Allocate space for the device stacks.
Write the new kernargs fields.
* config/gcn/gcn.cc (gcn_option_override): Remove stack_size_opt.
(default_requested_args): Remove PRIVATE_SEGMENT_BUFFER_ARG and
PRIVATE_SEGMENT_WAVE_OFFSET_ARG.
(gcn_addr_space_convert): Mask the QUEUE_PTR_ARG content.
(gcn_expand_prologue): Move the TARGET_PACKED_WORK_ITEMS to the top.
Set up the stacks from the values in the kernargs, not private.
(gcn_expand_builtin_1): Match the stack configuration in the prologue.
(gcn_hsa_declare_function_name): Turn off the private segment.
(gcn_conditional_register_usage): Ensure QUEUE_PTR is fixed.
* config/gcn/gcn.h (FIXED_REGISTERS): Fix the QUEUE_PTR register.
* config/gcn/gcn.opt (mstack-size): Change the description.
include/ChangeLog:
* gomp-constants.h (GOMP_VERSION_GCN): Bump.
libgomp/ChangeLog:
* config/gcn/libgomp-gcn.h (DEFAULT_GCN_STACK_SIZE): New define.
(DEFAULT_TEAM_ARENA_SIZE): New define.
(struct heap): Move to this file.
(struct kernargs_abi): Likewise.
* config/gcn/team.c (gomp_gcn_enter_kernel): Use team arena size from
the kernargs.
* libgomp.h: Include libgomp-gcn.h.
(TEAM_ARENA_SIZE): Remove.
(team_malloc): Update the error message.
* plugin/plugin-gcn.c (struct kernargs): Move common content to
struct kernargs_abi.
(struct agent_info): Rename team arenas to ephemeral memories.
(struct team_arena_list): Rename ....
(struct ephemeral_memories_list): to this.
(struct heap): Delete.
(team_arena_size): New variable.
(stack_size): New variable.
(print_kernel_dispatch): Update debug messages.
(init_environment_variables): Read GCN_TEAM_ARENA_SIZE.
Read GCN_STACK_SIZE.
(get_team_arena): Rename ...
(configure_ephemeral_memories): ... to this, and set up stacks.
(release_team_arena): Rename ...
(release_ephemeral_memories): ... to this.
(destroy_team_arenas): Rename ...
(destroy_ephemeral_memories): ... to this.
(create_kernel_dispatch): Add num_threads parameter.
Adjust for kernargs_abi refactor and ephemeral memories.
(release_kernel_dispatch): Adjust for ephemeral memories.
(run_kernel): Pass thread-count to create_kernel_dispatch.
(GOMP_OFFLOAD_init_device): Adjust for ephemeral memories.
(GOMP_OFFLOAD_fini_device): Adjust for ephemeral memories.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr47237.c: Xfail on amdgcn.
* gcc.dg/builtin-apply3.c: Xfail for amdgcn.
* gcc.dg/builtin-apply4.c: Xfail for amdgcn.
* gcc.dg/torture/stackalign/builtin-apply-3.c: Xfail for amdgcn.
* gcc.dg/torture/stackalign/builtin-apply-4.c: Xfail for amdgcn.
|
|
Fix the 'strict' modifier status: it is already listed (as 'Y') for OpenMP
5.1 for num_task and grainsize; only strict on num_threads is new with TR11.
libgomp/
* libgomp.texi (OpenMP TR11): Fix item for 'strict' modifier.
|
|
|
|
gcc/fortran/ChangeLog:
* openmp.cc (resolve_omp_clauses): Check also for
power of two.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-3.f90: Fix ALIGN
usage, remove unused -fdump-tree-original.
* testsuite/libgomp.fortran/allocate-4.f90: New.
|
|
libgomp/
* libgomp.texi (5.0 Impl. Status): Update 'requires' and 'ancestor'.
(GCN): Add item about 'omp requires'.
(nvptx): Likewise; add item about reverse offload.
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/108558
* trans-openmp.cc (gfc_split_omp_clauses): Handle has_device_addr.
libgomp/ChangeLog:
PR fortran/108558
* testsuite/libgomp.fortran/has_device_addr.f90: New test.
|
|
|
|
libgomp/
* libgomp.texi (OpenMP 5.0): Set non-rectangular
loop nest back to 'P' as Fortran support is incomplete.
|
|
|
|
expand_omp_for_init_counts was using for the case where collapse(2)
inner loop has init expression dependent on non-constant multiple of
the outer iterator and the condition upper bound expression doesn't
depend on the outer iterator fold_unary (NEGATE_EXPR, ...). This
will just return NULL if it can't be folded, we need fold_build1
instead.
2023-01-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108459
* omp-expand.cc (expand_omp_for_init_counts): Use fold_build1 rather
than fold_unary for NEGATE_EXPR.
* testsuite/libgomp.c/pr108459.c: New test.
|
|
|
|
libbacktrace/ChangeLog:
* Makefile.in: Regenerate.
libgomp/ChangeLog:
* Makefile.in: Regenerate.
* configure: Regenerate.
libphobos/ChangeLog:
* Makefile.in: Regenerate.
* libdruntime/Makefile.in: Regenerate.
libstdc++-v3/ChangeLog:
* src/libbacktrace/Makefile.in: Regenerate.
|
|
|
|
|
|
Recently, mingw-w64 has got updated <msxml.h> from Wine which is included
indirectly by <windows.h> if `WIN32_LEAN_AND_MEAN` is not defined. The
`IXMLDOMDocument` class has a member function named `abort()`, which gets
affected by our `abort()` macro in "system.h".
`WIN32_LEAN_AND_MEAN` should, nevertheless, always be defined. This
can exclude 'APIs such as Cryptography, DDE, RPC, Shell, and Windows
Sockets' [1], and speed up compilation of these files a bit.
[1] https://learn.microsoft.com/en-us/windows/win32/winprog/using-the-windows-headers
gcc/
PR middle-end/108300
* config/xtensa/xtensa-dynconfig.c: Define `WIN32_LEAN_AND_MEAN`
before <windows.h>.
* diagnostic-color.cc: Likewise.
* plugin.cc: Likewise.
* prefix.cc: Likewise.
gcc/ada/
PR middle-end/108300
* adaint.c: Define `WIN32_LEAN_AND_MEAN` before `#include
<windows.h>`.
* cio.c: Likewise.
* ctrl_c.c: Likewise.
* expect.c: Likewise.
* gsocket.h: Likewise.
* mingw32.h: Likewise.
* mkdir.c: Likewise.
* rtfinal.c: Likewise.
* rtinit.c: Likewise.
* seh_init.c: Likewise.
* sysdep.c: Likewise.
* terminals.c: Likewise.
* tracebak.c: Likewise.
gcc/jit/
PR middle-end/108300
* jit-w32.h: Define `WIN32_LEAN_AND_MEAN` before <windows.h>.
libatomic/
PR middle-end/108300
* config/mingw/lock.c: Define `WIN32_LEAN_AND_MEAN` before
<windows.h>.
libffi/
PR middle-end/108300
* src/aarch64/ffi.c: Define `WIN32_LEAN_AND_MEAN` before
<windows.h>.
libgcc/
PR middle-end/108300
* config/i386/enable-execute-stack-mingw32.c: Define
`WIN32_LEAN_AND_MEAN` before <windows.h>.
* libgcc2.c: Likewise.
* unwind-generic.h: Likewise.
libgfortran/
PR middle-end/108300
* intrinsics/sleep.c: Define `WIN32_LEAN_AND_MEAN` before
<windows.h>.
libgomp/
PR middle-end/108300
* config/mingw32/proc.c: Define `WIN32_LEAN_AND_MEAN` before
<windows.h>.
libiberty/
PR middle-end/108300
* make-temp-file.c: Define `WIN32_LEAN_AND_MEAN` before <windows.h>.
* pex-win32.c: Likewise.
libssp/
PR middle-end/108300
* ssp.c: Define `WIN32_LEAN_AND_MEAN` before <windows.h>.
libstdc++-v3/
PR middle-end/108300
* src/c++11/system_error.cc: Define `WIN32_LEAN_AND_MEAN` before
<windows.h>.
* src/c++11/thread.cc: Likewise.
* src/c++17/fs_ops.cc: Likewise.
* src/filesystem/ops.cc: Likewise.
libvtv/
PR middle-end/108300
* vtv_malloc.cc: Define `WIN32_LEAN_AND_MEAN` before <windows.h>.
* vtv_rts.cc: Likewise.
* vtv_utils.cc: Likewise.
|
|
|
|
The comment in the loop says that we shouldn't add a map clause if such
a clause exists already, but the loop was actually using OMP_CLAUSE_DECL
on any clause. Target construct can have various clauses which don't
have OMP_CLAUSE_DECL at all (e.g. nowait, device or if) or clause
where it means something different (e.g. privatization clauses, allocate,
depend).
So, only check OMP_CLAUSE_DECL on OMP_CLAUSE_MAP clauses.
2023-01-05 Jakub Jelinek <jakub@redhat.com>
PR c++/108286
* semantics.cc (finish_omp_target_clauses): Ignore clauses other than
OMP_CLAUSE_MAP.
* testsuite/libgomp.c++/pr108286.C: New test.
|
|
|
|
Manual part of copyright year updates.
2023-01-02 Jakub Jelinek <jakub@redhat.com>
gcc/
* gcc.cc (process_command): Update copyright notice dates.
* gcov-dump.cc (print_version): Ditto.
* gcov.cc (print_version): Ditto.
* gcov-tool.cc (print_version): Ditto.
* gengtype.cc (create_file): Ditto.
* doc/cpp.texi: Bump @copying's copyright year.
* doc/cppinternals.texi: Ditto.
* doc/gcc.texi: Ditto.
* doc/gccint.texi: Ditto.
* doc/gcov.texi: Ditto.
* doc/install.texi: Ditto.
* doc/invoke.texi: Ditto.
gcc/ada/
* gnat_ugn.texi: Bump @copying's copyright year.
* gnat_rm.texi: Likewise.
gcc/d/
* gdc.texi: Bump @copyrights-d year.
gcc/fortran/
* gfortranspec.cc (lang_specific_driver): Update copyright notice
dates.
* gfc-internals.texi: Bump @copying's copyright year.
* gfortran.texi: Ditto.
* intrinsic.texi: Ditto.
* invoke.texi: Ditto.
gcc/go/
* gccgo.texi: Bump @copyrights-go year.
libgomp/
* libgomp.texi: Bump @copying's copyright year.
libitm/
* libitm.texi: Bump @copying's copyright year.
libquadmath/
* libquadmath.texi: Bump @copying's copyright year.
|
|
2022 -> 2023
|
|
|
|
Instead of trying to have the GPU do CPU-with-OS-like things, this new barriers
implementation for NVPTX uses simplistic bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0
It is noted that: there might be a little bit of performance forfeited for
cases where earlier arriving threads could've been used to process tasks ahead
of other threads, but that has the requirement of implementing complex
futex-wait/wake like behavior, which is what we're try to avoid with this patch.
It is deemed that task processing is not what GPU target offloading is usually
used for.
Implementation highlight notes:
1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in
the usual manner)
2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
The main synchronization is done using a 'bar.red' instruction. This reduces
across all threads the condition (team->task_count != 0), to enable the task
processing down below if any thread created a task.
(this bar.red usage means that this patch is dependent on the prior NVPTX
bar.red GCC patch)
PR target/99555
libgomp/ChangeLog:
* config/nvptx/bar.c (generation_to_barrier): Remove.
(futex_wait,futex_wake,do_spin,do_wait): Remove.
(GOMP_WAIT_H): Remove.
(#include "../linux/bar.c"): Remove.
(gomp_barrier_wait_end): New function.
(gomp_barrier_wait): Likewise.
(gomp_barrier_wait_last): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
(gomp_team_barrier_wait_cancel): Likewise.
(gomp_team_barrier_cancel): Likewise.
* config/nvptx/bar.h (gomp_barrier_t): Remove waiters, lock fields.
(gomp_barrier_init): Remove init of waiters, lock fields.
(gomp_team_barrier_wake): Remove prototype, add new static inline
function.
|
|
DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR
of this->field used just during gimplification and omp lowering/expansion
to privatize individual fields in methods when needed.
As the following testcase shows, when not in templates, they were handled
right, but in templates we actually called cp_finish_decl on them and
that can result in their destruction, which is obviously undesirable,
we should only destruct the privatized copies of them created in omp
lowering.
Fixed thusly.
2022-12-21 Jakub Jelinek <jakub@redhat.com>
PR c++/108180
* pt.cc (tsubst_expr): Don't call cp_finish_decl on
DECL_OMP_PRIVATIZED_MEMBER vars.
* testsuite/libgomp.c++/pr108180.C: New test.
|
|
|