Age | Commit message (Collapse) | Author | Files | Lines |
|
__builtin_is_initial_device: Revert 'gimple_fold_builtin_acc_on_device' change
The motivation of the 'gimple_fold_builtin_acc_on_device' change in
commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device"
is unclear, and it unnecessarily diverges GCC's (default)
'--disable-offload-targets' vs. '--enable-offload-targets=[...]'
configurations.
PR testsuite/82250
gcc/
* gimple-fold.cc (gimple_fold_builtin_acc_on_device): Revert last
change.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Revert
last change.
|
|
C23 default
With commit 55e3bd376b2214e200fa76d12b67ff259b06c212 "c: Default to -std=gnu23"
we've got:
[-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 (test for excess errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 [-execution test-]{+compilation failed to produce executable+}
[Etc.]
..., due to:
[...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:13: error: two or more data types in declaration specifiers
[...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:1: warning: useless type name in empty declaration
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c
[!__cplusplus]: Don't 'typedef int bool;'.
|
|
__builtin_is_initial_device
It turned out that 'if (omp_is_initial_device() .eqv. true)' gave an ICE
due to comparing 'int' with 'logical(4)'. When digging deeper, it also
turned out that when the procedure pointer is needed, the builtin cannot
be used, either. (Follow up to r15-2799-gf1bfba3a9b3f31 )
Extend the code to also use the builtin acc_on_device with OpenACC,
which was previously only used in C/C++. Additionally, fix folding
when offloading is not enabled.
Fixes additionally the BT_BOOL data type, which was 'char'/integer(1)
instead of bool, backing the booleaness; use bool_type_node as the rest
of GCC.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_option_t): Add disable_acc_on_device.
* options.cc (gfc_handle_option): Handle -fno-builtin-acc_on_device.
* trans-decl.cc (gfc_get_extern_function_decl): Move
__builtin_omp_is_initial_device handling to ...
* trans-expr.cc (get_builtin_fn): ... this new function.
(conv_function_val): Call it.
(update_builtin_function): New.
(gfc_conv_procedure_call): Call it.
* types.def (BT_BOOL): Fix type by using bool_type_node.
gcc/ChangeLog:
* gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold
when offloading is not configured.
libgomp/ChangeLog:
* libgomp.texi (TR13): Fix minor typos.
(omp_is_initial_device): Improve wording.
(acc_on_device): Note how to disable the builtin.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove TODO.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
Add -fno-builtin-acc_on_device.
* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Update
dg- as !offloading_enabled now compile-time expands acc_on_device.
* testsuite/libgomp.fortran/target-is-initial-device-3.f90: New test.
* testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: New test.
|
|
counters
This patch adjusts the implementation of acc_map_data/acc_unmap_data API library
routines to more fit the description in the OpenACC 2.7 specification.
Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA
special value to mark acc_map_data-created mappings. Adjustment around
mapping related code to respect OpenACC semantics are also added.
libgomp/ChangeLog:
* libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2).
* oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
initialize dynamic_refcount as 1.
(acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA,
(goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case.
(goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect
REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest
dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA.
(goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case.
* target.c (gomp_increment_refcount): Return early for
REFCOUNT_ACC_MAP_DATA case.
(gomp_decrement_refcount): Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase.
* testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust
testcase error output scan test.
|
|
acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}
These routines map simply to the C counterpart and are meanwhile
defined in OpenACC 3.3. (There are additional routine changes,
including the Fortran addition of acc_attach/acc_detach, that
require more work than a simple addition of an interface and
are therefore excluded.)
libgomp/ChangeLog:
* libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3
routines that simply map to their C counterpart.
* openacc.f90 (openacc): Add them.
* openacc_lib.h: Likewise.
* testsuite/libgomp.oacc-fortran/acc_host_device_ptr.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-memcpy-2.f90: New test.
* testsuite/libgomp.oacc-c-c++-common/lib-59.c: Crossref to f90 test.
* testsuite/libgomp.oacc-c-c++-common/lib-60.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
|
|
libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: Add OpenACC
'acc_map_data' variant.
|
|
... on top of what was provided in recent
commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a
"OpenACC 2.7: Implement self clause for compute constructs".
gcc/testsuite/
* c-c++-common/goacc/if-clause-2.c: Enhance.
* c-c++-common/goacc/self-clause-1.c: Likewise.
* c-c++-common/goacc/self-clause-2.c: Likewise.
* gfortran.dg/goacc/if.f95: Likewise.
* gfortran.dg/goacc/kernels-tree.f95: Likewise.
* gfortran.dg/goacc/parallel-tree.f95: Likewise.
* gfortran.dg/goacc/self.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/if-1.c: Enhance.
* testsuite/libgomp.oacc-c-c++-common/self-1.c: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/if-self-1.c: New.
* testsuite/libgomp.oacc-fortran/self-1.f90: Likewise.
|
|
This patch implements the 'self' clause for compute constructs: parallel,
kernels, and serial. This clause conditionally uses the local device
(the host mult-core CPU) as the executing device of the compute region.
The actual implementation of the "local device" device type inside libgomp
(presumably using pthreads) is still not yet completed, so the libgomp
side is still implemented the exact same as host-fallback mode. (so as of now,
it essentially behaves like the 'if' clause with the condition inverted)
gcc/c/ChangeLog:
* c-parser.cc (c_parser_oacc_compute_clause_self): New function.
(c_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_oacc_compute_clause_self): New function.
(cp_parser_oacc_all_clauses): Add new 'bool compute_p = false'
parameter, add parsing of self clause when compute_p is true.
(OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSE_MASK): Likewise,
(OACC_SERIAL_CLAUSE_MASK): Likewise.
(cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to
set compute_p argument to true.
* pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case.
* semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged
with OMP_CLAUSE_IF case.
gcc/fortran/ChangeLog:
* gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF.
(gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF.
(OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF.
(OACC_KERNELS_CLAUSES): Likewise.
(OACC_SERIAL_CLAUSES): Likewise.
(resolve_omp_clauses): Add handling for omp_clauses->self_expr.
* trans-openmp.cc (gfc_trans_omp_clauses): Add handling of
clauses->self_expr and building of OMP_CLAUSE_SELF tree clause.
(gfc_split_omp_clauses): Add handling of self_expr field copy.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case.
(gimplify_adjust_omp_clauses): Likewise.
* omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code,
* omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum.
* tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF
case.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE_SELF_EXPR): New macro.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/self-clause-1.c: New test.
* c-c++-common/goacc/self-clause-2.c: New test.
* gfortran.dg/goacc/self.f95: New test.
include/ChangeLog:
* gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value.
libgomp/ChangeLog:
* oacc-parallel.c (GOACC_parallel_keyed): Add code to handle
GOACC_FLAG_LOCAL_DEVICE case.
* testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.
|
|
Reduce number of enum values passed to libgomp as
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} have the same semantic as
GOMP_MAP_FORCE_PRESENT (i.e. abort if not present, otherwise ignore);
that's different to GOMP_MAP_ALWAYS_PRESENT_{TO,TOFROM,FROM} which also
abort if not present but copy data when present. This is is a follow-up to
the commit r14-1579-g4ede915d5dde93 done 6 days ago.
Additionally, the commit improves a libgomp run-time and a C/C++ compile-time
error wording and extends testcases a tiny bit.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_map): Reword error message for
clearness especially with 'omp target (enter/exit) data.'
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_map): Reword error message for
clearness especially with 'omp target (enter/exit) data.'
* semantics.cc (handle_omp_array_sections): Handle
GOMP_MAP_{ALWAYS_,}PRESENT_{TO,TOFROM,FROM,ALLOC} enum values.
gcc/ChangeLog:
* gimplify.cc (gimplify_adjust_omp_clauses_1): Use
GOMP_MAP_FORCE_PRESENT for 'present alloc' implicit mapping.
(gimplify_adjust_omp_clauses): Change
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to the equivalent
GOMP_MAP_FORCE_PRESENT.
* omp-low.cc (lower_omp_target): Remove handling of no-longer valid
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC}; update map kinds used for
to/from clauses with present modifier.
include/ChangeLog:
* gomp-constants.h (enum gomp_map_kind): Change the enum values
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to be compiler only.
(GOMP_MAP_PRESENT_P): Update to include also GOMP_MAP_FORCE_PRESENT.
libgomp/ChangeLog:
* target.c (gomp_to_device_kind_p, gomp_map_vars_internal): Replace
GOMP_MAP_PRESENT_{FROM,TO,TOFROM,ACLLOC} by GOMP_MAP_FORCE_PRESENT.
(gomp_map_vars_internal, gomp_update): Likewise; unify and improve
error message.
* testsuite/libgomp.c-c++-common/target-present-2.c: Update for
changed error message.
* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
* testsuite/libgomp.fortran/target-present-2.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise.
* testsuite/libgomp.c-c++-common/target-present-1.c: Likewise and
extend testcase to check that data is copied when needed.
* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
* testsuite/libgomp.fortran/target-present-3.f90: Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/defaultmap-4.c: Update scan-tree-dump.
* c-c++-common/gomp/map-9.c: Likewise.
* gfortran.dg/gomp/defaultmap-8.f90: Likewise.
* gfortran.dg/gomp/map-11.f90: Likewise.
* gfortran.dg/gomp/target-update-1.f90: Likewise.
* gfortran.dg/gomp/map-12.f90: Likewise; also check original dump.
* c-c++-common/gomp/map-6.c: Update dg-error and also check
clause error with 'target (enter/exit) data'.
|
|
The patch that introduced the weak_undefined effective-target keyword
and corresponding dg-add-options support
commit 378ec7b87a5265dbe2d489c245fac98ef37fa638
Author: Alexandre Oliva <oliva@adacore.com>
Date: Thu Mar 23 00:45:05 2023 -0300
[testsuite] test for weak_undefined support and add options
badly broke the affected tests on macOS like so:
ERROR: gcc.dg/addr_equal-1.c: unknown dg option: 89 for " dg-add-options 5 weak_undefined "
ERROR: gcc.dg/addr_equal-1.c: unknown dg option: 89 for " dg-add-options 5 weak_undefined "
add_options_for_weak_undefined tries to call an non-existant proc "89".
Even after fixing this by escaping the brackets, two tests still failed to
link since they lacked the corresponding calls do dg-add-options
weak_undefined.
Tested on x86_64-apple-darwin20.6.0 and i386-pc-solaris2.11.
2023-03-27 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* lib/target-supports.exp (add_options_for_weak_undefined): Escape
brackets.
* gcc.dg/visibility-22.c: Add weak_undefined options.
libgomp:
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Add
weak_undefined options.
|
|
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec',
'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in
particular conceptually: no more device memory allocation, host to device data
copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for
us.
This depends on commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
where I said that "a use will emerge later", which is this one here.
PR libgomp/90596
libgomp/
* target.c (gomp_map_vars_internal): Allow for
'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'.
* oacc-parallel.c (GOACC_parallel_keyed): Pass
'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'.
* plugin/plugin-gcn.c (alloc_by_agent, gcn_exec)
(GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec):
Adjust, simplify.
(gomp_offload_free): Remove.
* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify.
(cuda_free_argmem): Remove.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Adjust.
|
|
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then
simplify the device plugins accordingly.
This is a follow-up to
Subversion r279551 (Git commit a6163563f2ce502bd4ef444bd5de33570bb8eeb1)
"Add OpenACC 2.6's no_create",
Subversion r279622 (Git commit 5bcd470bf0749e1f56d05dd43aa9584ff2e3a090)
"Use gomp_map_val for OpenACC host-to-device address translation".
libgomp/
* target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for
'GOMP_MAP_IF_PRESENT'.
* plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Adjust.
* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Likewise.
* testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async'
testing.
* testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
|
|
'libgomp.oacc-c-c++-common/data-3.c'
... that I almost broke with later implementation changes.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/data-3.c: Document/verify
another aspect of OpenACC 'async' semantics.
|
|
For an OpenACC compute construct, we've currently got:
- [...]
- acc_ev_enqueue_launch_start
- launch kernel
- free memory
- acc_ev_free
- acc_ev_enqueue_launch_end
This confused another thing that I'm working on, so I adjusted that to:
- [...]
- acc_ev_enqueue_launch_start
- launch kernel
- acc_ev_enqueue_launch_end
- free memory
- acc_ev_free
Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in
'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
libgomp/
* plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end'
position.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Verify 'acc_ev_alloc', 'acc_ev_free'.
|
|
[PR107195, PR107344]
That is, adjust for optimization introduced with recent
commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04
"[PR107195] Set range to zero when nonzero mask is 0", where GCC now
understands that after 'r *= 2;', 'r & 1' will never hold here, and thus
transforms/optimizes/"disturbs" the original code such that GCC/nvptx's later
"Neuter whole SESE regions" optimization no longer is applicable to it:
UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test
[-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}"
Same for C++.
It's unclear to me if this is an actual "problem", which optimization is "more
important", so I've filed PR107344 "GCC/nvptx SESE region optimization" to
capture this question, and here restore what we intend to be testing (to my
understanding) in 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'.
PR tree-optimization/107195
PR target/107344
libgomp/
* testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c: Restore SESE
regions checking.
|
|
After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5
"amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]",
"big" private data now works for GCN offloading, too.
PR target/105421
libgomp/
* testsuite/libgomp.oacc-c-c++-common/private-big-1.c: New.
|
|
This patch refactors struct sibling-list processing in gimplify.cc, and
adjusts some related mapping-clause processing in the Fortran FE and
omp-low.cc accordingly.
2022-09-13 Julian Brown <julian@codesourcery.com>
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Don't create
GOMP_MAP_TO_PSET mappings for class metadata, nor GOMP_MAP_POINTER
mappings for POINTER_TYPE_P decls.
gcc/
* gimplify.cc (gimplify_omp_var_data): Remove GOVD_MAP_HAS_ATTACHMENTS.
(GOMP_FIRSTPRIVATE_IMPLICIT): Renumber.
(insert_struct_comp_map): Refactor function into...
(build_omp_struct_comp_nodes): This new function. Remove list handling
and improve self-documentation.
(extract_base_bit_offset): Remove BASE_REF, OFFSETP parameters. Move
code to strip outer parts of address out of function, but strip no-op
conversions.
(omp_mapping_group): Add DELETED field for use during reindexing.
(omp_strip_components_and_deref, omp_strip_indirections): New functions.
(omp_group_last, omp_group_base): Add GOMP_MAP_STRUCT handling.
(omp_gather_mapping_groups): Initialise DELETED field for new groups.
(omp_index_mapping_groups): Notice DELETED groups when (re)indexing.
(omp_siblist_insert_node_after, omp_siblist_move_node_after,
omp_siblist_move_nodes_after, omp_siblist_move_concat_nodes_after): New
helper functions.
(omp_accumulate_sibling_list): New function to build up GOMP_MAP_STRUCT
node groups for sibling lists. Outlined from gimplify_scan_omp_clauses.
(omp_build_struct_sibling_lists): New function.
(gimplify_scan_omp_clauses): Remove struct_map_to_clause,
struct_seen_clause, struct_deref_set. Call
omp_build_struct_sibling_lists as pre-pass instead of handling sibling
lists in the function's main processing loop.
(gimplify_adjust_omp_clauses_1): Remove GOVD_MAP_HAS_ATTACHMENTS
handling, unused now.
* omp-low.cc (scan_sharing_clauses): Handle pointer-type indirect
struct references, and references to pointers to structs also.
gcc/testsuite/
* g++.dg/goacc/member-array-acc.C: New test.
* g++.dg/gomp/member-array-omp.C: New test.
* g++.dg/gomp/target-3.C: Update expected output.
* g++.dg/gomp/target-lambda-1.C: Likewise.
* g++.dg/gomp/target-this-2.C: Likewise.
* c-c++-common/goacc/deep-copy-arrayofstruct.c: Move test from here.
* c-c++-common/gomp/target-50.c: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test.
* testsuite/libgomp.oacc-c++/deep-copy-17.C: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c: Move
test to here, make "run" test.
|
|
'libgomp.oacc-c-c++-common/reduction-5.c' [PR101551]
Fix-up for recent commit 06b2a2abe26554c6f9365676683d67368cbba206
"Enhance '_Pragma' diagnostics verification in OMP C/C++ test cases".
Supposedly it's the same issue as in
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101551#c2>, where I'd
noted that:
| [...] with an offloading-enabled build of GCC we're losing
| "note: in expansion of macro '[...]'" diagnostics.
| (Effectively '-ftrack-macro-expansion=0'?)
PR middle-end/101551
libgomp/
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: XFAIL
'offloading_enabled' diagnostics issue.
|
|
Follow-up to recent commit 0587cef3d7962a8b0f44779589ba2920dd3d71e5
"c: Fix location for _Pragma tokens [PR97498]".
gcc/testsuite/
* c-c++-common/gomp/pragma-3.c: Enhance '_Pragma' diagnostics
verification.
* c-c++-common/gomp/pragma-5.c: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Enhance
'_Pragma' diagnostics verification.
|
|
The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location of some tokens and a
_Pragma directive will crucially determine whether a given diagnostic is enabled
or suppressed in the desired way. PR97498 shows how the C frontend ends up with
input_location pointing to the beginning of the line containing a _Pragma()
directive, resulting in the wrong behavior if the diagnostic to be modified
pertains to some tokens found earlier on the same line. This patch fixes that by
addressing two issues:
a) libcpp was not assigning a valid location to the CPP_PRAGMA token
generated by the _Pragma directive.
b) C frontend was not setting input_location to something reasonable.
With this change, the C frontend is able to change input_location to point to
the _Pragma token as needed.
This is just a two-line fix (one for each of a) and b)), the testsuite changes
were needed only because the location on the tested warnings has been somewhat
improved, so the tests need to look for the new locations.
gcc/c/ChangeLog:
PR preprocessor/97498
* c-parser.cc (c_parser_pragma): Set input_location to the
location of the pragma, rather than the start of the line.
libcpp/ChangeLog:
PR preprocessor/97498
* directives.cc (destringize_and_run): Override the location of
the CPP_PRAGMA token from a _Pragma directive to the location of
the expansion point, as is done for the tokens lexed from it.
gcc/testsuite/ChangeLog:
PR preprocessor/97498
* c-c++-common/pr97498.c: New test.
* c-c++-common/gomp/pragma-3.c: Adapt for improved warning locations.
* c-c++-common/gomp/pragma-5.c: Likewise.
* gcc.dg/pragma-message.c: Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adapt for
improved warning locations.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.
|
|
When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an
RTX A2000 (sm_86) with driver 510.60.02 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
output pattern test
...
The failing check verifies the launch dimensions:
...
/* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: \
launch gangs=1, workers=8, vectors=128" } */
...
which fails because (as we can see with GOMP_DEBUG=1) the actual num_workers
is 6:
...
nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128
...
This is due to the result of cuOccupancyMaxPotentialBlockSize (which suggests
'a launch configuration with reasonable occupancy') printed just before:
...
cuOccupancyMaxPotentialBlockSize: grid = 52, block = 768
...
[ Note: 6 * 128 == 768. ]
Fix this by updating the check to allow num_workers in the range 1 to 8.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-04-01 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Fix
num_workers check.
|
|
When a display manager is running on an nvidia card, all CUDA kernel launches
get a 5 seconds watchdog timer.
Consequently, when running the libgomp testsuite with nvptx accelerator and
GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this:
...
libgomp: cuStreamSynchronize error: the launch timed out and was terminated
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
execution test
...
Fix this by scaling down the failing test-cases by default, and reverting to
the original behaviour for GCC_TEST_RUN_EXPENSIVE=1.
Tested on x86_64-linux with nvptx accelerator.
libgomp/ChangeLog:
2022-03-25 Tom de Vries <tdevries@suse.de>
PR libgomp/105042
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce
execution time.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.
|
|
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/nesting-1.c: Likewise.
* gcc.dg/goacc/nested-function-1.c: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/nested-function-1.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
Enhance.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
|
|
[PR90115]
As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".
PR middle-end/90115
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance.
* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
|
|
It's an orthogonal concern why these diagnostics do appear at all for
non-offloaded OpenACC constructs (where they're not relevant at all); PR90115.
Depending on how 'assert' is implemented, it may cause temporaries to be
created, and/or may lower into 'COND_EXPR's, and
'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'.
Fix-up for commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and
corresponding testsuite coverage [PR90115]".
PR testsuite/102841
libgomp/
* testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust.
|
|
making certain variables addressable [PR100280, PR104892]
Currently in OpenACC 'kernels' decomposition, there is special handling of
'GOMP_MAP_FORCE_TOFROM', documented to be done to avoid "internal compiler
errors in later passes". For performance reasons, the current repetitive
to/from device copying for every region is not ideal, compared to using
'present' clauses, as done for almost all other 'GOMP_MAP_*'. Also, the
current special handling (incomplete, evidently) is the reason for the PR104892
misbehavior. For PR100280 etc. we've resolved all such known ICEs -- removing
the special handling for 'GOMP_MAP_FORCE_TOFROM' now resolves PR104892.
PR middle-end/100280
PR middle-end/104892
gcc/
* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
Remove special handling of 'GOMP_MAP_FORCE_TOFROM'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
Likewise.
|
|
certain variables addressable [PR104892]
Document a few examples of the status quo.
PR middle-end/104892
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Point
to PR104892.
* testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise,
enable '--param=openacc-kernels=decompose' and adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
Likewise.
|
|
[PR90115]
As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".
PR middle-end/90115
libgomp/
* testsuite/libgomp.oacc-c-c++-common/default-1.c: Enhance.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.
|
|
addressable [PR100280, PR104086]
... like in recent commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in synthesized
data clauses as addressable [PR100280]". Otherwise, we may run into
'gcc/omp-low.cc:lower_omp_target':
13125 else if (is_gimple_reg (var))
13126 {
13127 gcc_assert (offloaded);
PR middle-end/100280
PR middle-end/104086
gcc/
* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
Mark variables used in 'present' clauses as addressable.
* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Gracefully
handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust,
extend.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Merge this...
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
..., and this...
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into
this, and adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Extend.
|
|
PR102330, PR104774]
... so that it matches what we analyze and what we action on.
Fix-up for commit 29a2f51806c5b30e17a8d0e9ba7915a3c53c34ff "openacc:
Add support for gang local storage allocation in shared memory [PR90115]".
PR middle-end/90115
PR middle-end/102330
PR middle-end/104774
gcc/
* omp-low.cc (oacc_privatization_candidate_p)
(oacc_privatization_scan_clause_chain)
(oacc_privatization_scan_decl_chain, lower_oacc_private_marker):
Analyze 'lookup_decl'-translated DECL.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise.
* c-c++-common/goacc/privatization-1-compute-loop.c: Likewise.
* c-c++-common/goacc/privatization-1-compute.c: Likewise.
* c-c++-common/goacc/privatization-1-routine_gang-loop.c:
Likewise.
* c-c++-common/goacc/privatization-1-routine_gang.c: Likewise.
* gfortran.dg/goacc-gomp/pr102330-1.f90: Likewise, and subsume...
* gfortran.dg/goacc-gomp/pr102330-2.f90: ... this file, and...
* gfortran.dg/goacc-gomp/pr102330-3.f90: ... this file.
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust.
* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Enhance.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Likewise.
* testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
|
|
Fix-up for recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]": adjust for a GCN offloading workaround
added just before commit: '(volatile void *) &f1;'.
PR testsuite/104791
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Fix
expected diagnostics.
|
|
'--param=openacc-kernels=decompose' [PR104784]
Before recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]", 'libgomp.oacc-c' testing already worked fine,
but 'libgomp.oacc-c++' testing ICEed. Via the commit mentioned, the C++
testing ICEs are now resolved, but the underlying issue remains to be looked
into: PR104784 "OpenACC 'kernels' decomposition: C vs. C++ differences".
PR middle-end/104784
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Test with '--param=openacc-kernels=decompose'.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
Likewise.
|
|
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Test '-fopt-info-omp-all'.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
Likewise.
|
|
[PR100280, PR104132, PR104133]
... by generalizing the existing 'gcc/omp-low.cc:task_shared_vars'.
Fix-up for commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".
PR middle-end/100280
PR middle-end/104132
PR middle-end/104133
gcc/
* omp-low.cc (task_shared_vars): Rename to
'make_addressable_vars'. Adjust all users.
(scan_sharing_clauses) <OMP_CLAUSE_MAP> Use it for
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs, too.
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Extend.
|
|
lowering [PR100280]
... in preparation for later changes. No functional change.
Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".
PR middle-end/100280
gcc/
* tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New.
* tree-core.h: Document it.
* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Handle
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of
'TREE_ADDRESSABLE'.
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
|
|
declared in block made addressable" [PR100280]
Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".
PR middle-end/100280
gcc/
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
Add diagnostic: "note: OpenACC 'kernels' decomposition: variable
'[...]' declared in block made addressable".
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
'--param=openacc-privatization=noisy'.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
|
|
The ptx insn atom doesn't support local memory. In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.
The message is somewhat confusing given that actually the operation is not
supported on local address space.
Fix this by falling back on a non-atomic version when detecting
a frame-related memory operand.
This only solves some cases that are detected at compile-time. It does
however fix the openacc private-atomic-* test-cases.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1")
(define_insn "atomic_exchange<mode>")
(define_insn "atomic_fetch_add<mode>")
(define_insn "atomic_fetch_addsf")
(define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version
if memory operands is frame-relative.
gcc/testsuite/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/stack-atomics-run.c: New test.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove
PR83812 workaround.
* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same.
* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
|
|
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
recompile the program with 'num_workers = x and vector_length = y' on \
that offloaded region or '-fopenacc-dim=:x:y' where x * y <= 896.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O0 execution test
...
The error does not occur when using GOMP_NVPTX_JIT=-O0.
Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia.
Likewise for some other test-cases.
Tested libgomp on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce
num_workers for nvidia accelerator to fix libgomp error 'insufficient
resources'.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
|
|
Rather than rubber-stamp whatever requested vs. actual device kernel launch
configuration happens, actually (again) verify the requested values (modulo
expected variations).
This better highlights that "AMD GCN has an upper limit of 'num_workers(16)'",
and the deficiency that "AMD GCN uses the autovectorizer for the vector
dimension: the use of a function call in vector-partitioned code [...] is not
currently supported".
And, this removes several instances of race conditions, where variables are
concurrently written to in OpenACC gang-redundant mode.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Strengthen.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
|
|
gcc/ChangeLog:
* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Update
warning messages.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update scanning
patterns.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c:
Likewise.
* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
|
|
gcc/ChangeLog:
* config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Wrap
keyword.
* config/nvptx/nvptx.md: Remove trailing dot.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update keyword
in dg-warning.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise.
* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
|
|
OpenACC test cases
... including "note: '[...]' was declared here" emitted since recent
commit 9695e1c23be5b5c55d572ced152897313ddb96ae
"Improve -Wuninitialized note location".
For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es,
including one more instance of PR77504 etc., and several instances where
for "local variables" of reference-data-type reductions (etc.?) we emit
bogus (?) diagnostics.
For implicit data clauses (including 'firstprivate'), we seem to be missing
diagnostics, so I've placed XFAILed 'dg-warning's.
gcc/testsuite/
* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document
current '-Wuninitialized' diagnostics.
* c-c++-common/goacc/mdc-1.c: Likewise.
* c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise.
* c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise.
* c-c++-common/goacc/nested-reductions-1-routine.c: Likewise.
* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
* c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise.
* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
* c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise.
* c-c++-common/goacc/uninit-if-clause.c: Likewise.
* gfortran.dg/goacc/array-with-dt-1.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-2.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-1.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-3.f90: Likewise.
* gfortran.dg/goacc/derived-chartypes-4.f90: Likewise.
* gfortran.dg/goacc/derived-classtypes-1.f95: Likewise.
* gfortran.dg/goacc/derived-types-2.f90: Likewise.
* gfortran.dg/goacc/host_data-tree.f95: Likewise.
* gfortran.dg/goacc/kernels-tree.f95: Likewise.
* gfortran.dg/goacc/modules.f95: Likewise.
* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise.
* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
* gfortran.dg/goacc/parallel-tree.f95: Likewise.
* gfortran.dg/goacc/pr93464.f90: Likewise.
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
Likewise.
* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
* gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise.
* gfortran.dg/goacc/uninit-if-clause.f95: Likewise.
* gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise.
* gfortran.dg/goacc/wait.f90: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document
current '-Wuninitialized' diagnostics.
* testsuite/libgomp.oacc-fortran/data-5.f90: Likewise.
* testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/gemm.f90: Likewise.
* testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
* testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise.
* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
* testsuite/libgomp.oacc-fortran/reference-reductions.f90:
Likewise.
|
|
In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and
asynchronous data and compute regions, giving rise to data races when the
asynchronicity is actually executed, as is visible in at least on test case
with GCN offloading.
The proper fix is to correctly use the asynchronous interfaces, making the
currently synchronous data regions fully asynchronous (see also
<https://gcc.gnu.org/PR97390> "[OpenACC] 'async' clause on 'data' construct",
which is to share the same implementation), but that's for later; for now add
some more synchronization.
gcc/
* omp-oacc-kernels-decompose.cc (add_wait): New function, split out
of...
(add_async_clauses_and_wait): ...here. Call new outlined function.
(decompose_kernels_region_body): Add wait at the end of
explicitly-asynchronous kernels regions.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN
offloading execution XFAIL.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
clauses as addressable [PR100280]
... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:
13073 else if (is_gimple_reg (var))
13074 {
13075 gcc_assert (offloaded);
13076 tree avar = create_tmp_var (TREE_TYPE (var));
13077 mark_addressable (avar);
..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition). That's all not trivial
to make work, so let's just avoid this case.
gcc/
PR middle-end/100280
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
Mark variables used in synthesized data clauses as addressable.
gcc/testsuite/
PR middle-end/100280
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
Likewise.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
'--param openacc-kernels=decompose'.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-decompose-2.c: Update.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
Likewise.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
'--param openacc-kernels=decompose'.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
libgomp/
PR middle-end/100280
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Update.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
Suggested-by: Julian Brown <julian@codesourcery.com>
|
|
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-1.c: Enhance.
* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Enhance.
* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
|
|
Some systems do not have <alloca.h> but provide alloca differently, e.g.
via stdlib.h. Do it like other testcases do and use __builtin_alloca.
libgomp/ChangeLog:
PR testsuite/102910
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca
instead of #include <alloca.h> + alloca.
|
|
This patch implements an algorithm to lay out local data-share (LDS)
space. It currently works for AMD GCN. At the moment, LDS is used for
three things:
1. Gang-private variables
2. Reduction temporaries (accumulators)
3. Broadcasting for worker partitioning
After the patch is applied, (2) and (3) are placed at preallocated
locations in LDS, and (1) continues to be handled by the backend (as it
is at present prior to this patch being applied). LDS now looks like this:
+--------------+ (gang-private size + 1024, = 1536)
| free space |
| ... |
| - - - - - - -|
| worker bcast |
+--------------+
| reductions |
+--------------+ <<< -mgang-private-size=<number> (def. 512)
| gang-private |
| vars |
+--------------+ (32)
| low LDS vars |
+--------------+ LDS base
So, gang-private space is fixed at a constant amount at compile time
(which can be increased with a command-line switch if necessary
for some given code). The layout algorithm takes out a slice of the
remainder of usable space for reduction vars, and uses the rest for
worker partitioning.
The partitioning algorithm works as follows.
1. An "adjacency" set is built up for each basic block that might
do a broadcast. This is calculated by starting at each such block,
and doing a recursive DFS walk over successors to find the next
block (or blocks) that *also* does a broadcast
(dfs_broadcast_reachable_1).
2. The adjacency set is inverted to get adjacent predecessor blocks also.
3. Blocks that will perform a broadcast are sorted by size of that
broadcast: the biggest blocks are handled first.
4. A splay tree structure is used to calculate the spans of LDS memory
that are already allocated by the blocks adjacent to this one
(merge_ranges{,_1}.
5. The current block's broadcast space is allocated from the first free
span not allocated in the splay tree structure calculated above
(first_fit_range). This seems to work quite nicely and efficiently
with the splay tree structure.
6. Continue with the next-biggest broadcast block until we're done.
In this way, "adjacent" broadcasts will not use the same piece of
LDS memory.
PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in:
The GCN backend uses tree nodes like MEM((__lds TYPE *) <constant>)
for reduction temporaries. Unlike e.g. var decls and SSA names, these
nodes cannot be shared during gimplification, but are so in some
circumstances. This is detected when appropriate --enable-checking
options are used. This patch unshares such nodes when they are reused
more than once.
gcc/
* config/gcn/gcn-protos.h
(gcn_goacc_create_worker_broadcast_record): Update prototype.
* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use
preallocated block of LDS memory. Do not cache/share decls for
reduction temporaries between invocations.
(gcn_goacc_reduction_teardown): Unshare VAR on second use.
(gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter
and return temporary LDS space at that offset. Return pointer in
"sender" case.
* config/gcn/gcn.c (acc_lds_size, gang_private_hwm, lds_allocs):
New global vars.
(ACC_LDS_SIZE): Define as acc_lds_size.
(gcn_init_machine_status): Don't initialise lds_allocated,
lds_allocs, reduc_decls fields of machine function struct.
(gcn_option_override): Handle default size for gang-private
variables and -mgang-private-size option.
(gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when
initialising M0_REG.
(gcn_shared_mem_layout): New function.
(gcn_print_lds_decl): Update comment. Use global lds_allocs map and
gang_private_hwm variable.
(TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook.
* config/gcn/gcn.h (machine_function): Remove lds_allocated,
lds_allocs, reduc_decls. Add reduction_base, reduction_limit.
* config/gcn/gcn.opt (gang_private_size_opt): New global.
(mgang-private-size=): New option.
* doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place
documentation hook.
* doc/tm.texi: Regenerate.
* omp-oacc-neuter-broadcast.cc (targhooks.h, diagnostic-core.h):
Add includes.
(build_sender_ref): Handle sender_decl being pointer.
(worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS
parameters. Pass placement argument to
create_worker_broadcast_record hook invocations. Handle
sender_decl being pointer and isolate_broadcasts inserting extra
barriers.
(blk_offset_map_t): Add typedef.
(neuter_worker_single): Add BLK_OFFSET_MAP parameter. Pass
preallocated range to worker_single_copy call.
(dfs_broadcast_reachable_1): New function.
(idx_decl_pair_t, used_range_vec_t): New typedefs.
(sort_size_descending): New function.
(addr_range): New class.
(splay_tree_compare_addr_range, splay_tree_free_key)
(first_fit_range, merge_ranges_1, merge_ranges): New functions.
(execute_omp_oacc_neuter_broadcast): Rename to...
(oacc_do_neutering): ... this. Add BOUNDS_LO, BOUNDS_HI
parameters. Arrange layout of shared memory for broadcast
operations.
(execute_omp_oacc_neuter_broadcast): New function.
(pass_omp_oacc_neuter_broadcast::gate): Remove num_workers==1
handling from here. Enable pass for all OpenACC routines in order
to call shared memory-layout hook.
* target.def (create_worker_broadcast_record): Add OFFSET
parameter.
(shared_mem_layout): New hook.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Update.
|
|
libgomp/
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: New test.
|
|
[...]/libgomp.oacc-c-c++-common/mode-transitions.c: In function ‘t3’:
[...]/libgomp.oacc-c-c++-common/mode-transitions.c:127:43: warning: ‘?:’ using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
127 | assert (arr[i] == ((i % 64) < 32) ? 1 : -1);
| ^
[...]/libgomp.oacc-c-c++-common/mode-transitions.c: In function ‘t9’:
[...]/libgomp.oacc-c-c++-common/mode-transitions.c:359:46: warning: ‘?:’ using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
359 | assert (arr[i] == ((i % 3) == 0) ? 1 : 2);
| ^
..., and PR101862 "[C, C++] Potential '?:' diagnostic for always-true
expressions in boolean context".
libgomp/
* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: Address
'?:' issues.
|
|
... for 'gcc/gimplify.c:gimplify_scan_omp_clauses' changes in recent
commit d0befed793b94f3f407be44e6f69f81a02f5f073 "openmp: Add support
for OpenMP 5.1 masked construct".
libgomp/
* testsuite/libgomp.oacc-c-c++-common/static-variable-1.c: Adjust.
|