aboutsummaryrefslogtreecommitdiff
path: root/libgomp/testsuite/libgomp.oacc-c-c++-common
AgeCommit message (Collapse)AuthorFilesLines
2024-12-06Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' ↵Thomas Schwinge1-1/+2
__builtin_is_initial_device: Revert 'gimple_fold_builtin_acc_on_device' change The motivation of the 'gimple_fold_builtin_acc_on_device' change in commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11 "Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device" is unclear, and it unnecessarily diverges GCC's (default) '--disable-offload-targets' vs. '--enable-offload-targets=[...]' configurations. PR testsuite/82250 gcc/ * gimple-fold.cc (gimple_fold_builtin_acc_on_device): Revert last change. libgomp/ * testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Revert last change.
2024-11-28Fix 'libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c' for ↵Thomas Schwinge1-3/+0
C23 default With commit 55e3bd376b2214e200fa76d12b67ff259b06c212 "c: Default to -std=gnu23" we've got: [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 (test for excess errors) [-PASS:-]{+UNRESOLVED:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 [-execution test-]{+compilation failed to produce executable+} [Etc.] ..., due to: [...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:13: error: two or more data types in declaration specifiers [...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:1: warning: useless type name in empty declaration libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c [!__cplusplus]: Don't 'typedef int bool;'.
2024-10-13Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' ↵Tobias Burnus1-2/+1
__builtin_is_initial_device It turned out that 'if (omp_is_initial_device() .eqv. true)' gave an ICE due to comparing 'int' with 'logical(4)'. When digging deeper, it also turned out that when the procedure pointer is needed, the builtin cannot be used, either. (Follow up to r15-2799-gf1bfba3a9b3f31 ) Extend the code to also use the builtin acc_on_device with OpenACC, which was previously only used in C/C++. Additionally, fix folding when offloading is not enabled. Fixes additionally the BT_BOOL data type, which was 'char'/integer(1) instead of bool, backing the booleaness; use bool_type_node as the rest of GCC. gcc/fortran/ChangeLog: * gfortran.h (gfc_option_t): Add disable_acc_on_device. * options.cc (gfc_handle_option): Handle -fno-builtin-acc_on_device. * trans-decl.cc (gfc_get_extern_function_decl): Move __builtin_omp_is_initial_device handling to ... * trans-expr.cc (get_builtin_fn): ... this new function. (conv_function_val): Call it. (update_builtin_function): New. (gfc_conv_procedure_call): Call it. * types.def (BT_BOOL): Fix type by using bool_type_node. gcc/ChangeLog: * gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold when offloading is not configured. libgomp/ChangeLog: * libgomp.texi (TR13): Fix minor typos. (omp_is_initial_device): Improve wording. (acc_on_device): Note how to disable the builtin. * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove TODO. * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise. Add -fno-builtin-acc_on_device. * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Update dg- as !offloading_enabled now compile-time expands acc_on_device. * testsuite/libgomp.fortran/target-is-initial-device-3.f90: New test. * testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: New test.
2024-04-16OpenACC 2.7: Adjust acc_map_data/acc_unmap_data interaction with reference ↵Chung-Lin Tang2-1/+37
counters This patch adjusts the implementation of acc_map_data/acc_unmap_data API library routines to more fit the description in the OpenACC 2.7 specification. Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA special value to mark acc_map_data-created mappings. Adjustment around mapping related code to respect OpenACC semantics are also added. libgomp/ChangeLog: * libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2). * oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA, initialize dynamic_refcount as 1. (acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, (goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case. (goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. (goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case. * target.c (gomp_increment_refcount): Return early for REFCOUNT_ACC_MAP_DATA case. (gomp_decrement_refcount): Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase. * testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust testcase error output scan test.
2024-02-27OpenACC: Add Fortran routines ↵Tobias Burnus3-0/+6
acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*} These routines map simply to the C counterpart and are meanwhile defined in OpenACC 3.3. (There are additional routine changes, including the Fortran addition of acc_attach/acc_detach, that require more work than a simple addition of an interface and are therefore excluded.) libgomp/ChangeLog: * libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3 routines that simply map to their C counterpart. * openacc.f90 (openacc): Add them. * openacc_lib.h: Likewise. * testsuite/libgomp.oacc-fortran/acc_host_device_ptr.f90: New test. * testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test. * testsuite/libgomp.oacc-fortran/acc-memcpy-2.f90: New test. * testsuite/libgomp.oacc-c-c++-common/lib-59.c: Crossref to f90 test. * testsuite/libgomp.oacc-c-c++-common/lib-60.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
2023-10-31Add OpenACC 'acc_map_data' variant to 'libgomp.oacc-c-c++-common/deep-copy-8.c'Thomas Schwinge1-2/+27
libgomp/ * testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: Add OpenACC 'acc_map_data' variant.
2023-10-25Extend test suite coverage for OpenACC 'self' clause for compute constructsThomas Schwinge3-0/+45
... on top of what was provided in recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a "OpenACC 2.7: Implement self clause for compute constructs". gcc/testsuite/ * c-c++-common/goacc/if-clause-2.c: Enhance. * c-c++-common/goacc/self-clause-1.c: Likewise. * c-c++-common/goacc/self-clause-2.c: Likewise. * gfortran.dg/goacc/if.f95: Likewise. * gfortran.dg/goacc/kernels-tree.f95: Likewise. * gfortran.dg/goacc/parallel-tree.f95: Likewise. * gfortran.dg/goacc/self.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/if-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/self-1.c: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/if-self-1.c: New. * testsuite/libgomp.oacc-fortran/self-1.f90: Likewise.
2023-10-25OpenACC 2.7: Implement self clause for compute constructsChung-Lin Tang1-0/+962
This patch implements the 'self' clause for compute constructs: parallel, kernels, and serial. This clause conditionally uses the local device (the host mult-core CPU) as the executing device of the compute region. The actual implementation of the "local device" device type inside libgomp (presumably using pthreads) is still not yet completed, so the libgomp side is still implemented the exact same as host-fallback mode. (so as of now, it essentially behaves like the 'if' clause with the condition inverted) gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_compute_clause_self): New function. (c_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_compute_clause_self): New function. (cp_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case. * semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged with OMP_CLAUSE_IF case. gcc/fortran/ChangeLog: * gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field. * openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF. (gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF. (OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF. (OACC_KERNELS_CLAUSES): Likewise. (OACC_SERIAL_CLAUSES): Likewise. (resolve_omp_clauses): Add handling for omp_clauses->self_expr. * trans-openmp.cc (gfc_trans_omp_clauses): Add handling of clauses->self_expr and building of OMP_CLAUSE_SELF tree clause. (gfc_split_omp_clauses): Add handling of self_expr field copy. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case. (gimplify_adjust_omp_clauses): Likewise. * omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code, * omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum. * tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF case. (convert_local_omp_clauses): Likewise. * tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_SELF_EXPR): New macro. gcc/testsuite/ChangeLog: * c-c++-common/goacc/self-clause-1.c: New test. * c-c++-common/goacc/self-clause-2.c: New test. * gfortran.dg/goacc/self.f95: New test. include/ChangeLog: * gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value. libgomp/ChangeLog: * oacc-parallel.c (GOACC_parallel_keyed): Add code to handle GOACC_FLAG_LOCAL_DEVICE case. * testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.
2023-06-12OpenMP: Cleanups related to the 'present' modifierTobias Burnus1-1/+1
Reduce number of enum values passed to libgomp as GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} have the same semantic as GOMP_MAP_FORCE_PRESENT (i.e. abort if not present, otherwise ignore); that's different to GOMP_MAP_ALWAYS_PRESENT_{TO,TOFROM,FROM} which also abort if not present but copy data when present. This is is a follow-up to the commit r14-1579-g4ede915d5dde93 done 6 days ago. Additionally, the commit improves a libgomp run-time and a C/C++ compile-time error wording and extends testcases a tiny bit. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_map): Reword error message for clearness especially with 'omp target (enter/exit) data.' gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_map): Reword error message for clearness especially with 'omp target (enter/exit) data.' * semantics.cc (handle_omp_array_sections): Handle GOMP_MAP_{ALWAYS_,}PRESENT_{TO,TOFROM,FROM,ALLOC} enum values. gcc/ChangeLog: * gimplify.cc (gimplify_adjust_omp_clauses_1): Use GOMP_MAP_FORCE_PRESENT for 'present alloc' implicit mapping. (gimplify_adjust_omp_clauses): Change GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to the equivalent GOMP_MAP_FORCE_PRESENT. * omp-low.cc (lower_omp_target): Remove handling of no-longer valid GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC}; update map kinds used for to/from clauses with present modifier. include/ChangeLog: * gomp-constants.h (enum gomp_map_kind): Change the enum values GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to be compiler only. (GOMP_MAP_PRESENT_P): Update to include also GOMP_MAP_FORCE_PRESENT. libgomp/ChangeLog: * target.c (gomp_to_device_kind_p, gomp_map_vars_internal): Replace GOMP_MAP_PRESENT_{FROM,TO,TOFROM,ACLLOC} by GOMP_MAP_FORCE_PRESENT. (gomp_map_vars_internal, gomp_update): Likewise; unify and improve error message. * testsuite/libgomp.c-c++-common/target-present-2.c: Update for changed error message. * testsuite/libgomp.fortran/target-present-1.f90: Likewise. * testsuite/libgomp.fortran/target-present-2.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise. * testsuite/libgomp.c-c++-common/target-present-1.c: Likewise and extend testcase to check that data is copied when needed. * testsuite/libgomp.c-c++-common/target-present-3.c: Likewise. * testsuite/libgomp.fortran/target-present-3.f90: Likewise. gcc/testsuite/ChangeLog: * c-c++-common/gomp/defaultmap-4.c: Update scan-tree-dump. * c-c++-common/gomp/map-9.c: Likewise. * gfortran.dg/gomp/defaultmap-8.f90: Likewise. * gfortran.dg/gomp/map-11.f90: Likewise. * gfortran.dg/gomp/target-update-1.f90: Likewise. * gfortran.dg/gomp/map-12.f90: Likewise; also check original dump. * c-c++-common/gomp/map-6.c: Update dg-error and also check clause error with 'target (enter/exit) data'.
2023-03-28testsuite: Fix weak_undefined handling on DarwinRainer Orth1-0/+1
The patch that introduced the weak_undefined effective-target keyword and corresponding dg-add-options support commit 378ec7b87a5265dbe2d489c245fac98ef37fa638 Author: Alexandre Oliva <oliva@adacore.com> Date: Thu Mar 23 00:45:05 2023 -0300 [testsuite] test for weak_undefined support and add options badly broke the affected tests on macOS like so: ERROR: gcc.dg/addr_equal-1.c: unknown dg option: 89 for " dg-add-options 5 weak_undefined " ERROR: gcc.dg/addr_equal-1.c: unknown dg option: 89 for " dg-add-options 5 weak_undefined " add_options_for_weak_undefined tries to call an non-existant proc "89". Even after fixing this by escaping the brackets, two tests still failed to link since they lacked the corresponding calls do dg-add-options weak_undefined. Tested on x86_64-apple-darwin20.6.0 and i386-pc-solaris2.11. 2023-03-27 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: * lib/target-supports.exp (add_options_for_weak_undefined): Escape brackets. * gcc.dg/visibility-22.c: Add weak_undefined options. libgomp: * testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Add weak_undefined options.
2023-03-10Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs [PR90596]Thomas Schwinge1-44/+14
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec', 'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in particular conceptually: no more device memory allocation, host to device data copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for us. This depends on commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd "Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data", where I said that "a use will emerge later", which is this one here. PR libgomp/90596 libgomp/ * target.c (gomp_map_vars_internal): Allow for 'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'. * oacc-parallel.c (GOACC_parallel_keyed): Pass 'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'. * plugin/plugin-gcn.c (alloc_by_agent, gcn_exec) (GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify. (gomp_offload_free): Remove. * plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify. (cuda_free_argmem): Remove. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Adjust.
2023-03-10Simplify OpenACC 'no_create' clause implementationThomas Schwinge2-6/+36
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then simplify the device plugins accordingly. This is a follow-up to Subversion r279551 (Git commit a6163563f2ce502bd4ef444bd5de33570bb8eeb1) "Add OpenACC 2.6's no_create", Subversion r279622 (Git commit 5bcd470bf0749e1f56d05dd43aa9584ff2e3a090) "Use gomp_map_val for OpenACC host-to-device address translation". libgomp/ * target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for 'GOMP_MAP_IF_PRESENT'. * plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Adjust. * plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Likewise. * testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async' testing. * testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
2023-03-10Document/verify another aspect of OpenACC 'async' semantics in ↵Thomas Schwinge1-2/+2
'libgomp.oacc-c-c++-common/data-3.c' ... that I almost broke with later implementation changes. libgomp/ * testsuite/libgomp.oacc-c-c++-common/data-3.c: Document/verify another aspect of OpenACC 'async' semantics.
2023-03-10Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' positionThomas Schwinge1-19/+183
For an OpenACC compute construct, we've currently got: - [...] - acc_ev_enqueue_launch_start - launch kernel - free memory - acc_ev_free - acc_ev_enqueue_launch_end This confused another thing that I'm working on, so I adjusted that to: - [...] - acc_ev_enqueue_launch_start - launch kernel - acc_ev_enqueue_launch_end - free memory - acc_ev_free Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'. libgomp/ * plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end' position. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Verify 'acc_ev_alloc', 'acc_ev_free'.
2022-10-21Restore 'libgomp.oacc-c-c++-common/nvptx-sese-1.c' SESE regions checking ↵Thomas Schwinge1-1/+1
[PR107195, PR107344] That is, adjust for optimization introduced with recent commit r13-3217-gc4d15dddf6b9eacb36f535807ad2ee364af46e04 "[PR107195] Set range to zero when nonzero mask is 0", where GCC now understands that after 'r *= 2;', 'r & 1' will never hold here, and thus transforms/optimizes/"disturbs" the original code such that GCC/nvptx's later "Neuter whole SESE regions" optimization no longer is applicable to it: UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/nvptx-sese-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 scan-nvptx-none-offload-rtl-dump mach "SESE regions:.* [0-9]+{[0-9]+->[0-9]+(\\.[0-9]+)+}" Same for C++. It's unclear to me if this is an actual "problem", which optimization is "more important", so I've filed PR107344 "GCC/nvptx SESE region optimization" to capture this question, and here restore what we intend to be testing (to my understanding) in 'libgomp.oacc-c-c++-common/nvptx-sese-1.c'. PR tree-optimization/107195 PR target/107344 libgomp/ * testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c: Restore SESE regions checking.
2022-10-20Add 'libgomp.oacc-c-c++-common/private-big-1.c' [PR105421]Thomas Schwinge1-0/+100
After commit r13-3404-g7c55755d4c760de326809636531478fd7419e1e5 "amdgcn: Use FLAT addressing for all functions with pointer arguments [PR105421]", "big" private data now works for GCN offloading, too. PR target/105421 libgomp/ * testsuite/libgomp.oacc-c-c++-common/private-big-1.c: New.
2022-09-14OpenMP/OpenACC struct sibling list gimplification extension and reworkJulian Brown3-0/+382
This patch refactors struct sibling-list processing in gimplify.cc, and adjusts some related mapping-clause processing in the Fortran FE and omp-low.cc accordingly. 2022-09-13 Julian Brown <julian@codesourcery.com> gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_clauses): Don't create GOMP_MAP_TO_PSET mappings for class metadata, nor GOMP_MAP_POINTER mappings for POINTER_TYPE_P decls. gcc/ * gimplify.cc (gimplify_omp_var_data): Remove GOVD_MAP_HAS_ATTACHMENTS. (GOMP_FIRSTPRIVATE_IMPLICIT): Renumber. (insert_struct_comp_map): Refactor function into... (build_omp_struct_comp_nodes): This new function. Remove list handling and improve self-documentation. (extract_base_bit_offset): Remove BASE_REF, OFFSETP parameters. Move code to strip outer parts of address out of function, but strip no-op conversions. (omp_mapping_group): Add DELETED field for use during reindexing. (omp_strip_components_and_deref, omp_strip_indirections): New functions. (omp_group_last, omp_group_base): Add GOMP_MAP_STRUCT handling. (omp_gather_mapping_groups): Initialise DELETED field for new groups. (omp_index_mapping_groups): Notice DELETED groups when (re)indexing. (omp_siblist_insert_node_after, omp_siblist_move_node_after, omp_siblist_move_nodes_after, omp_siblist_move_concat_nodes_after): New helper functions. (omp_accumulate_sibling_list): New function to build up GOMP_MAP_STRUCT node groups for sibling lists. Outlined from gimplify_scan_omp_clauses. (omp_build_struct_sibling_lists): New function. (gimplify_scan_omp_clauses): Remove struct_map_to_clause, struct_seen_clause, struct_deref_set. Call omp_build_struct_sibling_lists as pre-pass instead of handling sibling lists in the function's main processing loop. (gimplify_adjust_omp_clauses_1): Remove GOVD_MAP_HAS_ATTACHMENTS handling, unused now. * omp-low.cc (scan_sharing_clauses): Handle pointer-type indirect struct references, and references to pointers to structs also. gcc/testsuite/ * g++.dg/goacc/member-array-acc.C: New test. * g++.dg/gomp/member-array-omp.C: New test. * g++.dg/gomp/target-3.C: Update expected output. * g++.dg/gomp/target-lambda-1.C: Likewise. * g++.dg/gomp/target-this-2.C: Likewise. * c-c++-common/goacc/deep-copy-arrayofstruct.c: Move test from here. * c-c++-common/gomp/target-50.c: New test. libgomp/ * testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test. * testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test. * testsuite/libgomp.oacc-c++/deep-copy-17.C: New test. * testsuite/libgomp.oacc-c-c++-common/deep-copy-arrayofstruct.c: Move test to here, make "run" test.
2022-07-12XFAIL 'offloading_enabled' diagnostics issue in ↵Thomas Schwinge1-3/+4
'libgomp.oacc-c-c++-common/reduction-5.c' [PR101551] Fix-up for recent commit 06b2a2abe26554c6f9365676683d67368cbba206 "Enhance '_Pragma' diagnostics verification in OMP C/C++ test cases". Supposedly it's the same issue as in <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101551#c2>, where I'd noted that: | [...] with an offloading-enabled build of GCC we're losing | "note: in expansion of macro '[...]'" diagnostics. | (Effectively '-ftrack-macro-expansion=0'?) PR middle-end/101551 libgomp/ * testsuite/libgomp.oacc-c-c++-common/reduction-5.c: XFAIL 'offloading_enabled' diagnostics issue.
2022-07-11Enhance '_Pragma' diagnostics verification in OMP C/C++ test casesThomas Schwinge1-3/+5
Follow-up to recent commit 0587cef3d7962a8b0f44779589ba2920dd3d71e5 "c: Fix location for _Pragma tokens [PR97498]". gcc/testsuite/ * c-c++-common/gomp/pragma-3.c: Enhance '_Pragma' diagnostics verification. * c-c++-common/gomp/pragma-5.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Enhance '_Pragma' diagnostics verification.
2022-07-10c: Fix location for _Pragma tokens [PR97498]Lewis Hyatt2-11/+11
The handling of #pragma GCC diagnostic uses input_location, which is not always as precise as needed; in particular the relative location of some tokens and a _Pragma directive will crucially determine whether a given diagnostic is enabled or suppressed in the desired way. PR97498 shows how the C frontend ends up with input_location pointing to the beginning of the line containing a _Pragma() directive, resulting in the wrong behavior if the diagnostic to be modified pertains to some tokens found earlier on the same line. This patch fixes that by addressing two issues: a) libcpp was not assigning a valid location to the CPP_PRAGMA token generated by the _Pragma directive. b) C frontend was not setting input_location to something reasonable. With this change, the C frontend is able to change input_location to point to the _Pragma token as needed. This is just a two-line fix (one for each of a) and b)), the testsuite changes were needed only because the location on the tested warnings has been somewhat improved, so the tests need to look for the new locations. gcc/c/ChangeLog: PR preprocessor/97498 * c-parser.cc (c_parser_pragma): Set input_location to the location of the pragma, rather than the start of the line. libcpp/ChangeLog: PR preprocessor/97498 * directives.cc (destringize_and_run): Override the location of the CPP_PRAGMA token from a _Pragma directive to the location of the expansion point, as is done for the tokens lexed from it. gcc/testsuite/ChangeLog: PR preprocessor/97498 * c-c++-common/pr97498.c: New test. * c-c++-common/gomp/pragma-3.c: Adapt for improved warning locations. * c-c++-common/gomp/pragma-5.c: Likewise. * gcc.dg/pragma-message.c: Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adapt for improved warning locations. * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.
2022-04-01[libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.cTom de Vries1-1/+1
When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an RTX A2000 (sm_86) with driver 510.60.02 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ output pattern test ... The failing check verifies the launch dimensions: ... /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: \ launch gangs=1, workers=8, vectors=128" } */ ... which fails because (as we can see with GOMP_DEBUG=1) the actual num_workers is 6: ... nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128 ... This is due to the result of cuOccupancyMaxPotentialBlockSize (which suggests 'a launch configuration with reasonable occupancy') printed just before: ... cuOccupancyMaxPotentialBlockSize: grid = 52, block = 768 ... [ Note: 6 * 128 == 768. ] Fix this by updating the check to allow num_workers in the range 1 to 8. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-01 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Fix num_workers check.
2022-03-25[libgomp, testsuite] Scale down some OpenACC test-casesTom de Vries2-19/+32
When a display manager is running on an nvidia card, all CUDA kernel launches get a 5 seconds watchdog timer. Consequently, when running the libgomp testsuite with nvptx accelerator and GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this: ... libgomp: cuStreamSynchronize error: the launch timed out and was terminated FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ execution test ... Fix this by scaling down the failing test-cases by default, and reverting to the original behaviour for GCC_TEST_RUN_EXPENSIVE=1. Tested on x86_64-linux with nvptx accelerator. libgomp/ChangeLog: 2022-03-25 Tom de Vries <tdevries@suse.de> PR libgomp/105042 * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce execution time. * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.
2022-03-17Enhance further testcases to verify Openacc 'kernels' decompositionThomas Schwinge2-4/+18
gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-1.c: Enhance. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/nesting-1.c: Likewise. * gcc.dg/goacc/nested-function-1.c: Likewise. * gfortran.dg/goacc/common-block-3.f90: Likewise. * gfortran.dg/goacc/nested-function-1.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
2022-03-17Enhance further testcases to verify handling of OpenACC privatization level ↵Thomas Schwinge1-5/+30
[PR90115] As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185 "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR middle-end/90115 gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-1.c: Enhance. * gfortran.dg/goacc/common-block-3.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
2022-03-16OpenACC privatization diagnostics vs. 'assert' [PR102841]Thomas Schwinge1-3/+3
It's an orthogonal concern why these diagnostics do appear at all for non-offloaded OpenACC constructs (where they're not relevant at all); PR90115. Depending on how 'assert' is implemented, it may cause temporaries to be created, and/or may lower into 'COND_EXPR's, and 'gcc/gimplify.cc:gimplify_cond_expr' uses 'create_tmp_var (type, "iftmp")'. Fix-up for commit 11b8286a83289f5b54e813f14ff56d730c3f3185 "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR testsuite/102841 libgomp/ * testsuite/libgomp.oacc-c-c++-common/host_data-7.c: Adjust.
2022-03-12OpenACC 'kernels' decomposition: resolve wrong-code cases unless manually ↵Thomas Schwinge5-24/+40
making certain variables addressable [PR100280, PR104892] Currently in OpenACC 'kernels' decomposition, there is special handling of 'GOMP_MAP_FORCE_TOFROM', documented to be done to avoid "internal compiler errors in later passes". For performance reasons, the current repetitive to/from device copying for every region is not ideal, compared to using 'present' clauses, as done for almost all other 'GOMP_MAP_*'. Also, the current special handling (incomplete, evidently) is the reason for the PR104892 misbehavior. For PR100280 etc. we've resolved all such known ICEs -- removing the special handling for 'GOMP_MAP_FORCE_TOFROM' now resolves PR104892. PR middle-end/100280 PR middle-end/104892 gcc/ * omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1): Remove special handling of 'GOMP_MAP_FORCE_TOFROM'. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-2.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.
2022-03-12OpenACC 'kernels' decomposition: wrong-code cases unless manually making ↵Thomas Schwinge4-15/+45
certain variables addressable [PR104892] Document a few examples of the status quo. PR middle-end/104892 libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Point to PR104892. * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise, enable '--param=openacc-kernels=decompose' and adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.
2022-03-12Enhance further testcases to verify handling of OpenACC privatization level ↵Thomas Schwinge3-53/+254
[PR90115] As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185 "[OpenACC privatization] Largely extend diagnostics and corresponding testsuite coverage [PR90115]". PR middle-end/90115 libgomp/ * testsuite/libgomp.oacc-c-c++-common/default-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90: Likewise.
2022-03-12OpenACC 'kernels' decomposition: Mark variables used in 'present' clauses as ↵Thomas Schwinge4-67/+88
addressable [PR100280, PR104086] ... like in recent commit 9b32c1669aad5459dd053424f9967011348add83 "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". Otherwise, we may run into 'gcc/omp-low.cc:lower_omp_target': 13125 else if (is_gimple_reg (var)) 13126 { 13127 gcc_assert (offloaded); PR middle-end/100280 PR middle-end/104086 gcc/ * omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1): Mark variables used in 'present' clauses as addressable. * omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Gracefully handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust, extend. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: Merge this... * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c: ..., and this... * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into this, and adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Extend.
2022-03-10[OpenACC privatization] Analyze 'lookup_decl'-translated DECL [PR90115, ↵Thomas Schwinge14-48/+78
PR102330, PR104774] ... so that it matches what we analyze and what we action on. Fix-up for commit 29a2f51806c5b30e17a8d0e9ba7915a3c53c34ff "openacc: Add support for gang local storage allocation in shared memory [PR90115]". PR middle-end/90115 PR middle-end/102330 PR middle-end/104774 gcc/ * omp-low.cc (oacc_privatization_candidate_p) (oacc_privatization_scan_clause_chain) (oacc_privatization_scan_decl_chain, lower_oacc_private_marker): Analyze 'lookup_decl'-translated DECL. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104774-1.c: Likewise. * c-c++-common/goacc/privatization-1-compute-loop.c: Likewise. * c-c++-common/goacc/privatization-1-compute.c: Likewise. * c-c++-common/goacc/privatization-1-routine_gang-loop.c: Likewise. * c-c++-common/goacc/privatization-1-routine_gang.c: Likewise. * gfortran.dg/goacc-gomp/pr102330-1.f90: Likewise, and subsume... * gfortran.dg/goacc-gomp/pr102330-2.f90: ... this file, and... * gfortran.dg/goacc-gomp/pr102330-3.f90: ... this file. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust. * gfortran.dg/goacc/privatization-1-compute.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/optional-private.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
2022-03-04Fix 'libgomp.oacc-c-c++-common/kernels-decompose-1.c' expected diagnosticsThomas Schwinge1-0/+2
Fix-up for recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968 "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]": adjust for a GCN offloading workaround added just before commit: '(volatile void *) &f1;'. PR testsuite/104791 libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Fix expected diagnostics.
2022-03-04Test 'libgomp.oacc-*/kernels-private-vars-*' with ↵Thomas Schwinge20-75/+270
'--param=openacc-kernels=decompose' [PR104784] Before recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968 "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]", 'libgomp.oacc-c' testing already worked fine, but 'libgomp.oacc-c++' testing ICEed. Via the commit mentioned, the C++ testing ICEs are now resolved, but the underlying issue remains to be looked into: PR104784 "OpenACC 'kernels' decomposition: C vs. C++ differences". PR middle-end/104784 libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Test with '--param=openacc-kernels=decompose'. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90: Likewise.
2022-03-04Test '-fopt-info-omp-all' in 'libgomp.oacc-*/kernels-private-vars-*'Thomas Schwinge20-366/+546
libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Test '-fopt-info-omp-all'. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90: Likewise.
2022-03-04OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs ↵Thomas Schwinge1-4/+58
[PR100280, PR104132, PR104133] ... by generalizing the existing 'gcc/omp-low.cc:task_shared_vars'. Fix-up for commit 9b32c1669aad5459dd053424f9967011348add83 "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 PR middle-end/104132 PR middle-end/104133 gcc/ * omp-low.cc (task_shared_vars): Rename to 'make_addressable_vars'. Adjust all users. (scan_sharing_clauses) <OMP_CLAUSE_MAP> Use it for 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs, too. gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Extend.
2022-03-04OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into OMP ↵Thomas Schwinge2-4/+8
lowering [PR100280] ... in preparation for later changes. No functional change. Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83 "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 gcc/ * tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New. * tree-core.h: Document it. * omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Handle 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'. * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of 'TREE_ADDRESSABLE'. gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise.
2022-03-04Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' ↵Thomas Schwinge2-0/+4
declared in block made addressable" [PR100280] Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83 "OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]". PR middle-end/100280 gcc/ * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable". gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Add '--param=openacc-privatization=noisy'. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Adjust. * c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise.
2022-02-01[nvptx] Add some support for .local atomicsTom de Vries1-7/+0
The ptx insn atom doesn't support local memory. In case of doing an atomic operation on local memory, we run into: ... operation not supported on global/shared address space ... This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE. The message is somewhat confusing given that actually the operation is not supported on local address space. Fix this by falling back on a non-atomic version when detecting a frame-related memory operand. This only solves some cases that are detected at compile-time. It does however fix the openacc private-atomic-* test-cases. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap<mode>_1") (define_insn "atomic_exchange<mode>") (define_insn "atomic_fetch_add<mode>") (define_insn "atomic_fetch_addsf") (define_insn "atomic_fetch_<logic><mode>"): Output non-atomic version if memory operands is frame-relative. gcc/testsuite/ChangeLog: 2022-01-31 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/stack-atomics-run.c: New test. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove PR83812 workaround. * testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same. * testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.
2022-02-01[libgomp, testsuite] Fix insufficient resources in test-casesTom de Vries3-3/+25
When running libgomp test-case broadcast-many.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... libgomp: The Nvidia accelerator has insufficient resources to launch \ 'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \ recompile the program with 'num_workers = x and vector_length = y' on \ that offloaded region or '-fopenacc-dim=:x:y' where x * y <= 896. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O0 execution test ... The error does not occur when using GOMP_NVPTX_JIT=-O0. Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia. Likewise for some other test-cases. Tested libgomp on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce num_workers for nvidia accelerator to fix libgomp error 'insufficient resources'. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: Same. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
2022-01-21Strengthen a few OpenACC test casesThomas Schwinge15-57/+202
Rather than rubber-stamp whatever requested vs. actual device kernel launch configuration happens, actually (again) verify the requested values (modulo expected variations). This better highlights that "AMD GCN has an upper limit of 'num_workers(16)'", and the deficiency that "AMD GCN uses the autovectorizer for the vector dimension: the use of a function call in vector-partitioned code [...] is not currently supported". And, this removes several instances of race conditions, where variables are concurrently written to in OpenACC gang-redundant mode. libgomp/ * testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Strengthen. * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
2022-01-19nvptx: update fix for -Wformat-diagMartin Liska9-18/+18
gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Update warning messages. libgomp/ChangeLog: * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update scanning patterns. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise. * testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-01-18nvptx: fix -Wformat-diag warningsMartin Liska9-18/+18
gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_goacc_validate_dims_1): Wrap keyword. * config/nvptx/nvptx.md: Remove trailing dot. libgomp/ChangeLog: * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Update keyword in dg-warning. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/struct-copyout-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-64-1.c: Likewise. * testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: Likewise.
2022-01-13Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for ↵Thomas Schwinge1-1/+35
OpenACC test cases ... including "note: '[...]' was declared here" emitted since recent commit 9695e1c23be5b5c55d572ced152897313ddb96ae "Improve -Wuninitialized note location". For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es, including one more instance of PR77504 etc., and several instances where for "local variables" of reference-data-type reductions (etc.?) we emit bogus (?) diagnostics. For implicit data clauses (including 'firstprivate'), we seem to be missing diagnostics, so I've placed XFAILed 'dg-warning's. gcc/testsuite/ * c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document current '-Wuninitialized' diagnostics. * c-c++-common/goacc/mdc-1.c: Likewise. * c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise. * c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise. * c-c++-common/goacc/nested-reductions-1-routine.c: Likewise. * c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise. * c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise. * c-c++-common/goacc/nested-reductions-2-routine.c: Likewise. * c-c++-common/goacc/uninit-dim-clause.c: Likewise. * c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise. * c-c++-common/goacc/uninit-if-clause.c: Likewise. * gfortran.dg/goacc/array-with-dt-1.f90: Likewise. * gfortran.dg/goacc/array-with-dt-2.f90: Likewise. * gfortran.dg/goacc/array-with-dt-3.f90: Likewise. * gfortran.dg/goacc/array-with-dt-4.f90: Likewise. * gfortran.dg/goacc/array-with-dt-5.f90: Likewise. * gfortran.dg/goacc/derived-chartypes-1.f90: Likewise. * gfortran.dg/goacc/derived-chartypes-2.f90: Likewise. * gfortran.dg/goacc/derived-chartypes-3.f90: Likewise. * gfortran.dg/goacc/derived-chartypes-4.f90: Likewise. * gfortran.dg/goacc/derived-classtypes-1.f95: Likewise. * gfortran.dg/goacc/derived-types-2.f90: Likewise. * gfortran.dg/goacc/host_data-tree.f95: Likewise. * gfortran.dg/goacc/kernels-tree.f95: Likewise. * gfortran.dg/goacc/modules.f95: Likewise. * gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise. * gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise. * gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise. * gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise. * gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise. * gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise. * gfortran.dg/goacc/parallel-tree.f95: Likewise. * gfortran.dg/goacc/pr93464.f90: Likewise. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-compute.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise. * gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise. * gfortran.dg/goacc/uninit-dim-clause.f95: Likewise. * gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise. * gfortran.dg/goacc/uninit-if-clause.f95: Likewise. * gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise. * gfortran.dg/goacc/wait.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document current '-Wuninitialized' diagnostics. * testsuite/libgomp.oacc-fortran/data-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/gemm.f90: Likewise. * testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise. * testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise. * testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise. * testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise. * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise. * testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise. * testsuite/libgomp.oacc-fortran/reference-reductions.f90: Likewise.
2022-01-13Wait at end of OpenACC asynchronous kernels regionsJulian Brown1-1/+0
In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and asynchronous data and compute regions, giving rise to data races when the asynchronicity is actually executed, as is visible in at least on test case with GCN offloading. The proper fix is to correctly use the asynchronous interfaces, making the currently synchronous data regions fully asynchronous (see also <https://gcc.gnu.org/PR97390> "[OpenACC] 'async' clause on 'data' construct", which is to share the same implementation), but that's for later; for now add some more synchronization. gcc/ * omp-oacc-kernels-decompose.cc (add_wait): New function, split out of... (add_async_clauses_and_wait): ...here. Call new outlined function. (decompose_kernels_region_body): Add wait at the end of explicitly-asynchronous kernels regions. libgomp/ * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN offloading execution XFAIL. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2022-01-13OpenACC 'kernels' decomposition: Mark variables used in synthesized data ↵Thomas Schwinge3-28/+33
clauses as addressable [PR100280] ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary: 13073 else if (is_gimple_reg (var)) 13074 { 13075 gcc_assert (offloaded); 13076 tree avar = create_tmp_var (TREE_TYPE (var)); 13077 mark_addressable (avar); ..., which (a) is only implemented for actualy *offloaded* regions (but not data regions), and (b) the subsequently synthesized code for writing to and later reading back from the temporary fundamentally conflicts with OpenACC 'async' (as used by OpenACC 'kernels' decomposition). That's all not trivial to make work, so let's just avoid this case. gcc/ PR middle-end/100280 * omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region): Mark variables used in synthesized data clauses as addressable. gcc/testsuite/ PR middle-end/100280 * c-c++-common/goacc/kernels-decompose-pr100280-1.c: New. * c-c++-common/goacc/classify-kernels-parloops.c: Likewise. * c-c++-common/goacc/classify-kernels-unparallelized-parloops.c: Likewise. * c-c++-common/goacc/classify-kernels-unparallelized.c: Test '--param openacc-kernels=decompose'. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-decompose-2.c: Update. * c-c++-common/goacc/kernels-decompose-ice-1.c: Remove. * c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise. * gfortran.dg/goacc/classify-kernels-parloops.f95: New. * gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95: Likewise. * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test '--param openacc-kernels=decompose'. * gfortran.dg/goacc/classify-kernels.f95: Likewise. libgomp/ PR middle-end/100280 * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: Update. * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. Suggested-by: Julian Brown <julian@codesourcery.com>
2022-01-13Enhance OpenACC 'kernels' decomposition testingThomas Schwinge7-60/+265
gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-1.c: Enhance. * c-c++-common/goacc/kernels-decompose-2.c: Likewise. * c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise. * c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise. * gfortran.dg/goacc/kernels-decompose-1.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
2021-10-25libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_allocaTobias Burnus1-4/+3
Some systems do not have <alloca.h> but provide alloca differently, e.g. via stdlib.h. Do it like other testcases do and use __builtin_alloca. libgomp/ChangeLog: PR testsuite/102910 * testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: Use __builtin_alloca instead of #include <alloca.h> + alloca.
2021-09-17openacc: Shared memory layout optimisationJulian Brown1-0/+4
This patch implements an algorithm to lay out local data-share (LDS) space. It currently works for AMD GCN. At the moment, LDS is used for three things: 1. Gang-private variables 2. Reduction temporaries (accumulators) 3. Broadcasting for worker partitioning After the patch is applied, (2) and (3) are placed at preallocated locations in LDS, and (1) continues to be handled by the backend (as it is at present prior to this patch being applied). LDS now looks like this: +--------------+ (gang-private size + 1024, = 1536) | free space | | ... | | - - - - - - -| | worker bcast | +--------------+ | reductions | +--------------+ <<< -mgang-private-size=<number> (def. 512) | gang-private | | vars | +--------------+ (32) | low LDS vars | +--------------+ LDS base So, gang-private space is fixed at a constant amount at compile time (which can be increased with a command-line switch if necessary for some given code). The layout algorithm takes out a slice of the remainder of usable space for reduction vars, and uses the rest for worker partitioning. The partitioning algorithm works as follows. 1. An "adjacency" set is built up for each basic block that might do a broadcast. This is calculated by starting at each such block, and doing a recursive DFS walk over successors to find the next block (or blocks) that *also* does a broadcast (dfs_broadcast_reachable_1). 2. The adjacency set is inverted to get adjacent predecessor blocks also. 3. Blocks that will perform a broadcast are sorted by size of that broadcast: the biggest blocks are handled first. 4. A splay tree structure is used to calculate the spans of LDS memory that are already allocated by the blocks adjacent to this one (merge_ranges{,_1}. 5. The current block's broadcast space is allocated from the first free span not allocated in the splay tree structure calculated above (first_fit_range). This seems to work quite nicely and efficiently with the splay tree structure. 6. Continue with the next-biggest broadcast block until we're done. In this way, "adjacent" broadcasts will not use the same piece of LDS memory. PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in: The GCN backend uses tree nodes like MEM((__lds TYPE *) <constant>) for reduction temporaries. Unlike e.g. var decls and SSA names, these nodes cannot be shared during gimplification, but are so in some circumstances. This is detected when appropriate --enable-checking options are used. This patch unshares such nodes when they are reused more than once. gcc/ * config/gcn/gcn-protos.h (gcn_goacc_create_worker_broadcast_record): Update prototype. * config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use preallocated block of LDS memory. Do not cache/share decls for reduction temporaries between invocations. (gcn_goacc_reduction_teardown): Unshare VAR on second use. (gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter and return temporary LDS space at that offset. Return pointer in "sender" case. * config/gcn/gcn.c (acc_lds_size, gang_private_hwm, lds_allocs): New global vars. (ACC_LDS_SIZE): Define as acc_lds_size. (gcn_init_machine_status): Don't initialise lds_allocated, lds_allocs, reduc_decls fields of machine function struct. (gcn_option_override): Handle default size for gang-private variables and -mgang-private-size option. (gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when initialising M0_REG. (gcn_shared_mem_layout): New function. (gcn_print_lds_decl): Update comment. Use global lds_allocs map and gang_private_hwm variable. (TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook. * config/gcn/gcn.h (machine_function): Remove lds_allocated, lds_allocs, reduc_decls. Add reduction_base, reduction_limit. * config/gcn/gcn.opt (gang_private_size_opt): New global. (mgang-private-size=): New option. * doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place documentation hook. * doc/tm.texi: Regenerate. * omp-oacc-neuter-broadcast.cc (targhooks.h, diagnostic-core.h): Add includes. (build_sender_ref): Handle sender_decl being pointer. (worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS parameters. Pass placement argument to create_worker_broadcast_record hook invocations. Handle sender_decl being pointer and isolate_broadcasts inserting extra barriers. (blk_offset_map_t): Add typedef. (neuter_worker_single): Add BLK_OFFSET_MAP parameter. Pass preallocated range to worker_single_copy call. (dfs_broadcast_reachable_1): New function. (idx_decl_pair_t, used_range_vec_t): New typedefs. (sort_size_descending): New function. (addr_range): New class. (splay_tree_compare_addr_range, splay_tree_free_key) (first_fit_range, merge_ranges_1, merge_ranges): New functions. (execute_omp_oacc_neuter_broadcast): Rename to... (oacc_do_neutering): ... this. Add BOUNDS_LO, BOUNDS_HI parameters. Arrange layout of shared memory for broadcast operations. (execute_omp_oacc_neuter_broadcast): New function. (pass_omp_oacc_neuter_broadcast::gate): Remove num_workers==1 handling from here. Enable pass for all OpenACC routines in order to call shared memory-layout hook. * target.def (create_worker_broadcast_record): Add OFFSET parameter. (shared_mem_layout): New hook. libgomp/ * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Update.
2021-09-17Add 'libgomp.oacc-c-c++-common/broadcast-many.c'Julian Brown1-0/+77
libgomp/ * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: New test.
2021-08-16Address '?:' issues in 'libgomp.oacc-c-c++-common/mode-transitions.c'Thomas Schwinge1-3/+3
[...]/libgomp.oacc-c-c++-common/mode-transitions.c: In function ‘t3’: [...]/libgomp.oacc-c-c++-common/mode-transitions.c:127:43: warning: ‘?:’ using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context] 127 | assert (arr[i] == ((i % 64) < 32) ? 1 : -1); | ^ [...]/libgomp.oacc-c-c++-common/mode-transitions.c: In function ‘t9’: [...]/libgomp.oacc-c-c++-common/mode-transitions.c:359:46: warning: ‘?:’ using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context] 359 | assert (arr[i] == ((i % 3) == 0) ? 1 : 2); | ^ ..., and PR101862 "[C, C++] Potential '?:' diagnostic for always-true expressions in boolean context". libgomp/ * testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: Address '?:' issues.
2021-08-13Adjust 'libgomp.oacc-c-c++-common/static-variable-1.c'Thomas Schwinge1-1/+4
... for 'gcc/gimplify.c:gimplify_scan_omp_clauses' changes in recent commit d0befed793b94f3f407be44e6f69f81a02f5f073 "openmp: Add support for OpenMP 5.1 masked construct". libgomp/ * testsuite/libgomp.oacc-c-c++-common/static-variable-1.c: Adjust.