aboutsummaryrefslogtreecommitdiff
path: root/libgomp/testsuite
AgeCommit message (Collapse)AuthorFilesLines
2023-12-13OpenMP/OpenACC: Rework clause expansion and nested struct handlingJulian Brown23-9/+4284
This patch reworks clause expansion in the C, C++ and (to a lesser extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in GPU offloading support. At present a single clause may be turned into several mapping nodes, or have its mapping type changed, in several places scattered through the front- and middle-end. The analysis relating to which particular transformations are needed for some given expression has become quite hard to follow. Briefly, we manipulate clause types in the following places: 1. During parsing, in c_omp_adjust_map_clauses. Depending on a set of rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into ATTACH_DETACH, or mark the decl addressable. 2. In semantics.cc or c-typeck.cc, clauses are expanded in handle_omp_array_sections (called via {c_}finish_omp_clauses, or in finish_omp_clauses itself. The two cases are for processing array sections (the former), or non-array sections (the latter). 3. In gimplify.cc, we build sibling lists for struct accesses, which groups and sorts accesses along with their struct base, creating new ALLOC/RELEASE nodes for pointers. 4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be adjusted or created. This patch doesn't completely disrupt this scheme, though clause types are no longer adjusted in c_omp_adjust_map_clauses (step 1). Clause expansion in step 2 (for C and C++) now uses a single, unified mechanism, parts of which are also reused for analysis in step 3. Rather than the kind-of "ad-hoc" pattern matching on addresses used to expand clauses used at present, a new method for analysing addresses is introduced. This does a recursive-descent tree walk on expression nodes, and emits a vector of tokens describing each "part" of the address. This tokenized address can then be translated directly into mapping nodes, with the assurance that no part of the expression has been inadvertently skipped or misinterpreted. In this way, all the variations of ways pointers, arrays, references and component accesses might be combined can be teased apart into easily-understood cases - and we know we've "parsed" the whole address before we start analysis, so the right code paths can easily be selected. For example, a simple access "arr[idx]" might parse as: base-decl access-indexed-array or "mystruct->foo[x]" with a pointer "foo" component might parse as: base-decl access-pointer component-selector access-pointer A key observation is that support for "array" bases, e.g. accesses whose root nodes are not structures, but describe scalars or arrays, and also *one-level deep* structure accesses, have first-class support in gimplify and beyond. Expressions that use deeper struct accesses or e.g. multiple indirections were more problematic: some cases worked, but lots of cases didn't. This patch reimplements the support for those in gimplify.cc, again using the new "address tokenization" support. An expression like "mystruct->foo->bar[0:10]" used in a mapping node will translate the right-hand access directly in the front-end. The base for the access will be "mystruct->foo". This is handled recursively in gimplify.cc -- there may be several accesses of "mystruct"'s members on the same directive, so the sibling-list building machinery can be used again. (This was already being done for OpenACC, but the new implementation differs somewhat in details, and is more robust.) For OpenMP, in the case where the base pointer itself, i.e. "mystruct->foo" here, is NOT mapped on the same directive, we create a "fragile" mapping. This turns the "foo" component access into a zero-length allocation (which is a new feature for the runtime, so support has been added there too). A couple of changes have been made to how mapping clauses are turned into mapping nodes: The first change is based on the observation that it is probably never correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for references), because if the containing struct is already mapped on the target then the host version of the pointer in question will be corrupted if the struct is copied back from the target. This patch removes all such uses, across each of C, C++ and Fortran. The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes are processed during sibling-list creation. For OpenMP, for pointer components, we must map the base pointer separately from an array section that uses the base pointer, so e.g. we must have both "map(mystruct.base)" and "map(mystruct.base[0:10])" mappings. These create nodes such as: GOMP_MAP_TOFROM mystruct.base G_M_TOFROM *mystruct.base [len: 10*elemsize] G_M_ATTACH_DETACH mystruct.base Instead of using the first of these directly when building the struct sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH, leading to: GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that drops the GOMP_MAP_TOFROM for the base pointer, marks the second group as having had a base-pointer mapping, then omp_build_struct_sibling_lists can create: GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize] This ends up working better in many cases, particularly those involving references. (The "alloc" space is immediately overwritten by a pointer attachment, so this is mildly more efficient than a redundant TO mapping at runtime also.) There is support in the address tokenizer for "arbitrary" base expressions which aren't rooted at a decl, but that is not used as present because such addresses are disallowed at parse time. In the front-ends, the address tokenization machinery is mostly only used for clause expansion and not for diagnostics at present. It could be used for those too, which would allow more of my previous "address inspector" implementation to be removed. The new bits in gimplify.cc work with OpenACC also. This version of the patch addresses several first-pass review comments from Tobias, and fixes a few previously-missed cases for manually-managed ragged array mappings (including cases using references). Some arbitrary differences between handling of clause expansion for C vs. C++ have also been fixed, and some fragments from later in the patch series have been moved forward (where they were useful for fixing bugs). Several new test cases have been added. 2023-11-29 Julian Brown <julian@codesourcery.com> gcc/c-family/ * c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA, C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET. (omp_addr_token): Add forward declaration. (c_omp_address_inspector): New class. * c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but do not change any mapping node types. (c_omp_address_inspector::unconverted_ref_origin, c_omp_address_inspector::component_access_p, c_omp_address_inspector::check_clause, c_omp_address_inspector::get_root_term, c_omp_address_inspector::map_supported_p, c_omp_address_inspector::get_origin, c_omp_address_inspector::maybe_unconvert_ref, c_omp_address_inspector::maybe_zero_length_array_section, c_omp_address_inspector::expand_array_base, c_omp_address_inspector::expand_component_selector, c_omp_address_inspector::expand_map_clause): New methods. (omp_expand_access_chain): New function. gcc/c/ * c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use to select region type for c_finish_omp_clauses call. (c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses. (c_parser_oacc_compute): Likewise. (c_parser_omp_target_data, c_parser_omp_target_enter_data): Support ATTACH kind. (c_parser_omp_target_exit_data): Support DETACH kind. (check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here. * c-typeck.cc (handle_omp_array_sections_1, handle_omp_array_sections, c_finish_omp_clauses): Use c_omp_address_inspector class and OMP address tokenizer to analyze and expand map clause expressions. Fix some diagnostics. Fix "is OpenACC" condition for C_ORT_ACC_TARGET addition. gcc/cp/ * parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use to select region type for finish_omp_clauses call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support GOMP_MAP_ATTACH kind. (cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind. (cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses. (cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses. (cp_parser_oacc_compute): Likewise. * pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to tsubst_omp_clauses for OpenACC compute regions. * semantics.cc (cp_omp_address_inspector): New class, derived from c_omp_address_inspector. (handle_omp_array_sections_1, handle_omp_array_sections, finish_omp_clauses): Use cp_omp_address_inspector class and OMP address tokenizer to analyze and expand OpenMP map clause expressions. Fix some diagnostics. Support C_ORT_ACC_TARGET. (finish_omp_target): Handle GOMP_MAP_POINTER. gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter. Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for derived type components. (gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section. gcc/ * gimplify.cc (build_struct_comp_nodes): Don't process GOMP_MAP_ATTACH_DETACH "middle" nodes here. (omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for nested struct handling. (omp_strip_components_and_deref, omp_strip_indirections): Remove functions. (omp_get_attachment): Handle GOMP_MAP_DETACH here. (omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH, GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer component array sections. (omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile fields. (omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT. (omp_index_mapping_groups_1): Skip reprocess_struct groups. (omp_get_nonfirstprivate_group, omp_directive_maps_explicitly, omp_resolve_clause_dependencies, omp_first_chained_access_token): New functions. (omp_check_mapping_compatibility): Adjust accepted node combinations for "from" clauses using release instead of alloc. (omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P, REPROCESSING_STRUCT, ADDED_TAIL parameters. Use OMP address tokenizer to analyze addresses. Reimplement nested struct handling, and implement "fragile groups". (omp_build_struct_sibling_lists): Adjust for changes to omp_accumulate_sibling_list. Recalculate bias for ATTACH_DETACH nodes after GOMP_MAP_STRUCT nodes. (gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies. Use OMP address tokenizer. (gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc instead of build_simple_mem_ref_loc. * omp-general.cc (omp-general.h, tree-pretty-print.h): Include. (omp_addr_tokenizer): New namespace. (omp_addr_tokenizer::omp_addr_token): New. (omp_addr_tokenizer::omp_parse_component_selector, omp_addr_tokenizer::omp_parse_ref, omp_addr_tokenizer::omp_parse_pointer, omp_addr_tokenizer::omp_parse_access_method, omp_addr_tokenizer::omp_parse_access_methods, omp_addr_tokenizer::omp_parse_structure_base, omp_addr_tokenizer::omp_parse_structured_expr, omp_addr_tokenizer::omp_parse_array_expr, omp_addr_tokenizer::omp_access_chain_p, omp_addr_tokenizer::omp_accessed_addr): New functions. (omp_parse_expr, debug_omp_tokenized_addr): New functions. * omp-general.h (omp_addr_tokenizer::access_method_kinds, omp_addr_tokenizer::structure_base_kinds, omp_addr_tokenizer::token_type, omp_addr_tokenizer::omp_addr_token, omp_addr_tokenizer::omp_access_chain_p, omp_addr_tokenizer::omp_accessed_addr): New. (omp_addr_token, omp_parse_expr): New. * omp-low.cc (scan_sharing_clauses): Skip error check for references to pointers. * tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro. gcc/testsuite/ * c-c++-common/gomp/clauses-2.c: Fix error output. * c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output. * c-c++-common/gomp/target-50.c: Adjust scan output. * c-c++-common/gomp/target-enter-data-1.c: Adjust scan output. * g++.dg/gomp/static-component-1.C: New test. * gcc.dg/gomp/target-3.c: Adjust scan output. * gfortran.dg/gomp/map-9.f90: Adjust scan output. libgomp/ * target.c (gomp_map_pointer): Modify zero-length array section pointer handling. (gomp_attach_pointer): Likewise. (gomp_map_fields_existing): Use gomp_map_0len_lookup. (gomp_attach_pointer): Allow attaching null pointers (or Fortran "unassociated" pointers). (gomp_map_vars_internal): Handle zero-sized struct members. Add diagnostic for unmapped struct pointer members. * testsuite/libgomp.c-c++-common/baseptrs-1.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-2.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-6.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-7.c: New test. * testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing "free". * testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test. * testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test. * testsuite/libgomp.c++/class-array-1.C: New test. * testsuite/libgomp.c++/baseptrs-3.C: New test. * testsuite/libgomp.c++/baseptrs-4.C: New test. * testsuite/libgomp.c++/baseptrs-5.C: New test. * testsuite/libgomp.c++/baseptrs-8.C: New test. * testsuite/libgomp.c++/baseptrs-9.C: New test. * testsuite/libgomp.c++/ref-mapping-1.C: New test. * testsuite/libgomp.c++/target-48.C: New test. * testsuite/libgomp.c++/target-49.C: New test. * testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test. * testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2 semantics. * testsuite/libgomp.c++/target-this-3.C: Likewise. * testsuite/libgomp.c++/target-this-4.C: Likewise. * testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL. * testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.
2023-12-13libgomp: basic pinned memory on LinuxAndrew Stubbs4-0/+541
Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. This implementation will work OK for page-scale allocations, and finer-grained allocations will be implemented in a future patch. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. (MEMSPACE_VALIDATE): Add PIN. (omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning. (omp_aligned_alloc): Add pinning to all MEMSPACE_* calls. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. (omp_free): Likewise. * config/linux/allocator.c: New file. * config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. (MEMSPACE_VALIDATE): Add PIN. * config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. * libgomp.texi: Switch pinned trait to supported. (MEMSPACE_VALIDATE): Add PIN. * testsuite/libgomp.c/alloc-pinned-1.c: New test. * testsuite/libgomp.c/alloc-pinned-2.c: New test. * testsuite/libgomp.c/alloc-pinned-3.c: New test. * testsuite/libgomp.c/alloc-pinned-4.c: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-12-11libgfortran: Replace mutex with rwlockLipeng Zhu3-0/+73
This patch try to introduce the rwlock and split the read/write to unit_root tree and unit_cache with rwlock instead of the mutex to increase CPU efficiency. In the get_gfc_unit function, the percentage to step into the insert_unit function is around 30%, in most instances, we can get the unit in the phase of reading the unit_cache or unit_root tree. So split the read/write phase by rwlock would be an approach to make it more parallel. BTW, the IPC metrics can gain around 9x in our test server with 220 cores. The benchmark we used is https://github.com/rwesson/NEAT libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro. (__gthrw): New function. (__gthread_rwlock_rdlock): New function. (__gthread_rwlock_tryrdlock): New function. (__gthread_rwlock_wrlock): New function. (__gthread_rwlock_trywrlock): New function. (__gthread_rwlock_unlock): New function. libgfortran/ChangeLog: * io/async.c (DEBUG_LINE): New macro. * io/async.h (RWLOCK_DEBUG_ADD): New macro. (CHECK_RDLOCK): New macro. (CHECK_WRLOCK): New macro. (TAIL_RWLOCK_DEBUG_QUEUE): New macro. (IN_RWLOCK_DEBUG_QUEUE): New macro. (RDLOCK): New macro. (WRLOCK): New macro. (RWUNLOCK): New macro. (RD_TO_WRLOCK): New macro. (INTERN_RDLOCK): New macro. (INTERN_WRLOCK): New macro. (INTERN_RWUNLOCK): New macro. * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in a comment. (unit_lock): Remove including associated internal_proto. (unit_rwlock): New declarations including associated internal_proto. (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock instead of __gthread_mutex_lock and __gthread_mutex_unlock on unit_lock. * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (st_write_done_worker): Likewise. * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules' comment. Use unit_rwlock variable instead of unit_lock variable. (get_gfc_unit_from_unit_root): New function. (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (close_units): Likewise. (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on unit_lock. * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock. (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead of LOCK and UNLOCK on unit_lock.
2023-12-11aarch64: enable mixed-types for aarch64 simdclonesAndre Vieira2-3/+13
This patch enables the use of mixed-types for simd clones for AArch64, adds aarch64 as a target_vect_simd_clones and corrects the way the simdlen is chosen for non-specified simdlen clauses according to the 'Vector Function Application Binary Interface Specification for AArch64'. Additionally this patch also restricts combinations of simdlen and return/argument types that map to vectors larger than 128 bits as we currently do not have a way to represent these types in a way that is consistent internally and externally. gcc/ChangeLog: * config/aarch64/aarch64.cc (lane_size): New function. (aarch64_simd_clone_compute_vecsize_and_simdlen): Determine simdlen according to NDS rule and reject combination of simdlen and types that lead to vectors larger than 128bits. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add aarch64 targets to vect_simd_clones. * c-c++-common/gomp/declare-variant-14.c: Adapt test for aarch64. * c-c++-common/gomp/pr60823-1.c: Likewise. * c-c++-common/gomp/pr60823-2.c: Likewise. * c-c++-common/gomp/pr60823-3.c: Likewise. * g++.dg/gomp/attrs-10.C: Likewise. * g++.dg/gomp/declare-simd-1.C: Likewise. * g++.dg/gomp/declare-simd-3.C: Likewise. * g++.dg/gomp/declare-simd-4.C: Likewise. * g++.dg/gomp/declare-simd-7.C: Likewise. * g++.dg/gomp/declare-simd-8.C: Likewise. * g++.dg/gomp/pr88182.C: Likewise. * gcc.dg/declare-simd.c: Likewise. * gcc.dg/gomp/declare-simd-1.c: Likewise. * gcc.dg/gomp/declare-simd-3.c: Likewise. * gcc.dg/gomp/pr87887-1.c: Likewise. * gcc.dg/gomp/pr87895-1.c: Likewise. * gcc.dg/gomp/pr89246-1.c: Likewise. * gcc.dg/gomp/pr99542.c: Likewise. * gcc.dg/gomp/simd-clones-2.c: Likewise. * gcc.dg/vect/vect-simd-clone-1.c: Likewise. * gcc.dg/vect/vect-simd-clone-2.c: Likewise. * gcc.dg/vect/vect-simd-clone-4.c: Likewise. * gcc.dg/vect/vect-simd-clone-5.c: Likewise. * gcc.dg/vect/vect-simd-clone-6.c: Likewise. * gcc.dg/vect/vect-simd-clone-7.c: Likewise. * gcc.dg/vect/vect-simd-clone-8.c: Likewise. * gfortran.dg/gomp/declare-simd-2.f90: Likewise. * gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise. * gfortran.dg/gomp/declare-variant-14.f90: Likewise. * gfortran.dg/gomp/pr79154-1.f90: Likewise. * gfortran.dg/gomp/pr83977.f90: Likewise. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-1.c: Adapt test for aarch64. * testsuite/libgomp.fortran/declare-simd-1.f90: Likewise.
2023-12-08OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatablesTobias Burnus5-0/+278
This commit adds -fopenmp-allocators which enables support for 'omp allocators' and 'omp allocate' that are associated with a Fortran allocate-stmt. If such a construct is encountered, an error is shown, unless the -fopenmp-allocators flag is present. With -fopenmp -fopenmp-allocators, those constructs get turned into GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp) ensures deallocation and reallocation (via intrinsic assignments) are properly directed to GOMP_free/omp_realloc - while normal Fortran allocations are processed by free/realloc. In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the version field of the Fortran array discriptor is (mis)used: 0 indicates the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars, there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the pointer address to a splay_tree while GOMP_is_alloc(ptr) will return true it was previously added but also removes it from the list. Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of omp-builtins.def and libgomp gains the mentioned two new function. gcc/ChangeLog: * builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New. * omp-builtins.def (BUILT_IN_GOMP_REALLOC): New. * builtins.cc (builtin_fnspec): Handle it. * gimple-ssa-warn-access.cc (fndecl_alloc_p, matching_alloc_calls_p): Likewise. * gimple.cc (nonfreeing_call_p): Likewise. * predict.cc (expr_expected_value_1): Likewise. * tree-ssa-ccp.cc (evaluate_stmt): Likewise. * tree.cc (fndecl_dealloc_argno): Likewise. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE and EXEC_OMP_ALLOCATORS. * f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST): Add 'ECF_LEAF | ECF_MALLOC' to existing 'ECF_NOTHROW'. (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define. * gfortran.h (gfc_omp_clauses): Add contained_in_target_construct. * invoke.texi (-fopenacc, -fopenmp): Update based on C version. (-fopenmp-simd): New, based on C version. (-fopenmp-allocators): New. * lang.opt (fopenmp-allocators): Add. * openmp.cc (resolve_omp_clauses): For allocators/allocate directive, add target and no dynamic_allocators diagnostic and more invalid diagnostic. * parse.cc (decode_omp_directive): Set contains_teams_construct. * trans-array.h (gfc_array_allocate): Update prototype. (gfc_conv_descriptor_version): New prototype. * trans-decl.cc (gfc_init_default_dt): Fix comment. * trans-array.cc (gfc_conv_descriptor_version): New. (gfc_array_allocate): Support GOMP_alloc allocation. (gfc_alloc_allocatable_for_assignment, structure_alloc_comps): Handle GOMP_free/omp_realloc as needed. * trans-expr.cc (gfc_conv_procedure_call): Likewise. (alloc_scalar_allocatable_for_assignment): Likewise. * trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise. * trans-openmp.cc (gfc_trans_omp_allocators, gfc_trans_omp_directive): Handle allocators/allocate directive. (gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New. * trans-stmt.h (gfc_trans_allocate): Update prototype. * trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc. * trans-types.cc (gfc_get_dtype_rank_type): Set version field. * trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable): Update to handle GOMP_alloc. (gfc_deallocate_with_status, gfc_deallocate_scalar_with_status): Handle GOMP_free. (trans_code): Update call. * trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc): Update prototype. (gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype. * types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New. libgomp/ChangeLog: * allocator.c (struct fort_alloc_splay_tree_key_s, fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New. * libgomp.h: Define splay_tree_static for 'reverse' splay tree. * libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ... (GOMP_5.1.1): ... here. * libgomp.texi (Impl. Status, Memory management): Update for allocators/allocate directives. * splay-tree.c: Handle splay_tree_static define to declare all functions as static. (splay_tree_lookup_node): New. * splay-tree.h: Handle splay_tree_decl_only define. (splay_tree_lookup_node): New prototype. * target.c: Define splay_tree_static for 'reverse'. * testsuite/libgomp.fortran/allocators-1.f90: New test. * testsuite/libgomp.fortran/allocators-2.f90: New test. * testsuite/libgomp.fortran/allocators-3.f90: New test. * testsuite/libgomp.fortran/allocators-4.f90: New test. * testsuite/libgomp.fortran/allocators-5.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-14.f90: Add coarray and not-listed tests. * gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message. * gfortran.dg/bind_c_array_params_2.f90: Update expected dump for dtype '.version=0'. * gfortran.dg/gomp/allocate-16.f90: New test. * gfortran.dg/gomp/allocators-3.f90: New test. * gfortran.dg/gomp/allocators-4.f90: New test.
2023-12-06amdgcn, libgomp: low-latency allocatorAndrew Stubbs1-1/+1
This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing (which supports both memories). A future patch will re-enable "global" instructions for cases where it is known to be safe to do so. gcc/ChangeLog: * config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in. * config/gcn/gcn.cc (gcn_init_machine_status): Disable global addressing. (gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR. libgomp/ChangeLog: * config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here. (TEAM_ARENA_FREE): Likewise. (TEAM_ARENA_END): Likewise. (GCN_LOWLAT_HEAP): New. * config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h. (__gcn_lowlat_init): New prototype. (gomp_gcn_enter_kernel): Initialize the low-latency heap. * libgomp.h (TEAM_ARENA_START): Move to libgomp.h. (TEAM_ARENA_FREE): Likewise. (TEAM_ARENA_END): Likewise. * plugin/plugin-gcn.c (lowlat_size): New variable. (print_kernel_dispatch): Label the group_segment_size purpose. (init_environment_variables): Read GOMP_GCN_LOWLAT_POOL. (create_kernel_dispatch): Pass low-latency head allocation to kernel. (run_kernel): Use shadow; don't assume values. * testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn. * config/gcn/allocator.c: New file. * libgomp.texi: Document low-latency implementation details.
2023-12-06openmp, nvptx: low-lat memory access traitsAndrew Stubbs7-6/+107
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator no longer works (but omp_cgroup_mem_alloc still does). libgomp/ChangeLog: * allocator.c (MEMSPACE_VALIDATE): New macro. (omp_init_allocator): Use MEMSPACE_VALIDATE. (omp_aligned_alloc): Use OMP_LOW_LAT_MEM_ALLOC_INVALID. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * config/nvptx/allocator.c (nvptx_memspace_validate): New function. (MEMSPACE_VALIDATE): New macro. (OMP_LOW_LAT_MEM_ALLOC_INVALID): New define. * libgomp.texi: Document low-latency implementation details. * testsuite/libgomp.c/omp_alloc-1.c (main): Add gnu_lowlat. * testsuite/libgomp.c/omp_alloc-2.c (main): Add gnu_lowlat. * testsuite/libgomp.c/omp_alloc-3.c (main): Add gnu_lowlat. * testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait. * testsuite/libgomp.c/omp_alloc-5.c (main): Add gnu_lowlat. * testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait. * testsuite/libgomp.c/omp_alloc-traits.c: New test.
2023-12-06libgomp, nvptx: low-latency memory allocatorAndrew Stubbs6-0/+544
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using the GOMP_NVPTX_LOWLAT_POOL environment variable. The use of the PTX dynamic_smem_size feature means that low-latency allocator will not work with the PTX 3.1 multilib. For now, the omp_low_lat_mem_alloc allocator also works, but that will change when I implement the access traits. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): New macro. (MEMSPACE_CALLOC): New macro. (MEMSPACE_REALLOC): New macro. (MEMSPACE_FREE): New macro. (predefined_alloc_mapping): New array. Add _Static_assert to match. (ARRAY_SIZE): New macro. (omp_aligned_alloc): Use MEMSPACE_ALLOC. Implement fall-backs for predefined allocators. Simplify existing fall-backs. (omp_free): Use MEMSPACE_FREE. (omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for predefined allocators. Simplify existing fall-backs. (omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE. Implement fall-backs for predefined allocators. Simplify existing fall-backs. * config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable. (__nvptx_lowlat_init): New prototype. (gomp_nvptx_main): Call __nvptx_lowlat_init. * libgomp.texi: Update memory space table. * plugin/plugin-nvptx.c (lowlat_pool_size): New variable. (GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar. (GOMP_OFFLOAD_run): Apply lowlat_pool_size. * basic-allocator.c: New file. * config/nvptx/allocator.c: New file. * testsuite/libgomp.c/omp_alloc-1.c: New test. * testsuite/libgomp.c/omp_alloc-2.c: New test. * testsuite/libgomp.c/omp_alloc-3.c: New test. * testsuite/libgomp.c/omp_alloc-4.c: New test. * testsuite/libgomp.c/omp_alloc-5.c: New test. * testsuite/libgomp.c/omp_alloc-6.c: New test. Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-11-30Fix 'libgomp.c/declare-variant-4-*.c', add 'libgomp.c/declare-variant-4.c'Thomas Schwinge8-12/+31
These test cases being 'dg-skip-if [...] { ! amdgcn-*-* }' meant they were only ever considered for 'amdgcn-*-*' target -- that is, GCN *target*, not GCN *offload target*, what is intended to be tested here, but for which they thus always were UNSUPPORTED. Use the same style of 'dg-[...]' directives as for the nvptx offloading test cases, 'libgomp.c/declare-variant-3*.c'. Fix-up for commit 1fd508744eccda9ad9c6d6fcce5b2ea9c568818d "amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors". libgomp/ * testsuite/libgomp.c/declare-variant-4-fiji.c: Adjust. * testsuite/libgomp.c/declare-variant-4-gfx803.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx900.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx906.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx908.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx90a.c: Likewise. * testsuite/libgomp.c/declare-variant-4.h: Likewise. * testsuite/libgomp.c/declare-variant-4.c: New.
2023-11-30Spin 'dg-do run' part of 'libgomp.c/declare-variant-3-sm30.c' off into new ↵Thomas Schwinge3-1/+14
'libgomp.c/declare-variant-3.c' Having nvptx offloading configured doesn't imply being able to run nvptx offloading test cases on the test host. Also, make 'libgomp.c/declare-variant-3.c' work for all non-offloading and offloading cases. Fix-up for commit 59b8ade88774b4dcf1691a8f650cdbb86cc30862 "[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c". libgomp/ * testsuite/libgomp.c/declare-variant-3-sm30.c: Turn 'dg-do run' into 'dg-do link'. * testsuite/libgomp.c/declare-variant-3.c: New. * testsuite/libgomp.c/declare-variant-3.h: Extend.
2023-11-30In 'libgomp.c/declare-variant-{3,4}-*.c', restrict 'scan-offload-tree-dump's ↵Thomas Schwinge12-12/+12
to 'only_for_offload_target [...]' ... to care for the case where not just one but both of GCN and nvptx offloading are enabled. In that case, we currently get: UNRESOLVED: libgomp.c/declare-variant-3-sm30.c scan-amdgcn-amdhsa-offload-tree-dump optimized "= f30 \\(\\);" ... in addition to: PASS: libgomp.c/declare-variant-3-sm30.c scan-nvptx-none-offload-tree-dump optimized "= f30 \\(\\);" Etc. Fix-up for commit 59b8ade88774b4dcf1691a8f650cdbb86cc30862 "[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c", and commit 1fd508744eccda9ad9c6d6fcce5b2ea9c568818d "amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors". libgomp/ * testsuite/libgomp.c/declare-variant-3-sm30.c: Restrict 'scan-offload-tree-dump' to 'only_for_offload_target nvptx-none'. * testsuite/libgomp.c/declare-variant-3-sm35.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm53.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm70.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm75.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm80.c: Likewise. * testsuite/libgomp.c/declare-variant-4-fiji.c: Restrict 'scan-offload-tree-dump' to 'only_for_offload_target amdgcn-amdhsa'. * testsuite/libgomp.c/declare-variant-4-gfx803.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx900.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx906.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx908.c: Likewise. * testsuite/libgomp.c/declare-variant-4-gfx90a.c: Likewise.
2023-11-30Fix 'libgomp.c/declare-variant-3-*.c' compilation for configurations where ↵Thomas Schwinge6-0/+6
GCN offloading is enabled in addition to nvptx The GCN offloading compiler doesn't like '-misa=sm_30' etc.; restrict to '-foffload=nvptx-none' compilation only. Fix-up for commit 59b8ade88774b4dcf1691a8f650cdbb86cc30862 "[libgomp, testsuite, nvptx] Add libgomp.c/declare-variant-3-sm*.c". libgomp/ * testsuite/libgomp.c/declare-variant-3-sm30.c: 'dg-additional-options -foffload=nvptx-none'. * testsuite/libgomp.c/declare-variant-3-sm35.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm53.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm70.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm75.c: Likewise. * testsuite/libgomp.c/declare-variant-3-sm80.c: Likewise.
2023-11-29In 'libgomp.c/target-simd-clone-{1,2,3}.c', restrict ↵Thomas Schwinge3-5/+5
'scan-offload-ipa-dump's to 'only_for_offload_target amdgcn-amdhsa' This gets rid of UNRESOLVEDs if nvptx offloading compilation is enabled in addition to GCN: PASS: libgomp.c/target-simd-clone-1.c (test for excess errors) PASS: libgomp.c/target-simd-clone-1.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "Generated local clone _ZGV.*N.*_addit" -UNRESOLVED: libgomp.c/target-simd-clone-1.c scan-nvptx-none-offload-ipa-dump simdclone "Generated local clone _ZGV.*N.*_addit" PASS: libgomp.c/target-simd-clone-1.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "Generated local clone _ZGV.*M.*_addit" -UNRESOLVED: libgomp.c/target-simd-clone-1.c scan-nvptx-none-offload-ipa-dump simdclone "Generated local clone _ZGV.*M.*_addit" PASS: libgomp.c/target-simd-clone-2.c (test for excess errors) PASS: libgomp.c/target-simd-clone-2.c scan-amdgcn-amdhsa-offload-ipa-dump-not simdclone "Generated .* clone" -UNRESOLVED: libgomp.c/target-simd-clone-2.c scan-nvptx-none-offload-ipa-dump-not simdclone "Generated .* clone" PASS: libgomp.c/target-simd-clone-3.c (test for excess errors) PASS: libgomp.c/target-simd-clone-3.c scan-amdgcn-amdhsa-offload-ipa-dump simdclone "device doesn't match" -UNRESOLVED: libgomp.c/target-simd-clone-3.c scan-nvptx-none-offload-ipa-dump simdclone "device doesn't match" PASS: libgomp.c/target-simd-clone-3.c scan-amdgcn-amdhsa-offload-ipa-dump-not simdclone "Generated .* clone" -UNRESOLVED: libgomp.c/target-simd-clone-3.c scan-nvptx-none-offload-ipa-dump-not simdclone "Generated .* clone" Minor fix-up for commit 309e2d95e3b930c6f15c8a5346b913158404c76d 'OpenMP: Generate SIMD clones for functions with "declare target"'. libgomp/ * testsuite/libgomp.c/target-simd-clone-1.c: Restrict 'scan-offload-ipa-dump's to 'only_for_offload_target amdgcn-amdhsa'. * testsuite/libgomp.c/target-simd-clone-2.c: Likewise. * testsuite/libgomp.c/target-simd-clone-3.c: Likewise.
2023-11-22Adjust 'libgomp.c/declare-variant-{3,4}-[...]' for inter-procedural value ↵Thomas Schwinge2-0/+15
range propagation ..., that is, commit 53ba8d669550d3a1f809048428b97ca607f95cf5 "inter-procedural value range propagation", after which we see: [-PASS:-]{+FAIL:+} libgomp.c/declare-variant-3-sm30.c scan-nvptx-none-offload-tree-dump optimized "= f30 \\(\\);" Etc. That's due to: @@ -144,13 +144,11 @@ __attribute__((omp target entrypoint, noclone)) void main._omp_fn.0 (const struct .omp_data_t.3 & restrict .omp_data_i) { - int _3; int * _5; <bb 2> [local count: 1073741824]: - _3 = f30 (); _5 = *.omp_data_i_4(D).v; - *_5 = _3; + *_5 = 30; return; It's nice to see this optimization work here, too, but it does interfere with how we're currently testing OpenMP 'declare variant'. libgomp/ * testsuite/libgomp.c/declare-variant-3.h (f30, f35, f53, f70) (f75, f80, f): Add '__attribute__ ((noipa))'. * testsuite/libgomp.c/declare-variant-4.h (gfx803, gfx900, gfx906) (gfx908, gfx90a, f): Likewise.
2023-11-07openmp: Add support for the 'indirect' clause in C/C++Kwok Cheung Yeung3-0/+77
This adds support for the 'indirect' clause in the 'declare target' directive. Functions declared as indirect may be called via function pointers passed from the host in offloaded code. Virtual calls to member functions via the object pointer in C++ are currently not supported in target regions. 2023-11-07 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/c-family/ * c-attribs.cc (c_common_attribute_table): Add attribute for indirect functions. * c-pragma.h (enum parma_omp_clause): Add entry for indirect clause. gcc/c/ * c-decl.cc (c_decl_attributes): Add attribute for indirect functions. * c-lang.h (c_omp_declare_target_attr): Add indirect field. * c-parser.cc (c_parser_omp_clause_name): Handle indirect clause. (c_parser_omp_clause_indirect): New. (c_parser_omp_all_clauses): Handle indirect clause. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_begin): Handle indirect clause. * c-typeck.cc (c_finish_omp_clauses): Handle indirect clause. gcc/cp/ * cp-tree.h (cp_omp_declare_target_attr): Add indirect field. * decl2.cc (cplus_decl_attributes): Add attribute for indirect functions. * parser.cc (cp_parser_omp_clause_name): Handle indirect clause. (cp_parser_omp_clause_indirect): New. (cp_parser_omp_all_clauses): Handle indirect clause. (handle_omp_declare_target_clause): Add extra parameter. Add indirect attribute for indirect functions. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_begin): Handle indirect clause. * semantics.cc (finish_omp_clauses): Handle indirect clause. gcc/ * lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect functions. (output_offload_tables): Write indirect functions. (input_offload_tables): read indirect functions. * lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. * omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New. * omp-offload.cc (offload_ind_funcs): New. (omp_discover_implicit_declare_target): Add functions marked with 'omp declare target indirect' to indirect functions list. (omp_finish_file): Add indirect functions to section for offload indirect functions. (execute_omp_device_lower): Redirect indirect calls on target by passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR. (pass_omp_device_lower::gate): Run pass_omp_device_lower if indirect functions are present on an accelerator device. * omp-offload.h (offload_ind_funcs): New. * tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT. * tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_INDIRECT_EXPR): New. * config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs section. Count number of indirect functions. (process_obj): Emit number of indirect functions. * config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New. (process): Emit offload_ind_func_table in PTX code. Emit indirect function names and count in image. * config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark indirect functions in PTX code with IND_FUNC_MAP. gcc/testsuite/ * c-c++-common/gomp/declare-target-7.c: Update expected error message. * c-c++-common/gomp/declare-target-indirect-1.c: New. * c-c++-common/gomp/declare-target-indirect-2.c: New. * g++.dg/gomp/attrs-21.C (v12): Update expected error message. * g++.dg/gomp/declare-target-indirect-1.C: New. * gcc.dg/gomp/attrs-21.c (v12): Update expected error message. include/ * gomp-constants.h (GOMP_VERSION): Increment to 3. (GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New. libgcc/ * offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. (__offload_ind_func_table): New. (__offload_ind_funcs_end): New. (__OFFLOAD_TABLE__): Add entries for indirect functions. libgomp/ * Makefile.am (libgomp_la_SOURCES): Add target-indirect.c. * Makefile.in: Regenerate. * libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define. (GOMP_OFFLOAD_load_image): Add extra argument. * libgomp.h (struct indirect_splay_tree_key_s): New. (indirect_splay_tree_node, indirect_splay_tree, indirect_splay_tree_key): New. (indirect_splay_compare): New. * libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr. * libgomp.texi (OpenMP 5.1): Update documentation on indirect calls in target region and on indirect clause. (Other new OpenMP 5.2 features): Add entry for virtual function calls. * libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype. * oacc-host.c (host_load_image): Add extra argument. * target.c (gomp_load_image_to_device): If the GOMP_VERSION is high enough, read host indirect functions table and pass to load_image_func. * config/accel/target-indirect.c: New. * config/linux/target-indirect.c: New. * config/gcn/team.c (build_indirect_map): Add prototype. (gomp_gcn_enter_kernel): Initialize support for indirect function calls on GCN target. * config/nvptx/team.c (build_indirect_map): Add prototype. (gomp_nvptx_main): Initialize support for indirect function calls on NVPTX target. * plugin/plugin-gcn.c (struct gcn_image_desc): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, build address translation table and copy it to target memory. * plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, Build address translation table and copy it to target memory. * testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New. * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New. * testsuite/libgomp.c++/declare-target-indirect-1.C: New.
2023-10-31Add OpenACC 'acc_map_data' variant to 'libgomp.oacc-c-c++-common/deep-copy-8.c'Thomas Schwinge1-2/+27
libgomp/ * testsuite/libgomp.oacc-c-c++-common/deep-copy-8.c: Add OpenACC 'acc_map_data' variant.
2023-10-25Handle OpenACC 'self' clause for compute constructs in OpenACC 'kernels' ↵Thomas Schwinge1-8/+7
decomposition ... to fix up recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a "OpenACC 2.7: Implement self clause for compute constructs" for that case. gcc/ * omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1): Handle 'OMP_CLAUSE_SELF' like 'OMP_CLAUSE_IF'. * omp-expand.cc (expand_omp_target): Handle 'OMP_CLAUSE_SELF' for 'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'. gcc/testsuite/ * c-c++-common/goacc/self-clause-2.c: Verify '--param=openacc-kernels=decompose'. * gfortran.dg/goacc/kernels-tree.f95: Adjust. libgomp/ * oacc-parallel.c (GOACC_data_start): Handle 'GOACC_FLAG_LOCAL_DEVICE'. (GOACC_parallel_keyed): Simplify accordingly. * testsuite/libgomp.oacc-fortran/self-1.f90: Adjust.
2023-10-25Extend test suite coverage for OpenACC 'self' clause for compute constructsThomas Schwinge5-0/+1045
... on top of what was provided in recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a "OpenACC 2.7: Implement self clause for compute constructs". gcc/testsuite/ * c-c++-common/goacc/if-clause-2.c: Enhance. * c-c++-common/goacc/self-clause-1.c: Likewise. * c-c++-common/goacc/self-clause-2.c: Likewise. * gfortran.dg/goacc/if.f95: Likewise. * gfortran.dg/goacc/kernels-tree.f95: Likewise. * gfortran.dg/goacc/parallel-tree.f95: Likewise. * gfortran.dg/goacc/self.f95: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/if-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/self-1.c: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/if-self-1.c: New. * testsuite/libgomp.oacc-fortran/self-1.f90: Likewise.
2023-10-25OpenACC 2.7: Implement self clause for compute constructsChung-Lin Tang1-0/+962
This patch implements the 'self' clause for compute constructs: parallel, kernels, and serial. This clause conditionally uses the local device (the host mult-core CPU) as the executing device of the compute region. The actual implementation of the "local device" device type inside libgomp (presumably using pthreads) is still not yet completed, so the libgomp side is still implemented the exact same as host-fallback mode. (so as of now, it essentially behaves like the 'if' clause with the condition inverted) gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_compute_clause_self): New function. (c_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_compute_clause_self): New function. (cp_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case. * semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged with OMP_CLAUSE_IF case. gcc/fortran/ChangeLog: * gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field. * openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF. (gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF. (OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF. (OACC_KERNELS_CLAUSES): Likewise. (OACC_SERIAL_CLAUSES): Likewise. (resolve_omp_clauses): Add handling for omp_clauses->self_expr. * trans-openmp.cc (gfc_trans_omp_clauses): Add handling of clauses->self_expr and building of OMP_CLAUSE_SELF tree clause. (gfc_split_omp_clauses): Add handling of self_expr field copy. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case. (gimplify_adjust_omp_clauses): Likewise. * omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code, * omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum. * tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF case. (convert_local_omp_clauses): Likewise. * tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_SELF_EXPR): New macro. gcc/testsuite/ChangeLog: * c-c++-common/goacc/self-clause-1.c: New test. * c-c++-common/goacc/self-clause-2.c: New test. * gfortran.dg/goacc/self.f95: New test. include/ChangeLog: * gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value. libgomp/ChangeLog: * oacc-parallel.c (GOACC_parallel_keyed): Add code to handle GOACC_FLAG_LOCAL_DEVICE case. * testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.
2023-10-14libgomp.fortran/allocate-6.f90: Run with -fdump-tree-gimpleTobias Burnus1-2/+3
libgomp/ * testsuite/libgomp.fortran/allocate-6.f90: Add missing dg-additional-options "-fdump-tree-gimple"; fix scan.
2023-10-14Fortran: Support OpenMP's 'allocate' directive for stack varsTobias Burnus4-0/+651
gcc/fortran/ChangeLog: * gfortran.h (ext_attr_t): Add omp_allocate flag. * match.cc (gfc_free_omp_namelist): Void deleting same u2.allocator multiple times now that a sequence can use the same one. * openmp.cc (gfc_match_omp_clauses, gfc_match_omp_allocate): Use same allocator expr multiple times. (is_predefined_allocator): Make static. (gfc_resolve_omp_allocate): Update/extend restriction checks; remove sorry message. (resolve_omp_clauses): Reject corarrays in allocate/allocators directive. * parse.cc (check_omp_allocate_stmt): Permit procedure pointers here (rejected later) for less misleading diagnostic. * trans-array.cc (gfc_trans_auto_array_allocation): Propagate size for GOMP_alloc and location to which it should be added to. * trans-decl.cc (gfc_trans_deferred_vars): Handle 'omp allocate' for stack variables; sorry for static variables/common blocks. * trans-openmp.cc (gfc_trans_omp_clauses): Evaluate 'allocate' clause's allocator only once; fix adding expressions to the block. (gfc_trans_omp_single): Pass a block to gfc_trans_omp_clauses. gcc/ChangeLog: * gimplify.cc (gimplify_bind_expr): Handle Fortran's 'omp allocate' for stack variables. libgomp/ChangeLog: * libgomp.texi (OpenMP Impl. Status): Mention that Fortran now supports the allocate directive for stack variables. * testsuite/libgomp.fortran/allocate-5.f90: New test. * testsuite/libgomp.fortran/allocate-6.f90: New test. * testsuite/libgomp.fortran/allocate-7.f90: New test. * testsuite/libgomp.fortran/allocate-8.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/allocate-14.c: Fix directive name. * c-c++-common/gomp/allocate-15.c: Likewise. * c-c++-common/gomp/allocate-9.c: Fix comment typo. * gfortran.dg/gomp/allocate-4.f90: Remove sorry dg-error. * gfortran.dg/gomp/allocate-7.f90: Likewise. * gfortran.dg/gomp/allocate-10.f90: New test. * gfortran.dg/gomp/allocate-11.f90: New test. * gfortran.dg/gomp/allocate-12.f90: New test. * gfortran.dg/gomp/allocate-13.f90: New test. * gfortran.dg/gomp/allocate-14.f90: New test. * gfortran.dg/gomp/allocate-15.f90: New test. * gfortran.dg/gomp/allocate-8.f90: New test. * gfortran.dg/gomp/allocate-9.f90: New test.
2023-10-08Fortran/OpenMP: Fix handling of strictly structured blocksTobias Burnus1-0/+22
For strictly structured blocks, a BLOCK was created but the code was placed after the block the outer structured block. Additionally, labelled blocks were mishandled. As the code is now properly in a BLOCK, it solves additional issues. gcc/fortran/ChangeLog: * parse.cc (parse_omp_structured_block): Make the user code end up inside of BLOCK construct for strictly structured blocks; fix fallout for 'section' and 'teams'. * openmp.cc (resolve_omp_target): Fix changed BLOCK handling for teams in target checking. libgomp/ChangeLog: * testsuite/libgomp.fortran/strictly-structured-block-1.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/block_17.f90: New test. * gfortran.dg/gomp/strictly-structured-block-5.f90: New test.
2023-09-20OpenMP: Add ME support for 'omp allocate' stack variablesTobias Burnus3-0/+529
Call GOMP_alloc/free for 'omp allocate' allocated variables. This is for C only as C++ and Fortran show a sorry already in the FE. Note that this only applies to stack variables as the C FE shows a sorry for static variables. gcc/ChangeLog: * gimplify.cc (gimplify_bind_expr): Call GOMP_alloc/free for 'omp allocate' variables; move stack cleanup after other cleanup. (omp_notice_variable): Process original decl when decl of the value-expression for a 'omp allocate' variable is passed. * omp-low.cc (scan_omp_1_op): Handle 'omp allocate' variables libgomp/ChangeLog: * libgomp.texi (OpenMP 5.1 Impl.): Mark 'omp allocate' as implemented for C only. * testsuite/libgomp.c/allocate-4.c: New test. * testsuite/libgomp.c/allocate-5.c: New test. * testsuite/libgomp.c/allocate-6.c: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/allocate-11.c: Remove C-only dg-message for 'sorry, unimplemented'. * c-c++-common/gomp/allocate-12.c: Likewise. * c-c++-common/gomp/allocate-15.c: Likewise. * c-c++-common/gomp/allocate-9.c: Likewise. * c-c++-common/gomp/allocate-10.c: New test. * c-c++-common/gomp/allocate-17.c: New test.
2023-09-12libgomp: Consider '--with-build-sysroot=[...]' for target libraries' ↵Thomas Schwinge5-17/+19
build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951] This is commit c8e759b4215ba4b376c9d468aeffe163b3d520f0 (Subversion r279708) "libgomp/test: Fix compilation for build sysroot" and follow-up commit 749bd22ddc50b5112e5ed506ffef7249bf8e6fb3 "libgomp/test: Remove a build sysroot fix regression" done differently, avoiding build-tree testing use of any random gunk that may appear in build-time 'CC', 'CXX', 'FC'. PR testsuite/91884 PR testsuite/109951 libgomp/ * configure.ac: Revert earlier changes, instead 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise. * testsuite/lib/libgomp.exp (libgomp_init): Remove "Fix up '-funconfigured-libstdc++-v3' in 'GXX_UNDER_TEST'" code. If '--with-build-sysroot=[...]' was specified, use it for build-tree testing. * testsuite/libgomp-site-extra.exp.in (GCC_UNDER_TEST) (GXX_UNDER_TEST, GFORTRAN_UNDER_TEST): Don't set. (SYSROOT_CFLAGS_FOR_TARGET): Set. * testsuite/libgomp.c++/c++.exp (lang_source_re) (lang_include_flags): Set for build-tree testing. * testsuite/libgomp.oacc-c++/c++.exp (lang_source_re) (lang_include_flags): Likewise. Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
2023-09-04Add 'libgomp.c-c++-common/pr100059-1.c'Tobias Burnus1-0/+55
For nvptx offloading, it'll FAIL its execution test until nvptx-tools updated to include commit 1b5946d78ef5dcfb640e9f545a7c791b7f623911 "Merge commit '26095fd01232061de9f79decb3e8222ef7b46191' into HEAD [#29]", <https://github.com/MentorEmbedded/nvptx-tools/commit/1b5946d78ef5dcfb640e9f545a7c791b7f623911>. libgomp/ * testsuite/libgomp.c-c++-common/pr100059-1.c: New. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2023-08-25OpenMP: Fortran support for imperfectly-nested loopsSandra Loosemore9-0/+966
OpenMP 5.0 removed the restriction that multiple collapsed loops must be perfectly nested, allowing "intervening code" (including nested BLOCKs) before or after each nested loop. In GCC this code is moved into the inner loop body by the respective front ends. In the Fortran front end, most of the semantic processing happens during the translation phase, so the parse phase just collects the intervening statements, checks them for errors, and splices them around the loop body. gcc/fortran/ChangeLog * gfortran.h (struct gfc_namespace): Add omp_structured_block bit. * openmp.cc: Include omp-api.h. (resolve_omp_clauses): Consolidate inscan reduction clause conflict checking here. (find_nested_loop_in_chain): New. (find_nested_loop_in_block): New. (gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly. Handle imperfectly-nested loops when looking for nested omp scan. Refactor to move inscan reduction clause conflict checking to resolve_omp_clauses. (gfc_resolve_do_iterator): Handle imperfectly-nested loops. (struct icode_error_state): New. (icode_code_error_callback): New. (icode_expr_error_callback): New. (diagnose_intervening_code_errors_1): New. (diagnose_intervening_code_errors): New. (make_structured_block): New. (restructure_intervening_code): New. (is_outer_iteration_variable): Do not assume loops are perfectly nested. (check_nested_loop_in_chain): New. (check_nested_loop_in_block_state): New. (check_nested_loop_in_block_symbol): New. (check_nested_loop_in_block): New. (expr_uses_intervening_var): New. (is_intervening_var): New. (expr_is_invariant): Do not assume loops are perfectly nested. (resolve_omp_do): Handle imperfectly-nested loops. * trans-stmt.cc (gfc_trans_block_construct): Generate OMP_STRUCTURED_BLOCK if magic bit is set on block namespace. gcc/testsuite/ChangeLog * gfortran.dg/gomp/collapse1.f90: Adjust expected errors. * gfortran.dg/gomp/collapse2.f90: Likewise. * gfortran.dg/gomp/imperfect-gotos.f90: New. * gfortran.dg/gomp/imperfect-invalid-scope.f90: New. * gfortran.dg/gomp/imperfect1.f90: New. * gfortran.dg/gomp/imperfect2.f90: New. * gfortran.dg/gomp/imperfect3.f90: New. * gfortran.dg/gomp/imperfect4.f90: New. * gfortran.dg/gomp/imperfect5.f90: New. libgomp/ChangeLog * testsuite/libgomp.fortran/imperfect-destructor.f90: New. * testsuite/libgomp.fortran/imperfect1.f90: New. * testsuite/libgomp.fortran/imperfect2.f90: New. * testsuite/libgomp.fortran/imperfect3.f90: New. * testsuite/libgomp.fortran/imperfect4.f90: New. * testsuite/libgomp.fortran/target-imperfect1.f90: New. * testsuite/libgomp.fortran/target-imperfect2.f90: New. * testsuite/libgomp.fortran/target-imperfect3.f90: New. * testsuite/libgomp.fortran/target-imperfect4.f90: New.
2023-08-25OpenMP: New C/C++ testcases for imperfectly nested loops.Sandra Loosemore10-0/+1040
gcc/testsuite/ChangeLog * c-c++-common/gomp/imperfect-attributes.c: New. * c-c++-common/gomp/imperfect-badloops.c: New. * c-c++-common/gomp/imperfect-blocks.c: New. * c-c++-common/gomp/imperfect-extension.c: New. * c-c++-common/gomp/imperfect-gotos.c: New. * c-c++-common/gomp/imperfect-invalid-scope.c: New. * c-c++-common/gomp/imperfect-labels.c: New. * c-c++-common/gomp/imperfect-legacy-syntax.c: New. * c-c++-common/gomp/imperfect-pragmas.c: New. * c-c++-common/gomp/imperfect1.c: New. * c-c++-common/gomp/imperfect2.c: New. * c-c++-common/gomp/imperfect3.c: New. * c-c++-common/gomp/imperfect4.c: New. * c-c++-common/gomp/imperfect5.c: New. libgomp/ChangeLog * testsuite/libgomp.c-c++-common/imperfect1.c: New. * testsuite/libgomp.c-c++-common/imperfect2.c: New. * testsuite/libgomp.c-c++-common/imperfect3.c: New. * testsuite/libgomp.c-c++-common/imperfect4.c: New. * testsuite/libgomp.c-c++-common/imperfect5.c: New. * testsuite/libgomp.c-c++-common/imperfect6.c: New. * testsuite/libgomp.c-c++-common/target-imperfect1.c: New. * testsuite/libgomp.c-c++-common/target-imperfect2.c: New. * testsuite/libgomp.c-c++-common/target-imperfect3.c: New. * testsuite/libgomp.c-c++-common/target-imperfect4.c: New.
2023-08-25OpenMP: C++ support for imperfectly-nested loopsSandra Loosemore13-0/+1740
OpenMP 5.0 removed the restriction that multiple collapsed loops must be perfectly nested, allowing "intervening code" (including nested BLOCKs) before or after each nested loop. In GCC this code is moved into the inner loop body by the respective front ends. This patch changes the C++ front end to use recursive descent parsing on nested loops within an "omp for" construct, rather than an iterative approach, in order to preserve proper nesting of compound statements. Preserving cleanups (destructors) for class objects declared in intervening code and loop initializers complicates moving the former into the body of the loop; this is handled by parsing the entire construct before reassembling any of it. gcc/cp/ChangeLog * cp-tree.h (cp_convert_omp_range_for): Adjust declaration. * parser.cc (struct omp_for_parse_data): New. (cp_parser_postfix_expression): Diagnose calls to OpenMP runtime in intervening code. (check_omp_intervening_code): New. (cp_parser_statement_seq_opt): Special-case nested loops, blocks, and other constructs for OpenMP loops. (cp_parser_iteration_statement): Reject loops in intervening code. (cp_parser_omp_for_loop_init): Expand comments and tweak the interface slightly to better distinguish input/output parameters. (cp_convert_omp_range_for): Likewise. (cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop and largely rewritten. Add more comments. (insert_structured_blocks): New. (find_structured_blocks): New. (struct sit_data, substitute_in_tree_walker, substitute_in_tree): New. (fixup_blocks_walker): New. (cp_parser_omp_for_loop): Rewrite to use recursive descent instead of a loop. Add logic to reshuffle the bits of code collected during parsing so intervening code gets moved to the loop body. (cp_parser_omp_loop): Remove call to finish_omp_for_block, which is now redundant. (cp_parser_omp_simd): Likewise. (cp_parser_omp_for): Likewise. (cp_parser_omp_distribute): Likewise. (cp_parser_oacc_loop): Likewise. (cp_parser_omp_taskloop): Likewise. (cp_parser_pragma): Reject OpenMP pragmas in intervening code. * parser.h (struct cp_parser): Add omp_for_parse_state field. * pt.cc (tsubst_omp_for_iterator): Adjust call to cp_convert_omp_range_for. * semantics.cc (finish_omp_for): Try harder to preserve location of loop variable init expression for use in diagnostics. (struct fofb_data, finish_omp_for_block_walker): New. (finish_omp_for_block): Allow variables to be bound in a BIND_EXPR nested inside BIND instead of directly in BIND itself. gcc/testsuite/ChangeLog * c-c++-common/goacc/tile-2.c: Adjust expected error patterns. * g++.dg/gomp/attrs-imperfect1.C: New test. * g++.dg/gomp/attrs-imperfect2.C: New test. * g++.dg/gomp/attrs-imperfect3.C: New test. * g++.dg/gomp/attrs-imperfect4.C: New test. * g++.dg/gomp/attrs-imperfect5.C: New test. * g++.dg/gomp/pr41967.C: Adjust expected error patterns. * g++.dg/gomp/tpl-imperfect-gotos.C: New test. * g++.dg/gomp/tpl-imperfect-invalid-scope.C: New test. libgomp/ChangeLog * testsuite/libgomp.c++/attrs-imperfect1.C: New test. * testsuite/libgomp.c++/attrs-imperfect2.C: New test. * testsuite/libgomp.c++/attrs-imperfect3.C: New test. * testsuite/libgomp.c++/attrs-imperfect4.C: New test. * testsuite/libgomp.c++/attrs-imperfect5.C: New test. * testsuite/libgomp.c++/attrs-imperfect6.C: New test. * testsuite/libgomp.c++/imperfect-class-1.C: New test. * testsuite/libgomp.c++/imperfect-class-2.C: New test. * testsuite/libgomp.c++/imperfect-class-3.C: New test. * testsuite/libgomp.c++/imperfect-destructor.C: New test. * testsuite/libgomp.c++/imperfect-template-1.C: New test. * testsuite/libgomp.c++/imperfect-template-2.C: New test. * testsuite/libgomp.c++/imperfect-template-3.C: New test.
2023-08-23libgomp, testsuite: Do not call nonstandard functionsFrancois-Xavier Coudert2-0/+28
The following functions are not standard, and not always available (e.g., on darwin). They should not be called unless available: gamma, gammaf, scalb, scalbf, significand, and significandf. libgomp/ChangeLog: * testsuite/lib/libgomp.exp: Add effective target. * testsuite/libgomp.c/simd-math-1.c: Avoid calling nonstandard functions.
2023-08-19omp-expand.cc: Fix wrong code with non-rectangular loop nest [PR111017]Tobias Burnus1-0/+72
Before commit r12-5295-g47de0b56ee455e, all gimple_build_cond in expand_omp_for_* were inserted with gsi_insert_before (gsi_p, cond_stmt, GSI_SAME_STMT); except the one dealing with the multiplicative factor that was gsi_insert_after (gsi, cond_stmt, GSI_CONTINUE_LINKING); That commit for PR103208 fixed the issue of some missing regimplify of operands of GIMPLE_CONDs by moving the condition handling to the new function expand_omp_build_cond. While that function has an 'bool after = false' argument to switch between the two variants. However, all callers ommited this argument. This commit reinstates the prior behavior by passing 'true' for the factor != 0 condition, fixing the included testcase. PR middle-end/111017 gcc/ * omp-expand.cc (expand_omp_for_init_vars): Pass after=true to expand_omp_build_cond for 'factor != 0' condition, resulting in pre-r12-5295-g47de0b56ee455e code for the gimple insert. libgomp/ * testsuite/libgomp.c-c++-common/non-rect-loop-1.c: New test.
2023-07-26OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rectTobias Burnus3-6/+537
When copying a 2D or 3D rectangular memmory block, the performance is better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the data one by one. That's what this commit does. Additionally, it permits device-to-device copies, if neccessary using a temporary variable on the host. include/ChangeLog: * cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE. (CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): New typdefs. (cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer, cuMemcpy3DPeerAsync): New prototypes. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New prototypes. * libgomp.h (struct gomp_device_descr): Add memcpy2d_func and memcpy3d_func. * libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used. * oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL. * plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D): Invoke via CUDA_ONE_CALL. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New. * target.c (omp_target_memcpy_rect_worker): (omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy): Permit all device-to-device copyies; invoke new plugins for 2D and 3D copying when available. (gomp_load_plugin_for_device): DLSYM the new plugin functions. * testsuite/libgomp.c/target-12.c: Fix dimension bug. * testsuite/libgomp.fortran/target-12.f90: Likewise. * testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
2023-07-19OpenMP/Fortran: Non-rectangular loops with constant steps other than 1 or -1 ↵Tobias Burnus4-648/+481
[PR107424] Before this commit, gfortran produced with OpenMP for 'do i = 1,10,2' the code for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1) i = count.0 * 2 + 1; While such an inner loop can be collapsed, a non-rectangular could not. With this commit and for all constant loop steps, a simple loop such as 'for (i = 1; i <= 10; i = i + 2)' is created. (Before only for the constant steps of 1 and -1.) The constant step permits to know the direction (increasing/decreasing) that is required for the loop condition. The new code is only valid if one assumes no overflow of the loop variable. However, the Fortran standard can be read that this must be ensured by the user. Namely, the Fortran standard requires (F2023, 10.1.5.2.4): "The execution of any numeric operation whose result is not defined by the arithmetic used by the processor is prohibited." And, for DO loops, F2023's "11.1.7.4.3 The execution cycle" has the following: The number of loop iterations handled by an iteration count, which would permit code like 'do i = huge(i)-5, huge(i),4'. However, in step (3), this count is not only decremented by one but also: "... The DO variable, if any, is incremented by the value of the incrementation parameter m3." And for the example above, 'i' would be 'huge(i)+3' in the last execution cycle, which exceeds the largest model number and should render the example as invalid. PR fortran/107424 gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_nonrect_loop_expr): Accept all constant loop steps. (gfc_trans_omp_do): Likewise; use sign to determine loop direction. libgomp/ChangeLog: * libgomp.texi (Impl. Status 5.0): Add link to new PR110735. * testsuite/libgomp.fortran/non-rectangular-loop-1.f90: Enable commented tests. * testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: Remove test file; tests are in non-rectangular-loop-1.f90. * testsuite/libgomp.fortran/non-rectangular-loop-5.f90: Change testcase to use a non-constant step to retain the 'sorry' test. * testsuite/libgomp.fortran/non-rectangular-loop-6.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/linear-2.f90: Update dump to remove the additional count variable.
2023-07-17OpenMP/Fortran: Parsing support for 'uses_allocators'Tobias Burnus2-0/+267
The 'uses_allocators' clause to the 'target' construct accepts predefined allocators and can also be used to define a new allocator for a target region. As predefined allocators in GCC do not require special handling, those can and are ignored after parsing, such that this feature now works. On the other hand, defining a new allocator will fail for now with a 'sorry, unimplemented'. Note that both the OpenMP 5.0/5.1 and 5.2 syntax for uses_allocators is supported by this commit. 2023-07-17 Tobias Burnus <tobias@codesoucery.com> Chung-Lin Tang <cltang@codesourcery.com> gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist, show_omp_clauses): Dump uses_allocators clause. * gfortran.h (gfc_free_omp_namelist): Add memspace_sym to u union and traits_sym to u2 union. (OMP_LIST_USES_ALLOCATORS): New enum value. (gfc_free_omp_namelist): Add 'bool free_mem_traits_space' arg. * match.cc (gfc_free_omp_namelist): Likewise. * openmp.cc (gfc_free_omp_clauses, gfc_match_omp_variable_list, gfc_match_omp_to_link, gfc_match_omp_doacross_sink, gfc_match_omp_clause_reduction, gfc_match_omp_allocate, gfc_match_omp_flush): Update call. (gfc_match_omp_clauses): Likewise. Parse uses_allocators clause. (gfc_match_omp_clause_uses_allocators): New. (enum omp_mask2): Add new OMP_CLAUSE_USES_ALLOCATORS. (OMP_TARGET_CLAUSES): Accept it. (resolve_omp_clauses): Resolve uses_allocators clause * st.cc (gfc_free_statement): Update gfc_free_omp_namelist call. * trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_USES_ALLOCATORS; fail with sorry unless predefined allocator. (gfc_split_omp_clauses): Handle uses_allocators. libgomp/ChangeLog: * testsuite/libgomp.fortran/uses_allocators_1.f90: New test. * testsuite/libgomp.fortran/uses_allocators_2.f90: New test. Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com>
2023-07-13testsuite: dg-require LTO for libgomp LTO testsDavid Edelsohn3-0/+3
Some test cases in libgomp testsuite pass -flto as an option, but the testcases do not require LTO target support. This patch adds the necessary DejaGNU requirement for LTO support to the testcases.. libgomp/ChangeLog: * testsuite/libgomp.c++/target-map-class-2.C: Require LTO. * testsuite/libgomp.c-c++-common/requires-4.c: Require LTO. * testsuite/libgomp.c-c++-common/requires-4a.c: Require LTO. Signed-off-by: David Edelsohn <dje.gcc@gmail.com>
2023-07-12libgomp: Use libnuma for OpenMP's partition=nearest allocation traitTobias Burnus2-0/+502
As with the memkind library, it is only used when found at runtime; it does not need to be present when building GCC. The included testcase does not check whether the memory has been placed on the nearest node as the Linux kernel memory handling too often ignores that hint, using a different node for the allocation. However, when running with 'numactl --preferred=<node> ./executable', it is clearly visible that the feature works by comparing malloc/default vs. nearest placement (using get_mempolicy to obtain the node for a mem addr). libgomp/ChangeLog: * allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA. (enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind; add GOMP_MEMKIND_LIBNUMA. (struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New. (omp_init_allocator): Handle partition=nearest with libnuma if avail. (omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add numa_alloc_local (+ memset), numa_free, and numa_realloc calls as needed. * config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define * libgomp.texi: Fix a typo; use 'fi' instead of its ligature char. (Memory allocation): Renamed from 'Memory allocation with libmemkind'; updated for libnuma usage. * testsuite/libgomp.c-c++-common/alloc-11.c: New test. * testsuite/libgomp.c-c++-common/alloc-12.c: New test.
2023-06-19Fix DejaGnu directive syntax error in 'libgomp.c/target-51.c'Thomas Schwinge1-1/+1
ERROR: libgomp.c/target-51.c: unknown dg option: \} for "}" Fix-up for recent commit 01fe115ba7eafebcf97bbac9e157038a003d0c85 "libgomp.c/target-51.c: Accept more error-msg variants in dg-output". libgomp/ * testsuite/libgomp.c/target-51.c: Fix DejaGnu directive syntax error.
2023-06-19libgomp.c/target-51.c: Accept more error-msg variants in dg-outputTobias Burnus1-2/+1
Depending on the details, the testcase can fail with different but related messages; all of the following all could be observed for this testcase: libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used for offloading libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available Before, the last two were tested for with 'target offload_device' and '! offload_device', respectively. Now, all three are accepted by matching '.*' already after 'but' and without distinguishing whether the effective target is an offload_device or not. (For completeness, there is a fourth error that follows this pattern: 'OMP_TARGET_OFFLOAD is set to MANDATORY, but device is finalized'.) libgomp/ * testsuite/libgomp.c/target-51.c: Accept more error msg variants as expected dg-output.
2023-06-19OpenMP (C/C++): Keep pointer value of unmapped ptr with default mapping ↵Tobias Burnus6-14/+390
[PR110270] For C/C++ pointers, default implicit mapping firstprivatizes the pointer but if the memory it points to is mapped, the it is updated to point to the device memory (by attaching a zero sized array section of the pointed-to storage). However, if the pointed-to storage wasn't mapped, the pointer was set to NULL on the device side (OpenMP 5.0/5.1 semantic). With this commit, the pointer retains the on-host address in that case (OpenMP 5.2 semantic). The new semantic avoids an explicit map/firstprivate/is_device_ptr in the following sensible cases: Special values (e.g. pointer or 0x1, 0x2 etc.), explicitly device allocated memory (e.g. omp_target_alloc), and with (unified) shared memory. (Note: With (U)SM, mappings still must be tracked, at least when omp_target_associate_ptr does not fail when passing in two destinct pointers.) libgomp/ PR middle-end/110270 * target.c (gomp_map_vars_internal): Copy host value instead of NULL for GOMP_MAP_ZERO_LEN_ARRAY_SECTION if not mapped. * libgomp.texi (OpenMP 5.2 Impl.): Mark as 'Y'. * testsuite/libgomp.c/target-19.c: Update expected value. * testsuite/libgomp.c++/target-18.C: Likewise. * testsuite/libgomp.c++/target-19.C: Likewise. * testsuite/libgomp.c-c++-common/requires-unified-addr-2.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-4.c: New test.
2023-06-16libgomp: Fix OMP_TARGET_OFFLOAD=mandatoryTobias Burnus2-0/+43
It turned out that gomp_init_targets_once() was not run when directly calling 'omp target' or 'omp target (enter/exit) data' causing an abort with OMP_TARGET_OFFLOAD=mandatory wrongly claiming that no device is available. It was called a tiny bit later but few lines too late for updating the default-device-var. libgomp/ChangeLog: * target.c (resolve_device): Call gomp_get_num_devices early to ensure gomp_init_targets_once was called before using default-device-var. * testsuite/libgomp.c/target-55.c: New test. * testsuite/libgomp.c/target-55a.c: New test.
2023-06-15libgomp: Extend OMP_ALLOCATOR, add affinity env var docTobias Burnus6-0/+104
Support OpenMP 5.1's syntax for OMP_ALLOCATOR as well, which permits besides predefined allocators also predefined memspaces optionally followed by traits. Additionally, this commit adds the previously lacking documentation for OMP_ALLOCATOR, OMP_AFFINITY_FORMAT and OMP_DISPLAY_AFFINITY. libgomp/ChangeLog: * env.c (gomp_def_allocator_envvar): New var. (parse_allocator): Handle OpenMP 5.1 syntax. (cleanup_env): New. (omp_display_env): Output gomp_def_allocator_envvar for an allocator with traits. * libgomp.texi (OMP_ALLOCATOR, OMP_AFFINITY_FORMAT, OMP_DISPLAY_AFFINITY): New. * testsuite/libgomp.c/allocator-1.c: New test. * testsuite/libgomp.c/allocator-2.c: New test. * testsuite/libgomp.c/allocator-3.c: New test. * testsuite/libgomp.c/allocator-4.c: New test. * testsuite/libgomp.c/allocator-5.c: New test. * testsuite/libgomp.c/allocator-6.c: New test.
2023-06-14Align a 'OMP_TARGET_OFFLOAD=mandatory' diagnostic with othersThomas Schwinge1-1/+1
On 2023-06-14T11:42:22+0200, Tobias Burnus <tobias@codesourcery.com> wrote: > On 14.06.23 10:09, Thomas Schwinge wrote: >> Let me know if I should also adjust the new 'target { ! offload_device }' >> diagnostic "[...] MANDATORY but only the host device is available" to >> include a comma before 'but', for consistency with the other existing >> diagnostics (cited above)? > > I think it makes sense to be consistent. Thus: Yes, please add the commas. Fix-up for recent commit 18c8b56c7d67a9e37acf28822587786f0fc0efbc "OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory". libgomp/ * target.c (resolve_device): Align a 'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others. * testsuite/libgomp.c/target-51.c: Adjust.
2023-06-14driver: Forward '-lgfortran', '-lm' to offloading compilationThomas Schwinge5-7/+0
..., so that users don't manually need to specify '-foffload-options=-lgfortran', '-foffload-options=-lm' in addition to '-lgfortran', '-lm' (specified manually, or implicitly by the driver). gcc/ * gcc.cc (driver_handle_option): Forward host '-lgfortran', '-lm' to offloading compilation. * config/gcn/mkoffload.cc (main): Adjust. * config/nvptx/mkoffload.cc (main): Likewise. * doc/invoke.texi (foffload-options): Update example. libgomp/ * testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Don't set. * testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Likewise. * testsuite/libgomp.c/simd-math-1.c: Remove '-foffload-options=-lm'. * testsuite/libgomp.fortran/fortran-torture_execute_math.f90: Likewise. * testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90: Likewise.
2023-06-14Add 'libgomp.{,oacc-}fortran/fortran-torture_execute_math.f90'Thomas Schwinge2-0/+9
..., via 'include'ing the existing 'gfortran.fortran-torture/execute/math.f90', which therefore is enhanced for optional OpenACC 'serial', OpenMP 'target' usage. gcc/testsuite/ * gfortran.fortran-torture/execute/math.f90: Enhance for optional OpenACC 'serial', OpenMP 'target' usage. libgomp/ * testsuite/libgomp.fortran/fortran-torture_execute_math.f90: New. * testsuite/libgomp.oacc-fortran/fortran-torture_execute_math.f90: Likewise.
2023-06-14Fix typo in 'libgomp.c/target-51.c'Thomas Schwinge1-1/+1
..., and therefore, given 'target offload_device': PASS: libgomp.c/target-51.c (test for excess errors) PASS: libgomp.c/target-51.c execution test [-FAIL:-]{+PASS:+} libgomp.c/target-51.c output pattern test Fix-up for recent commit 18c8b56c7d67a9e37acf28822587786f0fc0efbc "OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory". libgomp/ * testsuite/libgomp.c/target-51.c: Fix typo.
2023-06-14OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatoryTobias Burnus8-0/+210
OMP_TARGET_OFFLOAD=mandatory handling was before inconsistent. Hence, in OpenMP 5.2 it was clarified/extended by having implications on the default-device-var; additionally, omp_initial_device and omp_invalid_device enum values/PARAMETERs were added; support for it was added in r13-1066-g1158fe43407568 including aborting for omp_invalid_device and non-conforming device numbers. Only the mandatory handling was missing. Namely, while the default-device-var is usually initialized to value 0, with 'mandatory' it must have the value 'omp_invalid_device' if and only if zero non-host devices are available. (The OMP_DEFAULT_DEVICE env var overrides this as it comes semantically after the initialization.) To achieve this, default-device-var is now initialized to MIN_INT. If there is no 'mandatory', it is set to 0 directly after env var parsing. Otherwise, it is updated in gomp_target_init to either 0 or omp_invalid_device. To ensure INT_MIN is never seen by the user, both the omp_get_default_device API routine and omp_display_env (user call and OMP_DISPLAY_ENV env var) call gomp_init_targets_once() in that case. libgomp/ChangeLog: * env.c (gomp_default_icv_values): Init default_device_var to an nonconforming value - INT_MIN. (initialize_env): After env-var parsing, set default_device_var to device 0 unless OMP_TARGET_OFFLOAD=mandatory. (omp_display_env): If default_device_var is INT_MIN, call gomp_init_targets_once. * icv-device.c (omp_get_default_device): Likewise. * libgomp.texi (OMP_DEFAULT_DEVICE): Update init description. (OpenMP 5.2 Impl. Status): Mark OMP_TARGET_OFFLOAD=mandatory as 'Y'. * target.c (resolve_device): Improve error message device-num < 0 with 'mandatory' and no no-host devices available. (gomp_target_init): Set default-device-var if INT_MIN. * testsuite/libgomp.c/target-48.c: New test. * testsuite/libgomp.c/target-49.c: New test. * testsuite/libgomp.c/target-50.c: New test. * testsuite/libgomp.c/target-50a.c: New test. * testsuite/libgomp.c/target-51.c: New test. * testsuite/libgomp.c/target-52.c: New test. * testsuite/libgomp.c/target-53.c: New test. * testsuite/libgomp.c/target-54.c: New test.
2023-06-13libgomp/testsuite: Add requires-unified-addr-1.{c,f90} [PR109837]Tobias Burnus2-0/+185
Add a testcase for 'omp requires unified_address' that is currently supported by all devices but was not tested for. libgomp/ PR libgomp/109837 * testsuite/libgomp.c-c++-common/requires-unified-addr-1.c: New test. * testsuite/libgomp.fortran/requires-unified-addr-1.f90: New test.
2023-06-12OpenMP: Cleanups related to the 'present' modifierTobias Burnus7-13/+49
Reduce number of enum values passed to libgomp as GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} have the same semantic as GOMP_MAP_FORCE_PRESENT (i.e. abort if not present, otherwise ignore); that's different to GOMP_MAP_ALWAYS_PRESENT_{TO,TOFROM,FROM} which also abort if not present but copy data when present. This is is a follow-up to the commit r14-1579-g4ede915d5dde93 done 6 days ago. Additionally, the commit improves a libgomp run-time and a C/C++ compile-time error wording and extends testcases a tiny bit. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_map): Reword error message for clearness especially with 'omp target (enter/exit) data.' gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_map): Reword error message for clearness especially with 'omp target (enter/exit) data.' * semantics.cc (handle_omp_array_sections): Handle GOMP_MAP_{ALWAYS_,}PRESENT_{TO,TOFROM,FROM,ALLOC} enum values. gcc/ChangeLog: * gimplify.cc (gimplify_adjust_omp_clauses_1): Use GOMP_MAP_FORCE_PRESENT for 'present alloc' implicit mapping. (gimplify_adjust_omp_clauses): Change GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to the equivalent GOMP_MAP_FORCE_PRESENT. * omp-low.cc (lower_omp_target): Remove handling of no-longer valid GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC}; update map kinds used for to/from clauses with present modifier. include/ChangeLog: * gomp-constants.h (enum gomp_map_kind): Change the enum values GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to be compiler only. (GOMP_MAP_PRESENT_P): Update to include also GOMP_MAP_FORCE_PRESENT. libgomp/ChangeLog: * target.c (gomp_to_device_kind_p, gomp_map_vars_internal): Replace GOMP_MAP_PRESENT_{FROM,TO,TOFROM,ACLLOC} by GOMP_MAP_FORCE_PRESENT. (gomp_map_vars_internal, gomp_update): Likewise; unify and improve error message. * testsuite/libgomp.c-c++-common/target-present-2.c: Update for changed error message. * testsuite/libgomp.fortran/target-present-1.f90: Likewise. * testsuite/libgomp.fortran/target-present-2.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise. * testsuite/libgomp.c-c++-common/target-present-1.c: Likewise and extend testcase to check that data is copied when needed. * testsuite/libgomp.c-c++-common/target-present-3.c: Likewise. * testsuite/libgomp.fortran/target-present-3.f90: Likewise. gcc/testsuite/ChangeLog: * c-c++-common/gomp/defaultmap-4.c: Update scan-tree-dump. * c-c++-common/gomp/map-9.c: Likewise. * gfortran.dg/gomp/defaultmap-8.f90: Likewise. * gfortran.dg/gomp/map-11.f90: Likewise. * gfortran.dg/gomp/target-update-1.f90: Likewise. * gfortran.dg/gomp/map-12.f90: Likewise; also check original dump. * c-c++-common/gomp/map-6.c: Update dg-error and also check clause error with 'target (enter/exit) data'.
2023-06-07testsuite/libgomp.*/target-present-*.{c,f90}: Improve and fixTobias Burnus6-25/+35
One of the testcases lacked variables in a map clause such that the fail occurred too early. Additionally, it would have failed for all those non-host devices where 'present' is always true, i.e. non-host devices which can access all of the host memory (shared-memory devices). [There are currently none.] The commit now runs the code on all devices, which should succeed for host fallback and for shared-memory devices, finding potenial issues that way. Additionally, a checkpoint (required stdout output) is used to ensure that the execution won't fail (with the same error) before reaching the expected fail location. 2023-06-07 Thomas Schwinge <thomas@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> libgomp/ * testsuite/libgomp.c-c++-common/target-present-1.c: Run code also for non-offload_device targets; check that it runs successfully for those and for all until a checkpoint for all * testsuite/libgomp.c-c++-common/target-present-2.c: Likewise. * testsuite/libgomp.c-c++-common/target-present-3.c: Likewise. * testsuite/libgomp.fortran/target-present-1.f90: Likewise. * testsuite/libgomp.fortran/target-present-3.f90: Likewise. * testsuite/libgomp.fortran/target-present-2.f90: Likewise; add missing vars to map clause.
2023-06-06openmp: Add support for the 'present' modifierTobias Burnus6-0/+163
This implements support for the OpenMP 5.1 'present' modifier, which can be used in map clauses in the 'target', 'target data', 'target data enter' and 'target data exit' constructs, and in the 'to' and 'from' clauses of the 'target update' construct. It is also supported in defaultmap. The modifier triggers a fatal runtime error if the data specified by the clause is not already present on the target device. It can also be combined with 'always' in map clauses. 2023-06-06 Kwok Cheung Yeung <kcy@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> gcc/c/ * c-parser.cc (c_parser_omp_clause_defaultmap, c_parser_omp_clause_map): Parse 'present'. (c_parser_omp_clause_to, c_parser_omp_clause_from): Remove. (c_parser_omp_clause_from_to): New; parse to/from clauses with optional present modifer. (c_parser_omp_all_clauses): Update call. (c_parser_omp_target_data, c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Handle new map enum values for 'present' mapping. gcc/cp/ * parser.cc (cp_parser_omp_clause_defaultmap, cp_parser_omp_clause_map): Parse 'present'. (cp_parser_omp_clause_from_to): New; parse to/from clauses with optional 'present' modifier. (cp_parser_omp_all_clauses): Update call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data): Handle new enum value for 'present' mapping. * semantics.cc (finish_omp_target): Likewise. gcc/fortran/ * dump-parse-tree.cc (show_omp_namelist): Display 'present' map modifier. (show_omp_clauses): Display 'present' motion modifier for 'to' and 'from' clauses. * gfortran.h (enum gfc_omp_map_op): Add entries with 'present' modifiers. (struct gfc_omp_namelist): Add 'present_modifer'. * openmp.cc (gfc_match_motion_var_list): New, handles optional 'present' modifier for to/from clauses. (gfc_match_omp_clauses): Call it for to/from clauses; parse 'present' in defaultmap and map clauses. (resolve_omp_clauses): Allow 'present' modifiers on 'target', 'target data', 'target enter' and 'target exit' directives. * trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for defaultmap. gcc/ * gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag set. (omp_get_attachment): Handle map clauses with 'present' modifier. (omp_group_base): Likewise. (gimplify_scan_omp_clauses): Reorder present maps to come first. Set GOVD flags for present defaultmaps. (gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps. * omp-low.cc (scan_sharing_clauses): Handle 'always, present' map clauses. (lower_omp_target): Handle map clauses with 'present' modifier. Handle 'to' and 'from' clauses with 'present'. * tree-core.h (enum omp_clause_defaultmap_kind): Add OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind. * tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and 'from' clauses with 'present' modifier. Handle present defaultmap. * tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define. include/ * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New. (GOMP_MAP_FLAG_FORCE): Redefine. (GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New. (enum gomp_map_kind): Add map kinds with 'present' modifiers. (GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for map variants with 'present' (GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true for map variants with 'always, present' modifiers. (GOMP_MAP_ALWAYS): Redefine. (GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New. libgomp/ * libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses. * target.c (gomp_to_device_kind_p): Add map kinds with 'present' modifier. (gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro. (gomp_map_vars_internal, gomp_update, gomp_target_rev): Emit runtime error if memory region not present. * testsuite/libgomp.c-c++-common/target-present-1.c: New test. * testsuite/libgomp.c-c++-common/target-present-2.c: New test. * testsuite/libgomp.c-c++-common/target-present-3.c: New test. * testsuite/libgomp.fortran/target-present-1.f90: New test. * testsuite/libgomp.fortran/target-present-2.f90: New test. * testsuite/libgomp.fortran/target-present-3.f90: New test. gcc/testsuite/ * c-c++-common/gomp/map-6.c: Update dg-error, extend to test for duplicated 'present' and extend scan-dump tests for 'present'. * gfortran.dg/gomp/defaultmap-1.f90: Update dg-error. * gfortran.dg/gomp/map-7.f90: Extend parse and dump test for 'present'. * gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present' modifier checking. * c-c++-common/gomp/defaultmap-4.c: New test. * c-c++-common/gomp/map-9.c: New test. * c-c++-common/gomp/target-update-1.c: New test. * gfortran.dg/gomp/defaultmap-8.f90: New test. * gfortran.dg/gomp/map-11.f90: New test. * gfortran.dg/gomp/map-12.f90: New test. * gfortran.dg/gomp/target-update-1.f90: New test.
2023-06-02Support parallel testing in libgomp: fallback Perl 'flock' [PR66005]Thomas Schwinge2-1/+20
Follow-up to commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba "Support parallel testing in libgomp, part II [PR66005]" ("..., and enable if 'flock' is available for serializing execution testing"), where we saw: > On my Dell Precision 7530 laptop: > > $ uname -srvi > Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 > $ grep '^model name' < /proc/cpuinfo | uniq -c > 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz > $ nvidia-smi -L > GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea) > > ... [...]: case (c) standard configuration, no offloading > configured, [...] > $ \time make check-target-libgomp > > Case (c), baseline; [...]: > > 1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k > 1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k > > Case (c), parallelized [using 'flock']: > > [...] > -j12 GCC_TEST_PARALLEL_SLOTS=12 > 2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k > 2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k Quite the same when instead of 'flock' using this fallback Perl 'flock': 2565.23user 194.35system 4:46.77elapsed 962%CPU (0avgtext+0avgdata 505216maxresident)k 2549.38user 200.20system 4:46.08elapsed 961%CPU (0avgtext+0avgdata 505216maxresident)k PR testsuite/66005 gcc/ * doc/install.texi: Document (optional) Perl usage for parallel testing of libgomp. libgomp/ * testsuite/lib/libgomp.exp: 'flock' through stdout. * testsuite/flock: New. * configure.ac (FLOCK): Point to that if no 'flock' available, but 'perl' is. * configure: Regenerate.