aboutsummaryrefslogtreecommitdiff
path: root/libgomp/target.c
AgeCommit message (Collapse)AuthorFilesLines
2024-07-19libgomp: Document 'GOMP_teams4'Thomas Schwinge1-0/+9
For reference: - <https://inbox.sourceware.org/20211111190313.GV2710@tucnak> "[PATCH] openmp: Honor OpenMP 5.1 num_teams lower bound" - <https://inbox.sourceware.org/20211112132023.GC2710@tucnak> "[PATCH] libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound" libgomp/ * config/gcn/target.c (GOMP_teams4): Document. * config/nvptx/target.c (GOMP_teams4): Likewise. * target.c (GOMP_teams4): Likewise.
2024-05-29libgomp: Enable USM for some nvptx devicesTobias Burnus1-1/+23
A few high-end nvptx devices support the attribute CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared memory is supported in hardware. This patch enables support for those - if all installed nvptx devices have this feature (as the capabilities are per device type). This exposes a bug in gomp_copy_back_icvs as it did before use omp_get_mapped_ptr to find mapped variables, but that returns the unchanged pointer in cased of shared memory. But in this case, we have a few actually mapped pointers - like the ICV variables. Additionally, there was a mismatch with regards to '-1' for the device number as gomp_copy_back_icvs and omp_get_mapped_ptr count differently. Hence, do the lookup manually. include/ChangeLog: * cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add. libgomp/ChangeLog: * libgomp.texi (nvptx): Update USM description. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim support when requesting USM and all devices support CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS. * target.c (gomp_copy_back_icvs): Fix device ptr lookup. (gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the devices supports USM.
2024-04-16OpenACC 2.7: Adjust acc_map_data/acc_unmap_data interaction with reference ↵Chung-Lin Tang1-2/+6
counters This patch adjusts the implementation of acc_map_data/acc_unmap_data API library routines to more fit the description in the OpenACC 2.7 specification. Instead of using REFCOUNT_INFINITY, we now define a REFCOUNT_ACC_MAP_DATA special value to mark acc_map_data-created mappings. Adjustment around mapping related code to respect OpenACC semantics are also added. libgomp/ChangeLog: * libgomp.h (REFCOUNT_ACC_MAP_DATA): Define as (REFCOUNT_SPECIAL | 2). * oacc-mem.c (acc_map_data): Adjust to use REFCOUNT_ACC_MAP_DATA, initialize dynamic_refcount as 1. (acc_unmap_data): Adjust to use REFCOUNT_ACC_MAP_DATA, (goacc_map_var_existing): Add REFCOUNT_ACC_MAP_DATA case. (goacc_exit_datum_1): Add REFCOUNT_ACC_MAP_DATA case, respect REFCOUNT_ACC_MAP_DATA when decrementing/finalizing. Force lowest dynamic_refcount to be 1 for REFCOUNT_ACC_MAP_DATA. (goacc_enter_data_internal): Add REFCOUNT_ACC_MAP_DATA case. * target.c (gomp_increment_refcount): Return early for REFCOUNT_ACC_MAP_DATA case. (gomp_decrement_refcount): Likewise. * testsuite/libgomp.oacc-c-c++-common/lib-96.c: New testcase. * testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: Adjust testcase error output scan test.
2024-03-04libgomp: Use void (*) (void *) rather than void (*)() for host_fn type ↵Jakub Jelinek1-1/+1
[PR114216] For the type of the target callbacks we use elsehwere void (*) (void *) and IMHO should use that for the reverse offload fallback as well (where the actual callback is emitted using the same code as for host fallback or device kernel entry routines), even when it is also ok to use void (*) () before C23 and we aren't building libgomp with C23 yet. On some arches perhaps void (*) () could result in worse code generation because calls in that case like casts to unprototyped functions need to sometimes pass argument in two different spots etc. so that it deals with both passing it through ... and as a named argument. 2024-03-04 Jakub Jelinek <jakub@redhat.com> PR libgomp/114216 * target.c (gomp_target_rev): Change host_fn type and corresponding cast from void (*)() to void (*) (void *).
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-12-15OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnosticJulian Brown1-6/+54
This patch adds support for non-constant component offsets in "map" clauses for OpenMP (and the equivalants for OpenACC), which are not able to be sorted into order at compile time. Normally struct accesses in such clauses are gathered together and sorted into increasing address order after a "GOMP_MAP_STRUCT" node: if we have variable indices, that is no longer possible. This version of the patch scales back the previously-posted version to merely add a diagnostic for incorrect usage of component accesses with variably-indexed arrays of structs: the only permitted variant is where we have multiple indices that are the same, but we could not prove so at compile time. Rather than silently producing the wrong result for cases where the indices are in fact different, we error out (e.g., "map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j). For now, multiple *constant* array indices are still supported (see map-arrayofstruct-1.c). That could perhaps be addressed with a follow-up patch, if necessary. This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to avoid clashing with the OpenACC "non-contiguous" dynamic array support (though that is not yet applied to mainline). 2023-08-18 Julian Brown <julian@codesourcery.com> gcc/ * gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter. (omp_get_attachment, omp_group_last, omp_group_base, omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support. (omp_accumulate_sibling_list): Update calls to extract_base_bit_offset. Support GOMP_MAP_STRUCT_UNORD. (omp_build_struct_sibling_lists, gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add GOMP_MAP_STRUCT_UNORD support. * omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support. * tree-pretty-print.cc (dump_omp_clause): Likewise. include/ * gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD. libgomp/ * oacc-mem.c (find_group_last, goacc_enter_data_internal, goacc_exit_data_internal, GOACC_enter_exit_data): Add GOMP_MAP_STRUCT_UNORD support. * target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support. Detect incorrect use of variable indexing of arrays of structs. (GOMP_target_enter_exit_data, gomp_target_task_fn): Add GOMP_MAP_STRUCT_UNORD support. * testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test. * testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test. * testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test. * testsuite/libgomp.fortran/map-subarray-5.f90: New test.
2023-12-13OpenMP/OpenACC: Rework clause expansion and nested struct handlingJulian Brown1-8/+30
This patch reworks clause expansion in the C, C++ and (to a lesser extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in GPU offloading support. At present a single clause may be turned into several mapping nodes, or have its mapping type changed, in several places scattered through the front- and middle-end. The analysis relating to which particular transformations are needed for some given expression has become quite hard to follow. Briefly, we manipulate clause types in the following places: 1. During parsing, in c_omp_adjust_map_clauses. Depending on a set of rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into ATTACH_DETACH, or mark the decl addressable. 2. In semantics.cc or c-typeck.cc, clauses are expanded in handle_omp_array_sections (called via {c_}finish_omp_clauses, or in finish_omp_clauses itself. The two cases are for processing array sections (the former), or non-array sections (the latter). 3. In gimplify.cc, we build sibling lists for struct accesses, which groups and sorts accesses along with their struct base, creating new ALLOC/RELEASE nodes for pointers. 4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be adjusted or created. This patch doesn't completely disrupt this scheme, though clause types are no longer adjusted in c_omp_adjust_map_clauses (step 1). Clause expansion in step 2 (for C and C++) now uses a single, unified mechanism, parts of which are also reused for analysis in step 3. Rather than the kind-of "ad-hoc" pattern matching on addresses used to expand clauses used at present, a new method for analysing addresses is introduced. This does a recursive-descent tree walk on expression nodes, and emits a vector of tokens describing each "part" of the address. This tokenized address can then be translated directly into mapping nodes, with the assurance that no part of the expression has been inadvertently skipped or misinterpreted. In this way, all the variations of ways pointers, arrays, references and component accesses might be combined can be teased apart into easily-understood cases - and we know we've "parsed" the whole address before we start analysis, so the right code paths can easily be selected. For example, a simple access "arr[idx]" might parse as: base-decl access-indexed-array or "mystruct->foo[x]" with a pointer "foo" component might parse as: base-decl access-pointer component-selector access-pointer A key observation is that support for "array" bases, e.g. accesses whose root nodes are not structures, but describe scalars or arrays, and also *one-level deep* structure accesses, have first-class support in gimplify and beyond. Expressions that use deeper struct accesses or e.g. multiple indirections were more problematic: some cases worked, but lots of cases didn't. This patch reimplements the support for those in gimplify.cc, again using the new "address tokenization" support. An expression like "mystruct->foo->bar[0:10]" used in a mapping node will translate the right-hand access directly in the front-end. The base for the access will be "mystruct->foo". This is handled recursively in gimplify.cc -- there may be several accesses of "mystruct"'s members on the same directive, so the sibling-list building machinery can be used again. (This was already being done for OpenACC, but the new implementation differs somewhat in details, and is more robust.) For OpenMP, in the case where the base pointer itself, i.e. "mystruct->foo" here, is NOT mapped on the same directive, we create a "fragile" mapping. This turns the "foo" component access into a zero-length allocation (which is a new feature for the runtime, so support has been added there too). A couple of changes have been made to how mapping clauses are turned into mapping nodes: The first change is based on the observation that it is probably never correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for references), because if the containing struct is already mapped on the target then the host version of the pointer in question will be corrupted if the struct is copied back from the target. This patch removes all such uses, across each of C, C++ and Fortran. The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes are processed during sibling-list creation. For OpenMP, for pointer components, we must map the base pointer separately from an array section that uses the base pointer, so e.g. we must have both "map(mystruct.base)" and "map(mystruct.base[0:10])" mappings. These create nodes such as: GOMP_MAP_TOFROM mystruct.base G_M_TOFROM *mystruct.base [len: 10*elemsize] G_M_ATTACH_DETACH mystruct.base Instead of using the first of these directly when building the struct sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH, leading to: GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that drops the GOMP_MAP_TOFROM for the base pointer, marks the second group as having had a base-pointer mapping, then omp_build_struct_sibling_lists can create: GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize] This ends up working better in many cases, particularly those involving references. (The "alloc" space is immediately overwritten by a pointer attachment, so this is mildly more efficient than a redundant TO mapping at runtime also.) There is support in the address tokenizer for "arbitrary" base expressions which aren't rooted at a decl, but that is not used as present because such addresses are disallowed at parse time. In the front-ends, the address tokenization machinery is mostly only used for clause expansion and not for diagnostics at present. It could be used for those too, which would allow more of my previous "address inspector" implementation to be removed. The new bits in gimplify.cc work with OpenACC also. This version of the patch addresses several first-pass review comments from Tobias, and fixes a few previously-missed cases for manually-managed ragged array mappings (including cases using references). Some arbitrary differences between handling of clause expansion for C vs. C++ have also been fixed, and some fragments from later in the patch series have been moved forward (where they were useful for fixing bugs). Several new test cases have been added. 2023-11-29 Julian Brown <julian@codesourcery.com> gcc/c-family/ * c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA, C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET. (omp_addr_token): Add forward declaration. (c_omp_address_inspector): New class. * c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but do not change any mapping node types. (c_omp_address_inspector::unconverted_ref_origin, c_omp_address_inspector::component_access_p, c_omp_address_inspector::check_clause, c_omp_address_inspector::get_root_term, c_omp_address_inspector::map_supported_p, c_omp_address_inspector::get_origin, c_omp_address_inspector::maybe_unconvert_ref, c_omp_address_inspector::maybe_zero_length_array_section, c_omp_address_inspector::expand_array_base, c_omp_address_inspector::expand_component_selector, c_omp_address_inspector::expand_map_clause): New methods. (omp_expand_access_chain): New function. gcc/c/ * c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use to select region type for c_finish_omp_clauses call. (c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses. (c_parser_oacc_compute): Likewise. (c_parser_omp_target_data, c_parser_omp_target_enter_data): Support ATTACH kind. (c_parser_omp_target_exit_data): Support DETACH kind. (check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here. * c-typeck.cc (handle_omp_array_sections_1, handle_omp_array_sections, c_finish_omp_clauses): Use c_omp_address_inspector class and OMP address tokenizer to analyze and expand map clause expressions. Fix some diagnostics. Fix "is OpenACC" condition for C_ORT_ACC_TARGET addition. gcc/cp/ * parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use to select region type for finish_omp_clauses call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support GOMP_MAP_ATTACH kind. (cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind. (cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses. (cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses. (cp_parser_oacc_compute): Likewise. * pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to tsubst_omp_clauses for OpenACC compute regions. * semantics.cc (cp_omp_address_inspector): New class, derived from c_omp_address_inspector. (handle_omp_array_sections_1, handle_omp_array_sections, finish_omp_clauses): Use cp_omp_address_inspector class and OMP address tokenizer to analyze and expand OpenMP map clause expressions. Fix some diagnostics. Support C_ORT_ACC_TARGET. (finish_omp_target): Handle GOMP_MAP_POINTER. gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter. Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for derived type components. (gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section. gcc/ * gimplify.cc (build_struct_comp_nodes): Don't process GOMP_MAP_ATTACH_DETACH "middle" nodes here. (omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for nested struct handling. (omp_strip_components_and_deref, omp_strip_indirections): Remove functions. (omp_get_attachment): Handle GOMP_MAP_DETACH here. (omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH, GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer component array sections. (omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile fields. (omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT. (omp_index_mapping_groups_1): Skip reprocess_struct groups. (omp_get_nonfirstprivate_group, omp_directive_maps_explicitly, omp_resolve_clause_dependencies, omp_first_chained_access_token): New functions. (omp_check_mapping_compatibility): Adjust accepted node combinations for "from" clauses using release instead of alloc. (omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P, REPROCESSING_STRUCT, ADDED_TAIL parameters. Use OMP address tokenizer to analyze addresses. Reimplement nested struct handling, and implement "fragile groups". (omp_build_struct_sibling_lists): Adjust for changes to omp_accumulate_sibling_list. Recalculate bias for ATTACH_DETACH nodes after GOMP_MAP_STRUCT nodes. (gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies. Use OMP address tokenizer. (gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc instead of build_simple_mem_ref_loc. * omp-general.cc (omp-general.h, tree-pretty-print.h): Include. (omp_addr_tokenizer): New namespace. (omp_addr_tokenizer::omp_addr_token): New. (omp_addr_tokenizer::omp_parse_component_selector, omp_addr_tokenizer::omp_parse_ref, omp_addr_tokenizer::omp_parse_pointer, omp_addr_tokenizer::omp_parse_access_method, omp_addr_tokenizer::omp_parse_access_methods, omp_addr_tokenizer::omp_parse_structure_base, omp_addr_tokenizer::omp_parse_structured_expr, omp_addr_tokenizer::omp_parse_array_expr, omp_addr_tokenizer::omp_access_chain_p, omp_addr_tokenizer::omp_accessed_addr): New functions. (omp_parse_expr, debug_omp_tokenized_addr): New functions. * omp-general.h (omp_addr_tokenizer::access_method_kinds, omp_addr_tokenizer::structure_base_kinds, omp_addr_tokenizer::token_type, omp_addr_tokenizer::omp_addr_token, omp_addr_tokenizer::omp_access_chain_p, omp_addr_tokenizer::omp_accessed_addr): New. (omp_addr_token, omp_parse_expr): New. * omp-low.cc (scan_sharing_clauses): Skip error check for references to pointers. * tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro. gcc/testsuite/ * c-c++-common/gomp/clauses-2.c: Fix error output. * c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output. * c-c++-common/gomp/target-50.c: Adjust scan output. * c-c++-common/gomp/target-enter-data-1.c: Adjust scan output. * g++.dg/gomp/static-component-1.C: New test. * gcc.dg/gomp/target-3.c: Adjust scan output. * gfortran.dg/gomp/map-9.f90: Adjust scan output. libgomp/ * target.c (gomp_map_pointer): Modify zero-length array section pointer handling. (gomp_attach_pointer): Likewise. (gomp_map_fields_existing): Use gomp_map_0len_lookup. (gomp_attach_pointer): Allow attaching null pointers (or Fortran "unassociated" pointers). (gomp_map_vars_internal): Handle zero-sized struct members. Add diagnostic for unmapped struct pointer members. * testsuite/libgomp.c-c++-common/baseptrs-1.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-2.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-6.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-7.c: New test. * testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing "free". * testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test. * testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test. * testsuite/libgomp.c++/class-array-1.C: New test. * testsuite/libgomp.c++/baseptrs-3.C: New test. * testsuite/libgomp.c++/baseptrs-4.C: New test. * testsuite/libgomp.c++/baseptrs-5.C: New test. * testsuite/libgomp.c++/baseptrs-8.C: New test. * testsuite/libgomp.c++/baseptrs-9.C: New test. * testsuite/libgomp.c++/ref-mapping-1.C: New test. * testsuite/libgomp.c++/target-48.C: New test. * testsuite/libgomp.c++/target-49.C: New test. * testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test. * testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2 semantics. * testsuite/libgomp.c++/target-this-3.C: Likewise. * testsuite/libgomp.c++/target-this-4.C: Likewise. * testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL. * testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.
2023-12-08OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatablesTobias Burnus1-0/+1
This commit adds -fopenmp-allocators which enables support for 'omp allocators' and 'omp allocate' that are associated with a Fortran allocate-stmt. If such a construct is encountered, an error is shown, unless the -fopenmp-allocators flag is present. With -fopenmp -fopenmp-allocators, those constructs get turned into GOMP_alloc allocations, while -fopenmp-allocators (also without -fopenmp) ensures deallocation and reallocation (via intrinsic assignments) are properly directed to GOMP_free/omp_realloc - while normal Fortran allocations are processed by free/realloc. In order to distinguish a 'malloc'ed from a 'GOMP_alloc'ed memory, the version field of the Fortran array discriptor is (mis)used: 0 indicates the normal Fortran allocation while 1 denotes GOMP_alloc. For scalars, there is record keeping in libgomp: GOMP_add_alloc(ptr) will add the pointer address to a splay_tree while GOMP_is_alloc(ptr) will return true it was previously added but also removes it from the list. Besides Fortran FE work, BUILT_IN_GOMP_REALLOC is no part of omp-builtins.def and libgomp gains the mentioned two new function. gcc/ChangeLog: * builtin-types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New. * omp-builtins.def (BUILT_IN_GOMP_REALLOC): New. * builtins.cc (builtin_fnspec): Handle it. * gimple-ssa-warn-access.cc (fndecl_alloc_p, matching_alloc_calls_p): Likewise. * gimple.cc (nonfreeing_call_p): Likewise. * predict.cc (expr_expected_value_1): Likewise. * tree-ssa-ccp.cc (evaluate_stmt): Likewise. * tree.cc (fndecl_dealloc_argno): Likewise. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE and EXEC_OMP_ALLOCATORS. * f95-lang.cc (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST): Add 'ECF_LEAF | ECF_MALLOC' to existing 'ECF_NOTHROW'. (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST): Define. * gfortran.h (gfc_omp_clauses): Add contained_in_target_construct. * invoke.texi (-fopenacc, -fopenmp): Update based on C version. (-fopenmp-simd): New, based on C version. (-fopenmp-allocators): New. * lang.opt (fopenmp-allocators): Add. * openmp.cc (resolve_omp_clauses): For allocators/allocate directive, add target and no dynamic_allocators diagnostic and more invalid diagnostic. * parse.cc (decode_omp_directive): Set contains_teams_construct. * trans-array.h (gfc_array_allocate): Update prototype. (gfc_conv_descriptor_version): New prototype. * trans-decl.cc (gfc_init_default_dt): Fix comment. * trans-array.cc (gfc_conv_descriptor_version): New. (gfc_array_allocate): Support GOMP_alloc allocation. (gfc_alloc_allocatable_for_assignment, structure_alloc_comps): Handle GOMP_free/omp_realloc as needed. * trans-expr.cc (gfc_conv_procedure_call): Likewise. (alloc_scalar_allocatable_for_assignment): Likewise. * trans-intrinsic.cc (conv_intrinsic_move_alloc): Likewise. * trans-openmp.cc (gfc_trans_omp_allocators, gfc_trans_omp_directive): Handle allocators/allocate directive. (gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New. * trans-stmt.h (gfc_trans_allocate): Update prototype. * trans-stmt.cc (gfc_trans_allocate): Support GOMP_alloc. * trans-types.cc (gfc_get_dtype_rank_type): Set version field. * trans.cc (gfc_allocate_using_malloc, gfc_allocate_allocatable): Update to handle GOMP_alloc. (gfc_deallocate_with_status, gfc_deallocate_scalar_with_status): Handle GOMP_free. (trans_code): Update call. * trans.h (gfc_allocate_allocatable, gfc_allocate_using_malloc): Update prototype. (gfc_omp_call_add_alloc, gfc_omp_call_is_alloc): New prototype. * types.def (BT_FN_PTR_PTR_SIZE_PTRMODE_PTRMODE): New. libgomp/ChangeLog: * allocator.c (struct fort_alloc_splay_tree_key_s, fort_alloc_splay_compare, GOMP_add_alloc, GOMP_is_alloc): New. * libgomp.h: Define splay_tree_static for 'reverse' splay tree. * libgomp.map (GOMP_5.1.2): New; add GOMP_add_alloc and GOMP_is_alloc; move GOMP_target_map_indirect_ptr from ... (GOMP_5.1.1): ... here. * libgomp.texi (Impl. Status, Memory management): Update for allocators/allocate directives. * splay-tree.c: Handle splay_tree_static define to declare all functions as static. (splay_tree_lookup_node): New. * splay-tree.h: Handle splay_tree_decl_only define. (splay_tree_lookup_node): New prototype. * target.c: Define splay_tree_static for 'reverse'. * testsuite/libgomp.fortran/allocators-1.f90: New test. * testsuite/libgomp.fortran/allocators-2.f90: New test. * testsuite/libgomp.fortran/allocators-3.f90: New test. * testsuite/libgomp.fortran/allocators-4.f90: New test. * testsuite/libgomp.fortran/allocators-5.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-14.f90: Add coarray and not-listed tests. * gfortran.dg/gomp/allocate-5.f90: Remove sorry dg-message. * gfortran.dg/bind_c_array_params_2.f90: Update expected dump for dtype '.version=0'. * gfortran.dg/gomp/allocate-16.f90: New test. * gfortran.dg/gomp/allocators-3.f90: New test. * gfortran.dg/gomp/allocators-4.f90: New test.
2023-11-07openmp: Add support for the 'indirect' clause in C/C++Kwok Cheung Yeung1-3/+14
This adds support for the 'indirect' clause in the 'declare target' directive. Functions declared as indirect may be called via function pointers passed from the host in offloaded code. Virtual calls to member functions via the object pointer in C++ are currently not supported in target regions. 2023-11-07 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/c-family/ * c-attribs.cc (c_common_attribute_table): Add attribute for indirect functions. * c-pragma.h (enum parma_omp_clause): Add entry for indirect clause. gcc/c/ * c-decl.cc (c_decl_attributes): Add attribute for indirect functions. * c-lang.h (c_omp_declare_target_attr): Add indirect field. * c-parser.cc (c_parser_omp_clause_name): Handle indirect clause. (c_parser_omp_clause_indirect): New. (c_parser_omp_all_clauses): Handle indirect clause. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_begin): Handle indirect clause. * c-typeck.cc (c_finish_omp_clauses): Handle indirect clause. gcc/cp/ * cp-tree.h (cp_omp_declare_target_attr): Add indirect field. * decl2.cc (cplus_decl_attributes): Add attribute for indirect functions. * parser.cc (cp_parser_omp_clause_name): Handle indirect clause. (cp_parser_omp_clause_indirect): New. (cp_parser_omp_all_clauses): Handle indirect clause. (handle_omp_declare_target_clause): Add extra parameter. Add indirect attribute for indirect functions. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_begin): Handle indirect clause. * semantics.cc (finish_omp_clauses): Handle indirect clause. gcc/ * lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect functions. (output_offload_tables): Write indirect functions. (input_offload_tables): read indirect functions. * lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. * omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New. * omp-offload.cc (offload_ind_funcs): New. (omp_discover_implicit_declare_target): Add functions marked with 'omp declare target indirect' to indirect functions list. (omp_finish_file): Add indirect functions to section for offload indirect functions. (execute_omp_device_lower): Redirect indirect calls on target by passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR. (pass_omp_device_lower::gate): Run pass_omp_device_lower if indirect functions are present on an accelerator device. * omp-offload.h (offload_ind_funcs): New. * tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT. * tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_INDIRECT_EXPR): New. * config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs section. Count number of indirect functions. (process_obj): Emit number of indirect functions. * config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New. (process): Emit offload_ind_func_table in PTX code. Emit indirect function names and count in image. * config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark indirect functions in PTX code with IND_FUNC_MAP. gcc/testsuite/ * c-c++-common/gomp/declare-target-7.c: Update expected error message. * c-c++-common/gomp/declare-target-indirect-1.c: New. * c-c++-common/gomp/declare-target-indirect-2.c: New. * g++.dg/gomp/attrs-21.C (v12): Update expected error message. * g++.dg/gomp/declare-target-indirect-1.C: New. * gcc.dg/gomp/attrs-21.c (v12): Update expected error message. include/ * gomp-constants.h (GOMP_VERSION): Increment to 3. (GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New. libgcc/ * offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. (__offload_ind_func_table): New. (__offload_ind_funcs_end): New. (__OFFLOAD_TABLE__): Add entries for indirect functions. libgomp/ * Makefile.am (libgomp_la_SOURCES): Add target-indirect.c. * Makefile.in: Regenerate. * libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define. (GOMP_OFFLOAD_load_image): Add extra argument. * libgomp.h (struct indirect_splay_tree_key_s): New. (indirect_splay_tree_node, indirect_splay_tree, indirect_splay_tree_key): New. (indirect_splay_compare): New. * libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr. * libgomp.texi (OpenMP 5.1): Update documentation on indirect calls in target region and on indirect clause. (Other new OpenMP 5.2 features): Add entry for virtual function calls. * libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype. * oacc-host.c (host_load_image): Add extra argument. * target.c (gomp_load_image_to_device): If the GOMP_VERSION is high enough, read host indirect functions table and pass to load_image_func. * config/accel/target-indirect.c: New. * config/linux/target-indirect.c: New. * config/gcn/team.c (build_indirect_map): Add prototype. (gomp_gcn_enter_kernel): Initialize support for indirect function calls on GCN target. * config/nvptx/team.c (build_indirect_map): Add prototype. (gomp_nvptx_main): Initialize support for indirect function calls on NVPTX target. * plugin/plugin-gcn.c (struct gcn_image_desc): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, build address translation table and copy it to target memory. * plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, Build address translation table and copy it to target memory. * testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New. * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New. * testsuite/libgomp.c++/declare-target-indirect-1.C: New.
2023-09-07Revert: "Another revert test with a bogus hash"Tobias Burnus1-1/+1
This reverts commit ffffffffffffffffffffffffffffffffffffffff. This should get rejected because of the invalid hash. If it still is accepted, it does something sensible: It removes tailing white space from a line in libgomp/target.c.
2023-07-29libgomp: cuda.h and omp_target_memcpy_rect cleanupTobias Burnus1-35/+17
Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894 "OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect", namely: In that commit, the code was changed to handle shared-memory devices; however, as pointed out, omp_target_memcpy_check already set the pointer to NULL in that case. Hence, this commit reverts to the prior version. In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer (all currently unused) and in three structs, fixes reserved-member names and remove a bogus 'const' in three structs. And it changes a DLSYM to DLSYM_OPT as not all plugins support the new functions, yet. include/ChangeLog: * cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): Remove bogus 'const' from 'const void *dst' and fix reserved-name name in those structs. (cuMemcpyPeer, cuMemcpyPeerAsync): Add. libgomp/ChangeLog: * target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for GOMP_OFFLOAD_CAP_SHARED_MEM. (omp_target_memcpy_rect_copy): Likewise for lock condition. (gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for memcpy3d/memcpy2d. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.
2023-07-26OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rectTobias Burnus1-25/+127
When copying a 2D or 3D rectangular memmory block, the performance is better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the data one by one. That's what this commit does. Additionally, it permits device-to-device copies, if neccessary using a temporary variable on the host. include/ChangeLog: * cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE. (CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): New typdefs. (cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer, cuMemcpy3DPeerAsync): New prototypes. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New prototypes. * libgomp.h (struct gomp_device_descr): Add memcpy2d_func and memcpy3d_func. * libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used. * oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL. * plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D): Invoke via CUDA_ONE_CALL. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New. * target.c (omp_target_memcpy_rect_worker): (omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy): Permit all device-to-device copyies; invoke new plugins for 2D and 3D copying when available. (gomp_load_plugin_for_device): DLSYM the new plugin functions. * testsuite/libgomp.c/target-12.c: Fix dimension bug. * testsuite/libgomp.fortran/target-12.f90: Likewise. * testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
2023-06-19OpenMP (C/C++): Keep pointer value of unmapped ptr with default mapping ↵Tobias Burnus1-1/+1
[PR110270] For C/C++ pointers, default implicit mapping firstprivatizes the pointer but if the memory it points to is mapped, the it is updated to point to the device memory (by attaching a zero sized array section of the pointed-to storage). However, if the pointed-to storage wasn't mapped, the pointer was set to NULL on the device side (OpenMP 5.0/5.1 semantic). With this commit, the pointer retains the on-host address in that case (OpenMP 5.2 semantic). The new semantic avoids an explicit map/firstprivate/is_device_ptr in the following sensible cases: Special values (e.g. pointer or 0x1, 0x2 etc.), explicitly device allocated memory (e.g. omp_target_alloc), and with (unified) shared memory. (Note: With (U)SM, mappings still must be tracked, at least when omp_target_associate_ptr does not fail when passing in two destinct pointers.) libgomp/ PR middle-end/110270 * target.c (gomp_map_vars_internal): Copy host value instead of NULL for GOMP_MAP_ZERO_LEN_ARRAY_SECTION if not mapped. * libgomp.texi (OpenMP 5.2 Impl.): Mark as 'Y'. * testsuite/libgomp.c/target-19.c: Update expected value. * testsuite/libgomp.c++/target-18.C: Likewise. * testsuite/libgomp.c++/target-19.C: Likewise. * testsuite/libgomp.c-c++-common/requires-unified-addr-2.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-4.c: New test.
2023-06-16libgomp: Fix OMP_TARGET_OFFLOAD=mandatoryTobias Burnus1-3/+7
It turned out that gomp_init_targets_once() was not run when directly calling 'omp target' or 'omp target (enter/exit) data' causing an abort with OMP_TARGET_OFFLOAD=mandatory wrongly claiming that no device is available. It was called a tiny bit later but few lines too late for updating the default-device-var. libgomp/ChangeLog: * target.c (resolve_device): Call gomp_get_num_devices early to ensure gomp_init_targets_once was called before using default-device-var. * testsuite/libgomp.c/target-55.c: New test. * testsuite/libgomp.c/target-55a.c: New test.
2023-06-14Align a 'OMP_TARGET_OFFLOAD=mandatory' diagnostic with othersThomas Schwinge1-2/+2
On 2023-06-14T11:42:22+0200, Tobias Burnus <tobias@codesourcery.com> wrote: > On 14.06.23 10:09, Thomas Schwinge wrote: >> Let me know if I should also adjust the new 'target { ! offload_device }' >> diagnostic "[...] MANDATORY but only the host device is available" to >> include a comma before 'but', for consistency with the other existing >> diagnostics (cited above)? > > I think it makes sense to be consistent. Thus: Yes, please add the commas. Fix-up for recent commit 18c8b56c7d67a9e37acf28822587786f0fc0efbc "OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatory". libgomp/ * target.c (resolve_device): Align a 'OMP_TARGET_OFFLOAD=mandatory' diagnostic with others. * testsuite/libgomp.c/target-51.c: Adjust.
2023-06-14OpenMP: Set default-device-var with OMP_TARGET_OFFLOAD=mandatoryTobias Burnus1-1/+14
OMP_TARGET_OFFLOAD=mandatory handling was before inconsistent. Hence, in OpenMP 5.2 it was clarified/extended by having implications on the default-device-var; additionally, omp_initial_device and omp_invalid_device enum values/PARAMETERs were added; support for it was added in r13-1066-g1158fe43407568 including aborting for omp_invalid_device and non-conforming device numbers. Only the mandatory handling was missing. Namely, while the default-device-var is usually initialized to value 0, with 'mandatory' it must have the value 'omp_invalid_device' if and only if zero non-host devices are available. (The OMP_DEFAULT_DEVICE env var overrides this as it comes semantically after the initialization.) To achieve this, default-device-var is now initialized to MIN_INT. If there is no 'mandatory', it is set to 0 directly after env var parsing. Otherwise, it is updated in gomp_target_init to either 0 or omp_invalid_device. To ensure INT_MIN is never seen by the user, both the omp_get_default_device API routine and omp_display_env (user call and OMP_DISPLAY_ENV env var) call gomp_init_targets_once() in that case. libgomp/ChangeLog: * env.c (gomp_default_icv_values): Init default_device_var to an nonconforming value - INT_MIN. (initialize_env): After env-var parsing, set default_device_var to device 0 unless OMP_TARGET_OFFLOAD=mandatory. (omp_display_env): If default_device_var is INT_MIN, call gomp_init_targets_once. * icv-device.c (omp_get_default_device): Likewise. * libgomp.texi (OMP_DEFAULT_DEVICE): Update init description. (OpenMP 5.2 Impl. Status): Mark OMP_TARGET_OFFLOAD=mandatory as 'Y'. * target.c (resolve_device): Improve error message device-num < 0 with 'mandatory' and no no-host devices available. (gomp_target_init): Set default-device-var if INT_MIN. * testsuite/libgomp.c/target-48.c: New test. * testsuite/libgomp.c/target-49.c: New test. * testsuite/libgomp.c/target-50.c: New test. * testsuite/libgomp.c/target-50a.c: New test. * testsuite/libgomp.c/target-51.c: New test. * testsuite/libgomp.c/target-52.c: New test. * testsuite/libgomp.c/target-53.c: New test. * testsuite/libgomp.c/target-54.c: New test.
2023-06-12OpenMP: Cleanups related to the 'present' modifierTobias Burnus1-29/+26
Reduce number of enum values passed to libgomp as GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} have the same semantic as GOMP_MAP_FORCE_PRESENT (i.e. abort if not present, otherwise ignore); that's different to GOMP_MAP_ALWAYS_PRESENT_{TO,TOFROM,FROM} which also abort if not present but copy data when present. This is is a follow-up to the commit r14-1579-g4ede915d5dde93 done 6 days ago. Additionally, the commit improves a libgomp run-time and a C/C++ compile-time error wording and extends testcases a tiny bit. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_map): Reword error message for clearness especially with 'omp target (enter/exit) data.' gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_map): Reword error message for clearness especially with 'omp target (enter/exit) data.' * semantics.cc (handle_omp_array_sections): Handle GOMP_MAP_{ALWAYS_,}PRESENT_{TO,TOFROM,FROM,ALLOC} enum values. gcc/ChangeLog: * gimplify.cc (gimplify_adjust_omp_clauses_1): Use GOMP_MAP_FORCE_PRESENT for 'present alloc' implicit mapping. (gimplify_adjust_omp_clauses): Change GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to the equivalent GOMP_MAP_FORCE_PRESENT. * omp-low.cc (lower_omp_target): Remove handling of no-longer valid GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC}; update map kinds used for to/from clauses with present modifier. include/ChangeLog: * gomp-constants.h (enum gomp_map_kind): Change the enum values GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to be compiler only. (GOMP_MAP_PRESENT_P): Update to include also GOMP_MAP_FORCE_PRESENT. libgomp/ChangeLog: * target.c (gomp_to_device_kind_p, gomp_map_vars_internal): Replace GOMP_MAP_PRESENT_{FROM,TO,TOFROM,ACLLOC} by GOMP_MAP_FORCE_PRESENT. (gomp_map_vars_internal, gomp_update): Likewise; unify and improve error message. * testsuite/libgomp.c-c++-common/target-present-2.c: Update for changed error message. * testsuite/libgomp.fortran/target-present-1.f90: Likewise. * testsuite/libgomp.fortran/target-present-2.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise. * testsuite/libgomp.c-c++-common/target-present-1.c: Likewise and extend testcase to check that data is copied when needed. * testsuite/libgomp.c-c++-common/target-present-3.c: Likewise. * testsuite/libgomp.fortran/target-present-3.f90: Likewise. gcc/testsuite/ChangeLog: * c-c++-common/gomp/defaultmap-4.c: Update scan-tree-dump. * c-c++-common/gomp/map-9.c: Likewise. * gfortran.dg/gomp/defaultmap-8.f90: Likewise. * gfortran.dg/gomp/map-11.f90: Likewise. * gfortran.dg/gomp/target-update-1.f90: Likewise. * gfortran.dg/gomp/map-12.f90: Likewise; also check original dump. * c-c++-common/gomp/map-6.c: Update dg-error and also check clause error with 'target (enter/exit) data'.
2023-06-06openmp: Add support for the 'present' modifierTobias Burnus1-5/+61
This implements support for the OpenMP 5.1 'present' modifier, which can be used in map clauses in the 'target', 'target data', 'target data enter' and 'target data exit' constructs, and in the 'to' and 'from' clauses of the 'target update' construct. It is also supported in defaultmap. The modifier triggers a fatal runtime error if the data specified by the clause is not already present on the target device. It can also be combined with 'always' in map clauses. 2023-06-06 Kwok Cheung Yeung <kcy@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> gcc/c/ * c-parser.cc (c_parser_omp_clause_defaultmap, c_parser_omp_clause_map): Parse 'present'. (c_parser_omp_clause_to, c_parser_omp_clause_from): Remove. (c_parser_omp_clause_from_to): New; parse to/from clauses with optional present modifer. (c_parser_omp_all_clauses): Update call. (c_parser_omp_target_data, c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Handle new map enum values for 'present' mapping. gcc/cp/ * parser.cc (cp_parser_omp_clause_defaultmap, cp_parser_omp_clause_map): Parse 'present'. (cp_parser_omp_clause_from_to): New; parse to/from clauses with optional 'present' modifier. (cp_parser_omp_all_clauses): Update call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data): Handle new enum value for 'present' mapping. * semantics.cc (finish_omp_target): Likewise. gcc/fortran/ * dump-parse-tree.cc (show_omp_namelist): Display 'present' map modifier. (show_omp_clauses): Display 'present' motion modifier for 'to' and 'from' clauses. * gfortran.h (enum gfc_omp_map_op): Add entries with 'present' modifiers. (struct gfc_omp_namelist): Add 'present_modifer'. * openmp.cc (gfc_match_motion_var_list): New, handles optional 'present' modifier for to/from clauses. (gfc_match_omp_clauses): Call it for to/from clauses; parse 'present' in defaultmap and map clauses. (resolve_omp_clauses): Allow 'present' modifiers on 'target', 'target data', 'target enter' and 'target exit' directives. * trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for defaultmap. gcc/ * gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag set. (omp_get_attachment): Handle map clauses with 'present' modifier. (omp_group_base): Likewise. (gimplify_scan_omp_clauses): Reorder present maps to come first. Set GOVD flags for present defaultmaps. (gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps. * omp-low.cc (scan_sharing_clauses): Handle 'always, present' map clauses. (lower_omp_target): Handle map clauses with 'present' modifier. Handle 'to' and 'from' clauses with 'present'. * tree-core.h (enum omp_clause_defaultmap_kind): Add OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind. * tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and 'from' clauses with 'present' modifier. Handle present defaultmap. * tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define. include/ * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New. (GOMP_MAP_FLAG_FORCE): Redefine. (GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New. (enum gomp_map_kind): Add map kinds with 'present' modifiers. (GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for map variants with 'present' (GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true for map variants with 'always, present' modifiers. (GOMP_MAP_ALWAYS): Redefine. (GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New. libgomp/ * libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses. * target.c (gomp_to_device_kind_p): Add map kinds with 'present' modifier. (gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro. (gomp_map_vars_internal, gomp_update, gomp_target_rev): Emit runtime error if memory region not present. * testsuite/libgomp.c-c++-common/target-present-1.c: New test. * testsuite/libgomp.c-c++-common/target-present-2.c: New test. * testsuite/libgomp.c-c++-common/target-present-3.c: New test. * testsuite/libgomp.fortran/target-present-1.f90: New test. * testsuite/libgomp.fortran/target-present-2.f90: New test. * testsuite/libgomp.fortran/target-present-3.f90: New test. gcc/testsuite/ * c-c++-common/gomp/map-6.c: Update dg-error, extend to test for duplicated 'present' and extend scan-dump tests for 'present'. * gfortran.dg/gomp/defaultmap-1.f90: Update dg-error. * gfortran.dg/gomp/map-7.f90: Extend parse and dump test for 'present'. * gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present' modifier checking. * c-c++-common/gomp/defaultmap-4.c: New test. * c-c++-common/gomp/map-9.c: New test. * c-c++-common/gomp/target-update-1.c: New test. * gfortran.dg/gomp/defaultmap-8.f90: New test. * gfortran.dg/gomp/map-11.f90: New test. * gfortran.dg/gomp/map-12.f90: New test. * gfortran.dg/gomp/target-update-1.f90: New test.
2023-05-08libgomp: Simplify OpenMP reverse offload host <-> device memory copy ↵Thomas Schwinge1-56/+46
implementation ... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it. Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609 "libgomp/nvptx: Prepare for reverse-offload callback handling", and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8 "libgomp: Handle OpenMP's reverse offloads". libgomp/ * target.c (gomp_target_rev): Instead of 'dev_to_host_cpy', 'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'. * libgomp.h (gomp_target_rev): Adjust. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust. * plugin/plugin-gcn.c (process_reverse_offload): Adjust. * plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy) (rev_off_host_to_dev_cpy): Remove. (GOMP_OFFLOAD_run): Adjust.
2023-03-24Add caveat/safeguard to OpenMP: Handle descriptors in target's firstprivate ↵Thomas Schwinge1-0/+5
[PR104949] Follow-up to commit 49d1a2f91325fa8cc011149e27e5093a988b3a49 "OpenMP: Handle descriptors in target's firstprivate [PR104949]". PR fortran/104949 libgomp/ * target.c (gomp_map_vars_internal) <GOMP_MAP_FIRSTPRIVATE>: Add caveat/safeguard.
2023-03-10Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs [PR90596]Thomas Schwinge1-5/+5
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec', 'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in particular conceptually: no more device memory allocation, host to device data copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for us. This depends on commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd "Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data", where I said that "a use will emerge later", which is this one here. PR libgomp/90596 libgomp/ * target.c (gomp_map_vars_internal): Allow for 'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'. * oacc-parallel.c (GOACC_parallel_keyed): Pass 'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'. * plugin/plugin-gcn.c (alloc_by_agent, gcn_exec) (GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify. (gomp_offload_free): Remove. * plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify. (cuda_free_argmem): Remove. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Adjust.
2023-03-10Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' dataThomas Schwinge1-34/+36
This does *allow*, but under no circumstances is this currently going to be used: all potentially applicable data is non-'ephemeral', and thus not considered for 'gomp_coalesce_buf_add' for OpenACC 'async'. (But a use will emerge later.) Follow-up to commit r12-2530-gd88a6951586c7229b25708f4486eaaf4bf4b5bbe "Don't use libgomp 'cbuf' buffering with OpenACC 'async'", addressing this TODO comment: TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question: is it more performant to use libgomp CBUF buffering or individual device asyncronous copying?) Ephemeral data is small, and therefore individual device asyncronous copying does seem dubious -- in particular given that for all those, we'd individually have to allocate and queue for deallocation a temporary buffer to capture the ephemeral data. Instead, just let the 'cbuf' *be* the temporary buffer. libgomp/ * target.c (gomp_copy_host2dev, gomp_map_vars_internal): Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data.
2023-03-10Simplify OpenACC 'no_create' clause implementationThomas Schwinge1-1/+1
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then simplify the device plugins accordingly. This is a follow-up to Subversion r279551 (Git commit a6163563f2ce502bd4ef444bd5de33570bb8eeb1) "Add OpenACC 2.6's no_create", Subversion r279622 (Git commit 5bcd470bf0749e1f56d05dd43aa9584ff2e3a090) "Use gomp_map_val for OpenACC host-to-device address translation". libgomp/ * target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for 'GOMP_MAP_IF_PRESENT'. * plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Adjust. * plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Likewise. * testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async' testing. * testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
2023-02-15libgomp: Fix reverse-offload for GOMP_MAP_TO_PSETTobias Burnus1-1/+7
libgomp/ * target.c (gomp_target_rev): Dereference ptr to get device address. * testsuite/libgomp.fortran/reverse-offload-5.f90: Add test for unallocated allocatable.
2023-02-15libgomp: Fix 'target enter data' with always pointerTobias Burnus1-3/+7
As GOMP_MAP_ALWAYS_POINTER operates on the previous map item, ensure that with 'target enter data' both are passed together to gomp_map_vars_internal. libgomp/ChangeLog: * target.c (gomp_map_vars_internal): Add 'i > 0' before doing a kind check. (GOMP_target_enter_exit_data): If the next map item is GOMP_MAP_ALWAYS_POINTER map it together with the current item. * testsuite/libgomp.fortran/target-enter-data-3.f90: New test.
2023-02-03libgomp: Fix reverse offload issuesTobias Burnus1-4/+4
If there is nothing to map, skip the mapping and avoid attempting to copy 0 bytes from addrs, sizes and kinds. Additionally, it could happen that a non-allocated address was deallocated, such as a pointer set, leading to a free for the actual data. libgomp/ * target.c (gomp_target_rev): Handle mapnum == 0 and avoid freeing not allocated memory. * testsuite/libgomp.fortran/reverse-offload-6.f90: New test.
2023-01-16Update copyright years.Jakub Jelinek1-1/+1
2022-12-10libgomp: Handle OpenMP's reverse offloadsTobias Burnus1-15/+584
This commit enabled reverse offload for nvptx such that gomp_target_rev actually gets called. And it fills the latter function to do all of the following: finding the host function to the device func ptr and copying the arguments to the host, processing the mapping/firstprivate, calling the host function, copying back the data and freeing as needed. The data handling is made easier by assuming that all host variables either existed before (and are in the mapping) or that those are devices variables not yet available on the host. Thus, the reverse mapping can do without refcounts etc. Note that the spec disallows inside a target region device-affecting constructs other than target plus ancestor device-modifier and it also limits the clauses permitted on this construct. For the function addresses, an additional splay tree is used; for the lookup of mapped variables, the existing splay-tree is used. Unfortunately, its data structure requires a full walk of the tree; Additionally, the just mapped variables are recorded in a separate data structure an extra lookup. While the lookup is slow, assuming that only few variables get mapped in each reverse offload construct and that reverse offload is the exception and not performance critical, this seems to be acceptable. libgomp/ChangeLog: * libgomp.h (struct target_mem_desc): Predeclare; move below after 'reverse_splay_tree_node' and add rev_array member. (struct reverse_splay_tree_key_s, reverse_splay_compare): New. (reverse_splay_tree_node, reverse_splay_tree, reverse_splay_tree_key): New typedef. (struct gomp_device_descr): Add mem_map_rev member. * oacc-host.c (host_dispatch): NULL init .mem_map_rev. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim support for GOMP_REQUIRES_REVERSE_OFFLOAD. * splay-tree.h (splay_tree_callback_stop): New typedef; like splay_tree_callback but returning int not void. (splay_tree_foreach_lazy): Define; like splay_tree_foreach but taking splay_tree_callback_stop as argument. * splay-tree.c (splay_tree_foreach_internal_lazy, splay_tree_foreach_lazy): New; but early exit if callback returns nonzero. * target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'. (gomp_map_lookup_rev): New. (gomp_load_image_to_device): Handle reverse-offload function lookup table. (gomp_unload_image_from_device): Free devicep->mem_map_rev. (struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup, gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int, gomp_map_cdata_lookup): New auxiliary structs and functions for gomp_target_rev. (gomp_target_rev): Implement reverse offloading and its mapping. (gomp_target_init): Init current_device.mem_map_rev.root. * testsuite/libgomp.fortran/reverse-offload-2.f90: New test. * testsuite/libgomp.fortran/reverse-offload-3.f90: New test. * testsuite/libgomp.fortran/reverse-offload-4.f90: New test. * testsuite/libgomp.fortran/reverse-offload-5.f90: New test. * testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without mapping of on-device allocated variables.
2022-12-06OpenMP: omp_get_max_teams, omp_set_num_teams, and ↵Marcel Vollweiler1-4/+184
omp_{gs}et_teams_thread_limit on offload devices This patch adds support for omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of device-specific ICV values (specified as environment variables or changed on a device). In order to reuse device-specific ICV values, a copy back mechanism is implemented that copies ICV values back from device to the host. Additionally, a limitation of the number of teams on gcn offload devices is implemented. The number of teams is limited by twice the number of compute units (one team is executed on one compute unit). This avoids queueing unnessecary many teams and a corresponding allocation of large amounts of memory. Without that limitation the memory allocation for a large number of user-specified teams can result in an "memory access fault". A limitation of the number of teams is already also implemented for nvptx devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c). gcc/ChangeLog: * gimplify.cc (optimize_target_teams): Set initial num_teams_upper to "-2" instead of "1" for non-existing num_teams clause in order to disambiguate from the case of an existing num_teams clause with value 1. libgomp/ChangeLog: * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to allow processing of device-specific values. (omp_set_teams_thread_limit): Likewise. (ialias): Likewise. * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise. (omp_set_teams_thread_limit): Likewise. (ialias): Likewise. * icv-device.c (omp_get_teams_thread_limit): Likewise. (ialias): Likewise. (omp_set_teams_thread_limit): Likewise. * icv.c (omp_set_teams_thread_limit): Removed. (omp_get_teams_thread_limit): Likewise. (ialias): Likewise. * libgomp.texi: Updated documentation for nvptx and gcn corresponding to the limitation of the number of teams. * plugin/plugin-gcn.c (limit_teams): New helper function that limits the number of teams by twice the number of compute units. (parse_target_attributes): Limit the number of teams on gcn offload devices. * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var handling. (gomp_load_image_to_device): Added a size check for the ICVs struct variable. (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to copy back the ICV values from device to host. (GOMP_target_ext): Update the number of teams and threads in the kernel args also considering device-specific values. * testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading of OMP_TEAMS_THREAD_LIMIT from the environment. * testsuite/libgomp.c-c++-common/icv-5.c: Extended. * testsuite/libgomp.c-c++-common/icv-6.c: Extended. * testsuite/libgomp.c-c++-common/icv-7.c: Extended. * testsuite/libgomp.c-c++-common/icv-9.c: New test. * testsuite/libgomp.fortran/icv-5.f90: New test. * testsuite/libgomp.fortran/icv-6.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-teams-1.c: Adapt expected values for num_teams from "1" to "-2" in cases without num_teams clause. * g++.dg/gomp/target-teams-1.C: Likewise. * gfortran.dg/gomp/defaultmap-4.f90: Likewise. * gfortran.dg/gomp/defaultmap-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-6.f90: Likewise.
2022-10-24libgomp/nvptx: Prepare for reverse-offload callback handlingTobias Burnus1-0/+19
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will later handle the reverse offload. For nvptx, it adds support for forwarding the offload gomp_target_ext call to the host by setting values in a struct on the device and querying it on the host - invoking gomp_target_rev on the result. include/ChangeLog: * cuda/cuda.h (enum CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. (CU_MEMHOSTALLOC_DEVICEMAP): Define. (cuMemHostAlloc): Add prototype. libgomp/ChangeLog: * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove 'static' for this variable. * config/nvptx/libgomp-nvptx.h: New file. * config/nvptx/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare extern var. (GOMP_REV_OFFLOAD_VAR): Declare var. (GOMP_target_ext): Handle reverse offload. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ... * target.c (gomp_target_rev): ... this new stub function. * libgomp.h (gomp_target_rev): Declare. * libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev. * plugin/cuda-lib.def (cuMemHostAlloc): Add. * plugin/plugin-nvptx.c: Include libgomp-nvptx.h. (struct ptx_device): Add rev_data member. (nvptx_open_device): Remove async_engines query, last used in r10-304-g1f4c5b9b; add unified-address assert check. (GOMP_OFFLOAD_get_num_devices): Claim unified address support. (GOMP_OFFLOAD_load_image): Free rev_fn_table if no offload functions exist. Make offload var available on host and device. (rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New. (GOMP_OFFLOAD_run): Handle reverse offload.
2022-10-12libgomp: Fix up creation of artificial teamsJakub Jelinek1-3/+12
When not in explicit parallel/target/teams construct, we in some cases create an artificial parallel with a single thread (either to handle target nowait or for task reduction purposes). In those cases, it handled again artificially created implicit task (created by gomp_new_icv for cases where we needed to write to some ICVs), but as the testcases show, didn't take into account possibility of this being done from explicit task(s). The code would destroy/free the previous task and replace it with the new implicit task. If task is an explicit task (when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to a local stack variable, so freeing it doesn't work, and additionally we shouldn't lose the explicit tasks - the new implicit task should instead replace the ancestor task which is the first implicit one. 2022-10-12 Jakub Jelinek <jakub@redhat.com> * task.c (gomp_create_artificial_team): Fix up handling of invocations from within explicit task. * target.c (GOMP_target_ext): Likewise. * testsuite/libgomp.c/task-7.c: New test. * testsuite/libgomp.c/task-8.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-17.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-18.c: New test.
2022-09-09libgomp: Prepare for reverse offload fn lookupTobias Burnus1-1/+1
Prepare for reverse-offloading function-pointer lookup by passing a rev_fn_table argument to GOMP_OFFLOAD_load_image. The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is requested and devices not supported it, are filtered out. (Up to and including this commit, no non-host device claims such support and the caller currently always passes NULL.) libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add 'uint64_t **rev_fn_table' argument. * oacc-host.c (host_load_image): Likewise. * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise; currently unused. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. * target.c (gomp_load_image_to_device): Update call but pass NULL for now. liboffloadmic/ChangeLog: * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image): Add (unused) uint64_t **rev_fn_table argument.
2022-09-08OpenMP, libgomp: Environment variable syntax extensionMarcel Vollweiler1-26/+95
This patch considers the environment variable syntax extension for device-specific variants of environment variables from OpenMP 5.1 (see OpenMP 5.1 specification, p. 75 and p. 639). An environment variable (e.g. OMP_NUM_TEAMS) can have different suffixes: _DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host. _DEV_<device> (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with number <device>. no suffix (e.g. OMP_NUM_TEAMS): affects only the host. In future OpenMP versions also suffix _ALL will be introduced (see discussion https://github.com/OpenMP/spec/issues/3179). This is also considered in this patch: _ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host. The precedence is as follows (descending). For the host: 1. no suffix 2. _ALL For devices: 1. _DEV_<device> 2. _DEV 3. _ALL That means, _DEV_<device> is used whenever available. Otherwise _DEV is used if available, and at last _ALL. If there is no value for any of the variable variants, default values are used as already implemented before. This patch concerns parsing (a), storing (b), output (c) and transmission to the device (d): (a) The actual number of devices and the numbering are not known when parsing the environment variables. Thus all environment variables are iterated and searched for device-specific ones. (b) Only configured device-specific variables are stored. Thus, a linked list is used. (c) The output is done in omp_display_env (see specification p. 468f). Global ICVs are tagged with [all], see https://github.com/OpenMP/spec/issues/3179. ICVs which are not global but aren't handled device-specific yet are tagged with [host]. omp_display_env outputs the initial values of the ICVs. That is why a dedicated data structure is introduced for the inital values only (gomp_initial_icv_list). (d) Device-specific ICVs are transmitted to the device via GOMP_ADDITIONAL_ICVS. libgomp/ChangeLog: * config/gcn/icv-device.c (omp_get_default_device): Return device- specific ICV. (omp_get_max_teams): Added for GCN devices. (omp_set_num_teams): Likewise. (ialias): Likewise. * config/nvptx/icv-device.c (omp_get_default_device): Return device- specific ICV. (omp_get_max_teams): Added for NVPTX devices. (omp_set_num_teams): Likewise. (ialias): Likewise. * env.c (struct gomp_icv_list): New struct to store entries of initial ICV values. (struct gomp_offload_icv_list): New struct to store entries of device- specific ICV values that are copied to the device and back. (struct gomp_default_icv_values): New struct to store default values of ICVs according to the OpenMP standard. (parse_schedule): Generalized for different variants of OMP_SCHEDULE. (print_env_var_error): Function that prints an error for invalid values for ICVs. (parse_unsigned_long_1): Removed getenv. Generalized. (parse_unsigned_long): Likewise. (parse_int_1): Likewise. (parse_int): Likewise. (parse_int_secure): Likewise. (parse_unsigned_long_list): Likewise. (parse_target_offload): Likewise. (parse_bind_var): Likewise. (parse_stacksize): Likewise. (parse_boolean): Likewise. (parse_wait_policy): Likewise. (parse_allocator): Likewise. (omp_display_env): Extended to output different variants of environment variables. (print_schedule): New helper function for omp_display_env which prints the values of run_sched_var. (print_proc_bind): New helper function for omp_display_env which prints the values of proc_bind_var. (enum gomp_parse_type): Collection of types used for parsing environment variables. (ENTRY): Preprocess string lengths of environment variables. (OMP_VAR_CNT): Preprocess table size. (OMP_HOST_VAR_CNT): Likewise. (INT_MAX_STR_LEN): Constant for the maximal number of digits of a device number. (gomp_get_icv_flag): Returns if a flag for a particular ICV is set. (gomp_set_icv_flag): Sets a flag for a particular ICV. (print_device_specific_icvs): New helper function for omp_display_env to print device specific ICV values. (get_device_num): New helper function for parse_device_specific. Extracts the device number from an environment variable name. (get_icv_member_addr): Gets the memory address for a particular member of an ICV struct. (gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list. (initialize_icvs): New function to initialize a gomp_initial_icvs struct. (add_initial_icv_to_list): Adds an ICV struct to gomp_initial_icv_list. (startswith): Checks if a string starts with a given prefix. (initialize_env): Extended to parse the new syntax of environment variables. * icv-device.c (omp_get_max_teams): Added. (ialias): Likewise. (omp_set_num_teams): Likewise. * icv.c (omp_set_num_teams): Moved to icv-device.c. (omp_get_max_teams): Likewise. (ialias): Likewise. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Removed. (GOMP_ADDITIONAL_ICVS): New target-side struct that holds the designated ICVs of the target device. * libgomp.h (enum gomp_icvs): Collection of ICVs. (enum gomp_device_num): Definition of device numbers for _ALL, _DEV, and no suffix. (enum gomp_env_suffix): Collection of possible suffixes of environment variables. (struct gomp_initial_icvs): Contains all ICVs for which we need to store initial values. (struct gomp_default_icv):New struct to hold ICVs for which we need to store initial values. (struct gomp_icv_list): Definition of a linked list that is used for storing ICVs for the devices and also for _DEV, _ALL, and without suffix. (struct gomp_offload_icvs): New struct to hold ICVs that are copied to a device. (struct gomp_offload_icv_list): Definition of a linked list that holds device-specific ICVs that are copied to devices. (gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list. (gomp_get_icv_flag): Returns if a flag for a particular ICV is set. * libgomp.texi: Updated. * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Extended to read further ICVs from the offload image. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. * target.c (gomp_get_offload_icv_item): Get a list item of gomp_offload_icv_list. (get_gomp_offload_icvs): New. Returns the ICV values depending on the device num and the variable hierarchy. (gomp_load_image_to_device): Extended to copy further ICVs to a device. * testsuite/libgomp.c-c++-common/icv-5.c: New test. * testsuite/libgomp.c-c++-common/icv-6.c: New test. * testsuite/libgomp.c-c++-common/icv-7.c: New test. * testsuite/libgomp.c-c++-common/icv-8.c: New test. * testsuite/libgomp.c-c++-common/omp-display-env-1.c: New test. * testsuite/libgomp.c-c++-common/omp-display-env-2.c: New test.
2022-07-12libgomp: Add tailing \n to gomp_debugTobias Burnus1-1/+1
Contrary to gomp_{error,warning,fatal}, no tailing '\n' is added with gomp_debug; only affected was a 'requires'-related output. libgomp/ChangeLog: * target.c (gomp_target_init): Added tailing '\n' to gomp_debug.
2022-07-07Fix Intel MIC 'mkoffload' for OpenMP 'requires'Thomas Schwinge1-0/+4
Similar to how the other 'mkoffload's got changed in recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 "OpenMP: Move omp requires checks to libgomp". This also means finally switching Intel MIC 'mkoffload' to 'GOMP_offload_register_ver', 'GOMP_offload_unregister_ver', making 'GOMP_offload_register', 'GOMP_offload_unregister' legacy entry points. gcc/ * config/i386/intelmic-mkoffload.cc (generate_host_descr_file) (prepare_target_image, main): Handle OpenMP 'requires'. (generate_host_descr_file): Switch to 'GOMP_offload_register_ver', 'GOMP_offload_unregister_ver'. libgomp/ * target.c (GOMP_offload_register, GOMP_offload_unregister): Denote as legacy entry points. * testsuite/lib/libgomp.exp (check_effective_target_offload_target_any): New proc. * testsuite/libgomp.c-c++-common/requires-1.c: Enable for 'offload_target_any'. * testsuite/libgomp.c-c++-common/requires-3.c: Likewise. * testsuite/libgomp.c-c++-common/requires-7.c: Likewise. * testsuite/libgomp.fortran/requires-1.f90: Likewise.
2022-07-06Restore 'GOMP_offload_unregister_ver' functionalityThomas Schwinge1-7/+23
The recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 "OpenMP: Move omp requires checks to libgomp" changed the 'GOMP_offload_register_ver' interface but didn't change 'GOMP_offload_unregister_ver' accordingly, so we're no longer actually unregistering. gcc/ * config/gcn/mkoffload.cc (process_obj): Clarify 'target_data' -> '[...]_data'. * config/nvptx/mkoffload.cc (process): Likewise. libgomp/ * target.c (GOMP_offload_register_ver): Clarify 'target_data' -> 'data'. (GOMP_offload_unregister_ver): Likewise. Fix up 'target_data'.
2022-07-04OpenMP: Move omp requires checks to libgompTobias Burnus1-3/+73
Handle reverse_offload, unified_address, and unified_shared_memory requirements in libgomp by saving them alongside the offload table. When the device lto1 runs, it extracts the data for mkoffload. The latter than passes the value on to GOMP_offload_register_ver. lto1 (either the host one, with -flto [+ ENABLE_OFFLOADING], or in the offload-device lto1) also does the the consistency check is done, erroring out when the 'omp requires' clause use is inconsistent. For all in-principle supported devices, if a requirement cannot be fulfilled, the device is excluded from the (supported) devices list. Currently, none of those requirements are marked as supported for any of the non-host devices. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update, c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Set OMP_REQUIRES_TARGET_USED. (c_parser_omp_requires): Remove sorry. gcc/ChangeLog: * config/gcn/mkoffload.cc (process_asm): Write '#include <stdint.h>'. (process_obj): Pass omp_requires_mask to GOMP_offload_register_ver. (main): Ask lto1 to obtain omp_requires_mask and pass it on. * config/nvptx/mkoffload.cc (process, main): Likewise. * lto-cgraph.cc (omp_requires_to_name): New. (input_offload_tables): Save omp_requires_mask. (output_offload_tables): Read it, check for consistency, save value for mkoffload. * omp-low.cc (lower_omp_target): Force output_offloadtables call for OMP_REQUIRES_TARGET_USED. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_target_data, cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data, cp_parser_omp_target_update): Set OMP_REQUIRES_TARGET_USED. (cp_parser_omp_requires): Remove sorry. gcc/fortran/ChangeLog: * openmp.cc (gfc_match_omp_requires): Remove sorry. * parse.cc (decode_omp_directive): Don't regard 'declare target' as target usage for 'omp requires'; add more flags to omp_requires_mask. include/ChangeLog: * gomp-constants.h (GOMP_VERSION): Bump to 2. (GOMP_REQUIRES_UNIFIED_ADDRESS, GOMP_REQUIRES_UNIFIED_SHARED_MEMORY, GOMP_REQUIRES_REVERSE_OFFLOAD, GOMP_REQUIRES_TARGET_USED): New defines. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_get_num_devices): Add omp_requires_mask arg. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Likewise; return -1 when device available but omp_requires_mask != 0. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise. * oacc-host.c (host_get_num_devices, host_openacc_get_property): Update call. * oacc-init.c (resolve_device, acc_init_1, acc_shutdown_1, goacc_attach_host_thread_to_device, acc_get_num_devices, acc_set_device_num, get_property_any): Likewise. * target.c (omp_requires_mask): New global var. (gomp_requires_to_name): New. (GOMP_offload_register_ver): Handle passed omp_requires_mask. (gomp_target_init): Handle omp_requires_mask. * libgomp.texi (OpenMP 5.0): Update requires impl. status. (OpenMP 5.1): Add a missed item. (OpenMP 5.2): Mark linear-clause change as supported in C/C++. * testsuite/libgomp.c-c++-common/requires-1-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-1.c: New test. * testsuite/libgomp.c-c++-common/requires-2-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-2.c: New test. * testsuite/libgomp.c-c++-common/requires-3-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-3.c: New test. * testsuite/libgomp.c-c++-common/requires-4-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-4.c: New test. * testsuite/libgomp.c-c++-common/requires-5-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-5.c: New test. * testsuite/libgomp.c-c++-common/requires-6.c: New test. * testsuite/libgomp.c-c++-common/requires-7-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-7.c: New test. * testsuite/libgomp.fortran/requires-1-aux.f90: New test. * testsuite/libgomp.fortran/requires-1.f90: New test. liboffloadmic/ChangeLog: * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_num_devices): Return -1 when device available but omp_requires_mask != 0. gcc/testsuite/ChangeLog: * c-c++-common/gomp/requires-4.c: Update dg-*. * c-c++-common/gomp/reverse-offload-1.c: Likewise. * c-c++-common/gomp/target-device-ancestor-2.c: Likewise. * c-c++-common/gomp/target-device-ancestor-3.c: Likewise. * c-c++-common/gomp/target-device-ancestor-4.c: Likewise. * c-c++-common/gomp/target-device-ancestor-5.c: Likewise. * gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise. * gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise. * gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise. * gfortran.dg/gomp/target-device-ancestor-2.f90: Likewise. Move post-FE checks to ... * gfortran.dg/gomp/target-device-ancestor-2a.f90: ... this new file. * gfortran.dg/gomp/requires-8.f90: Update as we don't regard 'declare target' for the 'requires' usage requirement. Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com> Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-06-13openmp: Conforming device numbers and omp_{initial,invalid}_deviceJakub Jelinek1-67/+60
OpenMP 5.2 changed once more what device numbers are allowed. In 5.1, valid device numbers were [0, omp_get_num_devices()]. 5.2 makes also -1 valid (calls it omp_initial_device), which is equivalent in behavior to omp_get_num_devices() number but has the advantage that it is a constant. And it also introduces omp_invalid_device which is also a constant with implementation defined value < -1. That value should act like sNaN, any time any device construct (GOMP_target*) or OpenMP runtime API routine is asked for such a device, the program is terminated. And if OMP_TARGET_OFFLOAD=mandatory, all non-conforming device numbers (which is all but [-1, omp_get_num_devices()] other than omp_invalid_device) must be treated like omp_invalid_device. For device constructs, we have a compatibility problem, we've historically used 2 magic negative values to mean something special. GOMP_DEVICE_ICV (-1) means device clause wasn't present, pick the omp_get_default_device () number GOMP_DEVICE_FALLBACK (-2) means the host device (this is used e.g. for #pragma omp target if (cond) where if cond is false, we pass -2 But 5.2 requires that omp_initial_device is -1 (there were discussions about it, advantage of -1 is that one can say iterate over the [-1, omp_get_num_devices()-1] range to get all devices starting with the host/initial one. And also, if user passes -2, unless it is omp_invalid_device, we need to treat it like non-conforming with OMP_TARGET_OFFLOAD=mandatory. So, the patch does on the compiler side some number remapping, user_device_num >= -2U ? user_device_num - 1 : user_device_num. This remapping is done at compile time if device clause has constant argument, otherwise at runtime, and means that for user -1 (omp_initial_device) we pass -2 to GOMP_* in the runtime library where it treats it like host fallback, while -2 is remapped to -3 (one of the non-conforming device numbers, for those it doesn't matter which one is which). omp_invalid_device is then -4. For the OpenMP device runtime APIs, no remapping is done. This patch doesn't deal with the initial default-device-var for OMP_TARGET_OFFLOAD=mandatory , the spec says that the inital ICV value for that should in that case depend on whether there are any offloading devices or not (if not, should be omp_invalid_device), but that means we can't determine the number of devices lazily (and let libraries have the possibility to register their offloading data etc.). 2022-06-13 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-expand.cc (expand_omp_target): Remap user provided device clause arguments, -1 to -2 and -2 to -3, either at compile time if constant, or at runtime. include/ * gomp-constants.h (GOMP_DEVICE_INVALID): Define. libgomp/ * omp.h.in (omp_initial_device, omp_invalid_device): New enumerators. * omp_lib.f90.in (omp_initial_device, omp_invalid_device): New parameters. * omp_lib.h.in (omp_initial_device, omp_invalid_device): Likewise. * target.c (resolve_device): Add remapped argument, handle GOMP_DEVICE_ICV only if remapped is true (and clear remapped), for negative values, treat GOMP_DEVICE_FALLBACK as fallback only if remapped, otherwise treat omp_initial_device that way. For omp_invalid_device, always emit gomp_fatal, even when OMP_TARGET_OFFLOAD isn't mandatory. (GOMP_target, GOMP_target_ext, GOMP_target_data, GOMP_target_data_ext, GOMP_target_update, GOMP_target_update_ext, GOMP_target_enter_exit_data): Pass true as remapped argument to resolve_device. (omp_target_alloc, omp_target_free, omp_target_is_present, omp_target_memcpy_check, omp_target_associate_ptr, omp_target_disassociate_ptr, omp_get_mapped_ptr, omp_target_is_accessible): Pass false as remapped argument to resolve_device. Treat omp_initial_device the same as gomp_get_num_devices (). Don't bypass resolve_device calls if device_num is negative. (omp_pause_resource): Treat omp_initial_device the same as gomp_get_num_devices (). Call resolve_device. * icv-device.c (omp_set_default_device): Always set to device_num even when it is negative. * libgomp.texi: Document that Conforming device numbers, omp_initial_device and omp_invalid_device is implemented. * testsuite/libgomp.c/target-41.c (main): Add test with omp_initial_device. * testsuite/libgomp.c/target-45.c: New test. * testsuite/libgomp.c/target-46.c: New test. * testsuite/libgomp.c/target-47.c: New test. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add test with omp_initial_device. Use -5 instead of -1 for negative value test. * testsuite/libgomp.fortran/target-is-accessible-1.f90 (main): Likewise. Reorder stop numbers.
2022-05-23OpenMP: Handle descriptors in target's firstprivate [PR104949]Tobias Burnus1-0/+22
For allocatable/pointer arrays, a firstprivate to a device not only needs to privatize the descriptor but also the actual data. This is implemented as: firstprivate(x) firstprivate(x.data) attach(x [bias: &x.data-&x) where the address of x in device memory is saved in hostaddrs[i] by libgomp and the middle end actually passes hostaddrs[i]' to attach. As side effect, has_device_addr(array_desc) had to be changed: before, it was converted to firstprivate in the front end; now it is handled in omp-low.cc as has_device_addr requires a shallow firstprivate (not touching the data pointer) while the normal firstprivate requires (now) a deep firstprivate. gcc/fortran/ChangeLog: PR fortran/104949 * f95-lang.cc (LANG_HOOKS_OMP_ARRAY_SIZE): Redefine. * trans-openmp.cc (gfc_omp_array_size): New. (gfc_trans_omp_variable_list): Never turn has_device_addr to firstprivate. * trans.h (gfc_omp_array_size): New. gcc/ChangeLog: PR fortran/104949 * langhooks-def.h (lhd_omp_array_size): New. (LANG_HOOKS_OMP_ARRAY_SIZE): Define. (LANG_HOOKS_DECLS): Add it. * langhooks.cc (lhd_omp_array_size): New. * langhooks.h (struct lang_hooks_for_decls): Add hook. * omp-low.cc (scan_sharing_clauses, lower_omp_target): Handle GOMP_MAP_FIRSTPRIVATE for array descriptors. libgomp/ChangeLog: PR fortran/104949 * target.c (gomp_map_vars_internal, copy_firstprivate_data): Support attach for GOMP_MAP_FIRSTPRIVATE. * testsuite/libgomp.fortran/target-firstprivate-1.f90: New test. * testsuite/libgomp.fortran/target-firstprivate-2.f90: New test. * testsuite/libgomp.fortran/target-firstprivate-3.f90: New test.
2022-05-20libgomp: Add new runtime routines omp_target_memcpy_async and ↵Marcel Vollweiler1-54/+236
omp_target_memcpy_rect_async This patch adds two new OpenMP runtime routines: omp_target_memcpy_async and omp_target_memcpy_rect_async. Both functions are introduced in OpenMP 5.1 as asynchronous variants of omp_target_memcpy and omp_target_memcpy_rect. In contrast to the synchronous variants, the asynchronous functions have two additional function parameters to allow the specification of task dependences: int depobj_count omp_depend_t *depobj_list integer(c_int), value :: depobj_count integer(omp_depend_kind), optional :: depobj_list(*) The implementation splits the synchronous functions into two parts: (a) check and (b) copy. Then (a) is used in the asynchronous functions for the sequential part, and the actual copy process (b) is executed in a new created task. The sequential part (a) takes into account the requirements for the return values: "The routine returns zero if successful. Otherwise, it returns a non-zero value." (omp_target_memcpy_async, OpenMP 5.1 spec, section 3.8.7) "An application can determine the number of inclusive dimensions supported by an implementation by passing NULL pointers (or C_NULL_PTR, for Fortran) for both dst and src. The routine returns the number of dimensions supported by the implementation for the specified device numbers. No copy operation is performed." (omp_target_memcpy_rect_async, OpenMP 5.1 spec, section 3.8.8) Due to asynchronicity an error is thrown if the asynchronous memcpy is not successful (in contrast to the synchronous functions which use a return value unequal to zero). gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added target_memcpy_async and target_memcpy_rect_async to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_target_memcpy_async and omp_target_memcpy_rect_async. * libgomp.texi: Both functions are now supported. * omp.h.in: Added omp_target_memcpy_async and omp_target_memcpy_rect_async. * omp_lib.f90.in: Added interfaces for both new functions. * omp_lib.h.in: Likewise. * target.c (ialias_redirect): Added for GOMP_task. (omp_target_memcpy): Restructured into check and copy part. (omp_target_memcpy_check): New helper function for omp_target_memcpy and omp_target_memcpy_async that checks requirements. (omp_target_memcpy_copy): New helper function for omp_target_memcpy and omp_target_memcpy_async that performs the memcpy. (omp_target_memcpy_async_helper): New helper function that is used in omp_target_memcpy_async for the asynchronous task. (omp_target_memcpy_async): Added. (omp_target_memcpy_rect): Restructured into check and copy part. (omp_target_memcpy_rect_check): New helper function for omp_target_memcpy_rect and omp_target_memcpy_rect_async that checks requirements. (omp_target_memcpy_rect_copy): New helper function for omp_target_memcpy_rect and omp_target_memcpy_rect_async that performs the memcpy. (omp_target_memcpy_rect_async_helper): New helper function that is used in omp_target_memcpy_rect_async for the asynchronous task. (omp_target_memcpy_rect_async): Added. * task.c (ialias): Added for GOMP_task. * testsuite/libgomp.c-c++-common/target-memcpy-async-1.c: New test. * testsuite/libgomp.c-c++-common/target-memcpy-async-2.c: New test. * testsuite/libgomp.c-c++-common/target-memcpy-rect-async-1.c: New test. * testsuite/libgomp.c-c++-common/target-memcpy-rect-async-2.c: New test. * testsuite/libgomp.fortran/target-memcpy-async-1.f90: New test. * testsuite/libgomp.fortran/target-memcpy-async-2.f90: New test. * testsuite/libgomp.fortran/target-memcpy-rect-async-1.f90: New test. * testsuite/libgomp.fortran/target-memcpy-rect-async-2.f90: New test.
2022-05-06OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.Marcel Vollweiler1-0/+18
gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added target_is_accessible to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_target_is_accessible. * libgomp.texi: Tagged omp_target_is_accessible as supported. * omp.h.in: Added omp_target_is_accessible. * omp_lib.f90.in: Added interface for omp_target_is_accessible. * omp_lib.h.in: Likewise. * target.c (omp_target_is_accessible): Added implementation of omp_target_is_accessible. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c: New test. * testsuite/libgomp.fortran/target-is-accessible-1.f90: New test.
2022-05-02OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.Marcel Vollweiler1-0/+38
This patch adds the OpenMP runtime routine "omp_get_mapped_ptr" which was introduced in OpenMP 5.1. gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added get_mapped_ptr to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_get_mapped_ptr. * libgomp.texi: Tagged omp_get_mapped_ptr as supported. * omp.h.in: Added omp_get_mapped_ptr. * omp_lib.f90.in: Added interface for omp_get_mapped_ptr. * omp_lib.h.in: Likewise. * target.c (omp_get_mapped_ptr): Added implementation of omp_get_mapped_ptr. * testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test. * testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.
2022-02-09C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct.Marcel Vollweiler1-1/+1
This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct which was introduced in OpenMP 5.1 (OpenMP API 5.1 specification pp. 197ff): has_device_addr(list) "The has_device_addr clause indicates that its list items already have device addresses and therefore they may be directly accessed from a target device. If the device address of a list item is not for the device on which the target region executes, accessing the list item inside the region results in unspecified behavior. The list items may include array sections." (p. 200) "A list item may not be specified in both an is_device_ptr clause and a has_device_addr clause on the directive." (p. 202) "A list item that appears in an is_device_ptr or a has_device_addr clause must not be specified in any data-sharing attribute clause on the same target construct." (p. 203) gcc/c-family/ChangeLog: * c-omp.cc (c_omp_split_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * c-pragma.h (enum pragma_kind): Added 5.1 in comment. (enum pragma_omp_clause): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_name): Parse 'has_device_addr' clause. (c_parser_omp_variable_list): Handle array sections. (c_parser_omp_clause_has_device_addr): Added. (c_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (c_parser_omp_target_exit_data): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * c-typeck.cc (handle_omp_array_sections): Handle clause restrictions. (c_finish_omp_clauses): Handle array sections. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_clause_name): Parse 'has_device_addr' clause. (cp_parser_omp_var_list_no_open): Handle array sections. (cp_parser_omp_all_clauses): Added PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case. (cp_parser_omp_target_update): Added HAS_DEVICE_ADDR to OMP_CLAUSE_MASK. * semantics.cc (handle_omp_array_sections): Handle clause restrictions. (finish_omp_clauses): Handle array sections. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_clauses): Added OMP_LIST_HAS_DEVICE_ADDR case. * gfortran.h: Added OMP_LIST_HAS_DEVICE_ADDR. * openmp.cc (enum omp_mask2): Added OMP_CLAUSE_HAS_DEVICE_ADDR. (gfc_match_omp_clauses): Parse HAS_DEVICE_ADDR clause. (resolve_omp_clauses): Same. * trans-openmp.cc (gfc_trans_omp_variable_list): Added OMP_LIST_HAS_DEVICE_ADDR case. (gfc_trans_omp_clauses): Firstprivatize of array descriptors. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Added cases for OMP_CLAUSE_HAS_DEVICE_ADDR and handle array sections. (gimplify_adjust_omp_clauses): Added OMP_CLAUSE_HAS_DEVICE_ADDR case. * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_HAS_DEVICE_ADDR. (lower_omp_target): Same. * tree-core.h (enum omp_clause_code): Same. * tree-nested.cc (convert_nonlocal_omp_clauses): Same. (convert_local_omp_clauses): Same. * tree-pretty-print.cc (dump_omp_clause): Same. * tree.cc: Same. libgomp/ChangeLog: * libgomp.texi: Updated entry for HAS_DEVICE_ADDR. * target.c (copy_firstprivate_data): Copy only if host address is not NULL. * testsuite/libgomp.c++/target-has-device-addr-2.C: New test. * testsuite/libgomp.c++/target-has-device-addr-4.C: New test. * testsuite/libgomp.c++/target-has-device-addr-5.C: New test. * testsuite/libgomp.c++/target-has-device-addr-6.C: New test. * testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test. * testsuite/libgomp.c/target-has-device-addr-3.c: New test. * testsuite/libgomp.fortran/target-has-device-addr-1.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-2.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-3.f90: New test. * testsuite/libgomp.fortran/target-has-device-addr-4.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/clauses-1.c: Added has_device_addr to test cases. * g++.dg/gomp/attrs-1.C: Added has_device_addr to test cases. * g++.dg/gomp/attrs-2.C: Added has_device_addr to test cases. * c-c++-common/gomp/target-has-device-addr-1.c: New test. * c-c++-common/gomp/target-has-device-addr-2.c: New test. * c-c++-common/gomp/target-is-device-ptr-1.c: New test. * c-c++-common/gomp/target-is-device-ptr-2.c: New test. * gfortran.dg/gomp/is_device_ptr-3.f90: New test. * gfortran.dg/gomp/target-has-device-addr-1.f90: New test. * gfortran.dg/gomp/target-has-device-addr-2.f90: New test.
2022-01-03Update copyright years.Jakub Jelinek1-1/+1
2021-12-09OpenMP 5.0: Remove array section base-pointer mapping semantics and other ↵Chung-Lin Tang1-21/+88
front-end adjustments This patch implements three pieces of functionality: (1) Adjust array section mapping to have standards conforming behavior, mapping array sections should *NOT* also map the base-pointer: struct S { int *ptr; ... }; struct S s; Instead of generating this during gimplify: map(to:*_1 [len: 400]) map(attach:s.ptr [bias: 0]) Now, adjust to: (i.e. do not map the base-pointer together. The attach operation is still generated, and if s.ptr is already mapped prior, attachment will happen) The correct way of achieving the base-pointer-also-mapped behavior would be to use: (A small Fortran front-end patch to trans-openmp.c:gfc_trans_omp_array_section is also included, which removes generation of a GOMP_MAP_ALWAYS_POINTER for array types, which appears incorrect and causes a regression in libgomp.fortranlibgomp.fortran/struct-elem-map-1.f90) (2) Related to the first item above, are fixes in libgomp/target.c to not overwrite attached pointers when handling device<->host copies, mainly for the "always" case. (3) The third is a set of changes to the C/C++ front-ends to extend the allowed component access syntax in map clauses. These changes are enabled for both OpenACC and OpenMP. gcc/c/ChangeLog: * c-parser.c (struct omp_dim): New struct type for use inside c_parser_omp_variable_list. (c_parser_omp_variable_list): Allow multiple levels of array and component accesses in array section base-pointer expression. (c_parser_omp_clause_to): Set 'allow_deref' to true in call to c_parser_omp_var_list_parens. (c_parser_omp_clause_from): Likewise. * c-typeck.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (c_finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/cp/ChangeLog: * parser.c (struct omp_dim): New struct type for use inside cp_parser_omp_var_list_no_open. (cp_parser_omp_var_list_no_open): Allow multiple levels of array and component accesses in array section base-pointer expression. (cp_parser_omp_all_clauses): Set 'allow_deref' to true in call to cp_parser_omp_var_list for to/from clauses. * semantics.c (handle_omp_array_sections_1): Extend allowed range of base-pointer expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. (handle_omp_array_sections): Adjust pointer map generation of references. (finish_omp_clauses): Extend allowed ranged of expressions involving INDIRECT/MEM/ARRAY_REF and POINTER_PLUS_EXPR. gcc/fortran/ChangeLog: * trans-openmp.c (gfc_trans_omp_array_section): Do not generate GOMP_MAP_ALWAYS_POINTER map for main array maps of ARRAY_TYPE type. gcc/ChangeLog: * gimplify.c (extract_base_bit_offset): Add 'tree *offsetp' parameter, accomodate case where 'offset' return of get_inner_reference is non-NULL. (is_or_contains_p): Further robustify conditions. (omp_target_reorder_clauses): In alloc/to/from sorting phase, also move following GOMP_MAP_ALWAYS_POINTER maps along. Add new sorting phase where we make sure pointers with an attach/detach map are ordered correctly. (gimplify_scan_omp_clauses): Add modifications to avoid creating GOMP_MAP_STRUCT and associated alloc map for attach/detach maps. gcc/testsuite/ChangeLog: * c-c++-common/goacc/deep-copy-arrayofstruct.c: Adjust testcase. * c-c++-common/gomp/target-enter-data-1.c: New testcase. * c-c++-common/gomp/target-implicit-map-2.c: New testcase. libgomp/ChangeLog: * target.c (gomp_map_vars_existing): Make sure attached pointer is not overwritten during cross-host/device copying. (gomp_update): Likewise. (gomp_exit_data): Likewise. * testsuite/libgomp.c++/target-11.C: Adjust testcase. * testsuite/libgomp.c++/target-12.C: Likewise. * testsuite/libgomp.c++/target-15.C: Likewise. * testsuite/libgomp.c++/target-16.C: Likewise. * testsuite/libgomp.c++/target-17.C: Likewise. * testsuite/libgomp.c++/target-21.C: Likewise. * testsuite/libgomp.c++/target-23.C: Likewise. * testsuite/libgomp.c/target-23.c: Likewise. * testsuite/libgomp.c/target-29.c: Likewise. * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: New testcase.
2021-12-08openmp: Improve OpenMP target support for C++ (PR92120)Chung-Lin Tang1-24/+52
This patch implements several C++ specific mapping capabilities introduced for OpenMP 5.0, including implicit mapping of this[:1] for non-static member functions, zero-length array section mapping of pointer-typed members, lambda captured variable access in target regions, and use of lambda objects inside target regions. Several adjustments to the C/C++ front-ends to allow more member-access syntax as valid is also included. PR middle-end/92120 gcc/cp/ChangeLog: * cp-tree.h (finish_omp_target): New declaration. (finish_omp_target_clauses): Likewise. * parser.c (cp_parser_omp_clause_map): Adjust call to cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true. (cp_parser_omp_target): Factor out code, adjust into calls to new function finish_omp_target. * pt.c (tsubst_expr): Add call to finish_omp_target_clauses for OMP_TARGET case. * semantics.c (handle_omp_array_sections_1): Add handling to create 'this->member' from 'member' FIELD_DECL. Remove case of rejecting 'this' when not in declare simd. (handle_omp_array_sections): Likewise. (finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP map clauses. Handle 'A->member' case in map clauses. Remove case of rejecting 'this' when not in declare simd. (struct omp_target_walk_data): New struct for walking over target-directive tree body. (finish_omp_target_clauses_r): New function for tree walk. (finish_omp_target_clauses): New function. (finish_omp_target): New function. gcc/c/ChangeLog: * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in call to c_parser_omp_variable_list to 'true'. * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in array base handling. (c_finish_omp_clauses): Handle 'A->member' case in map clauses. gcc/ChangeLog: * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map<tree_operand, tree> *. Adjust struct map handling to handle cases of *A and A->B expressions. Under !DECL_P case of GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when handling component_ref_p case. Add unshare_expr and gimplification when created GOMP_MAP_STRUCT is not a DECL. Add code to add firstprivate pointer for *pointer-to-struct case. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. * omp-low.c (lower_omp_target): Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. * tree-pretty-print.c (dump_omp_clause): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/gomp/target-3.c: New testcase. * g++.dg/gomp/target-3.C: New testcase. * g++.dg/gomp/target-lambda-1.C: New testcase. * g++.dg/gomp/target-lambda-2.C: New testcase. * g++.dg/gomp/target-this-1.C: New testcase. * g++.dg/gomp/target-this-2.C: New testcase. * g++.dg/gomp/target-this-3.C: New testcase. * g++.dg/gomp/target-this-4.C: New testcase. * g++.dg/gomp/target-this-5.C: New testcase. * g++.dg/gomp/this-2.C: Adjust testcase. include/ChangeLog: * gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. (GOMP_MAP_POINTER_P): Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION. libgomp/ChangeLog: * libgomp.h (gomp_attach_pointer): Add bool parameter. * oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer. (goacc_enter_data_internal): Likewise. * target.c (gomp_map_vars_existing): Update assert condition to include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION. (gomp_map_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for mapping a pointer with NULL target. (gomp_attach_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for attaching a pointer with NULL target. (gomp_map_vars_internal): Update calls to gomp_map_pointer and gomp_attach_pointer, add handling for GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases. * testsuite/libgomp.c++/target-23.C: New testcase. * testsuite/libgomp.c++/target-lambda-1.C: New testcase. * testsuite/libgomp.c++/target-lambda-2.C: New testcase. * testsuite/libgomp.c++/target-this-1.C: New testcase. * testsuite/libgomp.c++/target-this-2.C: New testcase. * testsuite/libgomp.c++/target-this-3.C: New testcase. * testsuite/libgomp.c++/target-this-4.C: New testcase. * testsuite/libgomp.c++/target-this-5.C: New testcase.
2021-11-15openmp: Add support for thread_limit clause on targetJakub Jelinek1-4/+24
OpenMP 5.1 says that thread_limit clause can also appear on target, and similarly to teams should affect the thread-limit-var ICV. On combined target teams, the clause goes to both. We actually passed thread_limit internally on target already before, but only used it for gcn/ptx offloading to hint how many threads should be created and for ptx didn't set thread_limit_var in that case. Similarly for host fallback. Also, I found that we weren't copying the args array that contains encoded thread_limit and num_teams clause for target (etc.) for async target. 2021-11-15 Jakub Jelinek <jakub@redhat.com> gcc/ * gimplify.c (optimize_target_teams): Only add OMP_CLAUSE_THREAD_LIMIT to OMP_TARGET_CLAUSES if it isn't there already. gcc/c-family/ * c-omp.c (c_omp_split_clauses) <case OMP_CLAUSE_THREAD_LIMIT>: Duplicate to both OMP_TARGET and OMP_TEAMS. gcc/c/ * c-parser.c (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_THREAD_LIMIT. gcc/cp/ * parser.c (OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_THREAD_LIMIT. libgomp/ * task.c (gomp_create_target_task): Copy args array as well. * target.c (gomp_target_fallback): Add args argument. Set gomp_icv (true)->thread_limit_var if thread_limit is present. (GOMP_target): Adjust gomp_target_fallback caller. (GOMP_target_ext): Likewise. (gomp_target_task_fn): Likewise. * config/nvptx/team.c (gomp_nvptx_main): Set gomp_global_icv.thread_limit_var. * testsuite/libgomp.c-c++-common/thread-limit-1.c: New test.
2021-11-12openmp: Relax handling of implicit map vs. existing device mappingsChung-Lin Tang1-17/+57
This patch implements relaxing the requirements when a map with the implicit attribute encounters an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, lines 18-27 (and 5.1 spec, page 352, lines 13-22): "If a single contiguous part of the original storage of a list item with an implicit data-mapping attribute has corresponding storage in the device data environment prior to a task encountering the construct that is associated with the map clause, only that part of the original storage will have corresponding storage in the device data environment as a result of the map clause." 2021-11-12 Chung-Lin Tang <cltang@codesourcery.com> include/ChangeLog: * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define special bit macro. (GOMP_MAP_IMPLICIT): New special map kind bits value. (GOMP_MAP_FLAG_SPECIAL_BITS): Define helper mask for whole set of special map kind bits. (GOMP_MAP_IMPLICIT_P): New predicate macro for implicit map kinds. gcc/ChangeLog: * tree.h (OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P): New access macro for 'implicit' bit, using 'base.deprecated_flag' field of tree_node. * tree-pretty-print.c (dump_omp_clause): Add support for printing implicit attribute in tree dumping. * gimplify.c (gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P to 1 if map clause is implicitly created. (gimplify_adjust_omp_clauses): Adjust place of adding implicitly created clauses, from simple append, to starting of list, after non-map clauses. * omp-low.c (lower_omp_target): Add GOMP_MAP_IMPLICIT bits into kind values passed to libgomp for implicit maps. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-implicit-map-1.c: New test. * c-c++-common/goacc/combined-reduction.c: Adjust scan test pattern. * c-c++-common/goacc/firstprivate-mappings-1.c: Likewise. * c-c++-common/goacc/mdc-1.c: Likewise. * g++.dg/goacc/firstprivate-mappings-1.C: Likewise. libgomp/ChangeLog: * target.c (gomp_map_vars_existing): Add 'bool implicit' parameter, add implicit map handling to allow a "superset" existing map as valid case. (get_kind): Adjust to filter out GOMP_MAP_IMPLICIT bits in return value. (get_implicit): New function to extract implicit status. (gomp_map_fields_existing): Adjust arguments in calls to gomp_map_vars_existing, and add uses of get_implicit. (gomp_map_vars_internal): Likewise. * testsuite/libgomp.c-c++-common/target-implicit-map-1.c: New test.
2021-11-12openmp: Honor OpenMP 5.1 num_teams lower boundJakub Jelinek1-0/+26
The following patch implements what I've been talking about earlier, honor that for explicit num_teams clause we create at least the lower-bound (if not specified, upper-bound) teams in the league. For host fallback, it still means we only have one thread doing all the teams, sequentially one after another. For PTX and GCN, I think the new teams-2.c test and maybe teams-4.c too will or might fail. For these offloads, I think it is ok to remove symbols no longer used from libgomp.a. If num_teams_lower is bigger than the provided num_blocks or num_workgroups, we should arrange for gomp_num_teams_var to be num_teams_lower - 1, stop using the %ctaid.x or __builtin_gcn_dim_pos (0) for omp_get_team_num () and instead use for it some .shared var that GOMP_teams4 initializes to %ctaid.x or __builtin_gcn_dim_pos (0) when first and for !first increment that by num_blocks or num_workgroups each time and only return false when we are above num_teams_lower. Any help with actually implementing this for the 2 architectures highly appreciated. 2021-11-12 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-builtins.def (BUILT_IN_GOMP_TEAMS): Remove. (BUILT_IN_GOMP_TEAMS4): New. * builtin-types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. * omp-low.c (lower_omp_teams): Use GOMP_teams4 instead of GOMP_teams, pass to it also num_teams lower-bound expression or a dup of upper-bound if it is missing and a flag whether it is the first call or not. gcc/fortran/ * types.def (BT_FN_VOID_UINT_UINT): Remove. (BT_FN_BOOL_UINT_UINT_UINT_BOOL): New. libgomp/ * libgomp_g.h (GOMP_teams4): Declare. * libgomp.map (GOMP_5.1): Export GOMP_teams4. * target.c (GOMP_teams4): New function. * config/nvptx/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * config/gcn/target.c (GOMP_teams): Remove. (GOMP_teams4): New function. * testsuite/libgomp.c/teams-4.c (main): Expect exactly 2 teams instead of <= 2. * testsuite/libgomp.c-c++-common/teams-2.c: New test.
2021-10-12libgomp: Release device lock on cbuf error pathJulian Brown1-1/+4
This patch releases the device lock on a sanity-checking error path in transfer combining (cbuf) handling in libgomp:target.c. This shouldn't happen when handling well-formed mapping clauses, but erroneous clauses can currently cause a hang if the condition triggers. 2021-12-10 Julian Brown <julian@codesourcery.com> libgomp/ * target.c (gomp_copy_host2dev): Release device lock on cbuf error path.