aboutsummaryrefslogtreecommitdiff
path: root/libgomp/libgomp.h
AgeCommit message (Collapse)AuthorFilesLines
2023-07-26OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rectTobias Burnus1-0/+2
When copying a 2D or 3D rectangular memmory block, the performance is better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the data one by one. That's what this commit does. Additionally, it permits device-to-device copies, if neccessary using a temporary variable on the host. include/ChangeLog: * cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE. (CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): New typdefs. (cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer, cuMemcpy3DPeerAsync): New prototypes. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New prototypes. * libgomp.h (struct gomp_device_descr): Add memcpy2d_func and memcpy3d_func. * libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used. * oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL. * plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D): Invoke via CUDA_ONE_CALL. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New. * target.c (omp_target_memcpy_rect_worker): (omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy): Permit all device-to-device copyies; invoke new plugins for 2D and 3D copying when available. (gomp_load_plugin_for_device): DLSYM the new plugin functions. * testsuite/libgomp.c/target-12.c: Fix dimension bug. * testsuite/libgomp.fortran/target-12.f90: Likewise. * testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
2023-05-08libgomp: Simplify OpenMP reverse offload host <-> device memory copy ↵Thomas Schwinge1-4/+1
implementation ... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it. Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609 "libgomp/nvptx: Prepare for reverse-offload callback handling", and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8 "libgomp: Handle OpenMP's reverse offloads". libgomp/ * target.c (gomp_target_rev): Instead of 'dev_to_host_cpy', 'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'. * libgomp.h (gomp_target_rev): Adjust. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust. * plugin/plugin-gcn.c (process_reverse_offload): Adjust. * plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy) (rev_off_host_to_dev_cpy): Remove. (GOMP_OFFLOAD_run): Adjust.
2023-02-02amdgcn, libgomp: Manually allocated stacksAndrew Stubbs1-2/+3
Switch from using stacks in the "private segment" to using a memory block allocated on the host side. The primary reason is to permit the reverse offload implementation to access values located on the device stack, but there may also be performance benefits, especially with repeated kernel invocations. This implementation unifies the stacks with the "team arena" optimization feature, and now allows both to have run-time configurable sizes. A new ABI is needed, so all libraries must be rebuilt, and newlib must be version 4.3.0.20230120 or newer. gcc/ChangeLog: * config/gcn/gcn-run.cc: Include libgomp-gcn.h. (struct kernargs): Replace the common content with kernargs_abi. (struct heap): Delete. (main): Read GCN_STACK_SIZE envvar. Allocate space for the device stacks. Write the new kernargs fields. * config/gcn/gcn.cc (gcn_option_override): Remove stack_size_opt. (default_requested_args): Remove PRIVATE_SEGMENT_BUFFER_ARG and PRIVATE_SEGMENT_WAVE_OFFSET_ARG. (gcn_addr_space_convert): Mask the QUEUE_PTR_ARG content. (gcn_expand_prologue): Move the TARGET_PACKED_WORK_ITEMS to the top. Set up the stacks from the values in the kernargs, not private. (gcn_expand_builtin_1): Match the stack configuration in the prologue. (gcn_hsa_declare_function_name): Turn off the private segment. (gcn_conditional_register_usage): Ensure QUEUE_PTR is fixed. * config/gcn/gcn.h (FIXED_REGISTERS): Fix the QUEUE_PTR register. * config/gcn/gcn.opt (mstack-size): Change the description. include/ChangeLog: * gomp-constants.h (GOMP_VERSION_GCN): Bump. libgomp/ChangeLog: * config/gcn/libgomp-gcn.h (DEFAULT_GCN_STACK_SIZE): New define. (DEFAULT_TEAM_ARENA_SIZE): New define. (struct heap): Move to this file. (struct kernargs_abi): Likewise. * config/gcn/team.c (gomp_gcn_enter_kernel): Use team arena size from the kernargs. * libgomp.h: Include libgomp-gcn.h. (TEAM_ARENA_SIZE): Remove. (team_malloc): Update the error message. * plugin/plugin-gcn.c (struct kernargs): Move common content to struct kernargs_abi. (struct agent_info): Rename team arenas to ephemeral memories. (struct team_arena_list): Rename .... (struct ephemeral_memories_list): to this. (struct heap): Delete. (team_arena_size): New variable. (stack_size): New variable. (print_kernel_dispatch): Update debug messages. (init_environment_variables): Read GCN_TEAM_ARENA_SIZE. Read GCN_STACK_SIZE. (get_team_arena): Rename ... (configure_ephemeral_memories): ... to this, and set up stacks. (release_team_arena): Rename ... (release_ephemeral_memories): ... to this. (destroy_team_arenas): Rename ... (destroy_ephemeral_memories): ... to this. (create_kernel_dispatch): Add num_threads parameter. Adjust for kernargs_abi refactor and ephemeral memories. (release_kernel_dispatch): Adjust for ephemeral memories. (run_kernel): Pass thread-count to create_kernel_dispatch. (GOMP_OFFLOAD_init_device): Adjust for ephemeral memories. (GOMP_OFFLOAD_fini_device): Adjust for ephemeral memories. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr47237.c: Xfail on amdgcn. * gcc.dg/builtin-apply3.c: Xfail for amdgcn. * gcc.dg/builtin-apply4.c: Xfail for amdgcn. * gcc.dg/torture/stackalign/builtin-apply-3.c: Xfail for amdgcn. * gcc.dg/torture/stackalign/builtin-apply-4.c: Xfail for amdgcn.
2023-01-16Update copyright years.Jakub Jelinek1-1/+1
2022-12-10libgomp: Handle OpenMP's reverse offloadsTobias Burnus1-23/+54
This commit enabled reverse offload for nvptx such that gomp_target_rev actually gets called. And it fills the latter function to do all of the following: finding the host function to the device func ptr and copying the arguments to the host, processing the mapping/firstprivate, calling the host function, copying back the data and freeing as needed. The data handling is made easier by assuming that all host variables either existed before (and are in the mapping) or that those are devices variables not yet available on the host. Thus, the reverse mapping can do without refcounts etc. Note that the spec disallows inside a target region device-affecting constructs other than target plus ancestor device-modifier and it also limits the clauses permitted on this construct. For the function addresses, an additional splay tree is used; for the lookup of mapped variables, the existing splay-tree is used. Unfortunately, its data structure requires a full walk of the tree; Additionally, the just mapped variables are recorded in a separate data structure an extra lookup. While the lookup is slow, assuming that only few variables get mapped in each reverse offload construct and that reverse offload is the exception and not performance critical, this seems to be acceptable. libgomp/ChangeLog: * libgomp.h (struct target_mem_desc): Predeclare; move below after 'reverse_splay_tree_node' and add rev_array member. (struct reverse_splay_tree_key_s, reverse_splay_compare): New. (reverse_splay_tree_node, reverse_splay_tree, reverse_splay_tree_key): New typedef. (struct gomp_device_descr): Add mem_map_rev member. * oacc-host.c (host_dispatch): NULL init .mem_map_rev. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim support for GOMP_REQUIRES_REVERSE_OFFLOAD. * splay-tree.h (splay_tree_callback_stop): New typedef; like splay_tree_callback but returning int not void. (splay_tree_foreach_lazy): Define; like splay_tree_foreach but taking splay_tree_callback_stop as argument. * splay-tree.c (splay_tree_foreach_internal_lazy, splay_tree_foreach_lazy): New; but early exit if callback returns nonzero. * target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'. (gomp_map_lookup_rev): New. (gomp_load_image_to_device): Handle reverse-offload function lookup table. (gomp_unload_image_from_device): Free devicep->mem_map_rev. (struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup, gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int, gomp_map_cdata_lookup): New auxiliary structs and functions for gomp_target_rev. (gomp_target_rev): Implement reverse offloading and its mapping. (gomp_target_init): Init current_device.mem_map_rev.root. * testsuite/libgomp.fortran/reverse-offload-2.f90: New test. * testsuite/libgomp.fortran/reverse-offload-3.f90: New test. * testsuite/libgomp.fortran/reverse-offload-4.f90: New test. * testsuite/libgomp.fortran/reverse-offload-5.f90: New test. * testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without mapping of on-device allocated variables.
2022-10-24libgomp/nvptx: Prepare for reverse-offload callback handlingTobias Burnus1-0/+5
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will later handle the reverse offload. For nvptx, it adds support for forwarding the offload gomp_target_ext call to the host by setting values in a struct on the device and querying it on the host - invoking gomp_target_rev on the result. include/ChangeLog: * cuda/cuda.h (enum CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. (CU_MEMHOSTALLOC_DEVICEMAP): Define. (cuMemHostAlloc): Add prototype. libgomp/ChangeLog: * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove 'static' for this variable. * config/nvptx/libgomp-nvptx.h: New file. * config/nvptx/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare extern var. (GOMP_REV_OFFLOAD_VAR): Declare var. (GOMP_target_ext): Handle reverse offload. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ... * target.c (gomp_target_rev): ... this new stub function. * libgomp.h (gomp_target_rev): Declare. * libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev. * plugin/cuda-lib.def (cuMemHostAlloc): Add. * plugin/plugin-nvptx.c: Include libgomp-nvptx.h. (struct ptx_device): Add rev_data member. (nvptx_open_device): Remove async_engines query, last used in r10-304-g1f4c5b9b; add unified-address assert check. (GOMP_OFFLOAD_get_num_devices): Claim unified address support. (GOMP_OFFLOAD_load_image): Free rev_fn_table if no offload functions exist. Make offload var available on host and device. (rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New. (GOMP_OFFLOAD_run): Handle reverse offload.
2022-09-08OpenMP, libgomp: Environment variable syntax extensionMarcel Vollweiler1-0/+114
This patch considers the environment variable syntax extension for device-specific variants of environment variables from OpenMP 5.1 (see OpenMP 5.1 specification, p. 75 and p. 639). An environment variable (e.g. OMP_NUM_TEAMS) can have different suffixes: _DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host. _DEV_<device> (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with number <device>. no suffix (e.g. OMP_NUM_TEAMS): affects only the host. In future OpenMP versions also suffix _ALL will be introduced (see discussion https://github.com/OpenMP/spec/issues/3179). This is also considered in this patch: _ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host. The precedence is as follows (descending). For the host: 1. no suffix 2. _ALL For devices: 1. _DEV_<device> 2. _DEV 3. _ALL That means, _DEV_<device> is used whenever available. Otherwise _DEV is used if available, and at last _ALL. If there is no value for any of the variable variants, default values are used as already implemented before. This patch concerns parsing (a), storing (b), output (c) and transmission to the device (d): (a) The actual number of devices and the numbering are not known when parsing the environment variables. Thus all environment variables are iterated and searched for device-specific ones. (b) Only configured device-specific variables are stored. Thus, a linked list is used. (c) The output is done in omp_display_env (see specification p. 468f). Global ICVs are tagged with [all], see https://github.com/OpenMP/spec/issues/3179. ICVs which are not global but aren't handled device-specific yet are tagged with [host]. omp_display_env outputs the initial values of the ICVs. That is why a dedicated data structure is introduced for the inital values only (gomp_initial_icv_list). (d) Device-specific ICVs are transmitted to the device via GOMP_ADDITIONAL_ICVS. libgomp/ChangeLog: * config/gcn/icv-device.c (omp_get_default_device): Return device- specific ICV. (omp_get_max_teams): Added for GCN devices. (omp_set_num_teams): Likewise. (ialias): Likewise. * config/nvptx/icv-device.c (omp_get_default_device): Return device- specific ICV. (omp_get_max_teams): Added for NVPTX devices. (omp_set_num_teams): Likewise. (ialias): Likewise. * env.c (struct gomp_icv_list): New struct to store entries of initial ICV values. (struct gomp_offload_icv_list): New struct to store entries of device- specific ICV values that are copied to the device and back. (struct gomp_default_icv_values): New struct to store default values of ICVs according to the OpenMP standard. (parse_schedule): Generalized for different variants of OMP_SCHEDULE. (print_env_var_error): Function that prints an error for invalid values for ICVs. (parse_unsigned_long_1): Removed getenv. Generalized. (parse_unsigned_long): Likewise. (parse_int_1): Likewise. (parse_int): Likewise. (parse_int_secure): Likewise. (parse_unsigned_long_list): Likewise. (parse_target_offload): Likewise. (parse_bind_var): Likewise. (parse_stacksize): Likewise. (parse_boolean): Likewise. (parse_wait_policy): Likewise. (parse_allocator): Likewise. (omp_display_env): Extended to output different variants of environment variables. (print_schedule): New helper function for omp_display_env which prints the values of run_sched_var. (print_proc_bind): New helper function for omp_display_env which prints the values of proc_bind_var. (enum gomp_parse_type): Collection of types used for parsing environment variables. (ENTRY): Preprocess string lengths of environment variables. (OMP_VAR_CNT): Preprocess table size. (OMP_HOST_VAR_CNT): Likewise. (INT_MAX_STR_LEN): Constant for the maximal number of digits of a device number. (gomp_get_icv_flag): Returns if a flag for a particular ICV is set. (gomp_set_icv_flag): Sets a flag for a particular ICV. (print_device_specific_icvs): New helper function for omp_display_env to print device specific ICV values. (get_device_num): New helper function for parse_device_specific. Extracts the device number from an environment variable name. (get_icv_member_addr): Gets the memory address for a particular member of an ICV struct. (gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list. (initialize_icvs): New function to initialize a gomp_initial_icvs struct. (add_initial_icv_to_list): Adds an ICV struct to gomp_initial_icv_list. (startswith): Checks if a string starts with a given prefix. (initialize_env): Extended to parse the new syntax of environment variables. * icv-device.c (omp_get_max_teams): Added. (ialias): Likewise. (omp_set_num_teams): Likewise. * icv.c (omp_set_num_teams): Moved to icv-device.c. (omp_get_max_teams): Likewise. (ialias): Likewise. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Removed. (GOMP_ADDITIONAL_ICVS): New target-side struct that holds the designated ICVs of the target device. * libgomp.h (enum gomp_icvs): Collection of ICVs. (enum gomp_device_num): Definition of device numbers for _ALL, _DEV, and no suffix. (enum gomp_env_suffix): Collection of possible suffixes of environment variables. (struct gomp_initial_icvs): Contains all ICVs for which we need to store initial values. (struct gomp_default_icv):New struct to hold ICVs for which we need to store initial values. (struct gomp_icv_list): Definition of a linked list that is used for storing ICVs for the devices and also for _DEV, _ALL, and without suffix. (struct gomp_offload_icvs): New struct to hold ICVs that are copied to a device. (struct gomp_offload_icv_list): Definition of a linked list that holds device-specific ICVs that are copied to devices. (gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list. (gomp_get_icv_flag): Returns if a flag for a particular ICV is set. * libgomp.texi: Updated. * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Extended to read further ICVs from the offload image. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. * target.c (gomp_get_offload_icv_item): Get a list item of gomp_offload_icv_list. (get_gomp_offload_icvs): New. Returns the ICV values depending on the device num and the variable hierarchy. (gomp_load_image_to_device): Extended to copy further ICVs to a device. * testsuite/libgomp.c-c++-common/icv-5.c: New test. * testsuite/libgomp.c-c++-common/icv-6.c: New test. * testsuite/libgomp.c-c++-common/icv-7.c: New test. * testsuite/libgomp.c-c++-common/icv-8.c: New test. * testsuite/libgomp.c-c++-common/omp-display-env-1.c: New test. * testsuite/libgomp.c-c++-common/omp-display-env-2.c: New test.
2022-05-28libgomp: Don't define GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC for _aligned_malloc ↵Jakub Jelinek1-1/+0
[PR105745] since apparently _aligned_malloc requires freeing with _aligned_free and: /* Defined if gomp_aligned_alloc doesn't use fallback version and free can be used instead of gomp_aligned_free. */ #define GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC 1 so the second condition isn't satisfied. For uses inside of the OpenMP allocators we can still use _aligned_malloc but we need to call _aligned_free in gomp_aligned_free. 2022-05-28 Jakub Jelinek <jakub@redhat.com> PR libgomp/105745 * libgomp.h (GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC): Don't define for defined(HAVE__ALIGNED_MALLOC) case. * alloc.c (gomp_aligned_alloc): Move defined(HAVE__ALIGNED_MALLOC) handling as last option before fallback instead of first. (gomp_aligned_free): For defined(HAVE__ALIGNED_MALLOC) call _aligned_free.
2022-05-17openmp: Add support for inoutset depend-kindJakub Jelinek1-2/+2
This patch adds support for inoutset depend-kind in depend clauses. It is very similar to the in depend-kind in that a task with a dependency with that depend-kind is dependent on all previously created sibling tasks with matching address unless they have the same depend-kind. In the in depend-kind case everything is dependent except for in -> in dependency, for inoutset everything is dependent except for inoutset -> inoutset dependency. mutexinoutset is also similar (everything is dependent except for mutexinoutset -> mutexinoutset dependency), but there is also the additional restriction that only one task with mutexinoutset for each address can be scheduled at once (i.e. mutual exclusitivty). For now we support mutexinoutset the same as inout/out, but the inoutset support is full. In order not to bump the ABI for dependencies each time (we've bumped it already once, the old ABI supports only inout/out and in depend-kind, the new ABI supports inout/out, mutexinoutset, in and depobj), this patch arranges for inoutset to be at least for the time being always handled as if it was specified through depobj even when it is not. So it uses the new ABI for that and inoutset are represented like depobj - pointer to a pair of pointers where the first one will be the actual address of the object mentioned in depend clause and second pointer will be (void *) GOMP_DEPEND_INOUTSET. 2022-05-17 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-core.h (enum omp_clause_depend_kind): Add OMP_CLAUSE_DEPEND_INOUTSET. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_DEPEND_INOUTSET. * gimplify.cc (gimplify_omp_depend): Likewise. * omp-low.cc (lower_depend_clauses): Likewise. gcc/c-family/ * c-omp.cc (c_finish_omp_depobj): Handle OMP_CLAUSE_DEPEND_INOUTSET. gcc/c/ * c-parser.cc (c_parser_omp_clause_depend): Parse inoutset depend-kind. (c_parser_omp_depobj): Likewise. gcc/cp/ * parser.cc (cp_parser_omp_clause_depend): Parse inoutset depend-kind. (cp_parser_omp_depobj): Likewise. * cxx-pretty-print.cc (cxx_pretty_printer::statement): Handle OMP_CLAUSE_DEPEND_INOUTSET. gcc/testsuite/ * c-c++-common/gomp/all-memory-1.c (boo): Add test with inoutset depend-kind. * c-c++-common/gomp/all-memory-2.c (boo): Likewise. * c-c++-common/gomp/depobj-1.c (f1): Likewise. (f2): Adjusted expected diagnostics. * g++.dg/gomp/depobj-1.C (f4): Adjust expected diagnostics. include/ * gomp-constants.h (GOMP_DEPEND_INOUTSET): Define. libgomp/ * libgomp.h (struct gomp_task_depend_entry): Change is_in type from bool to unsigned char. * task.c (gomp_task_handle_depend): Handle GOMP_DEPEND_INOUTSET. Ignore dependencies where task->depend[i].is_in && task->depend[i].is_in == ent->is_in rather than just task->depend[i].is_in && ent->is_in. Remember whether GOMP_DEPEND_IN loop is needed and guard the loop with that conditional. (gomp_task_maybe_wait_for_dependencies): Handle GOMP_DEPEND_INOUTSET. Ignore dependencies where elem.is_in && elem.is_in == ent->is_in rather than just elem.is_in && ent->is_in. * testsuite/libgomp.c-c++-common/depend-1.c (test): Add task with inoutset depend-kind. * testsuite/libgomp.c-c++-common/depend-2.c (test): Likewise. * testsuite/libgomp.c-c++-common/depend-3.c (test): Likewise. * testsuite/libgomp.c-c++-common/depend-inoutset-1.c: New test.
2022-05-12openmp: Add omp_all_memory support (C/C++ only so far)Jakub Jelinek1-0/+2
The ugly part is that OpenMP 5.1 made omp_all_memory a reserved identifier which isn't allowed to be used anywhere but in the depend clause, this is against how everything else has been handled in OpenMP so far (where some identifiers could have special meaning in some OpenMP clauses or pragmas but not elsewhere). The patch handles it by making it a conditional keyword (for -fopenmp only) and emitting a better diagnostics when it is used in a primary expression. Having a nicer diagnostics when e.g. trying to do int omp_all_memory; or int *omp_all_memory[10]; etc. would mean changing too many spots and hooking into name lookups to reject declaring any such symbols would be too ugly and I'm afraid there are way too many spots where one can introduce a name (variables, functions, namespaces, struct, enum, enumerators, template arguments, ...). Otherwise, the handling is quite simple, normal depend clauses lower into addresses of variables being handed over to the library, for omp_all_memory I'm using NULL pointers. omp_all_memory can only be used with inout or out depend kinds and means that a task is dependent on all previously created sibling tasks that have any dependency (of any depend kind) and that any later created sibling tasks will be dependent on it if they have any dependency. 2022-05-12 Jakub Jelinek <jakub@redhat.com> gcc/ * gimplify.cc (gimplify_omp_depend): Don't build_fold_addr_expr if null_pointer_node. (gimplify_scan_omp_clauses): Likewise. * tree-pretty-print.cc (dump_omp_clause): Print null_pointer_node as omp_all_memory. gcc/c-family/ * c-common.h (enum rid): Add RID_OMP_ALL_MEMORY. * c-omp.cc (c_finish_omp_depobj): Don't build_fold_addr_expr if null_pointer_node. gcc/c/ * c-parser.cc (c_parse_init): Register omp_all_memory as keyword if flag_openmp. (c_parser_postfix_expression): Diagnose uses of omp_all_memory in postfix expressions. (c_parser_omp_variable_list): Handle omp_all_memory in depend clause. * c-typeck.cc (c_finish_omp_clauses): Handle omp_all_memory keyword in depend clause as null_pointer_node, diagnose invalid uses. gcc/cp/ * lex.cc (init_reswords): Register omp_all_memory as keyword if flag_openmp. * parser.cc (cp_parser_primary_expression): Diagnose uses of omp_all_memory in postfix expressions. (cp_parser_omp_var_list_no_open): Handle omp_all_memory in depend clause. * semantics.cc (finish_omp_clauses): Handle omp_all_memory keyword in depend clause as null_pointer_node, diagnose invalid uses. * pt.cc (tsubst_omp_clause_decl): Pass through omp_all_memory. gcc/testsuite/ * c-c++-common/gomp/all-memory-1.c: New test. * c-c++-common/gomp/all-memory-2.c: New test. * c-c++-common/gomp/all-memory-3.c: New test. * g++.dg/gomp/all-memory-1.C: New test. * g++.dg/gomp/all-memory-2.C: New test. libgomp/ * libgomp.h (struct gomp_task): Add depend_all_memory member. * task.c (gomp_init_task): Initialize depend_all_memory. (gomp_task_handle_depend): Handle omp_all_memory. (gomp_task_run_post_handle_depend_hash): Clear parent->depend_all_memory if equal to current task. (gomp_task_maybe_wait_for_dependencies): Handle omp_all_memory. * testsuite/libgomp.c-c++-common/depend-1.c: New test. * testsuite/libgomp.c-c++-common/depend-2.c: New test. * testsuite/libgomp.c-c++-common/depend-3.c: New test.
2022-01-03Update copyright years.Jakub Jelinek1-1/+1
2021-12-08openmp: Improve OpenMP target support for C++ (PR92120)Chung-Lin Tang1-1/+1
This patch implements several C++ specific mapping capabilities introduced for OpenMP 5.0, including implicit mapping of this[:1] for non-static member functions, zero-length array section mapping of pointer-typed members, lambda captured variable access in target regions, and use of lambda objects inside target regions. Several adjustments to the C/C++ front-ends to allow more member-access syntax as valid is also included. PR middle-end/92120 gcc/cp/ChangeLog: * cp-tree.h (finish_omp_target): New declaration. (finish_omp_target_clauses): Likewise. * parser.c (cp_parser_omp_clause_map): Adjust call to cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true. (cp_parser_omp_target): Factor out code, adjust into calls to new function finish_omp_target. * pt.c (tsubst_expr): Add call to finish_omp_target_clauses for OMP_TARGET case. * semantics.c (handle_omp_array_sections_1): Add handling to create 'this->member' from 'member' FIELD_DECL. Remove case of rejecting 'this' when not in declare simd. (handle_omp_array_sections): Likewise. (finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP map clauses. Handle 'A->member' case in map clauses. Remove case of rejecting 'this' when not in declare simd. (struct omp_target_walk_data): New struct for walking over target-directive tree body. (finish_omp_target_clauses_r): New function for tree walk. (finish_omp_target_clauses): New function. (finish_omp_target): New function. gcc/c/ChangeLog: * c-parser.c (c_parser_omp_clause_map): Set 'allow_deref' argument in call to c_parser_omp_variable_list to 'true'. * c-typeck.c (handle_omp_array_sections_1): Add strip of MEM_REF in array base handling. (c_finish_omp_clauses): Handle 'A->member' case in map clauses. gcc/ChangeLog: * gimplify.c ("tree-hash-traits.h"): Add include. (gimplify_scan_omp_clauses): Change struct_map_to_clause to type hash_map<tree_operand, tree> *. Adjust struct map handling to handle cases of *A and A->B expressions. Under !DECL_P case of GOMP_CLAUSE_MAP handling, add STRIP_NOPS for indir_p case, add to struct_deref_set for map(*ptr_to_struct) cases. Add MEM_REF case when handling component_ref_p case. Add unshare_expr and gimplification when created GOMP_MAP_STRUCT is not a DECL. Add code to add firstprivate pointer for *pointer-to-struct case. (gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for exit data directives code to earlier position. * omp-low.c (lower_omp_target): Handle GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. * tree-pretty-print.c (dump_omp_clause): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/gomp/target-3.c: New testcase. * g++.dg/gomp/target-3.C: New testcase. * g++.dg/gomp/target-lambda-1.C: New testcase. * g++.dg/gomp/target-lambda-2.C: New testcase. * g++.dg/gomp/target-this-1.C: New testcase. * g++.dg/gomp/target-this-2.C: New testcase. * g++.dg/gomp/target-this-3.C: New testcase. * g++.dg/gomp/target-this-4.C: New testcase. * g++.dg/gomp/target-this-5.C: New testcase. * g++.dg/gomp/this-2.C: Adjust testcase. include/ChangeLog: * gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds. (GOMP_MAP_POINTER_P): Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION. libgomp/ChangeLog: * libgomp.h (gomp_attach_pointer): Add bool parameter. * oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer. (goacc_enter_data_internal): Likewise. * target.c (gomp_map_vars_existing): Update assert condition to include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION. (gomp_map_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for mapping a pointer with NULL target. (gomp_attach_pointer): Add 'bool allow_zero_length_array_sections' parameter, add support for attaching a pointer with NULL target. (gomp_map_vars_internal): Update calls to gomp_map_pointer and gomp_attach_pointer, add handling for GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases. * testsuite/libgomp.c++/target-23.C: New testcase. * testsuite/libgomp.c++/target-lambda-1.C: New testcase. * testsuite/libgomp.c++/target-lambda-2.C: New testcase. * testsuite/libgomp.c++/target-this-1.C: New testcase. * testsuite/libgomp.c++/target-this-2.C: New testcase. * testsuite/libgomp.c++/target-this-3.C: New testcase. * testsuite/libgomp.c++/target-this-4.C: New testcase. * testsuite/libgomp.c++/target-this-5.C: New testcase.
2021-11-18libgomp: Ensure that either gomp_team is properly aligned [PR102838]Jakub Jelinek1-1/+5
struct gomp_team has struct gomp_work_share array inside of it. If that latter structure has 64-byte aligned member in the middle, the whole struct gomp_team needs to be 64-byte aligned, but we weren't allocating it using gomp_aligned_alloc. This patch fixes that, except that on gcn team_malloc is special, so I've instead decided at least for now to avoid using aligned member and use the padding instead on gcn. 2021-11-18 Jakub Jelinek <jakub@redhat.com> PR libgomp/102838 * libgomp.h (GOMP_USE_ALIGNED_WORK_SHARES): Define if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined and __AMDGCN__ is not. (struct gomp_work_share): Use GOMP_USE_ALIGNED_WORK_SHARES instead of GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC. * work.c (alloc_work_share, gomp_work_share_start): Likewise. * team.c (gomp_new_team): If GOMP_USE_ALIGNED_WORK_SHARES, use gomp_aligned_alloc instead of team_malloc.
2021-11-11libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() valuesJakub Jelinek1-0/+8
When thinking about GOMP_teams3, I've realized that using global variables for the values returned by omp_get_num_teams()/omp_get_team_num() calls is incorrect even with our right now dumb way of implementing host teams. The problems are two, one is if host teams is used from multiple pthread_create created threads - the spec says that host teams can't be nested inside of explicit parallel or other teams constructs, but with pthread_create the standard says obviously nothing about it. Another more important thing is host fallback, right now we don't do anything for omp_get_num_teams() or omp_get_team_num() which was fine before host teams was introduced and the 5.1 requirement that num_teams clause specifies minimum of teams, but with the global vars it means inside of target teams num_teams (2) we happily return omp_get_num_teams() == 4 if the target teams is inside of host teams with num_teams(4). With target fallback being invoked from parallel regions global vars simply can't work right on the host. So, this patch moves them to struct gomp_thread and propagates those for parallel to child threads. For host fallback, the implicit zeroing of *thr results in us returning omp_get_num_teams () == 1 and omp_get_team_num () == 0 which is fine for target teams without num_teams clause, for target teams with num_teams clause something to work on and for target without teams nested in it I've asked on omp-lang what should be done. 2021-11-11 Jakub Jelinek <jakub@redhat.com> * libgomp.h (struct gomp_thread): Add num_teams and team_num members. * team.c (struct gomp_thread_start_data): Likewise. (gomp_thread_start): Initialize thr->num_teams and thr->team_num. (gomp_team_start): Initialize start_data->num_teams and start_data->team_num. Update nthr->num_teams and nthr->team_num. * teams.c (gomp_num_teams, gomp_team_num): Remove. (GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num instead of gomp_num_teams and gomp_team_num. (omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams. (omp_get_team_num): Use thr->team_num instead of gomp_team_num. * testsuite/libgomp.c/teams-4.c: New test.
2021-10-20openmp: Fix up struct gomp_work_share handling [PR102838]Jakub Jelinek1-0/+35
If GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is not defined, the intent was to treat the split of the structure between first cacheline (64 bytes) as mostly write-once, use afterwards and second cacheline as rw just as an optimization. But as has been reported, with vectorization enabled at -O2 it can now result in aligned vector 16-byte or larger stores. When not having posix_memalign/aligned_alloc/memalign or other similar API, alloc.c emulates it but it needs to allocate extra memory for the dynamic realignment. So, for the GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC not defined case, this patch stops using aligned (64) attribute in the middle of the structure and instead inserts padding that puts the second half of the structure at offset 64 bytes. And when GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, usually it was allocated as aligned, but for the orphaned case it could still be allocated just with gomp_malloc without guaranteed proper alignment. 2021-10-20 Jakub Jelinek <jakub@redhat.com> PR libgomp/102838 * libgomp.h (struct gomp_work_share_1st_cacheline): New type. (struct gomp_work_share): Only use aligned(64) attribute if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, otherwise just add padding before lock to ensure lock is at offset 64 bytes into the structure. (gomp_workshare_struct_check1, gomp_workshare_struct_check2): New poor man's static assertions. * work.c (gomp_work_share_start): Use gomp_aligned_alloc instead of gomp_malloc if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC.
2021-10-11openmp: Add omp_set_num_teams, omp_get_max_teams, omp_[gs]et_teams_thread_limitJakub Jelinek1-0/+2
OpenMP 5.1 adds env vars and functions to set and query new ICVs used as fallback if thread_limit or num_teams clauses aren't specified on teams construct. The following patch implements those, though further work will be needed: 1) OpenMP 5.1 also changed the num_teams clause, so that it can specify both lower and upper limit for how many teams should be created and changed the meaning when only one expression is provided, instead of num_teams(expr) in 5.0 meaning num_teams(1:expr) in 5.1, it now means num_teams(expr:expr), i.e. while previously we could create 1 to expr teams, in 5.1 we have some low limit by default equal to the single expression provided and may not create fewer teams. For host teams (which we don't currently implement efficiently for NUMA hosts) we trivially satisfy it now by always honoring what the user asked for, but for the offloading teams I think we'll need to rethink the APIs; currently teams construct is just a call that returns and possibly lowers the number of teams; and whenever possible we try to evaluate num_teams/thread_limit already on the target construct and the GOMP_teams call just sets the number of teams to the minimum of provided and requested teams; for some cases e.g. where target is not combined with teams and num_teams expression calls some functions etc., we need to call those functions in the target region and so it is late to figure number of teams, but also hw could just limit what it is willing to create; in that case I'm afraid we need to run the target body multiple times and arrange for omp_get_team_num () returning the right values 2) we need to finally implement the NUMA handling for GOMP_teams_reg 3) I now realize I haven't added some testcase coverage, will do that incrementally 4) libgomp.texi needs updates for these new APIs, but also others like the allocator 2021-10-11 Jakub Jelinek <jakub@redhat.com> gcc/ * omp-low.c (omp_runtime_api_call): Handle omp_get_max_teams, omp_[sg]et_teams_thread_limit and omp_set_num_teams. libgomp/ * omp.h.in (omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare. * omp_lib.f90.in (omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare. * omp_lib.h.in (omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit, omp_get_teams_thread_limit): Declare. * libgomp.h (gomp_nteams_var, gomp_teams_thread_limit_var): Declare. * libgomp.map (OMP_5.1): Export omp_get_max_teams{,_}, omp_get_teams_thread_limit{,_}, omp_set_num_teams{,_,_8_} and omp_set_teams_thread_limit{,_,_8_}. * icv.c (omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit, omp_get_teams_thread_limit): New functions. * env.c (gomp_nteams_var, gomp_teams_thread_limit_var): Define. (omp_display_env): Print OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT. (initialize_env): Handle OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars. * teams.c (GOMP_teams_reg): If thread_limit is not specified, use gomp_teams_thread_limit_var as fallback if not zero. If num_teams is not specified, use gomp_nteams_var. * fortran.c (omp_set_num_teams, omp_get_max_teams, omp_set_teams_thread_limit, omp_get_teams_thread_limit): Add ialias_redirect. (omp_set_num_teams_, omp_set_num_teams_8_, omp_get_max_teams_, omp_set_teams_thread_limit_, omp_set_teams_thread_limit_8_, omp_get_teams_thread_limit_): New functions.
2021-07-27Fix OpenACC "ephemeral" asynchronous host-to-device copiesJulian Brown1-1/+1
This patch fixes several places in libgomp/target.c where "ephemeral" data (on the stack or in temporary heap locations) may be used as the source of an asynchronous host-to-device copy that may not complete before the host data disappears. An existing, but flawed, workaround for this problem in the AMD GCN libgomp offloading plugin is currently present on mainline, and was posted for the og9 branch here: https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html and previous versions of this patch were posted here (for mainline/og9): https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html libgomp/ * libgomp.h (gomp_copy_host2dev): Update prototype. * oacc-mem.c (memcpy_tofrom_device, update_dev_host): Add new argument to gomp_copy_host2dev (false). * plugin/plugin-gcn.c (struct copy_data): Remove free_src field. (copy_data): Don't free src. (queue_push_copy): Remove free_src handling. (GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy. (GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data snapshotting. (GOMP_OFFLOAD_openacc_async_dev2host): Update call to queue_push_copy. * target.c (goacc_device_copy_async): Add SRCADDR_ORIG parameter. (gomp_copy_host2dev): Add EPHEMERAL parameter. Snapshot source data when true, and set up deferred freeing of temporary buffer. (gomp_copy_dev2host): Update call to goacc_device_copy_async. (gomp_map_vars_existing, gomp_map_pointer, gomp_attach_pointer) (gomp_detach_pointer, gomp_map_vars_internal, gomp_update): Update calls to gomp_copy_host2dev with appropriate ephemeral argument. * testsuite/libgomp.oacc-c-c++-common/async-data-1-1.c: Remove XFAIL. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-07-20[gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds ↵Thomas Schwinge1-12/+0
of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' some more [PR101484] With yesterday's commit 9f2bc5077debef2b046b6c10d38591ac324ad8b5 "[gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' [PR101484]", I did defuse the "unexpected" '-Werror=array-bounds' diagnostics that we see as of commit a110855667782dac7b674d3e328b253b3b3c919b "Correct handling of variable offset minus constant in -Warray-bounds [PR100137]". However, these '#pragma GCC diagnostic [...]' directives cause some code generation changes (that seems unexpected, problematic!), which results in a lot (ten thousands) of 'GCN team arena exhausted' run-time diagnostics, also leading to a few FAILs: PASS: libgomp.c/../libgomp.c-c++-common/for-11.c (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-11.c execution test PASS: libgomp.c/../libgomp.c-c++-common/for-12.c (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-12.c execution test PASS: libgomp.c/../libgomp.c-c++-common/for-3.c (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-3.c execution test PASS: libgomp.c/../libgomp.c-c++-common/for-5.c (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-5.c execution test PASS: libgomp.c/../libgomp.c-c++-common/for-6.c (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-6.c execution test PASS: libgomp.c/../libgomp.c-c++-common/for-9.c (test for excess errors) [-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/for-9.c execution test Same for 'libgomp.c++'. It remains to be analyzed how '#pragma GCC diagnostic [...]' directives can cause code generation changes; for now I'm working around the "unexpected" '-Werror=array-bounds' diagnostics differently. Overall, still awaiting a different solution, of course. libgomp/ PR target/101484 * configure.tgt [amdgcn*-*-*] (XCFLAGS): Add '-Wno-error=array-bounds'. * config/gcn/team.c: Remove '-Werror=array-bounds' work-around. * libgomp.h [__AMDGCN__]: Likewise.
2021-07-19[gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds ↵Thomas Schwinge1-0/+12
of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' [PR101484] ... seen as of commit a110855667782dac7b674d3e328b253b3b3c919b "Correct handling of variable offset minus constant in -Warray-bounds [PR100137]". Awaiting a different solution, of course. libgomp/ PR target/101484 * config/gcn/team.c: Apply '-Werror=array-bounds' work-around. * libgomp.h [__AMDGCN__]: Likewise.
2021-06-17libgomp: Structure element mapping for OpenMP 5.0Chung-Lin Tang1-17/+49
This patch implement OpenMP 5.0 requirements of incrementing/decrementing the reference count of a mapped structure at most once (across all elements) on a construct. This is implemented by pulling in libgomp/hashtab.h and using htab_t as a pointer set. Structure element list siblings also have pointers-to-refcounts linked together, to naturally achieve uniform increment/decrement without repeating. There are still some questions on whether using such a htab_t based set is faster/slower than using a sorted pointer array based implementation. This is to be researched on later. libgomp/ChangeLog: * hashtab.h (htab_clear): New function with initialization code factored out from... (htab_create): ...here, adjust to use htab_clear function. * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of special refcount values, add comments. (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL. (REFCOUNT_LINK): Likewise. (REFCOUNT_STRUCTELEM): New special refcount range for structure element siblings. (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element sibling maps. (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling. (REFCOUNT_STRUCTELEM_FLAG_LAST): Flag to indicate last sibling. (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag. (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag. (struct splay_tree_key_s): Add structelem_refcount and structelem_refcount_ptr fields into a union with dynamic_refcount. Add comments. (gomp_map_vars): Delete declaration. (gomp_map_vars_async): Likewise. (gomp_unmap_vars): Likewise. (gomp_unmap_vars_async): Likewise. (goacc_map_vars): New declaration. (goacc_unmap_vars): Likewise. * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars. (goacc_enter_datum): Likewise. (goacc_enter_data_internal): Likewise. * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars and goacc_unmap_vars. (GOACC_data_start): Adjust to use goacc_map_vars. (GOACC_data_end): Adjust to use goacc_unmap_vars. * target.c (hash_entry_type): New typedef. (htab_alloc): New function hook for hashtab.h. (htab_free): Likewise. (htab_hash): Likewise. (htab_eq): Likewise. (hashtab.h): Add file include. (gomp_increment_refcount): New function. (gomp_decrement_refcount): Likewise. (gomp_map_vars_existing): Add refcount_set parameter, adjust to use gomp_increment_refcount. (gomp_map_fields_existing): Add refcount_set parameter, adjust calls to gomp_map_vars_existing. (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p variable to guard OpenMP specific paths, adjust calls to gomp_map_vars_existing, add structure element sibling splay_tree_key sequence creation code, adjust Fortran map case to avoid increment under OpenMP. (gomp_map_vars): Adjust to static, add refcount_set parameter, manage local refcount_set if caller passed in NULL, adjust call to gomp_map_vars_internal. (gomp_map_vars_async): Adjust and rename into... (goacc_map_vars): ...this new function, adjust call to gomp_map_vars_internal. (gomp_remove_splay_tree_key): New function with code factored out from gomp_remove_var_internal. (gomp_remove_var_internal): Add code to handle removing multiple splay_tree_key sequence for structure elements, adjust code to use gomp_remove_splay_tree_key for splay-tree key removal. (gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use gomp_decrement_refcount. (gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage local refcount_set if caller passed in NULL, adjust call to gomp_unmap_vars_internal. (gomp_unmap_vars_async): Adjust and rename into... (goacc_unmap_vars): ...this new function, adjust call to gomp_unmap_vars_internal. (GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and gomp_unmap_vars. (GOMP_target_ext): Likewise. (gomp_target_data_fallback): Adjust call to gomp_map_vars. (GOMP_target_data): Likewise. (GOMP_target_data_ext): Likewise. (GOMP_target_end_data): Adjust call to gomp_unmap_vars. (gomp_exit_data): Add refcount_set parameter, adjust to use gomp_decrement_refcount, adjust to queue splay-tree keys for removal after main loop. (GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to gomp_map_vars and gomp_exit_data. (gomp_target_task_fn): Likewise. * testsuite/libgomp.c-c++-common/refcount-1.c: New testcase. * testsuite/libgomp.c-c++-common/struct-elem-1.c: New testcase. * testsuite/libgomp.c-c++-common/struct-elem-2.c: New testcase. * testsuite/libgomp.c-c++-common/struct-elem-3.c: New testcase. * testsuite/libgomp.c-c++-common/struct-elem-4.c: New testcase. * testsuite/libgomp.c-c++-common/struct-elem-5.c: New testcase.
2021-02-25openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]Kwok Cheung Yeung1-6/+15
This adds support for the task detach clause to taskwait and taskgroup, and simplifies the handling of the detach clause by moving most of the extra handling required for detach tasks to omp_fulfill_event. 2021-02-25 Kwok Cheung Yeung <kcy@codesourcery.com> Jakub Jelinek <jakub@redhat.com> libgomp/ PR libgomp/98738 * libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED. (struct gomp_task): Replace detach and completion_sem fields with union containing completion_sem and detach_team. Add deferred_p field. (struct gomp_team): Remove task_detach_queue. * task.c: Include assert.h. (gomp_init_task): Initialize deferred_p and completion_sem fields. Rearrange initialization order of fields. (task_fulfilled_p): Delete. (GOMP_task): Use address of task as the event handle. Remove initialization of detach field. Initialize deferred_p field. Use automatic local for completion_sem. Initialize detach_team field for deferred tasks. (gomp_barrier_handle_tasks): Remove handling of task_detach_queue. Set kind of suspended detach task to GOMP_TASK_DETACHED and decrement task_running_count. Move finish_cancelled block out of else branch. Relocate call to gomp_team_barrier_done. (GOMP_taskwait): Handle tasks with completion events that have not been fulfilled. (GOMP_taskgroup_end): Likewise. (omp_fulfill_event): Use address of task as event handle. Post to completion_sem for undeferred tasks. Clear detach_team if task has not finished. For finished tasks, handle post-execution tasks, call gomp_team_barrier_wake if necessary, and free task. * team.c (gomp_new_team): Remove initialization of task_detach_queue. (free_team): Remove free of task_detach_queue. * testsuite/libgomp.c-c++-common/task-detach-1.c: Fix formatting. * testsuite/libgomp.c-c++-common/task-detach-2.c: Fix formatting. * testsuite/libgomp.c-c++-common/task-detach-3.c: Fix formatting. * testsuite/libgomp.c-c++-common/task-detach-4.c: Fix formatting. * testsuite/libgomp.c-c++-common/task-detach-5.c: Fix formatting. Change data-sharing of detach events on enclosing parallel to private. * testsuite/libgomp.c-c++-common/task-detach-6.c: Likewise. Remove taskwait directive. * testsuite/libgomp.c-c++-common/task-detach-7.c: New. * testsuite/libgomp.c-c++-common/task-detach-8.c: New. * testsuite/libgomp.c-c++-common/task-detach-9.c: New. * testsuite/libgomp.c-c++-common/task-detach-10.c: New. * testsuite/libgomp.c-c++-common/task-detach-11.c: New. * testsuite/libgomp.fortran/task-detach-1.f90: Fix formatting. * testsuite/libgomp.fortran/task-detach-2.f90: Fix formatting. * testsuite/libgomp.fortran/task-detach-3.f90: Fix formatting. * testsuite/libgomp.fortran/task-detach-4.f90: Fix formatting. * testsuite/libgomp.fortran/task-detach-5.f90: Fix formatting. Change data-sharing of detach events on enclosing parallel to private. * testsuite/libgomp.fortran/task-detach-6.f90: Likewise. Remove taskwait directive. * testsuite/libgomp.fortran/task-detach-7.f90: New. * testsuite/libgomp.fortran/task-detach-8.f90: New. * testsuite/libgomp.fortran/task-detach-9.f90: New. * testsuite/libgomp.fortran/task-detach-10.f90: New. * testsuite/libgomp.fortran/task-detach-11.f90: New.
2021-01-16openmp: Add support for the OpenMP 5.0 task detach clauseKwok Cheung Yeung1-0/+7
2021-01-16 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/ * builtin-types.def (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename to... (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR): ...this. Add extra argument. * gimplify.c (omp_default_clause): Ensure that event handle is firstprivate in a task region. (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DETACH. (gimplify_adjust_omp_clauses): Likewise. * omp-builtins.def (BUILT_IN_GOMP_TASK): Change function type to BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR. * omp-expand.c (expand_task_call): Add GOMP_TASK_FLAG_DETACH to flags if detach clause specified. Add detach argument when generating call to GOMP_task. * omp-low.c (scan_sharing_clauses): Setup data environment for detach clause. (finish_taskreg_scan): Move field for variable containing the event handle to the front of the struct. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DETACH. Fix ordering. * tree-nested.c (convert_nonlocal_omp_clauses): Handle OMP_CLAUSE_DETACH clause. (convert_local_omp_clauses): Handle OMP_CLAUSE_DETACH clause. * tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_DETACH. * tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_DETACH. Fix ordering. (omp_clause_code_name): Add entry for OMP_CLAUSE_DETACH. Fix ordering. (walk_tree_1): Handle OMP_CLAUSE_DETACH. gcc/c-family/ * c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_DETACH. Redefine PRAGMA_OACC_CLAUSE_DETACH. gcc/c/ * c-parser.c (c_parser_omp_clause_detach): New. (c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause. (OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH. * c-typeck.c (c_finish_omp_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause. Prevent use of detach with mergeable and overriding the data sharing mode of the event handle. gcc/cp/ * parser.c (cp_parser_omp_clause_detach): New. (cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH. (OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH. * pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_DETACH clause. * semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_DETACH clause. Prevent use of detach with mergeable and overriding the data sharing mode of the event handle. gcc/fortran/ * dump-parse-tree.c (show_omp_clauses): Handle detach clause. * frontend-passes.c (gfc_code_walker): Walk detach expression. * gfortran.h (struct gfc_omp_clauses): Add detach field. (gfc_c_intptr_kind): New. * openmp.c (gfc_free_omp_clauses): Free detach clause. (gfc_match_omp_detach): New. (enum omp_mask1): Add OMP_CLAUSE_DETACH. (enum omp_mask2): Remove OMP_CLAUSE_DETACH. (gfc_match_omp_clauses): Handle OMP_CLAUSE_DETACH for OpenMP. (OMP_TASK_CLAUSES): Add OMP_CLAUSE_DETACH. (resolve_omp_clauses): Prevent use of detach with mergeable and overriding the data sharing mode of the event handle. * trans-openmp.c (gfc_trans_omp_clauses): Handle detach clause. * trans-types.c (gfc_c_intptr_kind): New. (gfc_init_kinds): Initialize gfc_c_intptr_kind. * types.def (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename to... (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR): ...this. Add extra argument. gcc/testsuite/ * c-c++-common/gomp/task-detach-1.c: New. * g++.dg/gomp/task-detach-1.C: New. * gcc.dg/gomp/task-detach-1.c: New. * gfortran.dg/gomp/task-detach-1.f90: New. include/ * gomp-constants.h (GOMP_TASK_FLAG_DETACH): New. libgomp/ * fortran.c (omp_fulfill_event_): New. * libgomp.h (struct gomp_task): Add detach and completion_sem fields. (struct gomp_team): Add task_detach_queue and task_detach_count fields. * libgomp.map (OMP_5.0.1): Add omp_fulfill_event and omp_fulfill_event_. * libgomp_g.h (GOMP_task): Add extra argument. * omp.h.in (enum omp_event_handle_t): New. (omp_fulfill_event): New. * omp_lib.f90.in (omp_event_handle_kind): New. (omp_fulfill_event): New. * omp_lib.h.in (omp_event_handle_kind): New. (omp_fulfill_event): Declare. * priority_queue.c (priority_tree_find): New. (priority_list_find): New. (priority_queue_find): New. * priority_queue.h (priority_queue_predicate): New. (priority_queue_find): New. * task.c (gomp_init_task): Initialize detach field. (task_fulfilled_p): New. (GOMP_task): Add detach argument. Ignore detach argument if GOMP_TASK_FLAG_DETACH not set in flags. Initialize completion_sem field. Copy address of completion_sem into detach argument and into the start of the data record. Wait for detach event if task not deferred. (gomp_barrier_handle_tasks): Queue tasks with unfulfilled events. Remove completed tasks and requeue dependent tasks. (omp_fulfill_event): New. * team.c (gomp_new_team): Initialize task_detach_queue and task_detach_count fields. (free_team): Free task_detach_queue field. * testsuite/libgomp.c-c++-common/task-detach-1.c: New testcase. * testsuite/libgomp.c-c++-common/task-detach-2.c: New testcase. * testsuite/libgomp.c-c++-common/task-detach-3.c: New testcase. * testsuite/libgomp.c-c++-common/task-detach-4.c: New testcase. * testsuite/libgomp.c-c++-common/task-detach-5.c: New testcase. * testsuite/libgomp.c-c++-common/task-detach-6.c: New testcase. * testsuite/libgomp.fortran/task-detach-1.f90: New testcase. * testsuite/libgomp.fortran/task-detach-2.f90: New testcase. * testsuite/libgomp.fortran/task-detach-3.f90: New testcase. * testsuite/libgomp.fortran/task-detach-4.f90: New testcase. * testsuite/libgomp.fortran/task-detach-5.f90: New testcase. * testsuite/libgomp.fortran/task-detach-6.f90: New testcase.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-11-18openmp: Retire nest-var ICV for OpenMP 5.1Kwok Cheung Yeung1-3/+2
This removes the nest-var ICV, expressing nesting in terms of the max-active-levels-var ICV instead. The max-active-levels-var ICV is now per data environment rather than per device. 2020-11-18 Kwok Cheung Yeung <kcy@codesourcery.com> libgomp/ * env.c (gomp_global_icv): Remove nest_var field. Add max_active_levels_var field. (gomp_max_active_levels_var): Remove. (parse_boolean): Return true on success. (handle_omp_display_env): Express OMP_NESTED in terms of max_active_levels_var. Change format specifier for max_active_levels_var. (initialize_env): Set max_active_levels_var from OMP_MAX_ACTIVE_LEVELS, OMP_NESTED, OMP_NUM_THREADS and OMP_PROC_BIND. * icv.c (omp_set_nested): Express in terms of max_active_levels_var. (omp_get_nested): Likewise. (omp_set_max_active_levels): Use max_active_levels_var field instead of gomp_max_active_levels_var. (omp_get_max_active_levels): Likewise. * libgomp.h (struct gomp_task_icv): Remove nest_var field. Add max_active_levels_var field. (gomp_supported_active_levels): Set to UCHAR_MAX. (gomp_max_active_levels_var): Delete. * libgomp.texi (omp_get_nested): Update documentation. (omp_set_nested): Likewise. (OMP_MAX_ACTIVE_LEVELS): Likewise. (OMP_NESTED): Likewise. (OMP_NUM_THREADS): Likewise. (OMP_PROC_BIND): Likewise. * parallel.c (gomp_resolve_num_threads): Replace reference to nest_var with max_active_levels_var. Use max_active_levels_var field instead of gomp_max_active_levels_var.
2020-11-10openmp: Implement OpenMP 5.0 base-pointer attachement and clause orderingChung-Lin Tang1-4/+4
This patch implements some parts of the target variable mapping changes specified in OpenMP 5.0, including base-pointer attachment/detachment behavior for array section list-items in map clauses, and ordering of map clauses according to map kind. 2020-11-10 Chung-Lin Tang <cltang@codesourcery.com> gcc/c-family/ChangeLog: * c-common.h (c_omp_adjust_map_clauses): New declaration. * c-omp.c (struct map_clause): Helper type for c_omp_adjust_map_clauses. (c_omp_adjust_map_clauses): New function. gcc/c/ChangeLog: * c-parser.c (c_parser_omp_target_data): Add use of new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as handled map clause kind. (c_parser_omp_target_enter_data): Likewise. (c_parser_omp_target_exit_data): Likewise. (c_parser_omp_target): Likewise. * c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. (c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and same struct field access to co-exist on OpenMP construct. gcc/cp/ChangeLog: * parser.c (cp_parser_omp_target_data): Add use of new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as handled map clause kind. (cp_parser_omp_target_enter_data): Likewise. (cp_parser_omp_target_exit_data): Likewise. (cp_parser_omp_target): Likewise. * semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix interaction between reference case and attach/detach. (finish_omp_clauses): Adjust bitmap checks to allow struct decl and same struct field access to co-exist on OpenMP construct. gcc/ChangeLog: * gimplify.c (is_or_contains_p): New static helper function. (omp_target_reorder_clauses): New function. (gimplify_scan_omp_clauses): Add use of omp_target_reorder_clauses to reorder clause list according to OpenMP 5.0 rules. Add handling of GOMP_MAP_ATTACH_DETACH for OpenMP cases. * omp-low.c (is_omp_target): New static helper function. (scan_sharing_clauses): Add scan phase handling of GOMP_MAP_ATTACH/DETACH for OpenMP cases. (lower_omp_target): Add lowering handling of GOMP_MAP_ATTACH/DETACH for OpenMP cases. gcc/testsuite/ChangeLog: * c-c++-common/gomp/clauses-2.c: Remove dg-error cases now valid. * gfortran.dg/gomp/map-2.f90: Likewise. * c-c++-common/gomp/map-5.c: New testcase. libgomp/ChangeLog: * libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag usable. * oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to 'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'. (goacc_enter_datum): Likewise for call to gomp_map_vars_async. (goacc_enter_data_internal): Likewise. * target.c (gomp_map_vars_internal): Change checks of GOMP_MAP_VARS_ENTER_DATA to use bit-and (&). Adjust use of gomp_attach_pointer for OpenMP cases. (gomp_exit_data): Add handling of GOMP_MAP_DETACH. (GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH. * testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase.
2020-10-20openmp: Implement support for OMP_TARGET_OFFLOAD environment variableKwok Cheung Yeung1-0/+8
This implements support for the OMP_TARGET_OFFLOAD environment variable introduced in the OpenMP 5.0 standard, which controls how offloading is handled. It may be set to MANDATORY (abort if offloading cannot be performed), DISABLED (no offloading to devices) or DEFAULT (offload to device if possible, fall back to host if not). 2020-10-20 Kwok Cheung Yeung <kcy@codesourcery.com> libgomp/ * env.c (gomp_target_offload_var): New. (parse_target_offload): New. (handle_omp_display_env): Print value of OMP_TARGET_OFFLOAD. (initialize_env): Parse OMP_TARGET_OFFLOAD. * libgomp.h (gomp_target_offload_t): New. (gomp_target_offload_var): New. * libgomp.texi (OMP_TARGET_OFFLOAD): New section. * target.c (resolve_device): Generate error if device not found and offloading is mandatory. (gomp_target_fallback): Generate error if offloading is mandatory. (GOMP_target): Add argument in call to gomp_target_fallback. (GOMP_target_ext): Likewise. (gomp_target_data_fallback): Generate error if offloading is mandatory. (GOMP_target_data): Add argument in call to gomp_target_data_fallback. (GOMP_target_data_ext): Likewise. (gomp_target_task_fn): Add argument in call to gomp_target_fallback. (gomp_target_init): Return early if offloading is disabled.
2020-10-13openmp: Add support for the omp_get_supported_active_levels runtime library ↵Kwok Cheung Yeung1-0/+2
routine This patch implements the omp_get_supported_active_levels runtime routine from the OpenMP 5.0 specification, which returns the maximum number of active nested parallel regions supported by this implementation. The current maximum (set using the omp_set_max_active_levels routine or the OMP_MAX_ACTIVE_LEVELS environment variable) cannot exceed this number. 2020-10-13 Kwok Cheung Yeung <kcy@codesourcery.com> libgomp/ * env.c (gomp_max_active_levels_var): Initialize to gomp_supported_active_levels. (initialize_env): Limit gomp_max_active_levels_var to be at most equal to gomp_supported_active_levels. * fortran.c (omp_get_supported_active_levels): Add ialias_redirect. (omp_get_supported_active_levels_): New. * icv.c (omp_set_max_active_levels): Limit gomp_max_active_levels_var to at most equal to gomp_supported_active_levels. (omp_get_supported_active_levels): New. * libgomp.h (gomp_supported_active_levels): New. * libgomp.map (OMP_5.0.1): Add omp_get_supported_active_levels and omp_get_supported_active_levels_. * libgomp.texi (omp_get_supported_active_levels): New. (omp_set_max_active_levels): Update. Add reference to omp_get_supported_active_levels. * omp.h.in (omp_get_supported_active_levels): New. * omp_lib.f90.in (omp_get_supported_active_levels): New. * omp_lib.h.in (omp_get_supported_active_levels): New. * testsuite/libgomp.c/lib-2.c (main): Check omp_get_max_active_levels against omp_get_supported_active_levels. * testsuite/libgomp.fortran/lib4.f90 (lib4): Likewise.
2020-09-15OpenMP/Fortran: Fix (re)mapping of allocatable/pointer arrays [PR96668]Tobias Burnus1-0/+3
gcc/cp/ChangeLog: PR fortran/96668 * cp-gimplify.c (cxx_omp_finish_clause): Add bool openacc arg. * cp-tree.h (cxx_omp_finish_clause): Likewise * semantics.c (handle_omp_for_class_iterator): Update call. gcc/fortran/ChangeLog: PR fortran/96668 * trans.h (gfc_omp_finish_clause): Add bool openacc arg. * trans-openmp.c (gfc_omp_finish_clause): Ditto. Use GOMP_MAP_ALWAYS_POINTER with PSET for pointers. (gfc_trans_omp_clauses): Like the latter and also if the always modifier is used. gcc/ChangeLog: PR fortran/96668 * gimplify.c (gimplify_omp_for): Add 'bool openacc' argument; update omp_finish_clause calls. (gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses, gimplify_expr, gimplify_omp_loop): Update omp_finish_clause and/or gimplify_for calls. * langhooks-def.h (lhd_omp_finish_clause): Add bool openacc arg. * langhooks.c (lhd_omp_finish_clause): Likewise. * langhooks.h (lhd_omp_finish_clause): Likewise. * omp-low.c (scan_sharing_clauses): Keep GOMP_MAP_TO_PSET cause for 'declare target' vars. include/ChangeLog: PR fortran/96668 * gomp-constants.h (GOMP_MAP_ALWAYS_POINTER_P): Define. libgomp/ChangeLog: PR fortran/96668 * libgomp.h (struct target_var_desc): Add has_null_ptr_assoc member. * target.c (gomp_map_vars_existing): Add always_to_flag flag. (gomp_map_vars_existing): Update call to it. (gomp_map_fields_existing): Likewise (gomp_map_vars_internal): Update PSET handling such that if a nullptr is now allocated or if GOMP_MAP_POINTER is used PSET is updated and pointer remapped. (GOMP_target_enter_exit_data): Hanlde GOMP_MAP_ALWAYS_POINTER like GOMP_MAP_POINTER. * testsuite/libgomp.fortran/map-alloc-ptr-1.f90: New test. * testsuite/libgomp.fortran/map-alloc-ptr-2.f90: New test.
2020-07-27openacc: Deep copy attach/detach should not affect reference countsJulian Brown1-2/+2
Attach and detach operations are not supposed to affect structural or dynamic reference counts for OpenACC. Previously they did so, which led to subtle problems in some circumstances. We can avoid reference-counting attach/detach operations by extending and slightly repurposing the do_detach field in target_var_desc. It is now called is_attach to better reflect its new role. 2020-07-27 Julian Brown <julian@codesourcery.com> Thomas Schwinge <thomas@codesourcery.com> libgomp/ * libgomp.h (struct target_var_desc): Rename do_detach field to is_attach. * oacc-mem.c (goacc_exit_datum_1): Add assert. Don't set finalize for GOMP_MAP_FORCE_DETACH. Update checking to use is_attach field. (goacc_enter_data_internal): Don't affect reference counts for attach mappings. (goacc_exit_data_internal): Don't affect reference counts for detach mappings. * target.c (gomp_map_vars_existing): Don't affect reference counts for attach mappings. (gomp_map_vars_internal): Set renamed is_attach flag unconditionally to mark attach mappings. (gomp_unmap_vars_internal): Use is_attach flag to prevent affecting reference count for attach mappings. * testsuite/libgomp.oacc-c-c++-common/mdc-refcount-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/mdc-refcount-2.c: New test. * testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Mark test as shouldfail. * testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Adjust to fail gracefully in no-finalize mode. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2020-07-10openacc: Adjust dynamic reference count semanticsJulian Brown1-6/+2
This patch adjusts how dynamic reference counts work so that they match the semantics of the source program more closely, instead of representing "excess" reference counts beyond those that represent pointers in the internal libgomp splay-tree data structure. This allows some corner cases to be handled more gracefully. 2020-07-10 Julian Brown <julian@codesourcery.com> Thomas Schwinge <thomas@codesourcery.com> libgomp/ * libgomp.h (struct splay_tree_key_s): Change virtual_refcount to dynamic_refcount. (struct gomp_device_descr): Remove GOMP_MAP_VARS_OPENACC_ENTER_DATA. * oacc-mem.c (acc_map_data): Substitute virtual_refcount for dynamic_refcount. (acc_unmap_data): Update comment. (goacc_map_var_existing, goacc_enter_datum): Adjust for dynamic_refcount semantics. (goacc_exit_datum_1, goacc_exit_datum): Re-add some error checking. Adjust for dynamic_refcount semantics. (goacc_enter_data_internal): Implement "present" case of dynamic memory-map handling here. Update "non-present" case for dynamic_refcount semantics. (goacc_exit_data_internal): Use goacc_exit_datum_1. * target.c (gomp_map_vars_internal): Remove GOMP_MAP_VARS_OPENACC_ENTER_DATA handling. Update for dynamic_refcount handling. (gomp_unmap_vars_internal): Remove virtual_refcount handling. (gomp_load_image_to_device): Substitute dynamic_refcount for virtual_refcount. * testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Remove XFAILs. * testsuite/libgomp.oacc-c-c++-common/refcounting-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/refcounting-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c: New test. * testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Remove XFAILs and trace output. * testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Remove trace output. * testsuite/libgomp.oacc-fortran/dynamic-incr-structural-1.f90: New test. * testsuite/libgomp.oacc-c-c++-common/structured-dynamic-lifetimes-4.c: Remove stale comment. * testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-1.f90: Remove XFAILs. * testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-2.F90: Likewise. * testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-2.f90: Likewise. * testsuite/libgomp.oacc-fortran/mdc-refcount-1-3-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90: Adjust XFAIL. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2020-05-19openmp: Add basic library allocator support.Jakub Jelinek1-0/+4
This patch adds very basic allocator support (omp_{init,destroy}_allocator, omp_{alloc,free}, omp_[sg]et_default_allocator). The plan is to use memkind (likely dlopened) for high bandwidth memory, but that part isn't implemented yet, probably mlock for pinned memory and see what other options there are for other kinds of memory. For offloading targets, we need to decide if we want to support the dynamic allocators (and on which targets), or if e.g. all we do is at compile time replace omp_alloc/omp_free calls with constexpr predefined allocators with something special. And allocate directive and allocator/uses_allocators clauses are future work too. 2020-05-19 Jakub Jelinek <jakub@redhat.com> * omp.h.in (omp_uintptr_t): New typedef. (__GOMP_UINTPTR_T_ENUM): Define. (omp_memspace_handle_t, omp_allocator_handle_t, omp_alloctrait_key_t, omp_alloctrait_value_t, omp_alloctrait_t): New typedefs. (__GOMP_DEFAULT_NULL_ALLOCATOR): Define. (omp_init_allocator, omp_destroy_allocator, omp_set_default_allocator, omp_get_default_allocator, omp_alloc, omp_free): Declare. * libgomp.h (struct gomp_team_state): Add def_allocator field. (gomp_def_allocator): Declare. * libgomp.map (OMP_5.0.1): Export omp_set_default_allocator, omp_get_default_allocator, omp_init_allocator, omp_destroy_allocator, omp_alloc and omp_free. * team.c (gomp_team_start): Copy over ts.def_allocator. * env.c (gomp_def_allocator): New variable. (parse_wait_policy): Adjust function comment. (parse_allocator): New function. (handle_omp_display_env): Print OMP_ALLOCATOR. (initialize_env): Call parse_allocator. * Makefile.am (libgomp_la_SOURCES): Add allocator.c. * allocator.c: New file. * icv.c (omp_set_default_allocator, omp_get_default_allocator): New functions. * testsuite/libgomp.c-c++-common/alloc-1.c: New test. * testsuite/libgomp.c-c++-common/alloc-2.c: New test. * testsuite/libgomp.c-c++-common/alloc-3.c: New test. * Makefile.in: Regenerated.
2020-01-10OpenACC 'acc_get_property' cleanupThomas Schwinge1-1/+2
include/ * gomp-constants.h (enum gomp_device_property): Remove. libgomp/ * libgomp-plugin.h (enum goacc_property): New. Adjust all users to use this instead of 'enum gomp_device_property'. (GOMP_OFFLOAD_get_property): Rename to... (GOMP_OFFLOAD_openacc_get_property): ... this. Adjust all users. * libgomp.h (struct gomp_device_descr): Move 'GOMP_OFFLOAD_openacc_get_property'... (struct acc_dispatch_t): ... here. Adjust all users. * plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): Remove. liboffloadmic/ * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property): Remove. From-SVN: r280150
2020-01-10re PR libgomp/93219 (unused return value in affinity-fmt.c)Jakub Jelinek1-1/+1
PR libgomp/93219 * libgomp.h (gomp_print_string): Change return type from void to int. * affinity-fmt.c (gomp_print_string): Likewise. Return true if not all characters have been written. From-SVN: r280137
2020-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r279813
2019-12-22Add OpenACC 2.6 `acc_get_property' supportMaciej W. Rozycki1-0/+1
Add generic support for the OpenACC 2.6 `acc_get_property' and `acc_get_property_string' routines, as well as full handlers for the host and the NVPTX offload targets and minimal handlers for the HSA, Intel MIC, and AMD GCN offload targets. Included are C/C++ and Fortran tests that, in particular, print the property values for acc_property_vendor, acc_property_memory, acc_property_free_memory, acc_property_name, and acc_property_driver. The output looks as follows: Vendor: GNU Name: GOMP Total memory: 0 Free memory: 0 Driver: 1.0 with the host driver (where the memory related properties are not supported for the host device and yield 0, conforming to the standard) and output like: Vendor: Nvidia Total memory: 12651462656 Free memory: 12202737664 Name: TITAN V Driver: CUDA Driver 9.1 with the NVPTX driver. 2019-12-22 Maciej W. Rozycki <macro@codesourcery.com> Frederik Harwath <frederik@codesourcery.com> Thomas Schwinge <tschwinge@codesourcery.com> include/ * gomp-constants.h (gomp_device_property): New enum. libgomp/ * libgomp.h (gomp_device_descr): Add `get_property_func' member. * libgomp-plugin.h (gomp_device_property_value): New union. (gomp_device_property_value): New prototype. * openacc.h (acc_device_t): Add `acc_device_current' enumeration constant. (acc_device_property_t): New enum. (acc_get_property, acc_get_property_string): New prototypes. * oacc-init.c (acc_get_device_type): Also assert that result is not `acc_device_current'. (get_property_any, acc_get_property, acc_get_property_string): New functions. * openacc.f90 (openacc_kinds): Add `acc_device_current' and `acc_property_memory', `acc_property_free_memory', `acc_property_name', `acc_property_vendor' and `acc_property_driver' constants. Add `acc_device_property' data type. (openacc_internal): Add `acc_get_property' and `acc_get_property_string' interfaces. Add `acc_get_property_h', `acc_get_property_string_h', `acc_get_property_l' and `acc_get_property_string_l'. * oacc-host.c (host_get_property): New function. (host_dispatch): Wire it. * target.c (gomp_load_plugin_for_device): Handle `get_property'. * libgomp.map (OACC_2.6): Add `acc_get_property', `acc_get_property_h_', `acc_get_property_string' and `acc_get_property_string_h_' symbols. * libgomp.texi (OpenACC Runtime Library Routines): Add `acc_get_property'. (acc_get_property): New node. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_property): New function (stub). * plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): New function. * plugin/plugin-nvptx.c (CUDA_CALLS): Add `cuDeviceGetName', `cuDeviceTotalMem', `cuDriverGetVersion' and `cuMemGetInfo' calls. (GOMP_OFFLOAD_get_property): New function. (struct ptx_device): Add new field "name". (cuda_driver_version_s): Add new static variable ... (nvptx_init): ... and init from here. * testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c: New file with test helper functions. * testsuite/libgomp.oacc-fortran/acc_get_property.f90: New test. liboffloadmic/ * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_property): New function. Reviewed-by: Thomas Schwinge <thomas@codesourcery.com> Co-Authored-By: Frederik Harwath <frederik@codesourcery.com> Co-Authored-By: Thomas Schwinge <tschwinge@codesourcery.com> From-SVN: r279710
2019-12-20OpenACC 2.6 deep copy: libgomp partsJulian Brown1-0/+2
include/ * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_4, GOMP_MAP_DEEP_COPY): Define. (gomp_map_kind): Add GOMP_MAP_ATTACH, GOMP_MAP_DETACH, GOMP_MAP_FORCE_DETACH. libgomp/ * libgomp.h (struct target_var_desc): Add do_detach flag. * oacc-init.c (acc_shutdown_1): Free aux block if present. * oacc-mem.c (find_group_last): Add SIZES parameter. Support struct components. Tidy up and add some new checks. (goacc_enter_data_internal): Update call to find_group_last. (goacc_exit_data_internal): Support detach operations and GOMP_MAP_STRUCT. (GOACC_enter_exit_data): Handle initial GOMP_MAP_STRUCT or GOMP_MAP_FORCE_PRESENT in finalization detection code. Handle attach/detach in enter/exit data detection code. * target.c (gomp_map_vars_existing): Initialise do_detach field of tgt_var_desc. (gomp_map_vars_internal): Support attach. (gomp_unmap_vars_internal): Support detach. From-SVN: r279625
2019-12-20OpenACC 2.6 deep copy: attach/detach API routinesJulian Brown1-0/+10
libgomp/ * libgomp.h (struct splay_tree_aux): Add attach_count field. (gomp_attach_pointer, gomp_detach_pointer): Add prototypes. * libgomp.map (OACC_2.6): New section. Add acc_attach, acc_attach_async, acc_detach, acc_detach_async, acc_detach_finalize, acc_detach_finalize_async. * oacc-mem.c (acc_attach_async, acc_attach, goacc_detach_internal, acc_detach, acc_detach_async, acc_detach_finalize, acc_detach_finalize_async): New functions. * openacc.h (acc_attach, acc_attach_async, acc_detach, (acc_detach_async, acc_detach_finalize, acc_detach_finalize_async): Add prototypes. * target.c (gomp_attach_pointer, gomp_detach_pointer): New functions. (gomp_remove_var_internal): Free attachment counts if present. * testsuite/libgomp.oacc-c-c++-common/deep-copy-3.c: New test. * testsuite/libgomp.oacc-c-c++-common/deep-copy-5.c: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com> From-SVN: r279624
2019-12-20Use gomp_map_val for OpenACC host-to-device address translationJulian Brown1-0/+1
libgomp/ * libgomp.h (gomp_map_val): Add prototype. * oacc-parallel.c (GOACC_parallel_keyed): Use gomp_map_val instead of open-coding device-address calculation. * target.c (gomp_map_val): Make global. Use OFFSET_POINTER in non-present case. Co-Authored-By: Cesar Philippidis <cesar@codesourcery.com> From-SVN: r279622
2019-12-20OpenACC reference count overhaulJulian Brown1-3/+6
libgomp/ * libgomp.h (struct splay_tree_key_s): Substitute dynamic_refcount field for virtual_refcount. (enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA. (gomp_free_memmap): Remove prototype. * oacc-init.c (acc_shutdown_1): Iteratively call gomp_remove_var instead of calling gomp_free_memmap. * oacc-mem.c (acc_map_data): Use virtual_refcount instead of dynamic_refcount. (acc_unmap_data): Open code instead of forcing target_mem_desc's to_free field to NULL then calling gomp_unmap_vars. Handle REFCOUNT_INFINITY on target blocks. (goacc_enter_data): Rename to... (goacc_enter_datum): ...this. Remove MAPNUM parameter and special handling for mapping groups. Use virtual_refcount instead of dynamic_refcount. Use GOMP_MAP_VARS_OPENACC_ENTER_DATA for map_map_vars_async call. Re-do lookup for target pointer return value. (acc_create, acc_create_async, acc_copyin, acc_copyin_async): Call renamed goacc_enter_datum function. (goacc_exit_data): Rename to... (goacc_exit_datum): ...this. Update for virtual_refcount semantics. (acc_delete, acc_delete_async, acc_delete_finalize, acc_delete_finalize_async, acc_copyout, acc_copyout_async, acc_copyout_finalize, acc_copyout_finalize_async): Call renamed goacc_exit_datum function. (gomp_acc_remove_pointer, find_pointer): Remove functions. (find_group_last, goacc_enter_data_internal, goacc_exit_data_internal): New functions. (GOACC_enter_exit_data): Use goacc_enter_data_internal and goacc_exit_data_internal helper functions. * target.c (gomp_map_vars_internal): Handle GOMP_MAP_VARS_OPENACC_ENTER_DATA. Update for virtual_refcount semantics. (gomp_unmap_vars_internal): Update for virtual_refcount semantics. (gomp_load_image_to_device, omp_target_associate_ptr): Zero-initialise virtual_refcount field instead of dynamic_refcount. (gomp_free_memmap): Remove function. * testsuite/libgomp.oacc-c-c++-common/unmap-infinity-1.c: New test. * testsuite/libgomp.c-c++-common/unmap-infinity-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Add XFAIL. From-SVN: r279621
2019-12-20Use aux struct in libgomp for infrequently-used/API-specific dataJulian Brown1-2/+8
libgomp/ * libgomp.h (struct splay_tree_aux): New. (struct splay_tree_key_s): Replace link_key field with aux pointer. * target.c (gomp_map_vars_internal): Adjust for link_key being moved to aux struct. (gomp_remove_var_internal): Free aux block if present. (gomp_load_image_to_device): Zero-initialise aux field instead of link_key field. (omp_target_associate_pointer): Zero-initialise aux field. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com> From-SVN: r279620
2019-12-18Make 'libgomp/target.c:gomp_unmap_tgt' 'static' againThomas Schwinge1-1/+0
This got changed to 'attribute_hidden' in r271128, but it's not actually used outside of 'libgomp/target.c'. libgomp/ * target.c (gomp_unmap_tgt): Make it 'static'. * libgomp.h (gomp_unmap_tgt): Remove. From-SVN: r279529
2019-12-13Fix potential race condition in OpenACC "exit data" operationsJulian Brown1-0/+2
PR libgomp/92881 libgomp/ * libgomp.h (gomp_remove_var_async): Add prototype. * oacc-mem.c (delete_copyout): Call gomp_remove_var_async instead of gomp_remove_var. * target.c (gomp_unref_tgt): Change return type to bool, indicating whether target_mem_desc was unmapped. (gomp_unref_tgt_void): New. (gomp_remove_var): Reimplement in terms of... (gomp_remove_var_internal): ...this new helper function. (gomp_remove_var_async): New, implemented using above helper function. (gomp_unmap_vars_internal): Use gomp_unref_tgt_void instead of gomp_unref_tgt. Reviewed-by: Thomas Schwinge <thomas@codesourcery.com> From-SVN: r279388
2019-12-11[OpenACC] Consolidate 'GOACC_enter_exit_data' and its helper functions in ↵Thomas Schwinge1-2/+0
'libgomp/oacc-mem.c' libgomp/ * oacc-parallel.c (find_pointer, GOACC_enter_exit_data): Move... * oacc-mem.c: ... here. (gomp_acc_insert_pointer, gomp_acc_remove_pointer): Rename to 'goacc_insert_pointer', 'goacc_remove_pointer', and make 'static'. * libgomp.h (gomp_acc_insert_pointer, gomp_acc_remove_pointer): Remove. * libgomp_g.h: Update. From-SVN: r279233
2019-12-09[PR92116, PR92877] [OpenACC] Replace 'openacc.data_environ' by standard ↵Thomas Schwinge1-9/+1
libgomp mechanics libgomp/ PR libgomp/92116 PR libgomp/92877 * oacc-mem.c (lookup_dev): Reimplement. Adjust all users. * libgomp.h (struct acc_dispatch_t): Remove 'data_environ' member. Adjust all users. * testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4-2.c: Remove XFAIL. * testsuite/libgomp.oacc-c-c++-common/acc_free-pr92503-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/pr92877-1.c: New file. Co-Authored-By: Julian Brown <julian@codesourcery.com> From-SVN: r279147
2019-11-13Optimize GCN OpenMP malloc performanceAndrew Stubbs1-0/+63
2019-11-13 Andrew Stubbs <ams@codesourcery.com> libgomp/ * config/gcn/team.c (gomp_gcn_enter_kernel): Set up the team arena and use team_malloc variants. (gomp_gcn_exit_kernel): Use team_free. * libgomp.h (TEAM_ARENA_SIZE): Define. (TEAM_ARENA_START): Define. (TEAM_ARENA_FREE): Define. (TEAM_ARENA_END): Define. (team_malloc): New function. (team_malloc_cleared): New function. (team_free): New function. * team.c (gomp_new_team): Initialize and use team_malloc. (free_team): Use team_free. (gomp_free_thread): Use team_free. (gomp_pause_host): Use team_free. * work.c (gomp_init_work_share): Use team_malloc. (gomp_fini_work_share): Use team_free. From-SVN: r278136
2019-11-13GCN libgomp portAndrew Stubbs1-0/+18
2019-11-13 Andrew Stubbs <ams@codesourcery.com> Kwok Cheung Yeung <kcy@codesourcery.com> Julian Brown <julian@codesourcery.com> Tom de Vries <tom@codesourcery.com> include/ * gomp-constants.h (GOMP_DEVICE_GCN): Define. (GOMP_VERSION_GCN): Define. libgomp/ * Makefile.am (libgomp_la_SOURCES): Add oacc-target.c. * Makefile.in: Regenerate. * config.h.in (PLUGIN_GCN): Add new undef. * config/accel/openacc.f90 (acc_device_gcn): New parameter. * config/gcn/affinity-fmt.c: New file. * config/gcn/bar.c: New file. * config/gcn/bar.h: New file. * config/gcn/doacross.h: New file. * config/gcn/icv-device.c: New file. * config/gcn/oacc-target.c: New file. * config/gcn/simple-bar.h: New file. * config/gcn/target.c: New file. * config/gcn/task.c: New file. * config/gcn/team.c: New file. * config/gcn/time.c: New file. * configure.ac: Add amdgcn*-*-*. * configure: Regenerate. * configure.tgt: Add amdgcn*-*-*. * libgomp-plugin.h (offload_target_type): Add OFFLOAD_TARGET_TYPE_GCN. * libgomp.h (gcn_thrs): Add amdgcn variant. (set_gcn_thrs): Likewise. (gomp_thread): Likewise. * oacc-int.h (goacc_thread): Likewise. * oacc-target.c: New file. * openacc.f90 (acc_device_gcn): New parameter. * openacc.h (acc_device_t): Add acc_device_gcn. * team.c (gomp_free_pool_helper): Add amdgcn support. Co-Authored-By: Julian Brown <julian@codesourcery.com> Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Tom de Vries <tom@codesourcery.com> From-SVN: r278135
2019-10-03Libgomp magic offset value self-documentationJulian Brown1-0/+5
2019-10-02 Julian Brown <julian@codesourcery.com> Cesar Philippidis <cesar@codesourcery.com> libgomp/ * libgomp.h (OFFSET_INLINED, OFFSET_POINTER, OFFSET_STRUCT): Define. * target.c (FIELD_TGT_EMPTY): Define. (gomp_map_val): Use OFFSET_* macros instead of magic constants. Write as switch instead of list of ifs. (gomp_map_vars_internal): Use OFFSET_* and FIELD_TGT_EMPTY macros. Co-Authored-By: Cesar Philippidis <cesar@codesourcery.com> From-SVN: r276519
2019-10-01configure.ac: Remove GCC_HEADER_STDINT(gstdint.h).Jakub Jelinek1-1/+1
* configure.ac: Remove GCC_HEADER_STDINT(gstdint.h). * libgomp.h: Include <stdint.h> instead of "gstdint.h". * oacc-parallel.c: Don't include "libgomp_g.h". * plugin/plugin-hsa.c: Include <stdint.h> instead of "gstdint.h". * plugin/plugin-nvptx.c: Don't include "gstdint.h". * aclocal.m4: Regenerated. * config.h.in: Regenerated. * configure: Regenerated. * Makefile.in: Regenerated. From-SVN: r276389
2019-05-132019-05-13 Chung-Lin Tang <cltang@codesourcery.com>Chung-Lin Tang1-15/+38
Reviewed-by: Thomas Schwinge <thomas@codesourcery.com> libgomp/ * libgomp-plugin.h (struct goacc_asyncqueue): Declare. (struct goacc_asyncqueue_list): Likewise. (goacc_aq): Likewise. (goacc_aq_list): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove. (GOMP_OFFLOAD_openacc_async_test): Remove. (GOMP_OFFLOAD_openacc_async_test_all): Remove. (GOMP_OFFLOAD_openacc_async_wait): Remove. (GOMP_OFFLOAD_openacc_async_wait_async): Remove. (GOMP_OFFLOAD_openacc_async_wait_all): Remove. (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove. (GOMP_OFFLOAD_openacc_async_set_async): Remove. (GOMP_OFFLOAD_openacc_exec): Adjust declaration. (GOMP_OFFLOAD_openacc_cuda_get_stream): Likewise. (GOMP_OFFLOAD_openacc_cuda_set_stream): Likewise. (GOMP_OFFLOAD_openacc_async_exec): Declare. (GOMP_OFFLOAD_openacc_async_construct): Declare. (GOMP_OFFLOAD_openacc_async_destruct): Declare. (GOMP_OFFLOAD_openacc_async_test): Declare. (GOMP_OFFLOAD_openacc_async_synchronize): Declare. (GOMP_OFFLOAD_openacc_async_serialize): Declare. (GOMP_OFFLOAD_openacc_async_queue_callback): Declare. (GOMP_OFFLOAD_openacc_async_host2dev): Declare. (GOMP_OFFLOAD_openacc_async_dev2host): Declare. * libgomp.h (struct acc_dispatch_t): Define 'async' sub-struct. (gomp_acc_insert_pointer): Adjust declaration. (gomp_copy_host2dev): New declaration. (gomp_copy_dev2host): Likewise. (gomp_map_vars_async): Likewise. (gomp_unmap_tgt): Likewise. (gomp_unmap_vars_async): Likewise. (gomp_fini_device): Likewise. * oacc-async.c (get_goacc_thread): New function. (get_goacc_thread_device): New function. (lookup_goacc_asyncqueue): New function. (get_goacc_asyncqueue): New function. (acc_async_test): Adjust code to use new async design. (acc_async_test_all): Likewise. (acc_wait): Likewise. (acc_wait_async): Likewise. (acc_wait_all): Likewise. (acc_wait_all_async): Likewise. (goacc_async_free): New function. (goacc_init_asyncqueues): Likewise. (goacc_fini_asyncqueues): Likewise. * oacc-cuda.c (acc_get_cuda_stream): Adjust code to use new async design. (acc_set_cuda_stream): Likewise. * oacc-host.c (host_openacc_exec): Adjust parameters, remove 'async'. (host_openacc_register_async_cleanup): Remove. (host_openacc_async_exec): New function. (host_openacc_async_test): Adjust parameters. (host_openacc_async_test_all): Remove. (host_openacc_async_wait): Remove. (host_openacc_async_wait_async): Remove. (host_openacc_async_wait_all): Remove. (host_openacc_async_wait_all_async): Remove. (host_openacc_async_set_async): Remove. (host_openacc_async_synchronize): New function. (host_openacc_async_serialize): New function. (host_openacc_async_host2dev): New function. (host_openacc_async_dev2host): New function. (host_openacc_async_queue_callback): New function. (host_openacc_async_construct): New function. (host_openacc_async_destruct): New function. (struct gomp_device_descr host_dispatch): Remove initialization of old interface, add intialization of new async sub-struct. * oacc-init.c (acc_shutdown_1): Adjust to use gomp_fini_device. (goacc_attach_host_thread_to_device): Remove old async code usage. * oacc-int.h (goacc_init_asyncqueues): New declaration. (goacc_fini_asyncqueues): Likewise. (goacc_async_copyout_unmap_vars): Likewise. (goacc_async_free): Likewise. (get_goacc_asyncqueue): Likewise. (lookup_goacc_asyncqueue): Likewise. * oacc-mem.c (memcpy_tofrom_device): Adjust code to use new async design. (present_create_copy): Adjust code to use new async design. (delete_copyout): Likewise. (update_dev_host): Likewise. (gomp_acc_insert_pointer): Add async parameter, adjust code to use new async design. (gomp_acc_remove_pointer): Adjust code to use new async design. * oacc-parallel.c (GOACC_parallel_keyed): Adjust code to use new async design. (GOACC_enter_exit_data): Likewise. (goacc_wait): Likewise. (GOACC_update): Likewise. * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Change to assert fail when called, warn as obsolete in comment. * target.c (goacc_device_copy_async): New function. (gomp_copy_host2dev): Remove 'static', add goacc_asyncqueue parameter, add goacc_device_copy_async case. (gomp_copy_dev2host): Likewise. (gomp_map_vars_existing): Add goacc_asyncqueue parameter, adjust code. (gomp_map_pointer): Likewise. (gomp_map_fields_existing): Likewise. (gomp_map_vars_internal): New always_inline function, renamed from gomp_map_vars. (gomp_map_vars): Implement by calling gomp_map_vars_internal. (gomp_map_vars_async): Implement by calling gomp_map_vars_internal, passing goacc_asyncqueue argument. (gomp_unmap_tgt): Remove static, add attribute_hidden. (gomp_unref_tgt): New function. (gomp_unmap_vars_internal): New always_inline function, renamed from gomp_unmap_vars. (gomp_unmap_vars): Implement by calling gomp_unmap_vars_internal. (gomp_unmap_vars_async): Implement by calling gomp_unmap_vars_internal, passing goacc_asyncqueue argument. (gomp_fini_device): New function. (gomp_exit_data): Adjust gomp_copy_dev2host call. (gomp_load_plugin_for_device): Remove old interface, adjust to load new async interface. (gomp_target_fini): Adjust code to call gomp_fini_device. * plugin/plugin-nvptx.c (struct cuda_map): Remove. (struct ptx_stream): Remove. (struct nvptx_thread): Remove current_stream field. (cuda_map_create): Remove. (cuda_map_destroy): Remove. (map_init): Remove. (map_fini): Remove. (map_pop): Remove. (map_push): Remove. (struct goacc_asyncqueue): Define. (struct nvptx_callback): Define. (struct ptx_free_block): Define. (struct ptx_device): Remove null_stream, active_streams, async_streams, stream_lock, and next fields. (enum ptx_event_type): Remove. (struct ptx_event): Remove. (ptx_event_lock): Remove. (ptx_events): Remove. (init_streams_for_device): Remove. (fini_streams_for_device): Remove. (select_stream_for_async): Remove. (nvptx_init): Remove ptx_events and ptx_event_lock references. (nvptx_attach_host_thread_to_device): Remove CUDA_ERROR_NOT_PERMITTED case. (nvptx_open_device): Add free_blocks initialization, remove init_streams_for_device call. (nvptx_close_device): Remove fini_streams_for_device call, add free_blocks destruct code. (event_gc): Remove. (event_add): Remove. (nvptx_exec): Adjust parameters and code. (nvptx_free): Likewise. (nvptx_host2dev): Remove. (nvptx_dev2host): Remove. (nvptx_set_async): Remove. (nvptx_async_test): Remove. (nvptx_async_test_all): Remove. (nvptx_wait): Remove. (nvptx_wait_async): Remove. (nvptx_wait_all): Remove. (nvptx_wait_all_async): Remove. (nvptx_get_cuda_stream): Remove. (nvptx_set_cuda_stream): Remove. (GOMP_OFFLOAD_alloc): Adjust code. (GOMP_OFFLOAD_free): Likewise. (GOMP_OFFLOAD_openacc_register_async_cleanup): Remove. (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code. (GOMP_OFFLOAD_openacc_async_test_all): Remove. (GOMP_OFFLOAD_openacc_async_wait): Remove. (GOMP_OFFLOAD_openacc_async_wait_async): Remove. (GOMP_OFFLOAD_openacc_async_wait_all): Remove. (GOMP_OFFLOAD_openacc_async_wait_all_async): Remove. (GOMP_OFFLOAD_openacc_async_set_async): Remove. (cuda_free_argmem): New function. (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function. (GOMP_OFFLOAD_openacc_create_thread_data): Adjust code. (GOMP_OFFLOAD_openacc_cuda_get_stream): Adjust code. (GOMP_OFFLOAD_openacc_cuda_set_stream): Adjust code. (GOMP_OFFLOAD_openacc_async_construct): New plugin hook function. (GOMP_OFFLOAD_openacc_async_destruct): New plugin hook function. (GOMP_OFFLOAD_openacc_async_test): Remove and re-implement. (GOMP_OFFLOAD_openacc_async_synchronize): New plugin hook function. (GOMP_OFFLOAD_openacc_async_serialize): New plugin hook function. (GOMP_OFFLOAD_openacc_async_queue_callback): New plugin hook function. (cuda_callback_wrapper): New function. (cuda_memcpy_sanity_check): New function. (GOMP_OFFLOAD_host2dev): Remove and re-implement. (GOMP_OFFLOAD_dev2host): Remove and re-implement. (GOMP_OFFLOAD_openacc_async_host2dev): New plugin hook function. (GOMP_OFFLOAD_openacc_async_dev2host): New plugin hook function. From-SVN: r271128
2019-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r267494