aboutsummaryrefslogtreecommitdiff
path: root/libgomp/plugin
AgeCommit message (Collapse)AuthorFilesLines
2024-06-06nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment ↵Thomas Schwinge2-0/+46
variable [PR97384, PR105274] ... as a means to manually set the "native" GPU thread stack size. PR libgomp/97384 PR libgomp/105274 libgomp/ * plugin/cuda-lib.def (cuCtxSetLimit): Add. * plugin/plugin-nvptx.c (nvptx_open_device): Handle 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable.
2024-06-06nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld'Thomas Schwinge1-1/+116
This extends commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53 "nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'" for offloading. libgcc/ * config/nvptx/gbl-ctors.c ["mgomp"] (__do_global_ctors__entry__mgomp) (__do_global_dtors__entry__mgomp): New. [!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry): New. libgomp/ * plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New. (nvptx_close_device, GOMP_OFFLOAD_load_image) (GOMP_OFFLOAD_unload_image): Call it.
2024-05-29libgomp: Enable USM for AMD APUs and MI200 devicesTobias Burnus1-0/+17
If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true, all GPUs on the system support unified shared memory. That's the case for APUs and MI200 devices when XNACK is enabled. XNACK can be enabled by setting HSA_XNACK=1 as env var for supported devices; otherwise, if disable, USM code will use host fallback. gcc/ChangeLog: * config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo. include/ChangeLog: * hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add enum value. libgomp/ChangeLog: * libgomp.texi (gcn): Update USM handling * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true.
2024-05-29libgomp: Enable USM for some nvptx devicesTobias Burnus1-0/+15
A few high-end nvptx devices support the attribute CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared memory is supported in hardware. This patch enables support for those - if all installed nvptx devices have this feature (as the capabilities are per device type). This exposes a bug in gomp_copy_back_icvs as it did before use omp_get_mapped_ptr to find mapped variables, but that returns the unchanged pointer in cased of shared memory. But in this case, we have a few actually mapped pointers - like the ICV variables. Additionally, there was a mismatch with regards to '-1' for the device number as gomp_copy_back_icvs and omp_get_mapped_ptr count differently. Hence, do the lookup manually. include/ChangeLog: * cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add. libgomp/ChangeLog: * libgomp.texi (nvptx): Update USM description. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim support when requesting USM and all devices support CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS. * target.c (gomp_copy_back_icvs): Fix device ptr lookup. (gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the devices supports USM.
2024-04-26amdgcn: Add gfx90c targetFrederik Harwath1-0/+9
Add support for gfx90c GCN5 APU integrated graphics devices. The LLVM AMDGPU documentation does not list those devices as supported by rocm-amdhsa, but it passes most libgomp offloading tests. Although they are constrainted compared to dGPUs, they might be interesting for learning, experimentation, and testing. gcc/ChangeLog: * config.gcc: Add gfx90c. * config/gcn/gcn-hsa.h (NO_SRAM_ECC): Likewise. * config/gcn/gcn-opts.h (enum processor_type): Likewise. (TARGET_GFX90c): New macro. * config/gcn/gcn.cc (gcn_option_override): Handle gfx90c. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. * config/gcn/gcn.h: Add gfx90c. * config/gcn/gcn.opt: Likewise. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90c): New macro. (get_arch): Handle gfx90c. (main): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c * config/gcn/t-omp-device: Add gfx90c. * doc/install.texi: Likewise. * doc/invoke.texi: Likewise. libgomp/ChangeLog: * plugin/plugin-gcn.c (isa_hsa_name): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c. (isa_code): Handle gfx90c. (max_isa_vgprs): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c. Signed-off-by: Frederik Harwath <frederik@harwath.name>
2024-04-08GCN, nvptx: Errors during device probing are fatalThomas Schwinge2-7/+11
Currently, we silently disable libgomp GCN and nvptx plugins/devices in presence of certain error conditions during device probing, thus typically silently resorting to host-fallback execution. Make such errors fatal, similar as for any other device access later on, so that we early and reliably notice when things go wrong. (Keep just two cases non-fatal: (a) libgomp GCN or nvptx plugins are available but 'libhsa-runtime64.so.1' or 'libcuda.so.1' are not, and (b) those are available, but the corresponding devices are not.) This resolves the issue that we've got execution test cases unexpectedly PASSing, despite: libgomp: GCN fatal error: Run-time could not be initialized Runtime message: HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events. ..., and therefore they were not offloaded to the GCN device, but ran in host-fallback execution mode. What happend in that scenario is that in 'init_hsa_context' during the initial 'GOMP_OFFLOAD_get_num_devices' we ran into 'HSA_STATUS_ERROR_OUT_OF_RESOURCES', but it wasn't fatal, but just silently disabled the libgomp plugin/device. Especially "entertaining" were cases where such unintended host-fallback execution happened during effective-target checks like 'offload_device_available' (host-fallback execution there meaning: no offload device available), but actual test cases then were running with an offload device available, and therefore mis-configured. include/ * cuda/cuda.h (CUresult): Add 'CUDA_ERROR_NO_DEVICE'. libgomp/ * plugin/plugin-gcn.c (init_hsa_context): Add and handle 'bool probe' parameter. Adjust all users; errors during device probing are fatal. * plugin/plugin-nvptx.c (nvptx_get_num_devices): Aside from 'CUDA_ERROR_NO_DEVICE', errors during device probing are fatal.
2024-03-25amdgcn: Add gfx1036 targetRichard Biener1-0/+8
Add support for the gfx1036 RDNA2 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. gcc/ChangeLog: * config.gcc (amdgcn): Add gfx1036 entries. * config/gcn/gcn-hsa.h (NO_XNACK): Likewise. (gcn_local_sym_hash): Likewise. * config/gcn/gcn-opts.h (enum processor_type): Likewise. (TARGET_GFX1036): New macro. * config/gcn/gcn.cc (gcn_option_override): Handle gfx1036. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1036__. (TARGET_CPU_CPP_BUILTINS): Rename __gfx1030 to __gfx1030__. * config/gcn/gcn.opt: Add gfx1036. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1036): New. (main): Handle gfx1036. * config/gcn/t-omp-device: Add gfx1036 isa. * doc/install.texi (amdgcn): Add gfx1036. * doc/invoke.texi (-march): Likewise. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1036. (gcn_gfx1103_s): New. (isa_hsa_name): Handle gfx1036. (isa_code): Likewise. (max_isa_vgprs): Likewise.
2024-03-22amdgcn: Add gfx1103 targetAndrew Stubbs1-1/+9
Add support for the gfx1103 RDNA3 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. gcc/ChangeLog: * config.gcc (amdgcn): Add gfx1103 entries. * config/gcn/gcn-hsa.h (NO_XNACK): Likewise. (gcn_local_sym_hash): Likewise. * config/gcn/gcn-opts.h (enum processor_type): Likewise. (TARGET_GFX1103): New macro. * config/gcn/gcn.cc (gcn_option_override): Handle gfx1103. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. (gcn_hsa_declare_function_name): Use TARGET_RDNA3, not just gfx1100. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1103__. * config/gcn/gcn.opt: Add gfx1103. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1103): New. (main): Handle gfx1103. * config/gcn/t-omp-device: Add gfx1103 isa. * doc/install.texi (amdgcn): Add gfx1103. * doc/invoke.texi (-march): Likewise. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1103. (gcn_gfx1103_s): New. (isa_hsa_name): Handle gfx1103. (isa_code): Likewise. (max_isa_vgprs): Likewise.
2024-03-08GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable ↵Thomas Schwinge1-9/+5
(non-shared memory system) 'GCN_SUPPRESS_HOST_FALLBACK' originated as 'HSA_SUPPRESS_HOST_FALLBACK' in the libgomp HSA plugin, where the idea was -- in my understanding -- that you wouldn't have device code available for all functions that may be called, and in that case transparently (shared memory system!) do host-fallback execution. Or, with 'HSA_SUPPRESS_HOST_FALLBACK' set, you'd get those diagnosed. This has then been copied into the libgomp GCN plugin as 'GCN_SUPPRESS_HOST_FALLBACK'. However, the original meaning isn't applicable for the libgomp GCN plugin anymore: we assume that we're generating device code for all relevant functions, and we're implementing a non-shared memory system, where we cannot transparently do host-fallback execution for individual functions. However, 'GCN_SUPPRESS_HOST_FALLBACK' has gained an additional meaning, to enforce a fatal error in case that 'libhsa-runtime64.so.1' can't be dynamically loaded; keep that meaning. libgomp/ * plugin/plugin-gcn.c (GOMP_OFFLOAD_can_run): Don't consider 'GCN_SUPPRESS_HOST_FALLBACK' anymore (assume always-'true'). (init_hsa_context): Adjust 'GCN_SUPPRESS_HOST_FALLBACK' error message.
2024-03-08nvptx: 'cuDeviceGetCount' failure is fatalThomas Schwinge1-1/+1
Per commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 "OpenMP: Move omp requires checks to libgomp", we're now using 'return -1' from 'GOMP_OFFLOAD_get_num_devices' for 'omp_requires_mask' purposes. This missed that via 'nvptx_get_num_devices', we could also 'return -1' for 'cuDeviceGetCount' failure. Before, this meant (in 'gomp_target_init') to silently ignore the plugin/device -- which also has been doubtful behavior. Let's instead turn 'cuDeviceGetCount' failure into a fatal error, similar to other errors during device initialization. libgomp/ * plugin/plugin-nvptx.c (nvptx_get_num_devices): 'cuDeviceGetCount' failure is fatal.
2024-03-08GCN, nvptx: Fatal error for missing symbols in 'libhsa-runtime64.so.1', ↵Thomas Schwinge2-2/+3
'libcuda.so.1' If 'libhsa-runtime64.so.1', 'libcuda.so.1' are not available, the corresponding libgomp plugin/device gets disabled, as before. But if they are available, report any inconsistencies such as missing symbols, similar to how we fail in presence of other issues during device initialization. libgomp/ * plugin/plugin-gcn.c (init_hsa_runtime_functions): Fatal error for missing symbols. * plugin/plugin-nvptx.c (init_cuda_lib): Likewise.
2024-01-26Avoid registering unsupported OMP offload devicesRichard Biener1-0/+14
The following avoids registering unsupported GCN offload devices when iterating over available ones. With a Zen4 desktop CPU you will have an IGPU (unspported) which will otherwise be made available. This causes testcases like libgomp.c-c++-common/non-rect-loop-1.c which iterate over all decives to FAIL. libgomp/ * plugin/plugin-gcn.c (suitable_hsa_agent_p): Filter out agents with unsupported ISA.
2024-01-26Fix architecture support in OMP_OFFLOAD_init_device for gcnRichard Biener1-2/+3
The following makes the existing architecture support check work instead of being optimized away (enum vs. -1). This avoids later asserts when we assume such devices are never actually used. libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH::EF_AMDGPU_MACH_UNSUPPORTED): Add. (isa_code): Return that instead of -1. (GOMP_OFFLOAD_init_device): Adjust.
2024-01-26amdgcn: additional gfx1030/gfx1100 supportAndrew Stubbs1-2/+4
This is enough to get gfx1030 and gfx1100 working; there are still some test failures to investigate, and probably some tuning to do. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_PACKED_WORK_ITEMS): Add TARGET_RDNA3. * config/gcn/gcn-valu.md (all_convert): New iterator. (<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): New define_expand, and rename the old one to ... (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): ... this. (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>2<exec>): Likewise, to ... (extend<V_INT_1REG_ALT:mode><V_INT_1REG:mode>_sdwa<exec>): .. this. (*<convop><V_INT_1REG_ALT:mode><V_INT_1REG:mode>_shift<exec>): New. * config/gcn/gcn.cc (gcn_global_address_p): Use "offsetbits" correctly. (gcn_hsa_declare_function_name): Update the vgpr counting for gfx1100. * config/gcn/gcn.md (<u>mulhisi3): Disable on RDNA3. (<u>mulqihi3_scalar): Likewise. libgcc/ChangeLog: * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Handle RDNA3. libgomp/ChangeLog: * config/gcn/time.c (RTC_TICKS): Configure RDNA3. (omp_get_wtime): Add RDNA3-compatible variant. * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. Signed-off-by: Andrew Stubbs <ams@baylibre.com>
2024-01-08GCN: Add pre-initial support for gfx1100: 'EF_AMDGPU_MACH_AMDGCN_GFX1100'Thomas Schwinge1-1/+2
../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_hsa_name’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1666 | case EF_AMDGPU_MACH_AMDGCN_GFX1100: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | EF_AMDGPU_MACH_AMDGCN_GFX1030 ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1666:10: note: each undeclared identifier is reported only once for each function it appears in ../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘isa_code’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1711:12: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1711 | return EF_AMDGPU_MACH_AMDGCN_GFX1100; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | EF_AMDGPU_MACH_AMDGCN_GFX1030 ../../../source-gcc/libgomp/plugin/plugin-gcn.c: In function ‘max_isa_vgprs’: ../../../source-gcc/libgomp/plugin/plugin-gcn.c:1728:10: error: ‘EF_AMDGPU_MACH_AMDGCN_GFX1100’ undeclared (first use in this function); did you mean ‘EF_AMDGPU_MACH_AMDGCN_GFX1030’? 1728 | case EF_AMDGPU_MACH_AMDGCN_GFX1100: | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | EF_AMDGPU_MACH_AMDGCN_GFX1030 make[4]: *** [Makefile:813: libgomp_plugin_gcn_la-plugin-gcn.lo] Error 1 Fix-up for commit 52a2c659ae6c21f84b6acce0afcb9b93b9dc71a0 "GCN: Add pre-initial support for gfx1100". libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add 'EF_AMDGPU_MACH_AMDGCN_GFX1100'.
2024-01-08OpenMP: Support accelerated 2D/3D memory copies for AMD GCNJulian Brown1-0/+362
This patch adds support for 2D/3D memory copies for omp_target_memcpy_rect using AMD extensions to the HSA API. This is just the AMD GCN-specific part of the following patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631001.html 2024-01-04 Julian Brown <julian@codesourcery.com> libgomp/ * plugin/plugin-gcn.c (hsa_runtime_fn_info): Add hsa_amd_memory_lock_fn, hsa_amd_memory_unlock_fn, hsa_amd_memory_async_copy_rect_fn function pointers. (init_hsa_runtime_functions): Add above functions, with DLSYM_OPT_FN. (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New functions.
2024-01-08GCN: Add pre-initial support for gfx1100Tobias Burnus1-1/+8
ROCm since 5.7.1 supports gfx1100 (RDNA3) cards. This commit adds support for it, mostly by assuming gfx1100 behaves identical to gfx1030. Like gfx1030, gfx1100 support is neither documented nor the build of the multilib enabled by default. But contrary to gfx1030, gfx1100 has a known issue causing some libraries not to build, including newlib: The sdwa variant of v_mov_b32_sdwa is not supported by the hardware but GCC current does generates this instruction. This will be addressed in a later commit. gcc/ChangeLog: * config.gcc (amdgcn-*-amdhsa): Accept --with-arch=gfx1100. * config/gcn/gcn-hsa.h (NO_XNACK): Add gfx1100: (ASM_SPEC): Handle gfx1100. * config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1100. (enum gcn_isa): Add ISA_RDNA3. (TARGET_GFX1100, TARGET_RDNA2_PLUS, TARGET_RDNA3): Define. * config/gcn/gcn-valu.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS. * config/gcn/gcn.cc (gcn_option_override, gcn_omp_device_kind_arch_isa, output_file_start): Handle gfx1100. (gcn_global_address_p, gcn_addr_space_legitimate_address_p): Change TARGET_RDNA2 to TARGET_RDNA2_PLUS. (gcn_hsa_declare_function_name): Don't use '.amdhsa_reserve_flat_scratch' with gfx1100. * config/gcn/gcn.h (ASSEMBLER_DIALECT): Likewise. (TARGET_CPU_CPP_BUILTINS): Define __RDNA3__, __gfx1030__ and __gfx1100__. * config/gcn/gcn.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS. * config/gcn/gcn.opt (Enum gpu_type): Add gfx1100. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1100): Define. (isa_has_combined_avgprs, main): Handle gfx1100. * config/gcn/t-omp-device (isa): Add gfx1100. libgomp/ChangeLog: * plugin/plugin-gcn.c (gcn_gfx1100_s): New const string. (gcn_isa_name_len): Fix length. (isa_hsa_name, isa_code, max_isa_vgprs): Handle gfx1100.
2024-01-03Update copyright years.Jakub Jelinek4-4/+4
2023-12-20OpenMP, NVPTX: memcpy[23]D bias correctionJulian Brown1-0/+67
This patch works around behaviour of the 2D and 3D memcpy operations in the CUDA driver runtime. Particularly in Fortran, the "base pointer" of an array (used for either source or destination of a host/device copy) may lie outside of data that is actually stored on the device. The fix is to make sure that we use the first element of data to be transferred instead, and adjust parameters accordingly. 2023-10-02 Julian Brown <julian@codesourcery.com> libgomp/ * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to avoid out-of-bounds array checks in CUDA runtime. (GOMP_OFFLOAD_memcpy3d): Likewise. * testsuite/libgomp.c-c++-common/memcpyxd-bias-1.c: New test.
2023-12-06amdgcn, libgomp: low-latency allocatorAndrew Stubbs1-6/+29
This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing (which supports both memories). A future patch will re-enable "global" instructions for cases where it is known to be safe to do so. gcc/ChangeLog: * config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in. * config/gcn/gcn.cc (gcn_init_machine_status): Disable global addressing. (gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR. libgomp/ChangeLog: * config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here. (TEAM_ARENA_FREE): Likewise. (TEAM_ARENA_END): Likewise. (GCN_LOWLAT_HEAP): New. * config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h. (__gcn_lowlat_init): New prototype. (gomp_gcn_enter_kernel): Initialize the low-latency heap. * libgomp.h (TEAM_ARENA_START): Move to libgomp.h. (TEAM_ARENA_FREE): Likewise. (TEAM_ARENA_END): Likewise. * plugin/plugin-gcn.c (lowlat_size): New variable. (print_kernel_dispatch): Label the group_segment_size purpose. (init_environment_variables): Read GOMP_GCN_LOWLAT_POOL. (create_kernel_dispatch): Pass low-latency head allocation to kernel. (run_kernel): Use shadow; don't assume values. * testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn. * config/gcn/allocator.c: New file. * libgomp.texi: Document low-latency implementation details.
2023-12-06libgomp, nvptx: low-latency memory allocatorAndrew Stubbs1-1/+22
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using the GOMP_NVPTX_LOWLAT_POOL environment variable. The use of the PTX dynamic_smem_size feature means that low-latency allocator will not work with the PTX 3.1 multilib. For now, the omp_low_lat_mem_alloc allocator also works, but that will change when I implement the access traits. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): New macro. (MEMSPACE_CALLOC): New macro. (MEMSPACE_REALLOC): New macro. (MEMSPACE_FREE): New macro. (predefined_alloc_mapping): New array. Add _Static_assert to match. (ARRAY_SIZE): New macro. (omp_aligned_alloc): Use MEMSPACE_ALLOC. Implement fall-backs for predefined allocators. Simplify existing fall-backs. (omp_free): Use MEMSPACE_FREE. (omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for predefined allocators. Simplify existing fall-backs. (omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE. Implement fall-backs for predefined allocators. Simplify existing fall-backs. * config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable. (__nvptx_lowlat_init): New prototype. (gomp_nvptx_main): Call __nvptx_lowlat_init. * libgomp.texi: Update memory space table. * plugin/plugin-nvptx.c (lowlat_pool_size): New variable. (GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar. (GOMP_OFFLOAD_run): Apply lowlat_pool_size. * basic-allocator.c: New file. * config/nvptx/allocator.c: New file. * testsuite/libgomp.c/omp_alloc-1.c: New test. * testsuite/libgomp.c/omp_alloc-2.c: New test. * testsuite/libgomp.c/omp_alloc-3.c: New test. * testsuite/libgomp.c/omp_alloc-4.c: New test. * testsuite/libgomp.c/omp_alloc-5.c: New test. * testsuite/libgomp.c/omp_alloc-6.c: New test. Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-11-15amdgcn: Add Accelerator VGPR registersAndrew Stubbs1-2/+22
Add the new CDNA register file. We don't support any of the specialized instructions that use these registers, but they're useful to relieve register pressure without spilling to stack. Co-authored-by: Andrew Jenner <andrew@codesourcery.com> gcc/ChangeLog: * config/gcn/constraints.md: Add "a" AVGPR constraint. * config/gcn/gcn-valu.md (*mov<mode>): Add AVGPR alternatives. (*mov<mode>_4reg): Likewise. (@mov<mode>_sgprbase): Likewise. (gather<mode>_insn_1offset<exec>): Likewise. (gather<mode>_insn_1offset_ds<exec>): Likewise. (gather<mode>_insn_2offsets<exec>): Likewise. (scatter<mode>_expr<exec_scatter>): Likewise. (scatter<mode>_insn_1offset_ds<exec_scatter>): Likewise. (scatter<mode>_insn_2offsets<exec_scatter>): Likewise. * config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define. (gcn_class_max_nregs): Handle AVGPR_REGS and ALL_VGPR_REGS. (gcn_hard_regno_mode_ok): Likewise. (gcn_regno_reg_class): Likewise. (gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA1_PLUS. (gcn_sgpr_move_p): Handle AVGPRs. (gcn_secondary_reload): Reload AVGPRs via VGPRs. (gcn_conditional_register_usage): Handle AVGPRs. (gcn_vgpr_equivalent_register_operand): New function. (gcn_valid_move_p): Check for validity of AVGPR moves. (gcn_compute_frame_offsets): Handle AVGPRs. (gcn_memory_move_cost): Likewise. (gcn_register_move_cost): Likewise. (gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI. (gcn_md_reorg): Handle AVGPRs. (gcn_hsa_declare_function_name): Likewise. (print_reg): Likewise. (gcn_dwarf_register_number): Likewise. * config/gcn/gcn.h (FIRST_AVGPR_REG): Define. (AVGPR_REGNO): Define. (LAST_AVGPR_REG): Define. (SOFT_ARG_REG): Update. (FRAME_POINTER_REGNUM): Update. (DWARF_LINK_REGISTER): Update. (FIRST_PSEUDO_REGISTER): Update. (AVGPR_REGNO_P): Define. (enum reg_class): Add AVGPR_REGS and ALL_VGPR_REGS. (REG_CLASS_CONTENTS): Add new register classes and add entries for AVGPRs to all classes. (REGISTER_NAMES): Add AVGPRs. * config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define. (AP_REGNUM, FP_REGNUM): Update. (define_attr "type"): Add vop3p_mai. (define_attr "unit"): Handle vop3p_mai. (define_attr "gcn_version"): Add "cdna2". (define_attr "enabled"): Handle cdna2. (*mov<mode>_insn): Add AVGPR alternatives. (*movti_insn): Likewise. * config/gcn/mkoffload.cc (isa_has_combined_avgprs): New. (process_asm): Process avgpr_count. * config/gcn/predicates.md (gcn_avgpr_register_operand): New. (gcn_avgpr_hard_register_operand): New. * doc/md.texi: Document the "a" constraint. gcc/testsuite/ChangeLog: * gcc.target/gcn/avgpr-mem-double.c: New test. * gcc.target/gcn/avgpr-mem-int.c: New test. * gcc.target/gcn/avgpr-mem-long.c: New test. * gcc.target/gcn/avgpr-mem-short.c: New test. * gcc.target/gcn/avgpr-spill-double.c: New test. * gcc.target/gcn/avgpr-spill-int.c: New test. * gcc.target/gcn/avgpr-spill-long.c: New test. * gcc.target/gcn/avgpr-spill-short.c: New test. libgomp/ChangeLog: * plugin/plugin-gcn.c (max_isa_vgprs): New. (run_kernel): CDNA2 devices have more VGPRs.
2023-11-07openmp: Add support for the 'indirect' clause in C/C++Kwok Cheung Yeung2-3/+148
This adds support for the 'indirect' clause in the 'declare target' directive. Functions declared as indirect may be called via function pointers passed from the host in offloaded code. Virtual calls to member functions via the object pointer in C++ are currently not supported in target regions. 2023-11-07 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/c-family/ * c-attribs.cc (c_common_attribute_table): Add attribute for indirect functions. * c-pragma.h (enum parma_omp_clause): Add entry for indirect clause. gcc/c/ * c-decl.cc (c_decl_attributes): Add attribute for indirect functions. * c-lang.h (c_omp_declare_target_attr): Add indirect field. * c-parser.cc (c_parser_omp_clause_name): Handle indirect clause. (c_parser_omp_clause_indirect): New. (c_parser_omp_all_clauses): Handle indirect clause. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_begin): Handle indirect clause. * c-typeck.cc (c_finish_omp_clauses): Handle indirect clause. gcc/cp/ * cp-tree.h (cp_omp_declare_target_attr): Add indirect field. * decl2.cc (cplus_decl_attributes): Add attribute for indirect functions. * parser.cc (cp_parser_omp_clause_name): Handle indirect clause. (cp_parser_omp_clause_indirect): New. (cp_parser_omp_all_clauses): Handle indirect clause. (handle_omp_declare_target_clause): Add extra parameter. Add indirect attribute for indirect functions. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_begin): Handle indirect clause. * semantics.cc (finish_omp_clauses): Handle indirect clause. gcc/ * lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect functions. (output_offload_tables): Write indirect functions. (input_offload_tables): read indirect functions. * lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. * omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New. * omp-offload.cc (offload_ind_funcs): New. (omp_discover_implicit_declare_target): Add functions marked with 'omp declare target indirect' to indirect functions list. (omp_finish_file): Add indirect functions to section for offload indirect functions. (execute_omp_device_lower): Redirect indirect calls on target by passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR. (pass_omp_device_lower::gate): Run pass_omp_device_lower if indirect functions are present on an accelerator device. * omp-offload.h (offload_ind_funcs): New. * tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT. * tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_INDIRECT_EXPR): New. * config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs section. Count number of indirect functions. (process_obj): Emit number of indirect functions. * config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New. (process): Emit offload_ind_func_table in PTX code. Emit indirect function names and count in image. * config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark indirect functions in PTX code with IND_FUNC_MAP. gcc/testsuite/ * c-c++-common/gomp/declare-target-7.c: Update expected error message. * c-c++-common/gomp/declare-target-indirect-1.c: New. * c-c++-common/gomp/declare-target-indirect-2.c: New. * g++.dg/gomp/attrs-21.C (v12): Update expected error message. * g++.dg/gomp/declare-target-indirect-1.C: New. * gcc.dg/gomp/attrs-21.c (v12): Update expected error message. include/ * gomp-constants.h (GOMP_VERSION): Increment to 3. (GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New. libgcc/ * offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. (__offload_ind_func_table): New. (__offload_ind_funcs_end): New. (__OFFLOAD_TABLE__): Add entries for indirect functions. libgomp/ * Makefile.am (libgomp_la_SOURCES): Add target-indirect.c. * Makefile.in: Regenerate. * libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define. (GOMP_OFFLOAD_load_image): Add extra argument. * libgomp.h (struct indirect_splay_tree_key_s): New. (indirect_splay_tree_node, indirect_splay_tree, indirect_splay_tree_key): New. (indirect_splay_compare): New. * libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr. * libgomp.texi (OpenMP 5.1): Update documentation on indirect calls in target region and on indirect clause. (Other new OpenMP 5.2 features): Add entry for virtual function calls. * libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype. * oacc-host.c (host_load_image): Add extra argument. * target.c (gomp_load_image_to_device): If the GOMP_VERSION is high enough, read host indirect functions table and pass to load_image_func. * config/accel/target-indirect.c: New. * config/linux/target-indirect.c: New. * config/gcn/team.c (build_indirect_map): Add prototype. (gomp_gcn_enter_kernel): Initialize support for indirect function calls on GCN target. * config/nvptx/team.c (build_indirect_map): Add prototype. (gomp_nvptx_main): Initialize support for indirect function calls on NVPTX target. * plugin/plugin-gcn.c (struct gcn_image_desc): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, build address translation table and copy it to target memory. * plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, Build address translation table and copy it to target memory. * testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New. * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New. * testsuite/libgomp.c++/declare-target-indirect-1.C: New.
2023-10-20amdgcn: add -march=gfx1030 EXPERIMENTALAndrew Stubbs1-1/+8
Accept the architecture configure option and resolve build failures. This is enough to build binaries, but I've not got a device to test it on, so there are probably runtime issues to fix. The cache control instructions might be unsafe (or too conservative), and the kernel metadata might be off. Vector reductions will need to be reworked for RDNA2. In principle, it would be better to use wavefrontsize32 for this architecture, but that would mean switching everything to allow SImode masks, so wavefrontsize64 it is. The multilib is not included in the default configuration so either configure --with-arch=gfx1030 or include it in --with-multilib-list=gfx1030,.... The majority of this patch has no effect on other devices, but changing from using scalar writes for the exit value to vector writes means we don't need the scalar cache write-back instruction anywhere (which doesn't exist in RDNA2). gcc/ChangeLog: * config.gcc: Allow --with-arch=gfx1030. * config/gcn/gcn-hsa.h (NO_XNACK): gfx1030 does not support xnack. (ASM_SPEC): gfx1030 needs -mattr=+wavefrontsize64 set. * config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1030. (TARGET_GFX1030): New. (TARGET_RDNA2): New. * config/gcn/gcn-valu.md (@dpp_move<mode>): Disable for RDNA2. (addc<mode>3<exec_vcc>): Add RDNA2 syntax variant. (subc<mode>3<exec_vcc>): Likewise. (<convop><mode><vndi>2_exec): Add RDNA2 alternatives. (vec_cmp<mode>di): Likewise. (vec_cmp<u><mode>di): Likewise. (vec_cmp<mode>di_exec): Likewise. (vec_cmp<u><mode>di_exec): Likewise. (vec_cmp<mode>di_dup): Likewise. (vec_cmp<mode>di_dup_exec): Likewise. (reduc_<reduc_op>_scal_<mode>): Disable for RDNA2. (*<reduc_op>_dpp_shr_<mode>): Likewise. (*plus_carry_dpp_shr_<mode>): Likewise. (*plus_carry_in_dpp_shr_<mode>): Likewise. * config/gcn/gcn.cc (gcn_option_override): Recognise gfx1030. (gcn_global_address_p): RDNA2 only allows smaller offsets. (gcn_addr_space_legitimate_address_p): Likewise. (gcn_omp_device_kind_arch_isa): Recognise gfx1030. (gcn_expand_epilogue): Use VGPRs instead of SGPRs. (output_file_start): Configure gfx1030. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __RDNA2__; (ASSEMBLER_DIALECT): New. * config/gcn/gcn.md (rdna): New define_attr. (enabled): Use "rdna" attribute. (gcn_return): Remove s_dcache_wb. (addcsi3_scalar): Add RDNA2 syntax variant. (addcsi3_scalar_zero): Likewise. (addptrdi3): Likewise. (mulsi3): v_mul_lo_i32 should be v_mul_lo_u32 on all ISA. (*memory_barrier): Add RDNA2 syntax variant. (atomic_load<mode>): Add RDNA2 cache control variants, and disable scalar atomics for RDNA2. (atomic_store<mode>): Likewise. (atomic_exchange<mode>): Likewise. * config/gcn/gcn.opt (gpu_type): Add gfx1030. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1030): New. (main): Recognise -march=gfx1030. * config/gcn/t-omp-device: Add gfx1030 isa. libgcc/ChangeLog: * config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Set false for __RDNA2__. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH_AMDGCN_GFX1030): New. (isa_hsa_name): Recognise gfx1030. (isa_code): Likewise. * team.c (defined): Remove s_endpgm.
2023-07-29libgomp: cuda.h and omp_target_memcpy_rect cleanupTobias Burnus1-2/+4
Fixes for commit r14-2792-g25072a477a56a727b369bf9b20f4d18198ff5894 "OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect", namely: In that commit, the code was changed to handle shared-memory devices; however, as pointed out, omp_target_memcpy_check already set the pointer to NULL in that case. Hence, this commit reverts to the prior version. In cuda.h, it adds cuMemcpyPeer{,Async} for symmetry for cuMemcpy3DPeer (all currently unused) and in three structs, fixes reserved-member names and remove a bogus 'const' in three structs. And it changes a DLSYM to DLSYM_OPT as not all plugins support the new functions, yet. include/ChangeLog: * cuda/cuda.h (CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): Remove bogus 'const' from 'const void *dst' and fix reserved-name name in those structs. (cuMemcpyPeer, cuMemcpyPeerAsync): Add. libgomp/ChangeLog: * target.c (omp_target_memcpy_rect_worker): Undo dim=1 change for GOMP_OFFLOAD_CAP_SHARED_MEM. (omp_target_memcpy_rect_copy): Likewise for lock condition. (gomp_load_plugin_for_device): Use DLSYM_OPT not DLSYM for memcpy3d/memcpy2d. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): Use memset 0 to nullify reserved and unused src/dst fields for that mem type; remove '{src,dst}LOD = 0'.
2023-07-26OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rectTobias Burnus2-0/+119
When copying a 2D or 3D rectangular memmory block, the performance is better when using CUDA's cuMemcpy2D/cuMemcpy3D instead of copying the data one by one. That's what this commit does. Additionally, it permits device-to-device copies, if neccessary using a temporary variable on the host. include/ChangeLog: * cuda/cuda.h (CUlimit): Add CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_INVALID_HANDLE. (CUarray, CUmemorytype, CUDA_MEMCPY2D, CUDA_MEMCPY3D, CUDA_MEMCPY3D_PEER): New typdefs. (cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpy3DPeer, cuMemcpy3DPeerAsync): New prototypes. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New prototypes. * libgomp.h (struct gomp_device_descr): Add memcpy2d_func and memcpy3d_func. * libgomp.texi (nvtpx): Document when cuMemcpy2D/cuMemcpy3D is used. * oacc-host.c (memcpy2d_func, .memcpy3d_func): Init with NULL. * plugin/cuda-lib.def (cuMemcpy2D, cuMemcpy2DUnaligned, cuMemcpy3D): Invoke via CUDA_ONE_CALL. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d, GOMP_OFFLOAD_memcpy3d): New. * target.c (omp_target_memcpy_rect_worker): (omp_target_memcpy_rect_check, omp_target_memcpy_rect_copy): Permit all device-to-device copyies; invoke new plugins for 2D and 3D copying when available. (gomp_load_plugin_for_device): DLSYM the new plugin functions. * testsuite/libgomp.c/target-12.c: Fix dimension bug. * testsuite/libgomp.fortran/target-12.f90: Likewise. * testsuite/libgomp.fortran/target-memcpy-rect-1.f90: New test.
2023-06-06libgomp: plugin-gcn - support 'unified_address'Tobias Burnus1-1/+3
Effectively, for GCN (as for nvptx) there is a common address space between host and device, whether being accessible or not. Thus, this commit permits to use 'omp requires unified_address' with GCN devices. (nvptx accepts this requirement since r13-3460-g131d18e928a3ea.) libgomp/ * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Regard unified_address requirement as supported. * libgomp.texi (OpenMP 5.0, AMD Radeon, nvptx): Remove 'unified_address' from the not-supported requirements.
2023-05-08libgomp: Simplify OpenMP reverse offload host <-> device memory copy ↵Thomas Schwinge2-33/+46
implementation ... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it. Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609 "libgomp/nvptx: Prepare for reverse-offload callback handling", and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8 "libgomp: Handle OpenMP's reverse offloads". libgomp/ * target.c (gomp_target_rev): Instead of 'dev_to_host_cpy', 'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'. * libgomp.h (gomp_target_rev): Adjust. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust. * plugin/plugin-gcn.c (process_reverse_offload): Adjust. * plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy) (rev_off_host_to_dev_cpy): Remove. (GOMP_OFFLOAD_run): Adjust.
2023-03-10Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs [PR90596]Thomas Schwinge2-181/+20
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec', 'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in particular conceptually: no more device memory allocation, host to device data copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for us. This depends on commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd "Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data", where I said that "a use will emerge later", which is this one here. PR libgomp/90596 libgomp/ * target.c (gomp_map_vars_internal): Allow for 'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'. * oacc-parallel.c (GOACC_parallel_keyed): Pass 'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'. * plugin/plugin-gcn.c (alloc_by_agent, gcn_exec) (GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify. (gomp_offload_free): Remove. * plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify. (cuda_free_argmem): Remove. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Adjust.
2023-03-10Simplify OpenACC 'no_create' clause implementationThomas Schwinge2-20/+17
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then simplify the device plugins accordingly. This is a follow-up to Subversion r279551 (Git commit a6163563f2ce502bd4ef444bd5de33570bb8eeb1) "Add OpenACC 2.6's no_create", Subversion r279622 (Git commit 5bcd470bf0749e1f56d05dd43aa9584ff2e3a090) "Use gomp_map_val for OpenACC host-to-device address translation". libgomp/ * target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for 'GOMP_MAP_IF_PRESENT'. * plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Adjust. * plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Likewise. * testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async' testing. * testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
2023-03-10Fix OpenACC/GCN 'acc_ev_enqueue_launch_end' positionThomas Schwinge1-11/+12
For an OpenACC compute construct, we've currently got: - [...] - acc_ev_enqueue_launch_start - launch kernel - free memory - acc_ev_free - acc_ev_enqueue_launch_end This confused another thing that I'm working on, so I adjusted that to: - [...] - acc_ev_enqueue_launch_start - launch kernel - acc_ev_enqueue_launch_end - free memory - acc_ev_free Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'. libgomp/ * plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end' position. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Verify 'acc_ev_alloc', 'acc_ev_free'.
2023-02-03libgomp: enable reverse offload for AMDGCNTobias Burnus1-1/+2
libgomp/ChangeLog: * libgomp.texi (5.0 Impl. Status, gcn specifics): Update for reverse offload. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept reverse-offload requirement.
2023-02-02amdgcn, libgomp: Manually allocated stacksAndrew Stubbs1-115/+137
Switch from using stacks in the "private segment" to using a memory block allocated on the host side. The primary reason is to permit the reverse offload implementation to access values located on the device stack, but there may also be performance benefits, especially with repeated kernel invocations. This implementation unifies the stacks with the "team arena" optimization feature, and now allows both to have run-time configurable sizes. A new ABI is needed, so all libraries must be rebuilt, and newlib must be version 4.3.0.20230120 or newer. gcc/ChangeLog: * config/gcn/gcn-run.cc: Include libgomp-gcn.h. (struct kernargs): Replace the common content with kernargs_abi. (struct heap): Delete. (main): Read GCN_STACK_SIZE envvar. Allocate space for the device stacks. Write the new kernargs fields. * config/gcn/gcn.cc (gcn_option_override): Remove stack_size_opt. (default_requested_args): Remove PRIVATE_SEGMENT_BUFFER_ARG and PRIVATE_SEGMENT_WAVE_OFFSET_ARG. (gcn_addr_space_convert): Mask the QUEUE_PTR_ARG content. (gcn_expand_prologue): Move the TARGET_PACKED_WORK_ITEMS to the top. Set up the stacks from the values in the kernargs, not private. (gcn_expand_builtin_1): Match the stack configuration in the prologue. (gcn_hsa_declare_function_name): Turn off the private segment. (gcn_conditional_register_usage): Ensure QUEUE_PTR is fixed. * config/gcn/gcn.h (FIXED_REGISTERS): Fix the QUEUE_PTR register. * config/gcn/gcn.opt (mstack-size): Change the description. include/ChangeLog: * gomp-constants.h (GOMP_VERSION_GCN): Bump. libgomp/ChangeLog: * config/gcn/libgomp-gcn.h (DEFAULT_GCN_STACK_SIZE): New define. (DEFAULT_TEAM_ARENA_SIZE): New define. (struct heap): Move to this file. (struct kernargs_abi): Likewise. * config/gcn/team.c (gomp_gcn_enter_kernel): Use team arena size from the kernargs. * libgomp.h: Include libgomp-gcn.h. (TEAM_ARENA_SIZE): Remove. (team_malloc): Update the error message. * plugin/plugin-gcn.c (struct kernargs): Move common content to struct kernargs_abi. (struct agent_info): Rename team arenas to ephemeral memories. (struct team_arena_list): Rename .... (struct ephemeral_memories_list): to this. (struct heap): Delete. (team_arena_size): New variable. (stack_size): New variable. (print_kernel_dispatch): Update debug messages. (init_environment_variables): Read GCN_TEAM_ARENA_SIZE. Read GCN_STACK_SIZE. (get_team_arena): Rename ... (configure_ephemeral_memories): ... to this, and set up stacks. (release_team_arena): Rename ... (release_ephemeral_memories): ... to this. (destroy_team_arenas): Rename ... (destroy_ephemeral_memories): ... to this. (create_kernel_dispatch): Add num_threads parameter. Adjust for kernargs_abi refactor and ephemeral memories. (release_kernel_dispatch): Adjust for ephemeral memories. (run_kernel): Pass thread-count to create_kernel_dispatch. (GOMP_OFFLOAD_init_device): Adjust for ephemeral memories. (GOMP_OFFLOAD_fini_device): Adjust for ephemeral memories. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr47237.c: Xfail on amdgcn. * gcc.dg/builtin-apply3.c: Xfail for amdgcn. * gcc.dg/builtin-apply4.c: Xfail for amdgcn. * gcc.dg/torture/stackalign/builtin-apply-3.c: Xfail for amdgcn. * gcc.dg/torture/stackalign/builtin-apply-4.c: Xfail for amdgcn.
2023-01-16Update copyright years.Jakub Jelinek4-4/+4
2022-12-10libgomp: Handle OpenMP's reverse offloadsTobias Burnus1-1/+3
This commit enabled reverse offload for nvptx such that gomp_target_rev actually gets called. And it fills the latter function to do all of the following: finding the host function to the device func ptr and copying the arguments to the host, processing the mapping/firstprivate, calling the host function, copying back the data and freeing as needed. The data handling is made easier by assuming that all host variables either existed before (and are in the mapping) or that those are devices variables not yet available on the host. Thus, the reverse mapping can do without refcounts etc. Note that the spec disallows inside a target region device-affecting constructs other than target plus ancestor device-modifier and it also limits the clauses permitted on this construct. For the function addresses, an additional splay tree is used; for the lookup of mapped variables, the existing splay-tree is used. Unfortunately, its data structure requires a full walk of the tree; Additionally, the just mapped variables are recorded in a separate data structure an extra lookup. While the lookup is slow, assuming that only few variables get mapped in each reverse offload construct and that reverse offload is the exception and not performance critical, this seems to be acceptable. libgomp/ChangeLog: * libgomp.h (struct target_mem_desc): Predeclare; move below after 'reverse_splay_tree_node' and add rev_array member. (struct reverse_splay_tree_key_s, reverse_splay_compare): New. (reverse_splay_tree_node, reverse_splay_tree, reverse_splay_tree_key): New typedef. (struct gomp_device_descr): Add mem_map_rev member. * oacc-host.c (host_dispatch): NULL init .mem_map_rev. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim support for GOMP_REQUIRES_REVERSE_OFFLOAD. * splay-tree.h (splay_tree_callback_stop): New typedef; like splay_tree_callback but returning int not void. (splay_tree_foreach_lazy): Define; like splay_tree_foreach but taking splay_tree_callback_stop as argument. * splay-tree.c (splay_tree_foreach_internal_lazy, splay_tree_foreach_lazy): New; but early exit if callback returns nonzero. * target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'. (gomp_map_lookup_rev): New. (gomp_load_image_to_device): Handle reverse-offload function lookup table. (gomp_unload_image_from_device): Free devicep->mem_map_rev. (struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup, gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int, gomp_map_cdata_lookup): New auxiliary structs and functions for gomp_target_rev. (gomp_target_rev): Implement reverse offloading and its mapping. (gomp_target_init): Init current_device.mem_map_rev.root. * testsuite/libgomp.fortran/reverse-offload-2.f90: New test. * testsuite/libgomp.fortran/reverse-offload-3.f90: New test. * testsuite/libgomp.fortran/reverse-offload-4.f90: New test. * testsuite/libgomp.fortran/reverse-offload-5.f90: New test. * testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without mapping of on-device allocated variables.
2022-12-06OpenMP: omp_get_max_teams, omp_set_num_teams, and ↵Marcel Vollweiler1-1/+13
omp_{gs}et_teams_thread_limit on offload devices This patch adds support for omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of device-specific ICV values (specified as environment variables or changed on a device). In order to reuse device-specific ICV values, a copy back mechanism is implemented that copies ICV values back from device to the host. Additionally, a limitation of the number of teams on gcn offload devices is implemented. The number of teams is limited by twice the number of compute units (one team is executed on one compute unit). This avoids queueing unnessecary many teams and a corresponding allocation of large amounts of memory. Without that limitation the memory allocation for a large number of user-specified teams can result in an "memory access fault". A limitation of the number of teams is already also implemented for nvptx devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c). gcc/ChangeLog: * gimplify.cc (optimize_target_teams): Set initial num_teams_upper to "-2" instead of "1" for non-existing num_teams clause in order to disambiguate from the case of an existing num_teams clause with value 1. libgomp/ChangeLog: * config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to allow processing of device-specific values. (omp_set_teams_thread_limit): Likewise. (ialias): Likewise. * config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise. (omp_set_teams_thread_limit): Likewise. (ialias): Likewise. * icv-device.c (omp_get_teams_thread_limit): Likewise. (ialias): Likewise. (omp_set_teams_thread_limit): Likewise. * icv.c (omp_set_teams_thread_limit): Removed. (omp_get_teams_thread_limit): Likewise. (ialias): Likewise. * libgomp.texi: Updated documentation for nvptx and gcn corresponding to the limitation of the number of teams. * plugin/plugin-gcn.c (limit_teams): New helper function that limits the number of teams by twice the number of compute units. (parse_target_attributes): Limit the number of teams on gcn offload devices. * target.c (get_gomp_offload_icvs): Added teams_thread_limit_var handling. (gomp_load_image_to_device): Added a size check for the ICVs struct variable. (gomp_copy_back_icvs): New function that is used in GOMP_target_ext to copy back the ICV values from device to host. (GOMP_target_ext): Update the number of teams and threads in the kernel args also considering device-specific values. * testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading of OMP_TEAMS_THREAD_LIMIT from the environment. * testsuite/libgomp.c-c++-common/icv-5.c: Extended. * testsuite/libgomp.c-c++-common/icv-6.c: Extended. * testsuite/libgomp.c-c++-common/icv-7.c: Extended. * testsuite/libgomp.c-c++-common/icv-9.c: New test. * testsuite/libgomp.fortran/icv-5.f90: New test. * testsuite/libgomp.fortran/icv-6.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-teams-1.c: Adapt expected values for num_teams from "1" to "-2" in cases without num_teams clause. * g++.dg/gomp/target-teams-1.C: Likewise. * gfortran.dg/gomp/defaultmap-4.f90: Likewise. * gfortran.dg/gomp/defaultmap-5.f90: Likewise. * gfortran.dg/gomp/defaultmap-6.f90: Likewise.
2022-11-25libgomp: Add no-target-region rev offload test + fix plugin-nvptxTobias Burnus1-12/+24
OpenMP permits that a 'target device(ancestor:1)' is called without being enclosed in a target region - using the current device (i.e. the host) in that case. This commit adds a testcase for this. In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal gracefully by disabling reverse offload and assuming that the failure is fine. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR as valid and the code having no reverse-offload code. * testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.
2022-11-21libgomp/gcn: fix/improve struct outputTobias Burnus1-10/+7
output.printf_data.(value union) contains text[128], which has the size of 128 bytes, sufficient for 16 uint64_t variables; hence value_u64[2] could be extended to value_u64[6] - sufficient for all required arguments to gomp_target_rev. Additionally, next_output.printf_data.(msg union) contained msg_u64 which then is no longer needed and also caused 32bit vs 64bit alignment issues. libgomp/ * config/gcn/libgomp-gcn.h (struct output): Remove 'msg_u64' from the union, change value_u64[2] to value_u64[6]. * config/gcn/target.c (GOMP_target_ext): Update accordingly. * plugin/plugin-gcn.c (process_reverse_offload, console_output): Likewise.
2022-11-19libgomp/gcn: Prepare for reverse-offload callback handlingTobias Burnus1-15/+19
libgomp/ChangeLog: * config/gcn/libgomp-gcn.h: New file; contains struct output, declared previously in plugin-gcn.c. * config/gcn/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare as extern var. (GOMP_target_ext): Handle reverse offload. * plugin/plugin-gcn.c: Include libgomp-gcn.h. (struct kernargs): Replace struct def by the one from libgomp-gcn.h for output_data. (process_reverse_offload): New. (console_output): Call it.
2022-11-04Remove support for Intel MIC offloadingThomas Schwinge1-3/+0
... after its deprecation in GCC 12. * Makefile.def: Remove module 'liboffloadmic'. * Makefile.in: Regenerate. * configure.ac: Remove 'liboffloadmic' handling. * configure: Regenerate. contrib/ * gcc-changelog/git_commit.py (default_changelog_locations): Remove 'liboffloadmic'. * gcc_update (files_and_dependencies): Remove 'liboffloadmic' files. * update-copyright.py (GCCCmdLine): Remove 'liboffloadmic' comment. gcc/ * config.gcc [target *-intelmic-* | *-intelmicemul-*]: Remove. * config/i386/i386-options.cc (ix86_omp_device_kind_arch_isa) [ACCEL_COMPILER]: Remove. * config/i386/intelmic-mkoffload.cc: Remove. * config/i386/intelmic-offload.h: Likewise. * config/i386/t-intelmic: Likewise. * config/i386/t-omp-device: Likewise. * configure.ac [target *-intelmic-* | *-intelmicemul-*]: Remove. * configure: Regenerate. * doc/install.texi (--enable-offload-targets=[...]): Update. * doc/sourcebuild.texi: Remove 'liboffloadmic' documentation. include/ * gomp-constants.h (GOMP_DEVICE_INTEL_MIC): Comment out. (GOMP_VERSION_INTEL_MIC): Remove. libgomp/ * libgomp-plugin.h (OFFLOAD_TARGET_TYPE_INTEL_MIC): Remove. * libgomp.texi (OpenMP Context Selectors): Remove Intel MIC documentation. * plugin/configfrag.ac <enable_offload_targets> [*-intelmic-* | *-intelmicemul-*]: Remove. * configure: Regenerate. * testsuite/lib/libgomp.exp (libgomp_init): Remove 'liboffloadmic' handling. (offload_target_to_openacc_device_type) [$offload_target = *-intelmic*]: Remove. (check_effective_target_offload_device_intel_mic) (check_effective_target_offload_device_any_intel_mic): Remove. * testsuite/libgomp.c-c++-common/on_device_arch.h (device_arch_intel_mic, on_device_arch_intel_mic, any_device_arch) (any_device_arch_intel_mic): Remove. * testsuite/libgomp.c-c++-common/target-45.c: Remove 'offload_device_any_intel_mic' XFAIL. * testsuite/libgomp.fortran/target10.f90: Likewise. liboffloadmic/ * ChangeLog: Remove. * Makefile.am: Likewise. * Makefile.in: Likewise. * aclocal.m4: Likewise. * configure: Likewise. * configure.ac: Likewise. * configure.tgt: Likewise. * doc/doxygen/config: Likewise. * doc/doxygen/header.tex: Likewise. * include/coi/common/COIEngine_common.h: Likewise. * include/coi/common/COIEvent_common.h: Likewise. * include/coi/common/COIMacros_common.h: Likewise. * include/coi/common/COIPerf_common.h: Likewise. * include/coi/common/COIResult_common.h: Likewise. * include/coi/common/COISysInfo_common.h: Likewise. * include/coi/common/COITypes_common.h: Likewise. * include/coi/sink/COIBuffer_sink.h: Likewise. * include/coi/sink/COIPipeline_sink.h: Likewise. * include/coi/sink/COIProcess_sink.h: Likewise. * include/coi/source/COIBuffer_source.h: Likewise. * include/coi/source/COIEngine_source.h: Likewise. * include/coi/source/COIEvent_source.h: Likewise. * include/coi/source/COIPipeline_source.h: Likewise. * include/coi/source/COIProcess_source.h: Likewise. * liboffloadmic_host.spec.in: Likewise. * liboffloadmic_target.spec.in: Likewise. * plugin/Makefile.am: Likewise. * plugin/Makefile.in: Likewise. * plugin/aclocal.m4: Likewise. * plugin/configure: Likewise. * plugin/configure.ac: Likewise. * plugin/libgomp-plugin-intelmic.cpp: Likewise. * plugin/offload_target_main.cpp: Likewise. * runtime/cean_util.cpp: Likewise. * runtime/cean_util.h: Likewise. * runtime/coi/coi_client.cpp: Likewise. * runtime/coi/coi_client.h: Likewise. * runtime/coi/coi_server.cpp: Likewise. * runtime/coi/coi_server.h: Likewise. * runtime/compiler_if_host.cpp: Likewise. * runtime/compiler_if_host.h: Likewise. * runtime/compiler_if_target.cpp: Likewise. * runtime/compiler_if_target.h: Likewise. * runtime/dv_util.cpp: Likewise. * runtime/dv_util.h: Likewise. * runtime/emulator/coi_common.h: Likewise. * runtime/emulator/coi_device.cpp: Likewise. * runtime/emulator/coi_device.h: Likewise. * runtime/emulator/coi_host.cpp: Likewise. * runtime/emulator/coi_host.h: Likewise. * runtime/emulator/coi_version_asm.h: Likewise. * runtime/emulator/coi_version_linker_script.map: Likewise. * runtime/liboffload_error.c: Likewise. * runtime/liboffload_error_codes.h: Likewise. * runtime/liboffload_msg.c: Likewise. * runtime/liboffload_msg.h: Likewise. * runtime/mic_lib.f90: Likewise. * runtime/offload.h: Likewise. * runtime/offload_common.cpp: Likewise. * runtime/offload_common.h: Likewise. * runtime/offload_engine.cpp: Likewise. * runtime/offload_engine.h: Likewise. * runtime/offload_env.cpp: Likewise. * runtime/offload_env.h: Likewise. * runtime/offload_host.cpp: Likewise. * runtime/offload_host.h: Likewise. * runtime/offload_iterator.h: Likewise. * runtime/offload_omp_host.cpp: Likewise. * runtime/offload_omp_target.cpp: Likewise. * runtime/offload_orsl.cpp: Likewise. * runtime/offload_orsl.h: Likewise. * runtime/offload_table.cpp: Likewise. * runtime/offload_table.h: Likewise. * runtime/offload_target.cpp: Likewise. * runtime/offload_target.h: Likewise. * runtime/offload_target_main.cpp: Likewise. * runtime/offload_timer.h: Likewise. * runtime/offload_timer_host.cpp: Likewise. * runtime/offload_timer_target.cpp: Likewise. * runtime/offload_trace.cpp: Likewise. * runtime/offload_trace.h: Likewise. * runtime/offload_util.cpp: Likewise. * runtime/offload_util.h: Likewise. * runtime/ofldbegin.cpp: Likewise. * runtime/ofldend.cpp: Likewise. * runtime/orsl-lite/include/orsl-lite.h: Likewise. * runtime/orsl-lite/lib/orsl-lite.c: Likewise. * runtime/orsl-lite/version.txt: Likewise.
2022-10-24libgomp/nvptx: Prepare for reverse-offload callback handling, resolve ↵Thomas Schwinge1-0/+2
spurious SIGSEGVs Per commit r13-3460-g131d18e928a3ea1ab2d3bf61aa92d68a8a254609 "libgomp/nvptx: Prepare for reverse-offload callback handling", I'm seeing a lot of libgomp execution test regressions. Random example, 'libgomp.c-c++-common/error-1.c': [...] GOMP_OFFLOAD_run: kernel main$_omp_fn$0: launch [(teams: 1), 1, 1] [(lanes: 32), (threads: 8), 1] Thread 1 "a.out" received signal SIGSEGV, Segmentation fault. 0x00007ffff793b87d in GOMP_OFFLOAD_run (ord=<optimized out>, tgt_fn=<optimized out>, tgt_vars=<optimized out>, args=<optimized out>) at [...]/source-gcc/libgomp/plugin/plugin-nvptx.c:2127 2127 if (__atomic_load_n (&ptx_dev->rev_data->fn, __ATOMIC_ACQUIRE) != 0) (gdb) print ptx_dev $1 = (struct ptx_device *) 0x6a55a0 (gdb) print ptx_dev->rev_data $2 = (struct rev_offload *) 0xffffffff00000000 (gdb) print ptx_dev->rev_data->fn Cannot access memory at address 0xffffffff00000000 libgomp/ * plugin/plugin-nvptx.c (nvptx_open_device): Initialize 'ptx_dev->rev_data'.
2022-10-24libgomp/nvptx: Prepare for reverse-offload callback handlingTobias Burnus2-10/+98
This patch adds a stub 'gomp_target_rev' in the host's target.c, which will later handle the reverse offload. For nvptx, it adds support for forwarding the offload gomp_target_ext call to the host by setting values in a struct on the device and querying it on the host - invoking gomp_target_rev on the result. include/ChangeLog: * cuda/cuda.h (enum CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. (CU_MEMHOSTALLOC_DEVICEMAP): Define. (cuMemHostAlloc): Add prototype. libgomp/ChangeLog: * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove 'static' for this variable. * config/nvptx/libgomp-nvptx.h: New file. * config/nvptx/target.c: Include it. (GOMP_ADDITIONAL_ICVS): Declare extern var. (GOMP_REV_OFFLOAD_VAR): Declare var. (GOMP_target_ext): Handle reverse offload. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ... * target.c (gomp_target_rev): ... this new stub function. * libgomp.h (gomp_target_rev): Declare. * libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev. * plugin/cuda-lib.def (cuMemHostAlloc): Add. * plugin/plugin-nvptx.c: Include libgomp-nvptx.h. (struct ptx_device): Add rev_data member. (nvptx_open_device): Remove async_engines query, last used in r10-304-g1f4c5b9b; add unified-address assert check. (GOMP_OFFLOAD_get_num_devices): Claim unified address support. (GOMP_OFFLOAD_load_image): Free rev_fn_table if no offload functions exist. Make offload var available on host and device. (rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New. (GOMP_OFFLOAD_run): Handle reverse offload.
2022-09-09nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookupTobias Burnus1-1/+18
Add support to nvptx for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/nvptx/mkoffload.cc (struct id_map): Add 'dim' member. (record_id): Store func name without quotes, store dim separately. (process): For GOMP_REQUIRES_REVERSE_OFFLOAD, check that -march is at least sm_35, create '$offload_func_table' global array and init with reverse-offload function addresses. * config/nvptx/nvptx.cc (write_fn_proto_1, write_fn_proto): New force_public attribute to force .visible. (nvptx_declare_function_name): For "omp target device_ancestor_nohost" attribut, force .visible/TREE_PUBLIC. libgomp/ChangeLog: * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Read offload function address table '$offload_func_table' if rev_fn_table is not NULL.
2022-09-09GCN: libgomp+mkoffload.cc: Prepare for reverse offload fn lookupTobias Burnus1-1/+25
Add support to GCN for reverse lookup of function name to prepare for 'omp target device(ancestor:1)'. gcc/ChangeLog: * config/gcn/mkoffload.cc (process_asm): Create .offload_func_table, similar to pre-existing .offload_var_table. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Read .offload_func_table to populate rev_fn_table when requested.
2022-09-09libgomp: Prepare for reverse offload fn lookupTobias Burnus2-4/+10
Prepare for reverse-offloading function-pointer lookup by passing a rev_fn_table argument to GOMP_OFFLOAD_load_image. The argument will be NULL, unless GOMP_REQUIRES_REVERSE_OFFLOAD is requested and devices not supported it, are filtered out. (Up to and including this commit, no non-host device claims such support and the caller currently always passes NULL.) libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_load_image): Add 'uint64_t **rev_fn_table' argument. * oacc-host.c (host_load_image): Likewise. * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Likewise; currently unused. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. * target.c (gomp_load_image_to_device): Update call but pass NULL for now. liboffloadmic/ChangeLog: * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_load_image): Add (unused) uint64_t **rev_fn_table argument.
2022-09-08OpenMP, libgomp: Environment variable syntax extensionMarcel Vollweiler2-22/+26
This patch considers the environment variable syntax extension for device-specific variants of environment variables from OpenMP 5.1 (see OpenMP 5.1 specification, p. 75 and p. 639). An environment variable (e.g. OMP_NUM_TEAMS) can have different suffixes: _DEV (e.g. OMP_NUM_TEAMS_DEV): affects all devices but not the host. _DEV_<device> (e.g. OMP_NUM_TEAMS_DEV_42): affects only device with number <device>. no suffix (e.g. OMP_NUM_TEAMS): affects only the host. In future OpenMP versions also suffix _ALL will be introduced (see discussion https://github.com/OpenMP/spec/issues/3179). This is also considered in this patch: _ALL (e.g. OMP_NUM_TEAMS_ALL): affects all devices and the host. The precedence is as follows (descending). For the host: 1. no suffix 2. _ALL For devices: 1. _DEV_<device> 2. _DEV 3. _ALL That means, _DEV_<device> is used whenever available. Otherwise _DEV is used if available, and at last _ALL. If there is no value for any of the variable variants, default values are used as already implemented before. This patch concerns parsing (a), storing (b), output (c) and transmission to the device (d): (a) The actual number of devices and the numbering are not known when parsing the environment variables. Thus all environment variables are iterated and searched for device-specific ones. (b) Only configured device-specific variables are stored. Thus, a linked list is used. (c) The output is done in omp_display_env (see specification p. 468f). Global ICVs are tagged with [all], see https://github.com/OpenMP/spec/issues/3179. ICVs which are not global but aren't handled device-specific yet are tagged with [host]. omp_display_env outputs the initial values of the ICVs. That is why a dedicated data structure is introduced for the inital values only (gomp_initial_icv_list). (d) Device-specific ICVs are transmitted to the device via GOMP_ADDITIONAL_ICVS. libgomp/ChangeLog: * config/gcn/icv-device.c (omp_get_default_device): Return device- specific ICV. (omp_get_max_teams): Added for GCN devices. (omp_set_num_teams): Likewise. (ialias): Likewise. * config/nvptx/icv-device.c (omp_get_default_device): Return device- specific ICV. (omp_get_max_teams): Added for NVPTX devices. (omp_set_num_teams): Likewise. (ialias): Likewise. * env.c (struct gomp_icv_list): New struct to store entries of initial ICV values. (struct gomp_offload_icv_list): New struct to store entries of device- specific ICV values that are copied to the device and back. (struct gomp_default_icv_values): New struct to store default values of ICVs according to the OpenMP standard. (parse_schedule): Generalized for different variants of OMP_SCHEDULE. (print_env_var_error): Function that prints an error for invalid values for ICVs. (parse_unsigned_long_1): Removed getenv. Generalized. (parse_unsigned_long): Likewise. (parse_int_1): Likewise. (parse_int): Likewise. (parse_int_secure): Likewise. (parse_unsigned_long_list): Likewise. (parse_target_offload): Likewise. (parse_bind_var): Likewise. (parse_stacksize): Likewise. (parse_boolean): Likewise. (parse_wait_policy): Likewise. (parse_allocator): Likewise. (omp_display_env): Extended to output different variants of environment variables. (print_schedule): New helper function for omp_display_env which prints the values of run_sched_var. (print_proc_bind): New helper function for omp_display_env which prints the values of proc_bind_var. (enum gomp_parse_type): Collection of types used for parsing environment variables. (ENTRY): Preprocess string lengths of environment variables. (OMP_VAR_CNT): Preprocess table size. (OMP_HOST_VAR_CNT): Likewise. (INT_MAX_STR_LEN): Constant for the maximal number of digits of a device number. (gomp_get_icv_flag): Returns if a flag for a particular ICV is set. (gomp_set_icv_flag): Sets a flag for a particular ICV. (print_device_specific_icvs): New helper function for omp_display_env to print device specific ICV values. (get_device_num): New helper function for parse_device_specific. Extracts the device number from an environment variable name. (get_icv_member_addr): Gets the memory address for a particular member of an ICV struct. (gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list. (initialize_icvs): New function to initialize a gomp_initial_icvs struct. (add_initial_icv_to_list): Adds an ICV struct to gomp_initial_icv_list. (startswith): Checks if a string starts with a given prefix. (initialize_env): Extended to parse the new syntax of environment variables. * icv-device.c (omp_get_max_teams): Added. (ialias): Likewise. (omp_set_num_teams): Likewise. * icv.c (omp_set_num_teams): Moved to icv-device.c. (omp_get_max_teams): Likewise. (ialias): Likewise. * libgomp-plugin.h (GOMP_DEVICE_NUM_VAR): Removed. (GOMP_ADDITIONAL_ICVS): New target-side struct that holds the designated ICVs of the target device. * libgomp.h (enum gomp_icvs): Collection of ICVs. (enum gomp_device_num): Definition of device numbers for _ALL, _DEV, and no suffix. (enum gomp_env_suffix): Collection of possible suffixes of environment variables. (struct gomp_initial_icvs): Contains all ICVs for which we need to store initial values. (struct gomp_default_icv):New struct to hold ICVs for which we need to store initial values. (struct gomp_icv_list): Definition of a linked list that is used for storing ICVs for the devices and also for _DEV, _ALL, and without suffix. (struct gomp_offload_icvs): New struct to hold ICVs that are copied to a device. (struct gomp_offload_icv_list): Definition of a linked list that holds device-specific ICVs that are copied to devices. (gomp_get_initial_icv_item): Get a list item of gomp_initial_icv_list. (gomp_get_icv_flag): Returns if a flag for a particular ICV is set. * libgomp.texi: Updated. * plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): Extended to read further ICVs from the offload image. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise. * target.c (gomp_get_offload_icv_item): Get a list item of gomp_offload_icv_list. (get_gomp_offload_icvs): New. Returns the ICV values depending on the device num and the variable hierarchy. (gomp_load_image_to_device): Extended to copy further ICVs to a device. * testsuite/libgomp.c-c++-common/icv-5.c: New test. * testsuite/libgomp.c-c++-common/icv-6.c: New test. * testsuite/libgomp.c-c++-common/icv-7.c: New test. * testsuite/libgomp.c-c++-common/icv-8.c: New test. * testsuite/libgomp.c-c++-common/omp-display-env-1.c: New test. * testsuite/libgomp.c-c++-common/omp-display-env-2.c: New test.
2022-07-04OpenMP: Move omp requires checks to libgompTobias Burnus2-3/+12
Handle reverse_offload, unified_address, and unified_shared_memory requirements in libgomp by saving them alongside the offload table. When the device lto1 runs, it extracts the data for mkoffload. The latter than passes the value on to GOMP_offload_register_ver. lto1 (either the host one, with -flto [+ ENABLE_OFFLOADING], or in the offload-device lto1) also does the the consistency check is done, erroring out when the 'omp requires' clause use is inconsistent. For all in-principle supported devices, if a requirement cannot be fulfilled, the device is excluded from the (supported) devices list. Currently, none of those requirements are marked as supported for any of the non-host devices. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update, c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Set OMP_REQUIRES_TARGET_USED. (c_parser_omp_requires): Remove sorry. gcc/ChangeLog: * config/gcn/mkoffload.cc (process_asm): Write '#include <stdint.h>'. (process_obj): Pass omp_requires_mask to GOMP_offload_register_ver. (main): Ask lto1 to obtain omp_requires_mask and pass it on. * config/nvptx/mkoffload.cc (process, main): Likewise. * lto-cgraph.cc (omp_requires_to_name): New. (input_offload_tables): Save omp_requires_mask. (output_offload_tables): Read it, check for consistency, save value for mkoffload. * omp-low.cc (lower_omp_target): Force output_offloadtables call for OMP_REQUIRES_TARGET_USED. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_target_data, cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data, cp_parser_omp_target_update): Set OMP_REQUIRES_TARGET_USED. (cp_parser_omp_requires): Remove sorry. gcc/fortran/ChangeLog: * openmp.cc (gfc_match_omp_requires): Remove sorry. * parse.cc (decode_omp_directive): Don't regard 'declare target' as target usage for 'omp requires'; add more flags to omp_requires_mask. include/ChangeLog: * gomp-constants.h (GOMP_VERSION): Bump to 2. (GOMP_REQUIRES_UNIFIED_ADDRESS, GOMP_REQUIRES_UNIFIED_SHARED_MEMORY, GOMP_REQUIRES_REVERSE_OFFLOAD, GOMP_REQUIRES_TARGET_USED): New defines. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_get_num_devices): Add omp_requires_mask arg. * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Likewise; return -1 when device available but omp_requires_mask != 0. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise. * oacc-host.c (host_get_num_devices, host_openacc_get_property): Update call. * oacc-init.c (resolve_device, acc_init_1, acc_shutdown_1, goacc_attach_host_thread_to_device, acc_get_num_devices, acc_set_device_num, get_property_any): Likewise. * target.c (omp_requires_mask): New global var. (gomp_requires_to_name): New. (GOMP_offload_register_ver): Handle passed omp_requires_mask. (gomp_target_init): Handle omp_requires_mask. * libgomp.texi (OpenMP 5.0): Update requires impl. status. (OpenMP 5.1): Add a missed item. (OpenMP 5.2): Mark linear-clause change as supported in C/C++. * testsuite/libgomp.c-c++-common/requires-1-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-1.c: New test. * testsuite/libgomp.c-c++-common/requires-2-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-2.c: New test. * testsuite/libgomp.c-c++-common/requires-3-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-3.c: New test. * testsuite/libgomp.c-c++-common/requires-4-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-4.c: New test. * testsuite/libgomp.c-c++-common/requires-5-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-5.c: New test. * testsuite/libgomp.c-c++-common/requires-6.c: New test. * testsuite/libgomp.c-c++-common/requires-7-aux.c: New test. * testsuite/libgomp.c-c++-common/requires-7.c: New test. * testsuite/libgomp.fortran/requires-1-aux.f90: New test. * testsuite/libgomp.fortran/requires-1.f90: New test. liboffloadmic/ChangeLog: * plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_get_num_devices): Return -1 when device available but omp_requires_mask != 0. gcc/testsuite/ChangeLog: * c-c++-common/gomp/requires-4.c: Update dg-*. * c-c++-common/gomp/reverse-offload-1.c: Likewise. * c-c++-common/gomp/target-device-ancestor-2.c: Likewise. * c-c++-common/gomp/target-device-ancestor-3.c: Likewise. * c-c++-common/gomp/target-device-ancestor-4.c: Likewise. * c-c++-common/gomp/target-device-ancestor-5.c: Likewise. * gfortran.dg/gomp/target-device-ancestor-3.f90: Likewise. * gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise. * gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise. * gfortran.dg/gomp/target-device-ancestor-2.f90: Likewise. Move post-FE checks to ... * gfortran.dg/gomp/target-device-ancestor-2a.f90: ... this new file. * gfortran.dg/gomp/requires-8.f90: Update as we don't regard 'declare target' for the 'requires' usage requirement. Co-authored-by: Chung-Lin Tang <cltang@codesourcery.com> Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-06-10libgomp nvptx plugin: Remove '--with-cuda-driver=[...]' etc. configuration ↵Thomas Schwinge2-103/+12
option That means, exposing to the user only the '--without-cuda-driver' behavior: including the GCC-shipped 'include/cuda/cuda.h' (not system <cuda.h>), and 'dlopen'ing the CUDA Driver library (not linking it). For development purposes, the libgomp nvptx plugin developer may still manually override that, to get the previous '--with-cuda-driver' behavior. libgomp/ * plugin/Makefrag.am: Evaluate 'if PLUGIN_NVPTX_DYNAMIC' to true. * plugin/configfrag.ac (--with-cuda-driver) (--with-cuda-driver-include, --with-cuda-driver-lib) (CUDA_DRIVER_INCLUDE, CUDA_DRIVER_LIB, PLUGIN_NVPTX_CPPFLAGS) (PLUGIN_NVPTX_LDFLAGS, PLUGIN_NVPTX_LIBS, PLUGIN_NVPTX_DYNAMIC): Remove. * testsuite/libgomp-test-support.exp.in (cuda_driver_include) (cuda_driver_lib): Remove. * testsuite/lib/libgomp.exp (libgomp_init): Don't consider these. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise.
2022-05-24amdgcn: Add gfx90a supportAndrew Stubbs1-1/+8
This adds architecture options and multilibs for the AMD GFX90a GPUs. It also tidies up some of the ISA selection code, and corrects a few small mistake in the gfx908 naming. gcc/ChangeLog: * config.gcc (amdgcn): Accept --with-arch=gfx908 and gfx90a. * config/gcn/gcn-opts.h (enum gcn_isa): New. (TARGET_GCN3): Use enum gcn_isa. (TARGET_GCN3_PLUS): Likewise. (TARGET_GCN5): Likewise. (TARGET_GCN5_PLUS): Likewise. (TARGET_CDNA1): New. (TARGET_CDNA1_PLUS): New. (TARGET_CDNA2): New. (TARGET_CDNA2_PLUS): New. (TARGET_M0_LDS_LIMIT): New. (TARGET_PACKED_WORK_ITEMS): New. * config/gcn/gcn.cc (gcn_isa): Change to enum gcn_isa. (gcn_option_override): Recognise CDNA ISA variants. (gcn_omp_device_kind_arch_isa): Support gfx90a. (gcn_expand_prologue): Make m0 init optional. Add support for packed work items. (output_file_start): Support gfx90a. (gcn_hsa_declare_function_name): Support gfx90a metadata. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS):Add __CDNA1__ and __CDNA2__. * config/gcn/gcn.md (<su>mulsi3_highpart): Use TARGET_GCN5_PLUS. (<su>mulsi3_highpart_imm): Likewise. (<su>mulsidi3): Likewise. (<su>mulsidi3_imm): Likewise. * config/gcn/gcn.opt (gpu_type): Add gfx90a. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90a): New. (main): Support gfx90a. * config/gcn/t-gcn-hsa: Add gfx90a multilib. * config/gcn/t-omp-device: Add gfx90a isa. libgomp/ChangeLog: * plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add EF_AMDGPU_MACH_AMDGCN_GFX90a. (gcn_gfx90a_s): New. (isa_hsa_name): Support gfx90a. (isa_code): Likewise.
2022-05-13libgomp nvptx plugin: Only consider '--with-cuda-driver=[...]' when applicableThomas Schwinge1-2/+3
They're not applicable in 'PLUGIN_NVPTX_DYNAMIC' configurations. libgomp/ * plugin/Makefrag.am (libgomp_plugin_nvptx_la_CPPFLAGS) [PLUGIN_NVPTX_DYNAMIC]: Don't append '$(PLUGIN_NVPTX_CPPFLAGS)'. (libgomp_plugin_nvptx_la_LDFLAGS) [PLUGIN_NVPTX_DYNAMIC]: Don't append '$(PLUGIN_NVPTX_LDFLAGS)'. * Makefile.in: Regenerate.