aboutsummaryrefslogtreecommitdiff
path: root/openmp
AgeCommit message (Collapse)AuthorFilesLines
2024-02-23[libc] Remove 'llvm-gpu-none' directory from build (#82816)Joseph Huber1-8/+2
Summary: This directory is leftover from when we handled both AMDGPU and NVPTX in the same build and merged them into a pseudo triple. Now the only thing it contains is the RPC server header. This gets rid of it, but now that it's in the base install directory we should make it clear that it's an LLVM libc header.
2024-02-22[Libomptarget][NFC] Remove concept of optional plugin functions (#82681)Joseph Huber3-41/+43
Summary: Ever since the introduction of the new plugins we haven't exercised the concept of "optional" plugin functions. This is done in perparation for making the plugins use a static interface as it will greatly simplify the implementation if we assert that every function has the entrypoints. Currently some unsupported functions will just return failure or some other default value, so this shouldn't change anything.
2024-02-22[libc] Rework the GPU build to be a regular target (#81921)Joseph Huber4-12/+14
Summary: This is a massive patch because it reworks the entire build and everything that depends on it. This is not split up because various bots would fail otherwise. I will attempt to describe the necessary changes here. This patch completely reworks how the GPU build is built and targeted. Previously, we used a standard runtimes build and handled both NVPTX and AMDGPU in a single build via multi-targeting. This added a lot of divergence in the build system and prevented us from doing various things like building for the CPU / GPU at the same time, or exporting the startup libraries or running tests without a full rebuild. The new appraoch is to handle the GPU builds as strict cross-compiling runtimes. The first step required https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC` target to build for the GPU without touching the other targets. This means that the GPU uses all the same handling as the other builds in `libc`. The new expected way to build the GPU libc is with `LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`. The second step was reworking how we generated the embedded GPU library by moving it into the library install step. Where we previously had one `libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This patch includes the necessary clang / OpenMP changes to make that not break the bots when this lands. We unfortunately still require that the NVPTX target has an `internal` target for tests. This is because the NVPTX target needs to do LTO for the provided version (The offloading toolchain can handle it) but cannot use it for the native toolchain which is used for making tests. This approach is vastly superior in every way, allowing us to treat the GPU as a standard cross-compiling target. We can now install the GPU utilities to do things like use the offload tests and other fun things. Some certain utilities need to be built with `--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine workaround as we will always assume that the GPU `libc` is a cross-build with a functioning host. Depends on https://github.com/llvm/llvm-project/pull/81557
2024-02-22Fix build on musl by including stdint.h (#81434)Daniel Martinez1-0/+1
openmp fails to build on musl since it lacks the defines for int32_t Co-authored-by: Daniel Martinez <danielmartinez@cock.li>
2024-02-22[Libomptarget] Remove global ctor and use reference counting (#80499)Joseph Huber7-15/+89
Summary: Currently we rely on global constructors to initialize and shut down the OpenMP runtime library and plugin manager. This causes some issues because we do not have a defined lifetime that we can rely on to release and allocate resources. This patch instead adds some simple reference counted initialization and deinitialization function. A future patch will use the `deinit` interface to more intelligently handle plugin deinitilization. Right now we do nothing and rely on `atexit` inside of the plugins to tear them down. This isn't great because it limits our ability to control these things. Note that I made the `__tgt_register_lib` functions do the initialization instead of adding calls to the new runtime functions in the linker wrapper. The reason for this is because in the past it's been easier to not introduce a new function call, since sometimes the user's compiler will link against an older `libomptarget`. Maybe if we change the name with offloading in the future we can simplify this. Depends on https://github.com/llvm/llvm-project/pull/80460
2024-02-21[OpenMP] Remove `register_requires` global constructor (#80460)Joseph Huber6-7/+50
Summary: Currently, OpenMP handles the `omp requires` clause by emitting a global constructor into the runtime for every translation unit that requires it. However, this is not a great solution because it prevents us from having a defined order in which the runtime is accessed and used. This patch changes the approach to no longer use global constructors, but to instead group the flag with the other offloading entires that we already handle. This has the effect of still registering each flag per requires TU, but now we have a single constructor that handles everything. This function removes support for the old `__tgt_register_requires` and replaces it with a warning message. We just had a recent release, and the OpenMP policy for the past four releases since we switched to LLVM is that we do not provide strict backwards compatibility between major LLVM releases now that the library is versioned. This means that a user will need to recompile if they have an old binary that relied on `register_requires` having the old behavior. It is important that we actively deprecate this, as otherwise it would not solve the problem of having no defined init and shutdown order for `libomptarget`. The problem of `libomptarget` not having a define init and shutdown order cascades into a lot of other issues so I have a strong incentive to be rid of it. It is worth noting that the current `__tgt_offload_entry` only has space for a 32-bit integer here. I am planning to overhaul these at some point as well.
2024-02-20[OpenMP][AIX]Add assembly file containing microtasking routines and unnamed ↵Xing Xue1-0/+410
common block definitions (#81770) This patch adds assembly file `z_AIX_asm.S` that contains the 32- and 64-bit XCOFF version of microtasking routines and unnamed common block definitions. This code has been run through the libomp LIT tests and a user package successfully.
2024-02-19[libomptarget][test] Add support for APU testing feature. (#82054)Gheorghe-Teodor Bercea2-0/+70
Add test and support for `// REQUIRES: apu` for the category of tests which exercise APU specific behavior. Note: when running on an actual APU you may have to use the following if the architecture ID is not enough to determine if the underlying device is an APU: ``` IS_APU=1 ninja check-openmp ```
2024-02-19[OpenMP] [test] Skip the -mlong-double-80 test on MSVC ABI (#81115)Martin Storsjö1-2/+2
Within the MSVC ABI, long doubles are the same as regular 64 bit doubles. This test case, which is compiled with -mlong-double-80, cannot work when libomp has been compiled without that flag, as -mlong-double-80 changes the calling convention for the tested functions.
2024-02-16[OpenMP][AIX] Set worker stack size to 2 x KMP_DEFAULT_STKSIZE if system ↵Xing Xue2-0/+9
stack size is too big (#81996) This patch sets the stack size of worker threads to `2 x KMP_DEFAULT_STKSIZE` (2 x 4MB) for AIX if the system stack size is too big. Also defines maximum stack size for 32-bit AIX.
2024-02-13[OpenMP][AIX]Define struct kmp_base_tas_lock with the order of two members ↵Xing Xue4-10/+20
swapped for big-endian (#79188) The direct lock data structure has bit `0` (the least significant bit) of the first 32-bit word set to `1` to indicate it is a direct lock. On the other hand, the first word (in 32-bit mode) or first two words (in 64-bit mode) of an indirect lock are the address of the entry allocated from the indirect lock table. The runtime checks bit `0` of the first 32-bit word to tell if this is a direct or an indirect lock. This works fine for 32-bit and 64-bit little-endian because its memory layout of a 64-bit address is (`low word`, `high word`). However, this causes problems for big-endian where the memory layout of a 64-bit address is (`high word`, `low word`). If an address of the indirect lock table entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it is treated as a direct lock. This patch defines `struct kmp_base_tas_lock` with the ordering of the two 32-bit members flipped for big-endian PPC64 so that when checking/setting tags in member `poll`, the second word (the low word) is used. This patch also changes places where `poll` is not already explicitly specified for checking/setting tags.
2024-02-11[OpenMP] Remove -Wno-enum-constexpr-conversion (#81318)Carlos Galvez2-2/+0
This effectively reverts commit 9ff0cc7e0fa7e99163610d2fcb58e96f3315e343. For some reason "git revert" lead to "no changes" after fixing conflicts, so a clean revert was not possible. The original issue (#57022) is no longer reproducible even with this patch, so we can remove the suppression. This is in line with our goal to make -Wenum-constexpr-conversion a non-downgradeable error, see #59036. Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>
2024-02-09[OpenMP] Fix libomp debug build. (#81029)Daniil Fukalov1-0/+4
Disable libstdc++ assertions in the runtime library just like in https://reviews.llvm.org/D143168.
2024-02-08[OpenMP] [cmake] Don't use -fno-semantic-interposition on Windows (#81113)Martin Storsjö1-1/+5
This was added in 4b7beab4187ab0766c3d7b272511d5751431a8da. When the flag was added implicitly elsewhere, it was added via llvm/cmake/modules/HandleLLVMOptions.cmake, where it wasn't added on Windows/Cygwin targets. This avoids one warning per object file in OpenMP.
2024-02-08[OpenMP] [cmake] In standalone mode, make Python3_EXECUTABLE available (#80828)Martin Storsjö1-0/+2
When running the tests, we try to invoke them as "${Python3_EXECUTABLE} ${OPENMP_LLVM_LIT_EXECUTABLE}", but when running "find_package(Python3)" within the function "find_standalone_test_dependencies", the variable "Python3_EXECUTABLE" only gets set within the function scope. Tests have worked regardless of this in many cases, where executing the python script directly succeeds. But for consistency, and for working in cases when the python script can't be executed as such, make the Python3_EXECUTABLE variable available as intended.
2024-02-07[libomptarget] [OMPT] Fixed return address computation for OMPT events. (#80498)dhruvachak11-30/+275
Currently, __builtin_return_address is used to generate the return address when the callback invoker is created. However, this may result in the return address pointing to an internal runtime function. This is not what a tool would typically want. A tool would want to know the corresponding user code from where the runtime entry point is invoked. This change adds a thread local variable that is assigned the return address at the OpenMP runtime entry points. An RAII is used to manage the modifications to the thread local variable. Whenever the return address is required for OMPT events, it is read from the thread local variable.
2024-02-07[OpenMP][test]Flip bit-fields in 'struct flags' for big-endian in test cases ↵Xing Xue4-16/+47
(#79895) This patch flips bit-fields in `struct flags` for big-endian in test cases to be consistent with the definition of the structure in libomp `kmp.h`.
2024-02-06[OpenMP] Support for global variables when in auto zero-copy. (#80876)carlobertolli2-1/+87
When building without unified_shared_memory, global variables are declared in the device binary and allocated upon loading onto GPU memory. However, when running in zero-copy mode (same as with unified_shared_memory) D2H and H2D copies for mapped local and global variables are turned off. This patch turns back on H2D and D2H copies when they refer to global variables, enabling an application built without unified_shared_memory to work correctly with global variables when run under automatic zero-copy. Co-authored-by: Doru Bercea <doru.bercea@amd.com> Co-authored-by: Jan-Patrick Lehr <janpatrick.lehr@amd.com>
2024-02-06[OpenMP] HSA_ENABLE_SDMA visible in libomptarget tests (#80860)Jan Patrick Lehr1-0/+3
Enable the environment variable inside the test environment. This allows to disable SDMA engine transfers as a potential mitigation of flaky OpenMP offloading tests on AMDGPU. Motivated by the open ticket https://github.com/ROCm/ROCm/issues/2616 about a missed synchronization signal.
2024-02-06[OMPD] Runtime Entry Point functions for OMPD in libomp.so need C linkage as ↵vigbalu1-0/+8
per standard. (#79246) Adding extern "C" to all the entry point functions to make sure that these functions are not mangled.
2024-02-05[Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for ↵agozillon8-0/+431
target devices (#71766) This patch seeks to add an initial lowering for pointers and allocatable variables captured by implicit and explicit map in Flang OpenMP for Target operations that take map clauses e.g. Target, Target Update. Target Exit/Enter etc. Currently this is done by treating the type that lowers to a descriptor (allocatable/pointer/assumed shape) as a map of a record type (e.g. a structure) as that's effectively what descriptor types lower to in LLVM-IR and what they're represented as in the Fortran runtime (written in C/C++). The descriptor effectively lowers to a structure containing scalar and array elements that represent various aspects of the underlying data being mapped (lower bound, upper bound, extent being the main ones of interest in most cases) and a pointer to the allocated data. In this current iteration of the mapping we map the structure in it's entirety and then attach the underlying data pointer and map the data to the device, this allows most of the required data to be resident on the device for use. Currently we do not support the addendum (another block of pointer data), but it shouldn't be too difficult to extend this to support it. The MapInfoOp generation for descriptor types is primarily handled in an optimization pass, where it expands BoxType (descriptor types) map captures into two maps, one for the structure (scalar elements) and the other for the pointer data (base address) and links them in a Parent <-> Child relationship. The later lowering processes will then treat them as a conjoined structure with a pointer member map.
2024-02-05[Libomptarget] Remove unused 'SupportsEmptyImages' API function (#80316)Joseph Huber5-17/+1
Summary: This function is always false in the current implementation and is not even considered required. Just remove it and if someone needs it in the future they can add it back in. This is done to simplify the interface prior to other changes
2024-02-03[Libomptarget] Fix data mapping on dynamic loads (#80559)Joseph Huber3-7/+3
Summary: The current logic tries to map target mapping tables to the current device. Right now it assumes that data is only mapped a single time per device. This is only true if we have a single instance of the runtime running on a single program. However, in the case of dynamic library loads or shared libraries, this may happen multiple times. Given a case of a simple dynamic library load which has its own target kernel instruction, the current logic had only the first call to `__tgt_target_kernel` to the data mapping for that device. Then, when the next dynamic library load got called, it would see that the global were already mapped for that device and skip registering its own entires, even though they were distinct. This resulted in none of the mappings being done and hitting an assertion. This patch simply gets rid of this per-device check. The check should instead be on the host offloading entries. We already have logic that calls `continue` if we already have entries for that pointer, so we can simply rely on that instead.
2024-02-03[openmp] Add a dependency on the separate import library (#80449)Martin Storsjö1-0/+1
Currently, when doing e.g. "ninja check-openmp", the check-openmp target only depends on the target "omp", which builds the library. Thus by doing that, the separate import library "libomp.lib", which is generated directly from a def file, never gets created, unless one does a separate invocation first, that builds all targets. To fix this, make the "omp" target depend on the target for the separate import library, whenever that is created/used.
2024-02-01[OpenMP] Fix typo (NFC) (#80332)Kelvin Li1-1/+1
2024-02-01[OpenMP] Fix build breakage (NFC) (#80313)Kelvin Li1-1/+1
Assign `nullptr` to the pointer instead.
2024-02-01[openmp] On Windows, fix standalone cmake build (#80174)Alexandre Ganea1-0/+8
This fixes: https://github.com/llvm/llvm-project/issues/80117
2024-01-31[Libomptarget] Remove handling of old ctor / dtor entries (#80153)Joseph Huber8-152/+8
Summary: A previous patch removed creating these entries in clang in favor of the backend emitting a callable kernel and having the runtime call that if present. The support for the old style was kept around in LLVM 18.0 but now that we have forked to 19.0 we should remove the support. The effect of this would be that an application linking against a newer libomptarget that still had the old constructors will no longer be called. In that case, they can either recompile or use the `libomptarget.so.18` that comes with the previous release.
2024-01-30[OpenMP52][LIBOMPTARGET] Do not throw error in omp_get_mapped_ptr for the ↵Saiyedul Islam1-1/+1
host (#80038) OpenMP spec 5.2 specifies return value to be the host ptr in case of device_num being same as omp_get_initial_device().
2024-01-30[Libomptarget] Remove remaining inline assembly from the device RTL (#79922)Joseph Huber5-61/+10
Summary: Recent patches have added some missing intrinsic functions NVPTX. This patch gets rid of all the remaining uses of inline assembly. The one change that wasn't directly replaced with a built-in was the `pack` and `unpack` implementations. However, using the generic C implementation is equivalent to the output SASS when run through PTXAS.
2024-01-29[libomptarget][NFC] Outline parallel SPMD function (#78642)Gheorghe-Teodor Bercea1-46/+62
This patch outlines the SPMD code path into a separate function that can be called directly.
2024-01-25[OpenMP] Disable LTO build of libomptarget and plugins by default. (#79387)Michael Kruse1-10/+15
CheckIPOSupported is used to test for working LTO since #74520. However, before CMake 3.24 this will test the default linker and ignore options such as LLVM_ENABLE_LLD. As a result, CMake would test whether LTO works with the default linker but builds with another one. In a typical scenario, libomptarget is compiled with the in-tree Clang, but linked with ld.gold, which requires the LLVMgold plugin, when it actually would work with the lld linker (or also fail because the system lld is too old to understand opaque pointers). Using gcc as the compiler would pass the test, but fail when linking with lld since does not understand gcc's LTO format. Disable LTO by default for now since automatic detection causes too many problems. It causes the openmp-offload-cuda-project buildbot (https://lab.llvm.org/staging/#/builders/151) to fail and LLVM_ENABLE_RUNTIMES=openmp builds will have it implicitly disabled in the vast majority of system configurations anyway.
2024-01-25[openmp] Silence warning when compiling with MSVC targetting x86Alexandre Ganea1-1/+1
This fixes: ``` [3593/7449] Building CXX object projects\openmp\runtime\src\CMakeFiles\omp.dir\kmp_debug.cpp.obj C:\git\llvm-project\openmp\runtime\src\kmp_os.h(471): warning C4163: '_InlineInterlockedExchange64': not available as an intrinsic function ```
2024-01-24[openmp][flang][offloading] Do not use fixed device IDs in checks (#78973)Kareem Ergawy1-2/+2
Fixes a small issues in an offloading test where the test dependec on the host and device being assigned certains numeric IDs. This however is not stable and fails in situations where any of the devices is assigned an ID different from the expected value. The fix just checks that offloading succeeded by making sure the IDs are different. The test was failing locally for me.
2024-01-23Bump trunk version to 19.0.0gitllvmorg-19-initTom Stellard1-3/+3
2024-01-23Re-land [openmp] Fix warnings when building on Windows with latest MSVC or ↵Alexandre Ganea8-38/+67
Clang ToT (#77853) The reverts 94f960925b7f609636fc2ffd83053814d5e45ed1 and fixes it.
2024-01-23Revert 10f3296dd7d74c975f208a8569221dc8f96d1db1 - [openmp] Fix warnings when ↵Alexandre Ganea8-60/+34
building on Windows with latest MSVC or Clang ToT (#77853) It broke the AMDGPU buildbot: https://lab.llvm.org/buildbot/#/builders/193/builds/45378
2024-01-23[openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT ↵Alexandre Ganea8-34/+60
(#77853) There were quite a few compilation warnings when building openmp on Windows with the latest Visual Studios 2022 version 17.8.4. Some other warnings were visible with the latest Clang at tip. This commit fixes all of them.
2024-01-22[OpenMP][Fix] Require USM capability in force-usm test (#79059)Jan Patrick Lehr1-0/+2
This should fix the AMDGPU buildbot breakage from #76571
2024-01-22[OpenMP][USM] Introduces -fopenmp-force-usm flag (#76571)Jan Patrick Lehr2-0/+67
This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Which I think is the only difference observable in code generation. This runtime test checks for the (non-)occurence of data movement between host and device. It does one run without the flag and one with the flag to also see that both versions behave as expected. In the case w/o the new flag data movement between host and device is expected. In the case with the flag such data movement should not be present / reported.
2024-01-22[Libomptarget] Move target table handling out of the plugins (#77150)Joseph Huber14-210/+201
Summary: This patch removes the bulk of the handling of the `__tgt_offload_entries` out of the plugins itself. The reason for this is because the plugins themselves should not be handling this implementation detail of the OpenMP runtime. Instead, we expose two new plugin API functions to get the points to a device pointer for a global as well as a kernel type. This required introducing a new type to represent a binary image that has been loaded on a device. We can then use this to load the addresses as needed. The creation of the mapping table is then handled just in `libomptarget` where we simply look up each address individually. This should allow us to expose these operations more generically when we provide a separate API.
2024-01-22[OpenMP] Enable automatic unified shared memory on MI300A. (#77512)carlobertolli12-28/+219
This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch also introduces an environment variable OMPX_APU_MAPS that, if set, triggers automatic zero-copy also on non APU GPUs (e.g., on discrete GPUs). This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-22[OpenMP] Fix two usm tests for amdgpus. (#78824)carlobertolli4-8/+15
Some are missing setting of HSA_XNACK=1 environment variable, used to enable unified memory support on amdgpu's when it's not been set at kernel boot time. Some others needed to be marked as supporting unified_shared_memory in the lit test harness. Extend lit test harness to enable unified_shared_memory requirement for AMD GPUs. Reland: #77851
2024-01-22[LLVM][CMake] Add ffi_static target for the FFI static library (#78779)Joseph Huber1-5/+8
Summary: This patch is an attempt to make the `find_package(FFI)` support in LLVM prefer to provide the static library version if present. This is currently an optional library for building `libffi`, and its presence implies that it should likely be used. This patch is an attempt to fix some problems observed with testing programs linked against `libffi` on many different systems that could have conflicting paths. Linking it statically prevents this. This patch adds the `ffi_static` target for this library.
2024-01-22[OpenMP][OMPIRBuilder] Fix LLVM IR codegen for collapsed device loop (#78708)Dominik Adamski1-0/+44
When we generate the loop body function, we need to be sure, that all original loop counters are replaced by the new counter. We need to save all items which use the original loop counter and then perform replacement of the original loop counter. If we don't do it, there is a risk that some values are not updated.
2024-01-18[openmp] Revert 64874e5ab5fd102344d43ac9465537a44130bf19 since it was ↵Alexandre Ganea9-84/+15
committed by mistake and the PR (https://github.com/llvm/llvm-project/pull/77853) wasn't approved yet.
2024-01-18[NFC][OpenMP] Fix typo in CHECK line (#78586)Dominik Adamski1-1/+1
Typo in test: openmp/libomptarget/test/offloading/fortran/basic-target-parallel-do.f90
2024-01-18[NFC][OpenMP][Flang] Add test for OpenMP target parallel do (#77776)Dominik Adamski1-0/+33
Added test which proves that end-to-end compilation of `omp target parallel do` costruct is successful for Flang compiler.
2024-01-18[OpenMP][omp_lib] Restore compatibility with more restrictive Fortran ↵Paul Osmialowski1-3/+3
compilers (#77780) The most recent changes to `omp_lib.h.var` have re-introduced some compatibility issues that had to be fixed due to the similar changes in the past. Namely: 1. D120707 has removed the "use omp_lib_kinds" statement and replaced it with import 2. D114537 added line continuation to the long lines This patch introduces the same kind of changes in order to restore compatibility with some more restrictive Fortran compilers so their users could still benefit from the LLVM's OpenMP Fortran library.
2024-01-17[openmp] Silence warnings when building the LLVM release with MSVCAlexandre Ganea9-15/+84