aboutsummaryrefslogtreecommitdiff
path: root/offload
AgeCommit message (Collapse)AuthorFilesLines
2025-12-01Reland: [OpenMP] Implement omp_get_uid_from_device() / ↵Robert Imschweiler5-0/+146
omp_get_device_from_uid() (#168554) Reland https://github.com/llvm/llvm-project/pull/164392 with Fortran support moved to follow-up PR
2025-11-26[OpenMP][clang] Register vtables on device for indirect calls runtime (#167011)Jason-VanBeusekom4-19/+167
This is a branch off of https://github.com/llvm/llvm-project/pull/159856, in which consists of the runtime portion of the changes required to support indirect function and virtual function calls on an `omp target device` when the virtual class / indirect function is mapped to the device from the host. Key Changes - Introduced a new flag OMP_DECLARE_TARGET_INDIRECT_VTABLE to mark VTable registrations - Modified setupIndirectCallTable to support both VTable entries and indirect function pointers Details: The setupIndirectCallTable implementation was modified to support this registration type by retrieving the first address of the VTable and inferring the remaining data needed to build the indirect call table. Since the Vtables / Classes registered as indirect can be larger than 8 bytes, and the vtables may not be at the first address we either need to pass the size to __llvm_omp_indirect_call_lookup and have a check at each step of the binary search, or add multiple entries to the indirect table for each address registered. The latter was chosen. Commit: a00def3f20e166d4fb9328e6f0bc0742cd0afa31 is not a part of this PR and is handled / reviewed in: https://github.com/llvm/llvm-project/pull/159856, This is PR (2/3) Register Vtable PR (1/3): https://github.com/llvm/llvm-project/pull/159856, Codegen / _llvm_omp_indirect_call_lookup PR (3/3): https://github.com/llvm/llvm-project/pull/159857
2025-11-26[OFFLOAD] Add support for indexed per-thread containers (#164263)Alex Duran2-59/+208
Split from #158900 it adds a PerThreadContainer that can use STL-like indexed containers based on a slightly refactored PerThreadTable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-11-25[offload][lit] Fix compilation of two offload tests (#169399)Nick Sarnie2-3/+3
These are C tests, not C++, so no function parameters means unspecified number of parameters, not `void`. These compile fine on the current tested offload targets because an error is only [thrown](https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDecl.cpp#L10695) if the calling convention doesn't support variadic arguments, which they happen to. When compiling this test for other targets that do not support variadic arguments, we get an error, which does not seem intentional. Just add `void` to the parameter list. --------- Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2025-11-24[NFC][OpenMP] Add use_device_ptr/addr tests for when the lookup fails. (#169428)Abhinav Gaba3-0/+77
As per OpenMP 5.1, the pointers are expected to retain their original values when a lookup fails and there is no device pointer to translate to.
2025-11-24[OpenMP][flang] Lowering of OpenMP custom reductions to MLIR (#168417)Jan Leyonberg1-0/+88
This patch add support for lowering of custom reductions to MLIR. It also enhances the capability of the pass to automatically mark functions as "declare target" by traversing custom reduction initializers and combiners.
2025-11-24[Flang][OpenMP][MLIR] Initial declare target to for variables implementation ↵agozillon3-0/+111
(#119589) While the infrastructure for declare target to/enter and link for variables exists in the MLIR dialect and at the Flang level, the current lowering from MLIR -> LLVM IR isn't in place, it's only in place for variables that have the link clause applied. This PR aims to extend that lowering to an initial implementation that incorporates declare target to as well, which primarily requires changes in the OpenMPToLLVMIRTranslation phase. However, a minor addition to the OpenMP dialect was required to extend the declare target enumerator to include a default None field as well. This also requires a minor change to the Flang lowering's MapInfoFinlization.cpp pass to alter the map type for descriptors to deal with cases where a variable is marked declare to. Currently, when a descriptor variable is mapped declare target to the descriptor component can become attatched, and cannot be updated, this results in issues when an unusual allocation range is specified (effectively an off-by X error). The current solution is to map the descriptor always, as we always require an up-to-date version of this data. However, this also requires an interlinked PR that adds a more intricate type of mapping of structures/record types that clang currently implements, to circumvent the overwriting of the pointer in the descriptor. 3/3 required PRs to enable declare target to mapping, this PR should pass all tests and provide an all green CI. Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-11-24[MLIR][OpenMP] Introduce overlapped record type map support (#119588)agozillon1-0/+56
This PR introduces a new additional type of map lowering for record types that Clang currently supports, in which a user can map a top-level record type and then individual members with different mapping, effectively creating a sort of "overlapping" mapping that we attempt to cut around. This is currently most predominantly used in Fortran, when mapping descriptors and there data, we map the descriptor and its data with separate map modifiers and "cut around" the pointer data, so that wedo not overwrite it unless the runtime deems it a neccesary action based on its reference counting mechanism. However, it is a mechanism that will come in handy/trigger when a user explitily maps a record type (derived type or structure) and then explicitly maps a member with a different map type. These additions were predominantly in the OpenMPToLLVMIRTranslation.cpp file and phase, however, one Flang test that checks end-to-end IR compilation (as far as we care for now at least) was altered. 2/3 required PRs to enable declare target to mapping, should look at PR 3/3 to check for full green passes (this one will fail a number due to some dependencies). Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-11-20[OFFLOAD] Add support for more fine grained debug messages control (#165416)Alex Duran4-2/+285
This PR introduces new debug macros that allow a more fined control of which debug message to output and introduce C++ stream style for debug messages. Changing existing messages (except a few that I changed for testing) will come in subsequent PRs. I also think that we should make debug enabling OpenMP agnostic but, for now, I prioritized maintaing the current libomptarget behavior for now, and we might need more changes further down the line as we we decouple libomptarget.
2025-11-19[Offload] Make the RPC thread sleep briefly when idle (#168596)Joseph Huber1-3/+18
Summary: We start this thread if the RPC client symbol is detected in the loaded binary. We should make this sleep if there's no work to avoid the thread running at high priority when the (scarecely used) RPC call is actually required. So, right now after 25 microseconds we will assume the server is inactive and begin sleeping. This resets once we do find work. AMD supports a more intelligent way to do this. HSA signals can wake a sleeping thread from the kernel, and signals can be sent from the GPU side. This would be nice to have and I'm planning on working with it in the future to make this infrastructure more usable with existing AMD workloads.
2025-11-19[Runtimes] Default build must use its own output dirs (#168266)Michael Kruse3-9/+9
Post-commit fix of #164794 reported at https://github.com/llvm/llvm-project/pull/164794#issuecomment-3536253493 `LLVM_LIBRARY_OUTPUT_INTDIR` and `LLVM_RUNTIME_OUTPUT_INTDIR` is used by `AddLLVM.cmake` as output directories. Unless we are in a bootstrapping-build, It must not point to directories found by `find_package(LLVM)` which may be read-only directories. MLIR for instance sets thesese variables to its own build output directory, so should the runtimes.
2025-11-18Revert "[OpenMP] Implement omp_get_uid_from_device() / ↵Robert Imschweiler5-145/+0
omp_get_device_from_uid()" (#168547) Reverts llvm/llvm-project#164392 due to fortran issues
2025-11-18[OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_uid() ↵Robert Imschweiler5-0/+145
(#164392) Use the implementation in libomptarget. If libomptarget is not available, always return the UID / device number of the host / the initial device.
2025-11-14[OpenMP][Flang] Emit default declare mappers implicitly for derived types ↵Akash Banerjee1-0/+65
(#140562) This patch adds support to emit default declare mappers for implicit mapping of derived types when not supplied by user. This especially helps tackle mapping of allocatables of derived types.
2025-11-13[Offload] Add device info for shared memory (#167817)Kevin Sala Penades8-4/+47
2025-11-13[offload] defer "---> olInit" trace message (#167893)Łukasz Plewa1-5/+11
Tracing requires liboffload to be initialized, so calling isTracingEnabled() before olInit always returns false. This caused the first trace log to look like: ``` -> OL_SUCCESS ``` instead of: ``` ---> olInit() -> OL_SUCCESS ``` This patch moves the pre-call trace print for olInit so it is emitted only after initialization. It would be possible to add extra logic to detect whether liboffload is already initialized and only postpone the first pre-call print, but this would add unnecessary complexity, especially since this is tablegen code. The difference would matter only in the unlikely case of a crash during a second olInit call. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-11-10[PGO][Offload] Fix missing names bug in GPU PGO (#166444)Ethan Luis McDonough4-4/+0
After #163011 was merged, the tests in [`offload/test/offloading/gpupgo`](https://github.com/llvm/llvm-project/compare/main...EthanLuisMcDonough:llvm-project:gpupgo-names-fix-pr?expand=1#diff-f769f6cebd25fa527bd1c1150cc64eb585c41cb8a8b325c2bc80c690e47506a1) broke because the offload plugins were no longer able to find `__llvm_prf_nm`. This pull request explicitly makes `__llvm_prf_nm` visible to the host on GPU targets and reverses the changes made in f7e9968a5ba99521e6e51161f789f0cc1745193f.
2025-11-08[Offload] Remove unused KernelArgsTy instantiation (#167197)Kevin Sala Penades1-4/+0
2025-11-06[OpenMP] Fix tests relying on the heap size variableJoseph Huber4-7/+12
Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.
2025-11-06[Offload] Remove handling for device memory pool (#163629)Joseph Huber9-150/+12
Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.
2025-11-04[Offload] Add device UID (#164391)Robert Imschweiler12-6/+91
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-31[MLIR][OpenMP] Fix and simplify bounds offset calculation for 1-D GEP ↵agozillon1-0/+61
offsets (#165486) Currently this is being calculated incorrectly and will result in incorrect index offsets in more complicated array slices. This PR tries to address it by refactoring and changing the calculation to be more correct.
2025-10-24[OFFLOAD] Remove weak from __kmpc_* calls and gather them in one header ↵Alex Duran5-38/+18
(#164613) Follow-up from #162652 --------- Co-authored-by: Michael Klemm <michael.klemm@amd.com>
2025-10-22[OpenMP] Adds omp_target_is_accessible routine (#138294)Nicole Aschenbrenner9-0/+132
Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-22[NFC][Offload][OMPT] Improve readability of liboffload OMPT tests (#163181)Kaloyan Ignatov12-266/+327
- ompt_target_data_op_t, ompt_scope_endpoint_t and ompt_target_t are now printed as strings instead of just numbers to ease debugging - some missing clang-format clauses have been added
2025-10-21[NFC][OpenMP] Update a test that was failing on aarch64. (#164456)Abhinav Gaba1-5/+9
The failure was reported here: https://github.com/llvm/llvm-project/pull/164039#issuecomment-3425429556 The test was checking for the "bad" behavior so as to keep track of it, but there seem to be some issues with the pointer arithmetic specific to aarch64. The update for now is to not check for the "bad" behavior fully. We may need to debug further if similar issues are encountered eventually once the codegen has been fixed.
2025-10-21[Offload] Use `amd_signal_async_handler` for host function calls (#154131)Ross Brunton1-18/+32
2025-10-20[NFC][OpenMP] Add small class-member use_device_ptr/addr unit tests. (#164039)Abhinav Gaba8-0/+300
Two of the tests are currently asserting, and two are emitting unexpected results. The asserting tests will be fixed using the ATTACH-style codegen from #153683. The other two involve `use_device_addr` on byrefs, and need more follow-up codegen changes, that have been noted in a FIXME comment.
2025-10-17[OFFLOAD] Interop fixes for Windows (#162652)Alex Duran2-8/+7
On Windows, for a reason I don't fully understand boolean bits get extra padding (even when asking for packed structures) in the structures that messes the offsets between the compiler and the runtime. Also, "weak" works differently on Windows than Linux (i.e., the "local" routine has preference) which causes it to crash as we don't really have an alternate implementation of __kmpc_omp_wait_deps. Given this, it doesn't make sense to mark it as "weak" for Linux either.
2025-10-16[Offload] XFAIL pgo tests until resolved (#163722)Jan Patrick Lehr4-0/+4
While people look into it, xfail the tests.
2025-10-15[OpenMP] Disable a few more tests to get the bot green (#163614)Joseph Huber2-1/+4
2025-10-15[OpenMP] Add test to print interop identifiers (#161434)Jan Patrick Lehr1-0/+83
The test covers some of the identifier symbols in the interop runtime. This test, for now, is to guard against complete breakage, which was the result of the other `interop.c` test not being enabled on AMD and thus, not caught by our buildbots.
2025-10-14Revert "[Offload] Lazily initialize platforms in the Offloading API" (#163272)Joseph Huber1-62/+33
Summary: This causes issues with CUDA's teardown order when the init is separated from the total init scope.
2025-10-14[Offload] Lazily initialize platforms in the Offloading API (#163272)Joseph Huber2-38/+87
Summary: The Offloading library wraps around the underlying plugins. The problem is that we currently initialize all plugins we find, even if they are not needed for the program. This is very expensive for trivial uses, as fully heterogenous usage is quite rare. In practice this means that you will always pay a 200 ms penalty for having CUDA installed. This patch changes the behavior to provide accessors into the plugins and devices that allows them to be initialized lazily. We use a once_flag, this should properly take a fast-path check while still blocking on concurrent use. Making full use of this will require a way to filter platforms more specifically. I'm thinking of what this would look like as an API. I'm thinking that we either have an extra iterate function that takes a callback on the platform, or we just provide a helper to find all the devices that can run a given image. Maybe both? Fixes: https://github.com/llvm/llvm-project/issues/159636
2025-10-12[Offload] Silence warning via maybe unused (NFC) (#163076)Jan Patrick Lehr1-1/+1
2025-10-09[Flang][OpenMP] Defer descriptor mapping for assumed dummy argument types ↵agozillon1-0/+101
(#154349) This PR adds deferral of descriptor maps until they are necessary for assumed dummy argument types. The intent is to avoid a problem where a user can inadvertently map a temporary local descriptor to device without their knowledge and proceed to never unmap it. This temporary local descriptor remains lodged in OpenMP device memory and the next time another variable or descriptor residing in the same stack address is mapped we incur a runtime OpenMP map error as we try to remap the same address. This fix was discussed with the OpenMP committee and applies to OpenMP 5.2 and below, future versions of OpenMP can avoid this issue via the attach semantics added to the specification.
2025-10-09[OFFLOAD] Remove unused init_device_info plugin interface (#162650)Alex Duran5-64/+2
This was used for the old interop code. It's dead code after #143491
2025-10-06[Offload] Fix isValidBinary segfault on host platformJoseph Huber1-2/+3
Summary: Need to verify this actually has a device. We really need to rework this to point to a real impolementation, or streamline it to handle this automatically.
2025-10-06[Offload] Remove check on kernel argument sizes (#162121)Joseph Huber4-5/+6
Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.
2025-10-02[OFFLOAD] Restore interop functionality (#161429)Alex Duran4-5/+116
This implements two pieces to restore the interop functionality (that I broke) when the 6.0 interfaces were added: * A set of wrappers that support the old interfaces on top of the new ones * The same level of interop support for the CUDA amd AMD plugins
2025-10-02[Flang][OpenMP] Implicitly map nested allocatable components in derived ↵Akash Banerjee1-0/+43
types (#160766) This PR adds support for nested derived types and their mappers to the MapInfoFinalization pass. - Generalize MapInfoFinalization to add child maps for arbitrarily nested allocatables when a derived object is mapped via declare mapper. - Traverse HLFIR designates rooted at the target block arg and build full coordinate_of chains; append members with correct membersIndex. This fixes #156461.
2025-09-29[OpenMP] Mark problematic tests as XFAIL / UNSUPPORTED (#161267)Joseph Huber17-42/+58
Summary: Several of these tests have been failing for literal years. Ideally we make efforts to fix this, but keeping these broken has had serious consequences on our testing infrastructure where failures are the norm so almost all test failures are disregarded. I made a tracking issue for the ones that have been disabled. https://github.com/llvm/llvm-project/issues/161265
2025-09-29[Offload] Fix incorrect size used in llvm-offload-device-info toolJoseph Huber1-1/+1
Summary: This was not using the size previously queried and would fail when the implementation actually verified it.
2025-09-29[OpenMP][Offload] Support `PRIVATE | ATTACH` maps for ↵Abhinav Gaba1-64/+307
corresponding-pointer-initialization. (#160760) `PRIVATE | ATTACH` maps can be used to represent firstprivate pointers that should be initialized by doing doing the pointee's device address, if its lookup succeeds, or retain the original host pointee's address otherwise. With this, for a test like the following: ```f90 integer, pointer :: p(:) !$omp target map(p(1)) ... print*, p(1) !$omp end target ``` The codegen can look like: ```llvm ; maps for p: ; &p(1), &p(1), sizeof(p(1)), TO|FROM //(1) ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), ATTACH //(2) ; &ref_ptr(p), &p(1), sizeof(ref_ptr(p)), PRIVATE|ATTACH|PARAM //(3) call... @__omp_outlined...(ptr %ref_ptr_of_p) ``` * `(1)` maps the pointee `p(1)`. * `(2)` attaches it to the (previously) mapped `ref_ptr(p)`, if present. It can be controlled via OpenMP 6.1's `attach(auto/always/never)` map-type modifiers. * `(3)` privatizes and initializes the local `ref_ptr(p)`, which gets passed in as the kernel argument `%ref_ptr_of_p`. Can be skipped if p is not referenced directly within the region. While similar mapping can be used for C/C++, it's more important/useful for Fortran as we can avoid creating another argument for passing the descriptor, and use that to initialize the private copy in the body of the kernel.
2025-09-29[OpenMP] Fix 'libc' configuration when building OpenMPJoseph Huber1-188/+0
Summary: Forgot to port this option's old handling from offload. It's not way easier since they're built in the same CMake project. Also delete the leftover directory that's not used anymore, don't know how that was still there.
2025-09-29[OpenMP][Flang] Fix no-loop test (#161162)Dominik Adamski1-0/+1
Fortran no-loop test is supported only for GPU.
2025-09-29[Offload][NFC] use unique ptrs for platforms (#160888)Piotr Balcer1-44/+45
Currently, devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This patch allocates platforms individually, and changes devices to store a reference to its platform instead of a pointer. This is safe, because platforms are guaranteed to outlive the devices they contain.
2025-09-26[Offload] Use Error for allocating/deallocating in plugins (#160811)Kevin Sala Penades6-126/+158
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-26[Flang][OpenMP] Enable no-loop kernels (#155818)Dominik Adamski1-0/+96
Enable the generation of no-loop kernels for Fortran OpenMP code. target teams distribute parallel do pragmas can be promoted to no-loop kernels if the user adds the -fopenmp-assume-teams-oversubscription and -fopenmp-assume-threads-oversubscription flags. If the OpenMP kernel contains reduction or num_teams clauses, it is not promoted to no-loop mode. The global OpenMP device RTL oversubscription flags no longer force no-loop code generation for Fortran.
2025-09-25Revert "[Flang][OpenMP] Implicitly map nested allocatable components in ↵Akash Banerjee1-43/+0
derived types" (#160759) Reverts llvm/llvm-project#160116