aboutsummaryrefslogtreecommitdiff
path: root/offload/plugins-nextgen/amdgpu/src
AgeCommit message (Collapse)AuthorFilesLines
2025-11-13[Offload] Add device info for shared memory (#167817)Kevin Sala Penades1-0/+13
2025-11-06[Offload] Remove handling for device memory pool (#163629)Joseph Huber1-14/+0
Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.
2025-11-04[Offload] Add device UID (#164391)Robert Imschweiler1-0/+14
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-22[OpenMP] Adds omp_target_is_accessible routine (#138294)Nicole Aschenbrenner1-0/+24
Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-21[Offload] Use `amd_signal_async_handler` for host function calls (#154131)Ross Brunton1-18/+32
2025-10-09[OFFLOAD] Remove unused init_device_info plugin interface (#162650)Alex Duran1-11/+1
This was used for the old interop code. It's dead code after #143491
2025-10-06[Offload] Remove check on kernel argument sizes (#162121)Joseph Huber1-5/+0
Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.
2025-10-02[OFFLOAD] Restore interop functionality (#161429)Alex Duran1-0/+31
This implements two pieces to restore the interop functionality (that I broke) when the 6.0 interfaces were added: * A set of wrappers that support the old interfaces on top of the new ones * The same level of interop support for the CUDA amd AMD plugins
2025-09-26[Offload] Use Error for allocating/deallocating in plugins (#160811)Kevin Sala Penades1-43/+34
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-20[Offload] Remove non-blocking allocation type (#159851)Joseph Huber1-2/+0
Summary: This was originally added in as a hack to work around CUDA's limitation on allocation. The `libc` implementation now isn't even used for CUDA so this code is never hit. Even if this case, this code never truly worked. A true solution would be to use CUDA's virtual memory API instead to allocate 2MiB slabs independenctly from the normal memory management done in the stream.
2025-09-19[OpenMP][NFC] Clean up a bunch of warnings and clang-tidy messages (#159831)Joseph Huber1-22/+25
Summary: I made the GPU flags accept more of the default LLVM warnings, which triggered some new cases. Clean those up and fix some other ones while I'm at it.
2025-09-16[Offload] Copy loaded images into managed storage (#158748)Joseph Huber1-14/+14
Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.
2025-08-29[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823)Ross Brunton1-2/+3
This is the total number of work items that the device supports (the equivalent work group properties are for only a single work group).
2025-08-28[Offload] Add PRODUCT_NAME device info (#155632)Ross Brunton1-1/+1
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-26[Offload] Full AMD support for olMemFill (#154958)Ross Brunton1-25/+88
2025-08-22[Offload] Implement olMemFill (#154102)Callum Fare1-0/+25
Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)Ross Brunton1-0/+22
A simple info query for events that returns whether the event is complete or not.
2025-08-19[Offload] Add olCalculateOptimalOccupancy (#142950)Ross Brunton1-0/+10
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19[Offload] Define additional device info properties (#152533)Rafal Bielski1-3/+22
Add the following properties in Offload device info: * VENDOR_ID * NUM_COMPUTE_UNITS * [SINGLE|DOUBLE|HALF]_FP_CONFIG * NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF] * MAX_CLOCK_FREQUENCY * MEMORY_CLOCK_RATE * ADDRESS_BITS * MAX_MEM_ALLOC_SIZE * GLOBAL_MEM_SIZE Add a bitfield option to enumerators, allowing the values to be bit-shifted instead of incremented. Generate the per-type enums using `foreach` to reduce code duplication. Use macros in unit test definitions to reduce code duplication.
2025-08-15[Offload] Introduce dataFence plugin interface. (#153793)Abhinav Gaba1-0/+7
The purpose of this fence is to ensure that any `dataSubmit`s inserted into a queue before a `dataFence` finish before finish before any `dataSubmit`s inserted after it begin. This is a no-op for most queues, since they are in-order, and by design any operations inserted into them occur in order. But the interface is supposed to be functional for out-of-order queues. The addition of the interface means that any operations that rely on such ordering (like ATTACH map-type support in #149036) can invoke it, without worrying about whether the underlying queue is in-order or out-of-order. Once a plugin supports out-of-order queues, the plugin can implement this function, without requiring any change at the libomptarget level. --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-15[Offload] `olLaunchHostFunction` (#152482)Ross Brunton1-0/+48
Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.
2025-08-08[Offload] Make olLaunchKernel test thread safe (#149497)Ross Brunton1-13/+12
This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).
2025-08-07[Offload] Don't create events for empty queues (#152304)Ross Brunton1-0/+11
Add a device function to check if a device queue is empty. If liboffload tries to create an event for an empty queue, we create an "empty" event that is already complete. This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run quickly for empty queues.
2025-08-06[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size ↵hidekisaito1-0/+34
in omp_target_alloc (#151882) Enables AMD data center class GPUs to use memory manager memory pooling up to 3GB allocation by default, up from the "1 << 13" threshold that all plugin-nextgen devices use.
2025-08-04[Offload] Rework `MAX_WORK_GROUP_SIZE` (#151926)Ross Brunton1-2/+3
`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work groups the device can allocate, rather than the maximum per dimension. `MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old behaviour.
2025-07-21[Offload] Verify SyncCycle for events in AMDGPU (#149524)Ross Brunton1-0/+5
This check ensures that events after a synchronise (and thus after the queue is reset) are always considered complete. A test has been added as well.
2025-07-18[Offload] Allow "tagging" device info entries with offload keys (#147317)Ross Brunton1-4/+7
When generating the device info tree, nodes can be marked with an offload Device Info value. The nodes can also look up children based on this value.
2025-07-18[Offload] Implement event sync in amdgpu (#149300)Ross Brunton1-2/+50
2025-07-10[Offload] Allow querying the size of globals (#147698)Ross Brunton1-2/+3
The `GlobalTy` helper has been extended to make both the Size and Ptr be optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the size of the global to the struct, instead of just verifying it.
2025-07-04[Offload][amdgpu] Map `INVALID_CODE_OBJECT` to `INVALID_BINARY` (#147070)Ross Brunton1-0/+3
2025-06-26[Offload] Add default for HSA agent type to silence warning (#145943)Joseph Huber1-0/+3
Summary: There's a new one called the AIE (AI Engine). We could handle this, but since we don't use it currently I'm just making it future-proof. Adding the AIE check would require checking the HSA version which isn't worthwhile just yet.
2025-06-25[Offload] Add an `unloadBinary` interface to PluginInterface (#143873)Ross Brunton1-13/+7
This allows removal of a specific Image from a Device, rather than requiring all image data to outlive the device they were created for. This is required for `ol_program_handle_t`s, which now specify the lifetime of the buffer used to create the program.
2025-06-13[Offload] Replace device info queue with a tree (#144050)Ross Brunton1-22/+23
Previously, device info was returned as a queue with each element having a "Level" field indicating its nesting level. This replaces this queue with a more traditional tree-like structure. This should not result in a change to the output of `llvm-offload-device-info`.
2025-06-02[Offload][AMDGPU] Correctly handle variable implicit argument sizes (#142199)Joseph Huber1-19/+35
Summary: The size of the implicit argument struct can vary depending on optimizations, it is not always the size as listed by the full struct. Additionally, the implicit arguments are always aligned on a pointer boundary. This patch updates the handling to use the correctly aligned offset and only initialize the members if they are contained in the reported size. Additionally, we modify the `alloc` and `free` routines to allow `alloc(0)` and `free(nullptr)` as these are mandated by the C standard and allow us to easily handle cases where the user calls a kernel with no arguments.
2025-06-02[Offload] Make AMDGPU plugin handle empty allocation properly (#142383)Joseph Huber1-4/+1
Summary: `malloc(0)` and `free(nullptr)` are both defined by the standard but we current trigger erros and assertions on them. Fix that so this works with empty arguments.
2025-05-20[Offload] Use new error code handling mechanism and lower-case messages ↵Ross Brunton1-70/+105
(#139275) [Offload] Use new error code handling mechanism This removes the old ErrorCode-less error method and requires every user to provide a concrete error code. All calls have been updated. In addition, for consistency with error messages elsewhere in LLVM, all messages have been made to start lower case.
2025-03-26[Offload] Guard HSA implicit arguments if they aren't created (#133073)Joseph Huber1-21/+18
Summary: We conditionally allocate the implicit arguments, so they possibly are null. The flang compiler seems to hit this case, even though it shouldn't when it's supposed to conform to the HSA code object. For now guard this to fix the regression and cover a case in the future where someone rolls a fully custom implementatation. Fixes: https://github.com/llvm/llvm-project/issues/132982
2025-03-24[Offload] Remove handling for COV4 binaries from offload/ (#131033)Joseph Huber1-16/+12
Summary: We moved from cov4 to cov5 a long time ago, and it guards simplifying some front end code, so we should be able to move up with this.
2025-02-19[AMDGPU] Replace gfx940 and gfx941 with gfx942 in offload and libclc (#125826)Fabian Ritter1-6/+0
gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. For SWDEV-512631 and SWDEV-512633
2025-01-31[Offload][NFC] Fix typos discovered by codespell (#125119)Christian Clauss1-24/+24
https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`
2025-01-24[Offload] Move RPC server handling to a dedicated thread (#112988)Joseph Huber1-27/+32
Summary: Handling the RPC server requires running through list of jobs that the device has requested to be done. Currently this is handled by the thread that does the waiting for the kernel to finish. However, this is not sound on NVIDIA architectures and only works for async launches in the OpenMP model that uses helper threads. However, we also don't want to have this thread doing work unnnecessarily. For this reason we track the execution of kernels and cause the thread to sleep via a condition variable (usually backed by some kind of futex or other intelligent sleeping mechanism) so that the thread will be idle while no kernels are running.
2024-12-09[Offload][AMDGPU] accept generic target (#118919)hidekisaito1-21/+22
Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to run on gfx11-generic ISA capable device. Executable may contain one ELF that has specific target ISA and another ELF that has compatible generic ISA. Under that circumstance, this code should say both ELFs are compatible, leaving the rest to PluginManager to handle. Suggestions on how best to address that is welcome.
2024-12-06[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042)Shilei Tian1-31/+38
2024-12-04Fix to account for multiple ISA enumeration (#118676)hidekisaito1-1/+1
2024-12-02[OpenMP] Unconditionally provide an RPC client interface for OpenMP (#117933)Joseph Huber1-3/+1
Summary: This patch adds an RPC interface that lives directly in the OpenMP device runtime. This allows OpenMP to implement custom opcodes. Currently this is only providing the host call interface, which is the raw version of reverse offloading. Previously this lived in `libc/` as an extension which is not the correct place. The interface here uses a weak symbol for the RPC client by the same name that the `libc` interface uses. This means that it will defer to the libc one if both are present so we don't need to set up multiple instances. The presense of this symbol is what controls whether or not we set up the RPC server. Because this is an external symbol it normally won't be optimized out, so there's a special pass in OpenMPOpt that deletes this symbol if it is unused during linking. That means at `O0` the RPC server will always be present now, but will be removed trivially if it's not used at O1 and higher.
2024-11-22[AMDGPU] Remove uses of deprecreated HSA executable functions (#117241)Joseph Huber1-14/+16
Summary: These functions were deprecated in ROCR 1.3 which was released quite some time ago. The main functionality that was lost was modifying and inspecting the code object indepedently of the executable, however we do all of that custom through our ELF API. This should be within the versions of other functions we use.
2024-10-29[OpenMP] Add support for custom callback in AMDGPUStream (#112785)Joseph Huber1-25/+44
Summary: We have the ability to schedule callbacks after certain events complete. Currently we can register an arbitrary callback in CUDA, but can't in AMDGPU. I am planning on using this support to move the RPC handling to a separate thread, then using these callbacks to suspend / resume it when no kernels are running. This is a preliminary patch to keep this noise out of that one.
2024-09-05[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer ↵Johannes Doerfert1-42/+46
(#100280) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.
2024-08-21[Offload] Improve error reporting on memory faults (#104254)Johannes Doerfert1-1/+11
Since we can already track allocations, we can diagnose memory faults to some degree. If the fault happens in a prior allocation (use after free) or "close but outside" one, we can provide that information to the user. Note that the fault address might be page aligned, and not all accesses trigger a fault, especially for allocations that are backed by a MemoryManager. Still, if people disable the MemoryManager or the allocation is big enough, we can sometimes provide valueable feedback.
2024-08-20[llvm][offload] Move AMDGPU offload utilities to LLVM (#102487)Fabian Mora1-6/+6
This patch moves utilities from `offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h` to `llvm/Frontend/Offloading/Utility.h` to be reused by other projects. Concretely the following changes were made: - Rename `KernelMetaDataTy` to `AMDGPUKernelMetaData`. - Remove unused fields `KernelObject`, `KernelSegmentSize`, `ExplicitArgumentCount` and `ImplicitArgumentCount` from `AMDGPUKernelMetaData`. - Return the produced error if `ELFObj.sections()` failed instead of using `cantFail`. - Added `AGPRCount` field to `AMDGPUKernelMetaData`. - Added a default invalid value to all the fields in `AMDGPUKernelMetaData`.