aboutsummaryrefslogtreecommitdiff
path: root/offload/liboffload/src
AgeCommit message (Collapse)AuthorFilesLines
6 days[OFFLOAD] Extend olMemRegister API to handle cases when a memory block may ↵fineg741-4/+7
have been mapped outside of liboffload. (#172226) This PR adds extends liboffload olMemRegister API to handle a case when a memory block may have been mapped before calling olMemRegister to support some use cases in libomptarget
6 days[Offload] Add argument to 'olInit' for global configuration options (#181872)Joseph Huber1-7/+25
Summary: This PR adds a pointer argument to the initialization routine to be used for global options. Right now this is used to allow the user to constrain which backends they wish to use. If a null argument is passed, the same behavior as before is observed. This is epxected to be extensible by forcing the user to encode the size of the struct. So, old executables will encode which fields they have access to. We use a macro helper to get this struct rather than a runtime call so that the current state of the size is baked into the executable rather than something looked up by the runtime. Otherwise it would just return the size that the (potentially newer) runtime would see
10 days[OFFLOAD] Add support for host offloading device (#177307)fineg741-100/+17
The purpose of this PR is to add support of host as an offloading device to liboffload. Both OpenMP and sycl support offloading to a host as their normal workflow and therefore would require such capability from liboffload library.
2026-01-30[Offload] Add a function to register an RPC Server callback (#178774)Joseph Huber1-0/+6
Summary: We provide an RPC server to manage calls initiated by the device to run on the host. This is very useful for the built-in handling we have, however there are cases where we would want to extend this functionality. Cases like Fortran or MPI would be useful, but we cannot put references to these in the core offloading runtime. This way, we can provide this as a library interface that registers custom handlers for whatever code people want.
2026-01-28[Offload][AMDGPU] Fix olQueryQueue uninitialized output parameter (#178464)puneeth_aditya_56561-0/+3
## Summary - Fix uninitialized output parameter in `olQueryQueue_impl` when `Queue->AsyncInfo->Queue` is null - Set `IsQueueWorkCompleted` to `true` when no underlying queue exists (no pending work) - Resolves test failure on AMDGPU for `olQueryQueueTest.SuccessEmptyAsyncQueueCheckResult` Fixes #178462. ## Test plan - [x] Fixed `OffloadAPI/queue.unittests/olQueryQueueTest/SuccessEmptyAsyncQueueCheckResult/AMDGPU_AMD_Radeon_RX_7700_XT_0` test - [ ] CI tests pass --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>
2026-01-20[OFFLOAD] Add a check before calling dataExchange (#176853)fineg741-1/+19
Per documentation the call to dataExchange API (move memory block between different devices) is permitted only if isDataExchangable() call returned true. While almost all platforms support memory transfer between different devices, in the case when the transfer is attempted between devices belonging to different platforms if they are present on the same machine which can lead to unexpected results. This PR adds a check if dataExchange can be called and if not uses a workaround by initiating memory transfer through host.
2026-01-20[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231)fineg741-0/+9
Add liboffload asynchronous queue query API for libomptarget migration This PR adds liboffload asynchronous queue query API that needed to make libomptarget to use liboffload
2026-01-12[OFFLOAD] Add memory data locking API for libomptarget migration (#173138)fineg741-0/+15
Add liboffload memory data locking API for libomptarget migration This PR adds liboffload memory data locking API that needed to make libomptarget to use liboffload
2025-12-18[OFFLOAD] Recognize level_zero backend in liboffload (#172818)Alex Duran1-0/+2
The code to recognize the level_zero plugin as a liboffload backend was split from #158900. This PR adds the support back. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-11-13[Offload] Add device info for shared memory (#167817)Kevin Sala Penades1-0/+8
2025-11-04[Offload] Add device UID (#164391)Robert Imschweiler1-2/+5
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-14Revert "[Offload] Lazily initialize platforms in the Offloading API" (#163272)Joseph Huber1-62/+33
Summary: This causes issues with CUDA's teardown order when the init is separated from the total init scope.
2025-10-14[Offload] Lazily initialize platforms in the Offloading API (#163272)Joseph Huber1-37/+85
Summary: The Offloading library wraps around the underlying plugins. The problem is that we currently initialize all plugins we find, even if they are not needed for the program. This is very expensive for trivial uses, as fully heterogenous usage is quite rare. In practice this means that you will always pay a 200 ms penalty for having CUDA installed. This patch changes the behavior to provide accessors into the plugins and devices that allows them to be initialized lazily. We use a once_flag, this should properly take a fast-path check while still blocking on concurrent use. Making full use of this will require a way to filter platforms more specifically. I'm thinking of what this would look like as an API. I'm thinking that we either have an extra iterate function that takes a callback on the platform, or we just provide a helper to find all the devices that can run a given image. Maybe both? Fixes: https://github.com/llvm/llvm-project/issues/159636
2025-10-06[Offload] Fix isValidBinary segfault on host platformJoseph Huber1-2/+3
Summary: Need to verify this actually has a device. We really need to rework this to point to a real impolementation, or streamline it to handle this automatically.
2025-09-29[Offload][NFC] use unique ptrs for platforms (#160888)Piotr Balcer1-44/+45
Currently, devices store a raw pointer to back to their owning Platform. Platforms are stored directly inside of a vector. Modifying this vector risks invalidating all the platform pointers stored in devices. This patch allocates platforms individually, and changes devices to store a reference to its platform instead of a pointer. This is safe, because platforms are guaranteed to outlive the devices they contain.
2025-09-24[Offload] Add olGetMemInfo with platform-less API (#159581)Ross Brunton1-0/+54
2025-09-23[Offload] Don't add the unsupported host plugin to the list (#159642)Joseph Huber1-6/+4
Summary: The host plugin is basically OpenMP specific and doesn't work very well. Previously we were skipping over it in the list instead of just not adding it at all.
2025-09-23[Offload] Re-allocate overlapping memory (#159567)Ross Brunton1-10/+60
If olMemAlloc happens to allocate memory that was already allocated elsewhere (possibly by another device on another platform), it is now thrown away and a new allocation generated. A new `AllocBases` vector is now available, which is an ordered list of allocation start addresses.
2025-09-19[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)Joseph Huber1-1/+8
Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary *can* be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.
2025-09-16[Offload] Copy loaded images into managed storage (#158748)Joseph Huber1-26/+9
Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.
2025-08-29[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823)Ross Brunton1-0/+9
This is the total number of work items that the device supports (the equivalent work group properties are for only a single work group).
2025-08-29[Offload] Improve `olDestroyQueue` logic (#153041)Ross Brunton1-16/+126
Previously, `olDestroyQueue` would not actually destroy the queue, instead leaving it for the device to clean up when it was destroyed. Now, the queue is either released immediately if it is complete or put into a list of "pending" queues if it is not. Whenever we create a new queue, we check this list to see if any are now completed. If there are any we release their resources and use them instead of pulling from the pool. This prevents long running programs that create and drop many queues without syncing them from leaking memory all over the place.
2025-08-28[Offload] Add PRODUCT_NAME device info (#155632)Ross Brunton1-0/+3
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-22[Offload] Implement olMemFill (#154102)Callum Fare1-0/+6
Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)Ross Brunton1-2/+14
A simple info query for events that returns whether the event is complete or not.
2025-08-21[Offload] Fix `OL_DEVICE_INFO_MAX_MEM_ALLOC_SIZE` on AMD (#154521)Ross Brunton1-10/+9
This wasn't handled with the normal info API, so needs special handling.
2025-08-20[Offload] Guard olMemAlloc/Free with a mutex (#153786)Ross Brunton1-10/+18
Both these functions update an `AllocInfoMap` structure in the context, however they did not use any locks, causing random failures in threaded code. Now they use a mutex.
2025-08-19[Offload] Add olCalculateOptimalOccupancy (#142950)Ross Brunton1-0/+18
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19[Offload] Define additional device info properties (#152533)Rafal Bielski1-4/+89
Add the following properties in Offload device info: * VENDOR_ID * NUM_COMPUTE_UNITS * [SINGLE|DOUBLE|HALF]_FP_CONFIG * NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF] * MAX_CLOCK_FREQUENCY * MEMORY_CLOCK_RATE * ADDRESS_BITS * MAX_MEM_ALLOC_SIZE * GLOBAL_MEM_SIZE Add a bitfield option to enumerators, allowing the values to be bit-shifted instead of incremented. Generate the per-type enums using `foreach` to reduce code duplication. Use macros in unit test definitions to reduce code duplication.
2025-08-15[Offload] `olLaunchHostFunction` (#152482)Ross Brunton1-0/+7
Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.
2025-08-13[Offload] Store globals in the program's global list rather than the kernel ↵Ross Brunton1-1/+1
list (#153441)
2025-08-08[Offload] Make olLaunchKernel test thread safe (#149497)Ross Brunton1-10/+7
This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).
2025-08-08[Offload] OL_QUEUE_INFO_EMPTY (#152473)Ross Brunton1-0/+6
Add a queue query that (if possible) reports whether the queue is empty
2025-08-07[Offload] Don't create events for empty queues (#152304)Ross Brunton1-4/+20
Add a device function to check if a device queue is empty. If liboffload tries to create an event for an empty queue, we create an "empty" event that is already complete. This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run quickly for empty queues.
2025-08-04[Offload] Rework `MAX_WORK_GROUP_SIZE` (#151926)Ross Brunton1-0/+14
`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work groups the device can allocate, rather than the maximum per dimension. `MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old behaviour.
2025-07-25[Offload] Refactor device information queries to use new tagging (#147318)Ross Brunton2-76/+53
Instead using strings to look up device information (which is brittle and slow), use the new tags that the plugins specify when building the nodes.
2025-07-24[Offload] Replace "EventOut" parameters with `olCreateEvent` (#150217)Ross Brunton1-22/+10
Rather than having every "enqueue"-type function have an output pointer specifically for an output event, just provide an `olCreateEvent` entrypoint which pushes an event to the queue. For example, replace: ```cpp olMemcpy(Queue, ..., EventOut); ``` with ```cpp olMemcpy(Queue, ...); olCreateEvent(Queue, EventOut); ```
2025-07-23[Offload] Add olWaitEvents (#150036)Ross Brunton1-0/+22
This function causes a queue to wait until all the provided events have completed before running any future scheduled work.
2025-07-23[Offload] Rename olWaitEvent/Queue to olSyncEvent/Queue (#150023)Ross Brunton1-2/+2
This more closely matches the nomenclature used by CUDA, AMDGPU and the plugin interface.
2025-07-16[Offload] Cache symbols in program (#148209)Ross Brunton1-23/+34
When creating a new symbol, check that it already exists. If it does, return that pointer rather than building a new symbol structure.
2025-07-14[Offload] Check plugins aren't already deinitialized when tearing down (#148642)Callum Fare1-1/+1
This is a hotfix for #148615 - it fixes the issue for me locally. I think a broader issue is that in the test environment we're calling olShutDown from a global destructor in the test binaries. We should do something more controlled, either calling olInit/olShutDown in every test, or move those to a GTest global environment. I didn't do that originally because it looked like it needed changes to LLVM's GTest wrapper.
2025-07-11[Offload] Add global variable address/size queries (#147972)Ross Brunton1-0/+19
Add two new symbol info types for getting the bounds of a global variable. As well as a number of tests for reading/writing to it.
2025-07-11[Offload] Add `olGetSymbolInfo[Size]` (#147962)Ross Brunton1-0/+28
This mirrors the similar functions for other handles. The only implemented info at the moment is the symbol's kind.
2025-07-11[Offload] Replace `GetKernel` with `GetSymbol` with global support (#148221)Ross Brunton1-19/+41
`olGetKernel` has been replaced by `olGetSymbol` which accepts a `Kind` parameter. As well as loading information about kernels, it can now also load information about global variables.
2025-07-10[Offload] Change `ol_kernel_handle_t` -> `ol_symbol_handle_t` (#147943)Ross Brunton1-4/+18
In the future, we want `ol_symbol_handle_t` to represent both kernels and global variables The first step in this process is a rename and promotion to a "typed handle".
2025-07-09[Offload] Implement olGetQueueInfo, olGetEventInfo (#142947)Callum Fare1-0/+55
Add info queries for queues and events. `olGetQueueInfo` only supports getting the associated device. We were already tracking this so we can implement this for free. We will likely add other queries to it in the future (whether the queue is empty, what flags it was created with, etc) `olGetEventInfo` only supports getting the associated queue. This is another thing we were already storing in the handle. We'll be able to add other queries in future (the event type, status, etc)
2025-07-02[Offload] Add `MAX_WORK_GROUP_SIZE` device info query (#143718)Ross Brunton1-0/+40
This adds a new device info query for the maximum workgroup/block size for each dimension.
2025-06-30[Offload] Refactor device/platform info queries (#146345)Ross Brunton2-50/+90
This makes several small changes to how the platform and device info queries are handled: * ReturnHelper has been replaced with InfoWriter which is more explicit in how it is invoked. * InfoWriter consumes `llvm::Expected` rather than values directly, and will early exit if it returns an error. * As a result of the above, `GetInfoString` now correctly returns errors rather than empty strings. * The host device now has its own dedicated "getInfo" function rather than being checked in multiple places.
2025-06-30[Offload] Implement `olShutDown` (#144055)Ross Brunton1-21/+48
`olShutDown` was not properly calling deinit on the platforms, resulting in random segfaults on AMD devices. As part of this, `olInit` and `olShutDown` now alloc and free the offload context rather than it being static. This allows `olShutDown` to be called within a destructor of a static object (like the tests do) without having to worry about destructor ordering.
2025-06-27[Offload] Store device info tree in device handle (#145913)Ross Brunton1-20/+21
Rather than creating a new device info tree for each call to `olGetDeviceInfo`, we instead do it on device initialisation. As well as improving performance, this fixes a few lifetime issues with returned strings. This does unfortunately mean that device information is immutable, but hopefully that shouldn't be a problem for any queries we want to implement. This also meant allowing offload initialization to fail, which it can now do.