aboutsummaryrefslogtreecommitdiff
path: root/offload/liboffload/API
AgeCommit message (Collapse)AuthorFilesLines
2025-11-13[Offload] Add device info for shared memory (#167817)Kevin Sala Penades1-0/+1
2025-11-04[Offload] Add device UID (#164391)Robert Imschweiler1-0/+1
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-14[Offload] Lazily initialize platforms in the Offloading API (#163272)Joseph Huber1-1/+2
Summary: The Offloading library wraps around the underlying plugins. The problem is that we currently initialize all plugins we find, even if they are not needed for the program. This is very expensive for trivial uses, as fully heterogenous usage is quite rare. In practice this means that you will always pay a 200 ms penalty for having CUDA installed. This patch changes the behavior to provide accessors into the plugins and devices that allows them to be initialized lazily. We use a once_flag, this should properly take a fast-path check while still blocking on concurrent use. Making full use of this will require a way to filter platforms more specifically. I'm thinking of what this would look like as an API. I'm thinking that we either have an extra iterate function that takes a callback on the platform, or we just provide a helper to find all the devices that can run a given image. Maybe both? Fixes: https://github.com/llvm/llvm-project/issues/159636
2025-09-24[Offload] Add olGetMemInfo with platform-less API (#159581)Ross Brunton1-0/+50
2025-09-23[Offload] Re-allocate overlapping memory (#159567)Ross Brunton1-0/+3
If olMemAlloc happens to allocate memory that was already allocated elsewhere (possibly by another device on another platform), it is now thrown away and a new allocation generated. A new `AllocBases` vector is now available, which is an ordered list of allocation start addresses.
2025-09-19[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)Joseph Huber1-0/+12
Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary *can* be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.
2025-08-29[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823)Ross Brunton1-0/+2
This is the total number of work items that the device supports (the equivalent work group properties are for only a single work group).
2025-08-28[Offload] Add PRODUCT_NAME device info (#155632)Ross Brunton1-0/+1
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-22[Offload] Fix definition of olMemFill (#154947)Callum Fare1-2/+1
Fix regression introduced by #154102 - the way offload-tblgen handles names has changed
2025-08-22[Offload] Implement olMemFill (#154102)Callum Fare1-0/+20
Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)Ross Brunton1-1/+2
A simple info query for events that returns whether the event is complete or not.
2025-08-22[Offload][NFC] Use tablegen names rather than `name` parameter for API (#154736)Ross Brunton10-123/+61
2025-08-21[Offload][NFC] Use a sensible order for APIGen (#154518)Ross Brunton1-1/+1
The order entries in the tablegen API files are iterated is not the order they appear in the file. To avoid any issues with the order changing in future, we now generate all definitions of a certain class before class that can use them. This is a NFC; the definitions don't actually change, just the order they exist in in the OffloadAPI.h header.
2025-08-19[Offload] Add olCalculateOptimalOccupancy (#142950)Ross Brunton1-1/+19
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19[Offload] Define additional device info properties (#152533)Rafal Bielski2-1/+37
Add the following properties in Offload device info: * VENDOR_ID * NUM_COMPUTE_UNITS * [SINGLE|DOUBLE|HALF]_FP_CONFIG * NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF] * MAX_CLOCK_FREQUENCY * MEMORY_CLOCK_RATE * ADDRESS_BITS * MAX_MEM_ALLOC_SIZE * GLOBAL_MEM_SIZE Add a bitfield option to enumerators, allowing the values to be bit-shifted instead of incremented. Generate the per-type enums using `foreach` to reduce code duplication. Use macros in unit test definitions to reduce code duplication.
2025-08-15[Offload] `olLaunchHostFunction` (#152482)Ross Brunton2-1/+35
Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.
2025-08-08[Offload] OL_QUEUE_INFO_EMPTY (#152473)Ross Brunton1-1/+2
Add a queue query that (if possible) reports whether the queue is empty
2025-08-06[NFC][Offload] Clarify `olDestroyQueue` (#152132)Ross Brunton1-1/+3
This has no code changes.
2025-08-04[Offload] Rework `MAX_WORK_GROUP_SIZE` (#151926)Ross Brunton1-1/+2
`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work groups the device can allocate, rather than the maximum per dimension. `MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old behaviour.
2025-07-24[Offload] Replace "EventOut" parameters with `olCreateEvent` (#150217)Ross Brunton3-6/+14
Rather than having every "enqueue"-type function have an output pointer specifically for an output event, just provide an `olCreateEvent` entrypoint which pushes an event to the queue. For example, replace: ```cpp olMemcpy(Queue, ..., EventOut); ``` with ```cpp olMemcpy(Queue, ...); olCreateEvent(Queue, EventOut); ```
2025-07-23[Offload] Add olWaitEvents (#150036)Ross Brunton1-0/+17
This function causes a queue to wait until all the provided events have completed before running any future scheduled work.
2025-07-23[Offload] Rename olWaitEvent/Queue to olSyncEvent/Queue (#150023)Ross Brunton2-4/+4
This more closely matches the nomenclature used by CUDA, AMDGPU and the plugin interface.
2025-07-11[Offload] Add global variable address/size queries (#147972)Ross Brunton1-1/+3
Add two new symbol info types for getting the bounds of a global variable. As well as a number of tests for reading/writing to it.
2025-07-11[Offload] Add `olGetSymbolInfo[Size]` (#147962)Ross Brunton1-0/+54
This mirrors the similar functions for other handles. The only implemented info at the moment is the symbol's kind.
2025-07-11[Offload] Replace `GetKernel` with `GetSymbol` with global support (#148221)Ross Brunton2-16/+17
`olGetKernel` has been replaced by `olGetSymbol` which accepts a `Kind` parameter. As well as loading information about kernels, it can now also load information about global variables.
2025-07-10[Offload] Change `ol_kernel_handle_t` -> `ol_symbol_handle_t` (#147943)Ross Brunton4-8/+30
In the future, we want `ol_symbol_handle_t` to represent both kernels and global variables The first step in this process is a rename and promotion to a "typed handle".
2025-07-10[Offload] Add Offload API Sphinx documentation (#147323)Kenneth Benzie (Benie)1-15/+0
* Add spec generation to offload-tblgen tool * This patch adds generation of Sphinx compatible reStructuedText utilizing the C domain to document the Offload API directly from the spec definition `.td` files. * Add Sphinx HTML documentation target * Introduces the `docs-offload-html` target when CMake is configured with `LLVM_ENABLE_SPHINX=ON` and `SPHINX_OUTPUT_HTML=ON`. Utilized `offload-tblgen -gen-spen` to generate Offload API specification docs.
2025-07-09[Offload] Implement olGetQueueInfo, olGetEventInfo (#142947)Callum Fare2-0/+96
Add info queries for queues and events. `olGetQueueInfo` only supports getting the associated device. We were already tracking this so we can implement this for free. We will likely add other queries to it in the future (whether the queue is empty, what flags it was created with, etc) `olGetEventInfo` only supports getting the associated queue. This is another thing we were already storing in the handle. We'll be able to add other queries in future (the event type, status, etc)
2025-07-09[Offload] Generate OffloadInfo.inc (#147316)Ross Brunton1-1/+1
This is a generated file which contains a macro for all Device Info keys. This is visible to the plugin interface so that it can use the definitions in a future patch.
2025-07-02[Offload] Add missing license header to Common.td (#146737)Callum Fare1-0/+12
All other tablegen files in this directory have the license header, but `Common.td` is missing it
2025-07-02[Offload] Add `MAX_WORK_GROUP_SIZE` device info query (#143718)Ross Brunton1-1/+2
This adds a new device info query for the maximum workgroup/block size for each dimension.
2025-07-02[Offload] Improve liboffload documentation (#142403)Callum Fare1-36/+69
- Update the main README to reflect the current project status - Rework the main API generation documentation. General fixes/tidying, but also spell out explicitly how to make API changes at the top of the document since this is what most people will care about. --------- Co-authored-by: Martin Grant <martingrant@outlook.com>
2025-06-30[Offload] Implement `olShutDown` (#144055)Ross Brunton1-1/+1
`olShutDown` was not properly calling deinit on the platforms, resulting in random segfaults on AMD devices. As part of this, `olInit` and `olShutDown` now alloc and free the offload context rather than it being static. This allows `olShutDown` to be called within a destructor of a static object (like the tests do) without having to worry about destructor ordering.
2025-06-25[Offload] Add an `unloadBinary` interface to PluginInterface (#143873)Ross Brunton1-1/+3
This allows removal of a specific Image from a Device, rather than requiring all image data to outlive the device they were created for. This is required for `ol_program_handle_t`s, which now specify the lifetime of the buffer used to create the program.
2025-06-24[Offload] Properly report errors when jit compiling (#145498)Ross Brunton1-0/+1
Previously, if a binary failed to load due to failures when jit compiling, the function would return success with nullptr. Now it returns a new plugin error, `COMPILE_FAILURE`.
2025-06-20[Offload] Check for initialization (#144370)Ross Brunton1-0/+1
All entry points (except olInit) now check that offload has been initialized. If not, a new `OL_ERRC_UNINITIALIZED` error is returned.
2025-06-12[Offload] Add `ol_dimensions_t` and convert ranges from size_t -> uint32_t ↵Ross Brunton2-6/+12
(#143901) This is a three element x, y, z size_t vector that can be used any place where a 3D vector is required. This ensures that all vectors across liboffload are the same and don't require any resizing/reordering dances.
2025-06-06[Offload] Make olMemcpy src parameter const (#143161)Callum Fare1-1/+1
2025-06-06[Offload] Allow setting null arguments in olLaunchKernel (#141958)Ross Brunton1-2/+4
2025-06-04[Offload] Explicitly create directories that contain tablegen output (#142817)Callum Fare1-0/+1
This isn't required when building with Ninja, but with the Makefile generator these directories don't get implicitly created.
2025-06-04[Offload] Fix missing dependencies in Offload API generation (#142776)Callum Fare1-0/+2
Thanks to @RossBrunton for spotting this. We attempt to clang-format the generated Offload header files, but if clang-format isn't available we just copy the generated files instead. That fallback path was missing the correct dependencies. Fixes #142756
2025-06-03[Offload] Don't check in generated files (#141982)Callum Fare1-24/+41
Previously we decided to check in files that we generate with tablegen. The justification at the time was that it helped reviewers unfamiliar with `offload-tblgen` see the actual changes to the headers in PRs. After trying it for a while, it's ended up causing some headaches and is also not how tablegen is used elsewhere in LLVM. This changes our use of tablegen to be more conventional. Where possible, files are still clang-formatted, but this is no longer a hard requirement. Because `OffloadErrcodes.inc` is shared with libomptarget it now gets generated in a more appropriate place.
2025-05-28[Offload] Add specifier for the host type (#141635)Joseph Huber1-0/+1
Summary: We use this sepcial type to indicate a host value, this will be refined later but for now it's used as a stand-in device for transfers and queues. It needs a special kind because it is not a device target as the other ones so we need to differentiate it between a CPU and GPU type. Fixes: https://github.com/llvm/llvm-project/issues/141436
2025-05-20[Offload] Use new error code handling mechanism and lower-case messages ↵Ross Brunton1-13/+20
(#139275) [Offload] Use new error code handling mechanism This removes the old ErrorCode-less error method and requires every user to provide a concrete error code. All calls have been updated. In addition, for consistency with error messages elsewhere in LLVM, all messages have been made to start lower case.
2025-05-19[Offload] Add Error Codes to PluginInterface (#138258)Ross Brunton2-14/+21
A new ErrorCode enumeration is present in PluginInterface which can be used when returning an llvm::Error from offload and PluginInterface functions. This enum must be kept up to sync with liboffload's ol_errc_t enum, so both are automatically generated from liboffload's enum definition. Some error codes have also been shuffled around to allow for future work. Note that this patch only adds the machinery; actual error codes will be added in a future patch. ~~Depends on #137339 , please ignore first commit of this MR.~~ This has been merged.
2025-04-22[Offload] Implement the remaining initial Offload API (#122106)Callum Fare11-67/+290
Implement the complete initial version of the Offload API, to the extent that is usable for simple offloading programs. Tested with a basic SYCL program. As far as possible, these are simple wrappers over existing functionality in the plugins. * Allocating and freeing memory (host, device, shared). * Creating a program * Creating a queue (wrapper over asynchronous stream resource) * Enqueuing memcpy operations * Enqueuing kernel executions * Waiting on (optional) output events from the enqueue operations * Waiting on a queue to finish Objects created with the API have reference counting semantics to handle their lifetime. They are created with an initial reference count of 1, which can be incremented and decremented with retain and release functions. They are freed when their reference count reaches 0. Platform and device objects are not reference counted, as they are expected to persist as long as the library is in use, and it's not meaningful for users to create or destroy them. Tests have been added to `offload.unittests`, including device code for testing program and kernel related functionality. The API should still be considered unstable and it's very likely we will need to change the existing entry points.
2025-01-31[Offload][NFC] Fix typos discovered by codespell (#125119)Christian Clauss2-2/+2
https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`
2024-12-05Reland #118503: [Offload] Introduce offload-tblgen and initial new API ↵Callum Fare7-0/+761
implementation (#118614) Reland #118503. Added a fix for builds with `-DBUILD_SHARED_LIBS=ON` (see last commit). Otherwise the changes are identical. --- ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.
2024-12-03Revert "Reland of #108413: [Offload] Introduce offload-tblgen and initial ↵Jan Patrick Lehr7-761/+0
new API implementation" (#118541) Reverts llvm/llvm-project#118503 Broke bot https://lab.llvm.org/staging/#/builders/131/builds/9701/steps/5/logs/stdio
2024-12-03Reland of #108413: [Offload] Introduce offload-tblgen and initial new API ↵Callum Fare7-0/+761
implementation (#118503) This is another attempt to reland the changes from #108413 The previous two attempts introduced regressions and were reverted. This PR has been more thoroughly tested with various configurations so shouldn't cause any problems this time. If anyone is aware of any likely remaining problems then please let me know. The changes are identical other than the fixes contained in the last 5 commits. --- ### New API Previous discussions at the LLVM/Offload meeting have brought up the need for a new API for exposing the functionality of the plugins. This change introduces a very small subset of a new API, which is primarily for testing the offload tooling and demonstrating how a new API can fit into the existing code base without being too disruptive. Exact designs for these entry points and future additions can be worked out over time. The new API does however introduce the bare minimum functionality to implement device discovery for Unified Runtime and SYCL. This means that the `urinfo` and `sycl-ls` tools can be used on top of Offload. A (rough) implementation of a Unified Runtime adapter (aka plugin) for Offload is available [here](https://github.com/callumfare/unified-runtime/tree/offload_adapter). Our intention is to maintain this and use it to implement and test Offload API changes with SYCL. ### Demoing the new API ```sh # From the runtime build directory $ ninja LibomptUnitTests $ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests ``` ### Open questions and future work * Only some of the available device info is exposed, and not all the possible device queries needed for SYCL are implemented by the plugins. A sensible next step would be to refactor and extend the existing device info queries in the plugins. The existing info queries are all strings, but the new API introduces the ability to return any arbitrary type. * It may be sensible at some point for the plugins to implement the new API directly, and the higher level code on top of it could be made generic, but this is more of a long-term possibility.