aboutsummaryrefslogtreecommitdiff
path: root/offload/plugins-nextgen/common/include
AgeCommit message (Collapse)AuthorFilesLines
5 days[Offload] Always check/consume Error (#182008)Jan Patrick Lehr1-1/+3
This fixes an issue introduced in https://github.com/llvm/llvm-project/pull/172226 where an llvm::Error is not checked in the "good" code path.
6 days[OFFLOAD] Extend olMemRegister API to handle cases when a memory block may ↵fineg741-63/+38
have been mapped outside of liboffload. (#172226) This PR adds extends liboffload olMemRegister API to handle a case when a memory block may have been mapped before calling olMemRegister to support some use cases in libomptarget
2026-02-06[Offload] Make the RPC callbacks private to each running server (#178901)Joseph Huber1-2/+11
Summary: The static object mixes callbacks from different plugins because ever since we moved to the object library target these are actually shared. Just make it a member of the base class and make it a pointer set just to do some basic deduplication.
2026-01-30[Offload] Add a function to register an RPC Server callback (#178774)Joseph Huber1-0/+5
Summary: We provide an RPC server to manage calls initiated by the device to run on the host. This is very useful for the built-in handling we have, however there are cases where we would want to extend this functionality. Cases like Fortran or MPI would be useful, but we cannot put references to these in the core offloading runtime. This way, we can provide this as a library interface that registers custom handlers for whatever code people want.
2026-01-20[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231)fineg741-2/+4
Add liboffload asynchronous queue query API for libomptarget migration This PR adds liboffload asynchronous queue query API that needed to make libomptarget to use liboffload
2026-01-12[Offload] Update debug message printig in the plugins (#175205)Hansang Bae2-30/+31
* Prepare a set of debug types in llvm::offload::debug to be used in plugin code * Update debug messages in the plugins
2025-12-18[OFFLOAD] Add plugin with support for Intel oneAPI Level Zero (#158900)Alex Duran2-0/+31
Add a new nextgen plugin that supports GPU devices through the Intel oneAPI Level Zero library. The plugin is not enabled by default and needs to be added to LIBOMPTARGET_PLUGINS_TO_BUILD explicitely. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-12-17[Offload] Debug message update part 3 (#171684)Hansang Bae2-27/+33
Update debug messages based on the new method from #170425. Updated the following files. - plugins-nextgen/common/include/MemoryManager.h - plugins-nextgen/common/include/PluginInterface.h - plugins-nextgen/common/src/GlobalHandler.cpp - plugins-nextgen/common/src/PluginInterface.cpp - plugins-nextgen/host/dynamic_ffi/ffi.cpp
2025-11-13[Offload] Add device info for shared memory (#167817)Kevin Sala Penades1-0/+7
2025-11-06[OpenMP] Fix tests relying on the heap size variableJoseph Huber1-0/+1
Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.
2025-11-06[Offload] Remove handling for device memory pool (#163629)Joseph Huber1-14/+9
Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.
2025-11-04[Offload] Add device UID (#164391)Robert Imschweiler1-1/+18
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-22[OpenMP] Adds omp_target_is_accessible routine (#138294)Nicole Aschenbrenner1-0/+11
Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-09[OFFLOAD] Remove unused init_device_info plugin interface (#162650)Alex Duran1-9/+1
This was used for the old interop code. It's dead code after #143491
2025-09-26[Offload] Use Error for allocating/deallocating in plugins (#160811)Kevin Sala Penades1-21/+45
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-23[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372)Alexey Sachkov1-3/+4
2025-09-20[Offload] Remove non-blocking allocation type (#159851)Joseph Huber1-1/+0
Summary: This was originally added in as a hack to work around CUDA's limitation on allocation. The `libc` implementation now isn't even used for CUDA so this code is never hit. Even if this case, this code never truly worked. A true solution would be to use CUDA's virtual memory API instead to allocate 2MiB slabs independenctly from the normal memory management done in the stream.
2025-09-19[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)Joseph Huber1-2/+2
Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary *can* be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.
2025-09-16[Offload] Copy loaded images into managed storage (#158748)Joseph Huber2-52/+15
Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.
2025-09-08[OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182)Joseph Huber1-11/+2
Summary: This operation is done every time we load a binary, this behavior should be moved into OpenMP since it concerns an OpenMP specific data struct. This is a little messy, because ideally we should only be using public APIs, but more can be extracted later.
2025-09-01[OpenMP][Offload] Mark `SPMD_NO_LOOP` as a valid exec mode (#155990)Ross Brunton1-0/+1
This was added in #154105 , but was not added to the plugin interface's list of valid modes.
2025-08-28[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime (#154105)Dominik Adamski1-1/+8
Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space. No-Loop mode is described in RFC: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/
2025-08-22[Offload] Implement olMemFill (#154102)Callum Fare1-0/+7
Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)Ross Brunton1-0/+5
A simple info query for events that returns whether the event is complete or not.
2025-08-19[Offload] Add olCalculateOptimalOccupancy (#142950)Ross Brunton1-0/+3
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-15[Offload] Introduce dataFence plugin interface. (#153793)Abhinav Gaba1-0/+8
The purpose of this fence is to ensure that any `dataSubmit`s inserted into a queue before a `dataFence` finish before finish before any `dataSubmit`s inserted after it begin. This is a no-op for most queues, since they are in-order, and by design any operations inserted into them occur in order. But the interface is supposed to be functional for out-of-order queues. The addition of the interface means that any operations that rely on such ordering (like ATTACH map-type support in #149036) can invoke it, without worrying about whether the underlying queue is in-order or out-of-order. Once a plugin supports out-of-order queues, the plugin can implement this function, without requiring any change at the libomptarget level. --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-15[Offload] `olLaunchHostFunction` (#152482)Ross Brunton1-0/+6
Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.
2025-08-08[Offload] Make olLaunchKernel test thread safe (#149497)Ross Brunton1-3/+22
This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).
2025-08-07[Offload] Don't create events for empty queues (#152304)Ross Brunton1-0/+8
Add a device function to check if a device queue is empty. If liboffload tries to create an event for an empty queue, we create an "empty" event that is already complete. This allows `olCreateEvent`, `olSyncEvent` and `olWaitEvent` to run quickly for empty queues.
2025-08-06[AMDGPU][Offload] Enable memory manager use for up to ~3GB allocation size ↵hidekisaito1-0/+3
in omp_target_alloc (#151882) Enables AMD data center class GPUs to use memory manager memory pooling up to 3GB allocation by default, up from the "1 << 13" threshold that all plugin-nextgen devices use.
2025-08-06[OFFLOAD][OPENMP] 6.0 compatible interop interface (#143491)Alex Duran1-33/+90
The following patch introduces a new interop interface implementation with the following characteristics: * It supports the new 6.0 prefer_type specification * It supports both explicit objects (from interop constructs) and implicit objects (from variant calls). * Implements a per-thread reuse mechanism for implicit objects to reduce overheads. * It provides a plugin interface that allows selecting the supported interop types, and managing all the backend related interop operations (init, sync, ...). * It enables cooperation with the OpenMP runtime to allow progress on OpenMP synchronizations. * It cleanups some vendor/fr_id mismatchs from the current query routines. * It supports extension to define interop callbacks for library cleanup.
2025-07-25[Offload] Erase entries from JIT cache when program is destroyed (#148847)Ross Brunton1-3/+9
When `unloadBinary` is called, any entries in the JITEngine's cache for that binary will be cleared. This fixes a nasty issue with liboffload program handles. If two handles happen to have had the same address (after one was free'd, for example), the cache would be hit and return the wrong program.
2025-07-18[Offload] Allow "tagging" device info entries with offload keys (#147317)Ross Brunton1-3/+25
When generating the device info tree, nodes can be marked with an offload Device Info value. The nodes can also look up children based on this value.
2025-07-10[Offload] Allow querying the size of globals (#147698)Ross Brunton1-6/+13
The `GlobalTy` helper has been extended to make both the Size and Ptr be optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the size of the global to the struct, instead of just verifying it.
2025-07-08[Offload] Provide proper memory management for Images on host device (#146066)Ross Brunton1-0/+2
The `unloadBinaryImpl` method on the host plugin is now implemented properly (rather than just being a stub). When an image is unloaded, it is deallocated and the library associated with it is closed.
2025-07-02[Offload] Store kernel name in GenericKernelTy (#142799)Ross Brunton1-2/+2
GenericKernelTy has a pointer to the name that was used to create it. However, the name passed in as an argument may not outlive the kernel. Instead, GenericKernelTy now contains a std::string, and copies the name into there.
2025-06-25[Offload] Add an `unloadBinary` interface to PluginInterface (#143873)Ross Brunton1-4/+8
This allows removal of a specific Image from a Device, rather than requiring all image data to outlive the device they were created for. This is required for `ol_program_handle_t`s, which now specify the lifetime of the buffer used to create the program.
2025-06-20[Offload] Add type information to device info nodes (#144535)Ross Brunton1-12/+29
Rather than being "stringly typed", store values as a std::variant that can hold various types. This means that liboffload doesn't have to do any string parsing for integer/bool device info keys.
2025-06-13[Offload] Replace device info queue with a tree (#144050)Ross Brunton1-58/+81
Previously, device info was returned as a queue with each element having a "Level" field indicating its nesting level. This replaces this queue with a more traditional tree-like structure. This should not result in a change to the output of `llvm-offload-device-info`.
2025-06-10[PGO][Offload] Fix offload coverage mapping (#143490)Ethan Luis McDonough1-3/+1
This pull request fixes coverage mapping on GPU targets. - It adds an address space cast to the coverage mapping generation pass. - It reads the profiled function names from the ELF directly. Reading it from public globals was causing issues in cases where multiple device-code object files are linked together.
2025-06-03[Offload] Don't check in generated files (#141982)Callum Fare2-1/+88
Previously we decided to check in files that we generate with tablegen. The justification at the time was that it helped reviewers unfamiliar with `offload-tblgen` see the actual changes to the headers in PRs. After trying it for a while, it's ended up causing some headaches and is also not how tablegen is used elsewhere in LLVM. This changes our use of tablegen to be more conventional. Where possible, files are still clang-formatted, but this is no longer a hard requirement. Because `OffloadErrcodes.inc` is shared with libomptarget it now gets generated in a more appropriate place.
2025-05-20[Offload] Use new error code handling mechanism and lower-case messages ↵Ross Brunton1-10/+4
(#139275) [Offload] Use new error code handling mechanism This removes the old ErrorCode-less error method and requires every user to provide a concrete error code. All calls have been updated. In addition, for consistency with error messages elsewhere in LLVM, all messages have been made to start lower case.
2025-05-19[Offload] Add Error Codes to PluginInterface (#138258)Ross Brunton1-2/+18
A new ErrorCode enumeration is present in PluginInterface which can be used when returning an llvm::Error from offload and PluginInterface functions. This enum must be kept up to sync with liboffload's ol_errc_t enum, so both are automatically generated from liboffload's enum definition. Some error codes have also been shuffled around to allow for future work. Note that this patch only adds the machinery; actual error codes will be added in a future patch. ~~Depends on #137339 , please ignore first commit of this MR.~~ This has been merged.
2025-05-13[Offload] Remove unused field IsBareKernel. (#139815)Dhruva Chakrabarti1-3/+0
2025-04-23[Offload] Fix handling of 'bare' mode when environment missing (#136794)Joseph Huber1-0/+6
Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.
2025-03-19[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268)Ethan Luis McDonough1-2/+3
This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden
2025-02-11 [PGO][Offload] Profile profraw generation for GPU instrumentation #76587 ↵Ethan Luis McDonough1-2/+10
(#93365) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set
2025-02-11[Offload] Properly guard modifications to the RPC device array (#126790)Joseph Huber1-3/+9
Summary: If the user deallocates an RPC device this can sometimes fail if the RPC server is still running. This will happen if the modification happens while the server is still checking it. This patch adds a mutex to guard modifications to it.
2025-02-06[Offload] Make only a single thread handle the RPC server thread (#126067)Joseph Huber1-1/+1
Summary: This patch just changes the interface to make starting the thread multiple times permissable since it will only be done the first time. Note that this does not refcount it or anything, so it's onto the user to make sure that they don't shut down the thread before everyone is done using it. That is the case today because the shutDown portion is run by a single thread in the destructor phase. Another question is if we should make this thread truly global state, because currently it will be private to each plugin instance, so if you have an AMD and NVIDIA image there will be two, similarly if you have those inside of a shared library.
2025-02-02[offload] `gnu::format` with variadic template functions is Clang-only (#124406)Michał Górny1-6/+12
Use `gnu::format` attribute only when compiling with Clang, as using it against variadic template functions is a Clang extension and is not supported by GCC. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77958 Fixes #119069