aboutsummaryrefslogtreecommitdiff
path: root/offload/plugins-nextgen
AgeCommit message (Collapse)AuthorFilesLines
2026-01-12[NFC][Offload] Rename a function (#175673)Hansang Bae1-6/+6
Renamed a function as suggested in #175664.
2026-01-12[Offload] Fix level_zero plugin build (#175664)Hansang Bae1-2/+0
Build has been broken when OMPTARGET_DEBUG is undefined.
2026-01-12[Offload] Update debug message printig in the plugins (#175205)Hansang Bae15-247/+270
* Prepare a set of debug types in llvm::offload::debug to be used in plugin code * Update debug messages in the plugins
2026-01-12[OFFLOAD] Add memory data locking API for libomptarget migration (#173138)fineg741-2/+3
Add liboffload memory data locking API for libomptarget migration This PR adds liboffload memory data locking API that needed to make libomptarget to use liboffload
2026-01-12[OFFLOAD][OpenMP] Remove old style REPORT support (#175607)Alex Duran2-2/+2
Fix the few remaining usages and remove the support for the old REPORT macro.
2026-01-09[OpenMP] Remove testing LTO variant on CPU targets (#175187)Joseph Huber1-7/+7
Summary: This is only really meaningful for the NVPTX target. Not all build environments support host LTO and these are redundant tests, just clean this up and make it run faster.
2026-01-08[OFFLOAD] Make L0 provide more information about device to be consistent ↵fineg742-1/+26
with other plugins (#172946) Update information about devices provided by level zero plugin in order to be more consistent with other plugins.
2025-12-18[OFFLOAD][L0] Expose native ELF to upper layers (#172819)Alex Duran5-94/+91
This PR refactors how the device image is built so we can expose the native ELF of the device to DeviceImageTy which solves several issues regarding symbol look up (as DeviceImageTy expects an ELF). It also simplifies the module linking code taking into account the latest changes in the driver (which adds "-library-compilation when necessary). --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-12-18[OFFLOAD][L0] Improve symbol device lookup (#172820)Alex Duran3-12/+11
When looking for the device address of a symbol, we need to also look if it's a function symbol if not found as global symbol in the device. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-18[OFFLOAD][L0] Fix usages of getDebugLevel in L0 plugin (#172815)Alex Duran2-50/+60
Support for getDebugLevel was removed as part of the new debug macros (#165416). This PR updates such usages to use the new ODBG_* macros. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-12-18[OFFLOAD] Add plugin with support for Intel oneAPI Level Zero (#158900)Alex Duran23-0/+5840
Add a new nextgen plugin that supports GPU devices through the Intel oneAPI Level Zero library. The plugin is not enabled by default and needs to be added to LIBOMPTARGET_PLUGINS_TO_BUILD explicitely. --------- Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com> Co-authored-by: Nick Sarnie <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-12-17[Offload] Debug message update part 3 (#171684)Hansang Bae5-127/+140
Update debug messages based on the new method from #170425. Updated the following files. - plugins-nextgen/common/include/MemoryManager.h - plugins-nextgen/common/include/PluginInterface.h - plugins-nextgen/common/src/GlobalHandler.cpp - plugins-nextgen/common/src/PluginInterface.cpp - plugins-nextgen/host/dynamic_ffi/ffi.cpp
2025-12-14[offload] Fix CUDA args size by subtracting tail padding (#172249)Kevin Sala Penades3-2/+33
This commit makes the cuLaunchKernel call to pass the total arguments size without tail padding.
2025-11-20[OFFLOAD] Add support for more fine grained debug messages control (#165416)Alex Duran1-0/+2
This PR introduces new debug macros that allow a more fined control of which debug message to output and introduce C++ stream style for debug messages. Changing existing messages (except a few that I changed for testing) will come in subsequent PRs. I also think that we should make debug enabling OpenMP agnostic but, for now, I prioritized maintaing the current libomptarget behavior for now, and we might need more changes further down the line as we we decouple libomptarget.
2025-11-19[Offload] Make the RPC thread sleep briefly when idle (#168596)Joseph Huber1-3/+18
Summary: We start this thread if the RPC client symbol is detected in the loaded binary. We should make this sleep if there's no work to avoid the thread running at high priority when the (scarecely used) RPC call is actually required. So, right now after 25 microseconds we will assume the server is inactive and begin sleeping. This resets once we do find work. AMD supports a more intelligent way to do this. HSA signals can wake a sleeping thread from the kernel, and signals can be sent from the GPU side. This would be nice to have and I'm planning on working with it in the future to make this infrastructure more usable with existing AMD workloads.
2025-11-13[Offload] Add device info for shared memory (#167817)Kevin Sala Penades3-4/+28
2025-11-06[OpenMP] Fix tests relying on the heap size variableJoseph Huber3-7/+11
Summary: I made that an unimplemented error, but forgot that it was used for this environment variable.
2025-11-06[Offload] Remove handling for device memory pool (#163629)Joseph Huber5-127/+9
Summary: This was a lot of code that was only used for upstream LLVM builds of AMDGPU offloading. We have a generic and fast `malloc` in `libc` now so just use that. Simplifies code, can be added back if we start providing alternate forms but I don't think there's a single use-case that would justify it yet.
2025-11-04[Offload] Add device UID (#164391)Robert Imschweiler7-4/+63
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-22[OpenMP] Adds omp_target_is_accessible routine (#138294)Nicole Aschenbrenner3-0/+55
Adds omp_target_is_accessible routine. Refactors common code from omp_target_is_present to work for both routines. --------- Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-10-21[Offload] Use `amd_signal_async_handler` for host function calls (#154131)Ross Brunton1-18/+32
2025-10-09[OFFLOAD] Remove unused init_device_info plugin interface (#162650)Alex Duran5-64/+2
This was used for the old interop code. It's dead code after #143491
2025-10-06[Offload] Remove check on kernel argument sizes (#162121)Joseph Huber1-5/+0
Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.
2025-10-02[OFFLOAD] Restore interop functionality (#161429)Alex Duran2-0/+75
This implements two pieces to restore the interop functionality (that I broke) when the 6.0 interfaces were added: * A set of wrappers that support the old interfaces on top of the new ones * The same level of interop support for the CUDA amd AMD plugins
2025-09-26[Offload] Use Error for allocating/deallocating in plugins (#160811)Kevin Sala Penades6-126/+158
Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-24[Offload] Print Image location rather than casting it (#160309)Ross Brunton1-2/+4
This squishes a warning where the runtime tries to bind a StringRef to a `%p`.
2025-09-23[Offload][NFC] Avoid temporary string copies in InfoTreeNode (#159372)Alexey Sachkov1-3/+4
2025-09-22[Remarks] Restructure bitstream remarks to be fully standalone (#156715)Tobias Stadler1-3/+4
Currently there are two serialization modes for bitstream Remarks: standalone and separate. The separate mode splits remark metadata (e.g. the string table) from actual remark data. The metadata is written into the object file by the AsmPrinter, while the remark data is stored in a separate remarks file. This means we can't use bitstream remarks with tools like opt that don't generate an object file. Also, it is confusing to post-process bitstream remarks files, because only the standalone files can be read by llvm-remarkutil. We always need to use dsymutil to convert the separate files to standalone files, which only works for MachO. It is not possible for clang/opt to directly emit bitstream remark files in standalone mode, because the string table can only be serialized after all remarks were emitted. Therefore, this change completely removes the separate serialization mode. Instead, the remark string table is now always written to the end of the remarks file. This requires us to tell the serializer when to finalize remark serialization. This automatically happens when the serializer goes out of scope. However, often the remark file goes out of scope before the serializer is destroyed. To diagnose this, I have added an assert to alert users that they need to explicitly call finalizeLLVMOptimizationRemarks. This change paves the way for further improvements to the remark infrastructure, including more tooling (e.g. #159784), size optimizations for bitstream remarks, and more. Pull Request: https://github.com/llvm/llvm-project/pull/156715
2025-09-20[Offload] Remove non-blocking allocation type (#159851)Joseph Huber6-28/+3
Summary: This was originally added in as a hack to work around CUDA's limitation on allocation. The `libc` implementation now isn't even used for CUDA so this code is never hit. Even if this case, this code never truly worked. A true solution would be to use CUDA's virtual memory API instead to allocate 2MiB slabs independenctly from the normal memory management done in the stream.
2025-09-19[OpenMP][NFC] Clean up a bunch of warnings and clang-tidy messages (#159831)Joseph Huber2-26/+29
Summary: I made the GPU flags accept more of the default LLVM warnings, which triggered some new cases. Clean those up and fix some other ones while I'm at it.
2025-09-19[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)Joseph Huber2-18/+11
Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary *can* be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.
2025-09-17[LLVM] Fix offload and update CUDA ABI for all SM values (#159354)Joseph Huber1-1/+1
Summary: Turns out the new CUDA ABI now applies retroactively to all the other SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping all the SM flags consistent but using an offset. Fixes: https://github.com/llvm/llvm-project/issues/159088
2025-09-16[offload] Fix build with debug libomptarget (#159144)Nick Sarnie1-1/+1
Currently get this error ``` offload/plugins-nextgen/common/src/PluginInterface.cpp:859:63: error: member reference type 'StringRef' is not a pointer; did you mean to use '.'? ``` We pass the full image binary now so we can't really print anything useful here. Seems introduced in https://github.com/llvm/llvm-project/pull/158748. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com> Co-authored-by: Joseph Huber <huberjn@outlook.com>
2025-09-16[Offload] Copy loaded images into managed storage (#158748)Joseph Huber7-172/+80
Summary: Currently we have this `__tgt_device_image` indirection which just takes a reference to some pointers. This was all find and good when the only usage of this was from a section of GPU code that came from an ELF constant section. However, we have expanded beyond that and now need to worry about managing lifetimes. We have code that references the image even after it was loaded internally. This patch changes the implementation to instaed copy the memory buffer and manage it locally. This PR reworks the JIT and other image handling to directly manage its own memory. We now don't need to duplicate this behavior externally at the Offload API level. Also we actually free these if the user unloads them. Upside, less likely to crash and burn. Downside, more latency when loading an image.
2025-09-08[OpenMP] Move `__omp_rtl_data_environment' handling to OpenMP (#157182)Joseph Huber3-103/+3
Summary: This operation is done every time we load a binary, this behavior should be moved into OpenMP since it concerns an OpenMP specific data struct. This is a little messy, because ideally we should only be using public APIs, but more can be extracted later.
2025-09-01[OpenMP][Offload] Mark `SPMD_NO_LOOP` as a valid exec mode (#155990)Ross Brunton1-0/+1
This was added in #154105 , but was not added to the plugin interface's list of valid modes.
2025-08-29[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823)Ross Brunton2-3/+10
This is the total number of work items that the device supports (the equivalent work group properties are for only a single work group).
2025-08-29[Offload] Improve `olDestroyQueue` logic (#153041)Ross Brunton1-6/+9
Previously, `olDestroyQueue` would not actually destroy the queue, instead leaving it for the device to clean up when it was destroyed. Now, the queue is either released immediately if it is complete or put into a list of "pending" queues if it is not. Whenever we create a new queue, we check this list to see if any are now completed. If there are any we release their resources and use them instead of pulling from the pool. This prevents long running programs that create and drop many queues without syncing them from leaking memory all over the place.
2025-08-28[Offload] Add PRODUCT_NAME device info (#155632)Ross Brunton2-2/+4
On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-28[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime (#154105)Dominik Adamski2-1/+12
Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space. No-Loop mode is described in RFC: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/
2025-08-27[NFC][offload] Fix error message for cuFuncSetAttribute (#155655)Kevin Sala Penades1-1/+1
2025-08-26[Offload] Full AMD support for olMemFill (#154958)Ross Brunton1-25/+88
2025-08-22[Offload] Implement olMemFill (#154102)Callum Fare7-0/+132
Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)Ross Brunton7-0/+63
A simple info query for events that returns whether the event is complete or not.
2025-08-19[Offload] Add olCalculateOptimalOccupancy (#142950)Ross Brunton6-0/+39
This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19[Offload] Define additional device info properties (#152533)Rafal Bielski4-7/+42
Add the following properties in Offload device info: * VENDOR_ID * NUM_COMPUTE_UNITS * [SINGLE|DOUBLE|HALF]_FP_CONFIG * NATIVE_VECTOR_WIDTH_[CHAR|SHORT|INT|LONG|FLOAT|DOUBLE|HALF] * MAX_CLOCK_FREQUENCY * MEMORY_CLOCK_RATE * ADDRESS_BITS * MAX_MEM_ALLOC_SIZE * GLOBAL_MEM_SIZE Add a bitfield option to enumerators, allowing the values to be bit-shifted instead of incremented. Generate the per-type enums using `foreach` to reduce code duplication. Use macros in unit test definitions to reduce code duplication.
2025-08-15[Offload] Introduce dataFence plugin interface. (#153793)Abhinav Gaba5-0/+41
The purpose of this fence is to ensure that any `dataSubmit`s inserted into a queue before a `dataFence` finish before finish before any `dataSubmit`s inserted after it begin. This is a no-op for most queues, since they are in-order, and by design any operations inserted into them occur in order. But the interface is supposed to be functional for out-of-order queues. The addition of the interface means that any operations that rely on such ordering (like ATTACH map-type support in #149036) can invoke it, without worrying about whether the underlying queue is in-order or out-of-order. Once a plugin supports out-of-order queues, the plugin can implement this function, without requiring any change at the libomptarget level. --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>
2025-08-15[Offload] `olLaunchHostFunction` (#152482)Ross Brunton5-0/+82
Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.
2025-08-13[Offload] Implement hasPendingWork on CUDA (#152728)Callum Fare1-2/+12
Following on from #152304, implement the new query in the CUDA plugin
2025-08-10[Offload] Fix return error with a condition (#152876)Kevin Sala Penades1-3/+4
Adds a conditional to the error return so that it only returns if there was an error.