riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
9 days	[Offload] Add a function to register an RPC Server callback (#178774)	Joseph Huber	1	-0/+30
	Summary: We provide an RPC server to manage calls initiated by the device to run on the host. This is very useful for the built-in handling we have, however there are cases where we would want to extend this functionality. Cases like Fortran or MPI would be useful, but we cannot put references to these in the core offloading runtime. This way, we can provide this as a library interface that registers custom handlers for whatever code people want.
2026-01-20	[OFFLOAD] Add asynchronous queue query API for libomptarget migration (#172231)	fineg74	2	-1/+26
	Add liboffload asynchronous queue query API for libomptarget migration This PR adds liboffload asynchronous queue query API that needed to make libomptarget to use liboffload
2026-01-12	[OFFLOAD] Add memory data locking API for libomptarget migration (#173138)	fineg74	2	-1/+86
	Add liboffload memory data locking API for libomptarget migration This PR adds liboffload memory data locking API that needed to make libomptarget to use liboffload
2025-12-21	[offload] Fix unittests when multiple devices are available (#173209)	Kevin Sala Penades	1	-6/+6
	This commit appends a device number after the device name (used as unittest param name). The number is between 0 and the number of available non-host devices. In this way, it allows multiple devices of the same vendor to be tested.
2025-12-21	[offload] Fix kernel launch unittest (#173203)	Kevin Sala Penades	1	-2/+2
	This commit fixes the error introduced in #172249.
2025-12-14	[offload] Fix CUDA args size by subtracting tail padding (#172249)	Kevin Sala Penades	3	-0/+19
	This commit makes the cuLaunchKernel call to pass the total arguments size without tail padding.
2025-11-13	[Offload] Add device info for shared memory (#167817)	Kevin Sala Penades	2	-0/+7

2025-11-04	[Offload] Add device UID (#164391)	Robert Imschweiler	2	-0/+21
	Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
2025-10-06	[Offload] Remove check on kernel argument sizes (#162121)	Joseph Huber	3	-0/+6
	Summary: This check is unnecessarily restrictive and currently incorrectly fires for any size less than eight bytes. Just remove it, we do sanity checks elsewhere and at some point need to trust the ABI.
2025-09-24	[Offload] Add olGetMemInfo with platform-less API (#159581)	Ross Brunton	3	-1/+196

2025-09-23	[Offload] Re-allocate overlapping memory (#159567)	Ross Brunton	1	-0/+20
	If olMemAlloc happens to allocate memory that was already allocated elsewhere (possibly by another device on another platform), it is now thrown away and a new allocation generated. A new `AllocBases` vector is now available, which is an ordered list of allocation start addresses.
2025-09-19	[Offload] Implement 'olIsValidBinary' in offload and clean up (#159658)	Joseph Huber	2	-0/+50
	Summary: This exposes the 'isDeviceCompatible' routine for checking if a binary can be loaded. This is useful if people don't want to consume errors everywhere when figuring out which image to put to what device. I don't know if this is a good name, I was thining like `olIsCompatible` or whatever. Let me know what you think. Long term I'd like to be able to do something similar to what OpenMP does where we can conditionally only initialize devices if we need them. That's going to be support needed if we want this to be more generic.
2025-09-16	[Offload] Make `ASSERT_ERROR` output more readable (#157653)	Ross Brunton	1	-2/+6

2025-09-09	[Offload] Skip most liboffload tests if no devices (#157417)	Ross Brunton	2	-1/+8
	If there are no devices available for testing on liboffload, the test will no longer throw an error when it fails to instantiate. The tests will be silently skipped, but with a warning printed to stderr.
2025-08-29	[Offload] Add `OL_DEVICE_INFO_MAX_WORK_SIZE[_PER_DIMENSION]` (#155823)	Ross Brunton	2	-0/+23
	This is the total number of work items that the device supports (the equivalent work group properties are for only a single work group).
2025-08-29	[Offload] Improve `olDestroyQueue` logic (#153041)	Ross Brunton	1	-0/+9
	Previously, `olDestroyQueue` would not actually destroy the queue, instead leaving it for the device to clean up when it was destroyed. Now, the queue is either released immediately if it is complete or put into a list of "pending" queues if it is not. Whenever we create a new queue, we check this list to see if any are now completed. If there are any we release their resources and use them instead of pulling from the pool. This prevents long running programs that create and drop many queues without syncing them from leaking memory all over the place.
2025-08-28	[Offload] Add PRODUCT_NAME device info (#155632)	Ross Brunton	3	-2/+26
	On my system, this will be "Radeon RX 7900 GRE" rather than "gfx1100". For Nvidia, the product name and device name are identical.
2025-08-26	[Offload] Full AMD support for olMemFill (#154958)	Ross Brunton	2	-29/+122

2025-08-22	[Offload] Implement olMemFill (#154102)	Callum Fare	2	-0/+135
	Implement olMemFill to support filling device memory with arbitrary length patterns. AMDGPU support will be added in a follow-up PR.
2025-08-22	[Offload] `OL_EVENT_INFO_IS_COMPLETE` (#153194)	Ross Brunton	2	-1/+16
	A simple info query for events that returns whether the event is complete or not.
2025-08-19	[Offload] Add olCalculateOptimalOccupancy (#142950)	Ross Brunton	3	-0/+60
	This is equivalent to `cuOccupancyMaxPotentialBlockSize`. It is currently only implemented on Cuda; AMDGPU and Host return unsupported. --------- Co-authored-by: Callum Fare <callum@codeplay.com>
2025-08-19	[Offload] Define additional device info properties (#152533)	Rafal Bielski	2	-40/+125
	Add the following properties in Offload device info: * VENDOR_ID * NUM_COMPUTE_UNITS * [SINGLE\|DOUBLE\|HALF]_FP_CONFIG * NATIVE_VECTOR_WIDTH_[CHAR\|SHORT\|INT\|LONG\|FLOAT\|DOUBLE\|HALF] * MAX_CLOCK_FREQUENCY * MEMORY_CLOCK_RATE * ADDRESS_BITS * MAX_MEM_ALLOC_SIZE * GLOBAL_MEM_SIZE Add a bitfield option to enumerators, allowing the values to be bit-shifted instead of incremented. Generate the per-type enums using `foreach` to reduce code duplication. Use macros in unit test definitions to reduce code duplication.
2025-08-15	[Offload] `olLaunchHostFunction` (#152482)	Ross Brunton	2	-1/+109
	Add an `olLaunchHostFunction` method that allows enqueueing host work to the stream.
2025-08-08	[Offload] Make olLaunchKernel test thread safe (#149497)	Ross Brunton	2	-0/+41
	This sprinkles a few mutexes around the plugin interface so that the olLaunchKernel CTS test now passes when ran on multiple threads. Part of this also involved changing the interface for device synchronise so that it can optionally not free the underlying queue (which introduced a race condition in liboffload).
2025-08-08	[Offload] OL_QUEUE_INFO_EMPTY (#152473)	Ross Brunton	2	-0/+12
	Add a queue query that (if possible) reports whether the queue is empty
2025-08-04	[Offload] Rework `MAX_WORK_GROUP_SIZE` (#151926)	Ross Brunton	2	-1/+16
	`MAX_WORK_GROUP_SIZE` now represents the maximum total number of work groups the device can allocate, rather than the maximum per dimension. `MAX_WORK_GROUP_SIZE_PER_DIMENSION` has been added, which has the old behaviour.
2025-08-04	[Offload][UnitTests] Build device code as C++ (#151714)	Leandro Lacerda	12	-19/+28
	This commit refactors the `add_offload_test_device_code` CMake function to compile device code using the C++ compiler (`CMAKE_CXX_COMPILER`) instead of the C compiler. This change enables the use of C++ features, such as templates, within device-side test kernels. This will allow for more advanced and reusable kernel wrappers, reducing boilerplate code in the conformance test suite. As part of this change: - All `.c` files for device code in `unittests/` have been renamed to `.cpp`. - Kernel definitions are now wrapped in `extern "C"` to ensure C linkage and prevent name mangling. This change affects the `OffloadAPI` and `Conformance` test suites. cc @callumfare @RossBrunton @jhuber6
2025-07-24	[Offload] Fix olWaitEvents tests after change to events API (#150465)	Callum Fare	1	-3/+6
	Fix the olWaitEvents tests after #150217 broke them
2025-07-24	[Offload] Replace "EventOut" parameters with `olCreateEvent` (#150217)	Ross Brunton	7	-82/+76
	Rather than having every "enqueue"-type function have an output pointer specifically for an output event, just provide an `olCreateEvent` entrypoint which pushes an event to the queue. For example, replace: ```cpp olMemcpy(Queue, ..., EventOut); ``` with ```cpp olMemcpy(Queue, ...); olCreateEvent(Queue, EventOut); ```
2025-07-23	[Offload] Add olWaitEvents (#150036)	Ross Brunton	4	-1/+163
	This function causes a queue to wait until all the provided events have completed before running any future scheduled work.
2025-07-23	[Offload] Rename olWaitEvent/Queue to olSyncEvent/Queue (#150023)	Ross Brunton	7	-36/+36
	This more closely matches the nomenclature used by CUDA, AMDGPU and the plugin interface.
2025-07-21	[Offload] Verify SyncCycle for events in AMDGPU (#149524)	Ross Brunton	1	-0/+17
	This check ensures that events after a synchronise (and thus after the queue is reset) are always considered complete. A test has been added as well.
2025-07-18	[Offload] Implement event sync in amdgpu (#149300)	Ross Brunton	2	-6/+0

2025-07-16	[Offload] Cache symbols in program (#148209)	Ross Brunton	1	-0/+18
	When creating a new symbol, check that it already exists. If it does, return that pointer rather than building a new symbol structure.
2025-07-14	[Offload] Skip event tests on AMDGPU (#148632)	Kenneth Benzie (Benie)	2	-0/+17
	Add `OffloadDeviceTest::getPlatformBackend()` and use it to skip event tests which currently fail on AMDGPU due to: ``` OL_ERRC_UNIMPLEMENTED: synchronize event not implemented ```
2025-07-11	[Offload] Add global variable address/size queries (#147972)	Ross Brunton	3	-0/+147
	Add two new symbol info types for getting the bounds of a global variable. As well as a number of tests for reading/writing to it.
2025-07-11	[Offload] Add `olGetSymbolInfo[Size]` (#147962)	Ross Brunton	4	-3/+133
	This mirrors the similar functions for other handles. The only implemented info at the moment is the symbol's kind.
2025-07-11	[Offload] Replace `GetKernel` with `GetSymbol` with global support (#148221)	Ross Brunton	6	-42/+111
	`olGetKernel` has been replaced by `olGetSymbol` which accepts a `Kind` parameter. As well as loading information about kernels, it can now also load information about global variables.
2025-07-10	[Offload] Change `ol_kernel_handle_t` -> `ol_symbol_handle_t` (#147943)	Ross Brunton	3	-6/+6
	In the future, we want `ol_symbol_handle_t` to represent both kernels and global variables The first step in this process is a rename and promotion to a "typed handle".
2025-07-09	[Offload] Implement olGetQueueInfo, olGetEventInfo (#142947)	Callum Fare	6	-2/+215
	Add info queries for queues and events. `olGetQueueInfo` only supports getting the associated device. We were already tracking this so we can implement this for free. We will likely add other queries to it in the future (whether the queue is empty, what flags it was created with, etc) `olGetEventInfo` only supports getting the associated queue. This is another thing we were already storing in the handle. We'll be able to add other queries in future (the event type, status, etc)
2025-07-09	[Offload] Tests for global memory and constructors (#147537)	Ross Brunton	5	-6/+147
	Adds two "launch kernel" tests for lib offload, one testing that global memory works and persists between different kernels, and one verifying that `[[gnu::constructor]]` works correctly. Since we now have tests that contain multiple kernels in the same binary, the test framework has been updated a bit.
2025-07-07	[Offload] Add liboffload unit tests for shared/local memory (#147040)	Ross Brunton	5	-17/+140

2025-07-02	[Offload] Add `MAX_WORK_GROUP_SIZE` device info query (#143718)	Ross Brunton	2	-0/+17
	This adds a new device info query for the maximum workgroup/block size for each dimension.
2025-06-30	[Offload] Implement `olShutDown` (#144055)	Ross Brunton	1	-0/+12
	`olShutDown` was not properly calling deinit on the platforms, resulting in random segfaults on AMD devices. As part of this, `olInit` and `olShutDown` now alloc and free the offload context rather than it being static. This allows `olShutDown` to be called within a destructor of a static object (like the tests do) without having to worry about destructor ordering.
2025-06-23	[Offload] Fix type mismatch warning in test (#143700)	Ross Brunton	1	-2/+2

2025-06-20	[Offload] Rework compiling device code for unit test suites (#144776)	Joseph Huber	1	-67/+2
	Summary: I'll probably want to use this as a more generic utility in the future. This patch reworks it to make it a top level function. I also tried to decouple this from the OpenMP utilities to make that easier in the future. Instead, I just use `-march=native` functionality which is the same thing. Needed a small hack to skip the linker stage for checking if that works. This should still create the same output as far as I'm aware.
2025-06-20	[Offload] Check for initialization (#144370)	Ross Brunton	3	-0/+28
	All entry points (except olInit) now check that offload has been initialized. If not, a new `OL_ERRC_UNINITIALIZED` error is returned.
2025-06-12	[Offload] Add `ol_dimensions_t` and convert ranges from size_t -> uint32_t ↵	Ross Brunton	1	-9/+4
	(#143901) This is a three element x, y, z size_t vector that can be used any place where a 3D vector is required. This ensures that all vectors across liboffload are the same and don't require any resizing/reordering dances.
2025-06-06	[Offload] Allow setting null arguments in olLaunchKernel (#141958)	Ross Brunton	3	-4/+31

2025-06-02	[Offload] Split offload unittests into multiple files (#142418)	Ross Brunton	1	-24/+30
	Rather than a single `offload.unittests` file, this will produce `device.unittests`, `event.unittests`, etc.. This should reduce time spent building tests, and make it easier to manually run a subset of the tests. Note that `check-offload-unit` will still run all the tests.