riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-06-30	[libomptarget] Add a test for OMP_TARGET_OFFLOAD=disabled (#146385)	Ye Luo	1	-0/+22
	closes https://github.com/llvm/llvm-project/issues/144786
2025-06-12	[Offload][PGO] Fix new GPU PGO tests (#143645)	Ethan Luis McDonough	2	-6/+6
	`pgo_atomic_teams.c` and `pgo_atomic_threads.c` currently are set to run on NVPTX despite the changes for that target not being upstreamed yet. This patch also replaces instances of `llvm-profdata` with `%profdata` in those tests.
2025-05-20	[OpenMP][GPU][FIX] Enable generic barriers in single threaded contexts (#140786)	Johannes Doerfert	2	-0/+44
	The generic GPU barrier implementation checked if it was the main thread in generic mode to identify single threaded regions. This doesn't work since inside of a non-active (=sequential) parallel, that thread becomes the main thread of a team, and is not the main thread in generic mode. At least that is the implementation of the APIs today. To identify single threaded regions we now check the team size explicitly. This exposed three other issues; one is, for now, expected and not a bug, the second one is a bug and has a FIXME in the single_threaded_for_barrier_hang_1.c file, and the final one is also benign as described in the end. The non-bug issue comes up if we ever initialize a thread state. Afterwards we will never run any region in parallel. This is a little conservative, but I guess thread states are really bad for performance anyway. The bug comes up if we optimize single_threaded_for_barrier_hang_1 and execute it in Generic-SPMD mode. For some reason we loose all the updates to b. This looks very much like a compiler bug, but could also be another logic issue in the runtime. Needs to be investigated. Issue number 3 comes up if we have nested parallels inside of a target region. The clang SPMD-check logic gets confused, determines SPMD (which is fine) but picks an unreasonable thread count. This is all benign, I think, just weird: ``` #pragma omp target teams #pragma omp parallel num_threads(64) #pragma omp parallel num_threads(10) {} ``` Was launched with 10 threads, not 64.
2025-05-14	[PGO][Offload] Update PGO GPU tests (#132262)	Ethan Luis McDonough	4	-17/+200

2025-05-12	[Flang][OpenMP] Initial defaultmap implementation (#135226)	agozillon	2	-0/+200
	This aims to implement most of the initial arguments for defaultmap aside from firstprivate and none, and some of the more recent OpenMP 6 additions which will come in subsequent updates (with the OpenMP 6 variants needing parsing/semantic support first).
2025-05-09	[Flang][OpenMP] Generate correct present checks for implicit maps of ↵	agozillon	1	-0/+57
	optional allocatables (#138210) Currently, we do not generate the appropriate checks to check if an optional allocatable argument is present before accessing relevant components of it, in particular when creating bounds, we must generate a presence check and we must make sure we do not generate/keep an load external to the presence check by utilising the raw address rather than the regular address of the info data structure. Similarly in cases for optional allocatables we must treat them like non-allocatable arguments and generate an intermediate allocation that we can have as a location in memory that we can access later in the lowering without causing segfaults when we perform "mapping" on it, even if the end result is an empty allocatable (basically, we shouldn't explode if someone tries to map a non-present optional, similar to C++ when mapping null data).
2025-04-23	[Offload] Fix handling of 'bare' mode when environment missing (#136794)	Joseph Huber	2	-2/+2
	Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.
2025-04-14	[Flang][OpenMP][MLIR] Check for presence of Box type before emitting store ↵	agozillon	1	-0/+36
	in MapInfoFinalization pass (#135477) Currently we don't check for the presence of descriptor/BoxTypes before emitting stores which lower to memcpys, the issue with this is that users can have optional arguments, where they don't provide an input, making the argument effectively null. This can still be mapped and this causes issues at the moment as we'll emit a memcpy for function arguments to store to a local variable for certain edge cases, when we perform this memcpy on a null input, we cause a segfault at runtime. The fix to this is to simply create a branch around the store that checks if the data we're copying from is actually present. If it is, we proceed with the store, if it isn't we skip it.
2025-04-09	[PGO][Offload] Use %profdata in PGO tests (#135015)	Joel E. Denny	2	-11/+11
	So that the wrong llvm-profdata is not picked up from PATH.
2025-04-07	[MLIR][OpenMP] Add codegen for teams reductions (#133310)	Jan Leyonberg	2	-0/+54
	This patch adds the lowering of teams reductions from the omp dialect to LLVM-IR. Some minor cleanup was done in clang to remove an unused parameter.
2025-03-28	[PGO][Offload] Disable PGO on NVPTX (#133522)	Ethan Luis McDonough	2	-2/+2

2025-03-28	[offload] Remove bad assert in StaticLoopChunker::Distribute (#132705)	macurtis-amd	1	-0/+26
	When building with asserts enabled, this can actually cause strange miscompilations because an incorrect llvm.assume is generated at the point of the assertion.
2025-03-19	[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268)	Ethan Luis McDonough	3	-66/+186
	This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden
2025-03-12	[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832)	Krzysztof Parzyszek	1	-0/+20
	Improve the check for whether a type can be passed by copy. Currently, passing by copy is done via the OMP_MAP_LITERAL mapping, which can only transfer as much data as can be contained in a pointer representation.
2025-03-10	[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568)	Krzysztof Parzyszek	3	-0/+132
	The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged). --------- Co-authored-by: Sergio Afonso <safonsof@amd.com>
2025-03-07	[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for ↵	agozillon	5	-1/+192
	Flang maps (#129586) This PR adds an initial implementation for the map modifiers close, present and ompx_hold, primarily just required adding the appropriate map type flags to the map type bits. In the case of ompx_hold it required adding the map type to the OpenMP dialect. Close has a bit of a problem when utilised with the ALWAYS map type on descriptors, so it is likely we'll have to make sure close and always are not applied to the descriptor simultaneously in the future when we apply always to the descriptors to facilitate movement of descriptor information to device for consistency, however, we may find an alternative to this with further investigation. For the moment, it is a TODO/Note to keep track of it.
2025-02-18	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers ↵	Akash Banerjee	1	-0/+53
	(#124746) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005
2025-02-14	Reapply "[LinkerWrapper] Clean up options after proper forwarding" (#126495)	Joseph Huber	1	-3/+3
	Summary: The test failed because it no longer passed Rpass by default without LTO. I think that's desirable as it matches the standard behavior. This reverts commit 6fd99de31864a5ef84ae8613b3a9034e05293461.
2025-02-12	[PGO][Offload] Fix pgo1.c (#126864)	Ethan Luis McDonough	1	-43/+30
	pgo1.c had outdated test checks
2025-02-11	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 ↵	Ethan Luis McDonough	1	-5/+10
	(#93365) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set
2025-01-31	[Offload][NFC] Fix typos discovered by codespell (#125119)	Christian Clauss	9	-10/+10
	https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`
2025-01-30	[Flang][MLIR][OpenMP] Fix Target Data if (present(...)) causing LLVM-IR ↵	agozillon	1	-0/+29
	branching error (#123771) Currently if we generate code for the below target data map that uses an optional mapping: !$omp target data if(present(a)) map(alloc:a) do i = 1, 10 a(i) = i end do !$omp end target data We yield an LLVM-IR error as the branch for the else path is not generated. This occurs because we enter the NoDupPriv path of the call back function when generating the else branch, however, the emitBranch function needs to be set to a block for it to functionally generate and link in a follow up branch. The NoDupPriv path currently doesn't do this, while it's not supposed to generate anything (as far as I am aware) we still need to at least set the builders placement back so that it emits the appropriate follow up branch. This avoids the missing terminator LLVM-IR verification error by correctly generating the follow up branch.
2025-01-29	[MLIR][OpenMP] Emit nullary check for mapped pointer members and appropriate ↵	agozillon	1	-0/+24
	size select based on results (#124604) This PR aims to fix a mapping error when trying to map nullary elements of a record type (primary example is allocatables/pointer types in Fortran at the moment). This should be legal to map, just not write to without pointing to anything within the target region. A common Fortran OpenMP idiom/example where this is useful can be found in the added Fortran offload example. The runtime error arises when we try to map the pointer member utilising a prescribed constant size that we receive from the lowered type, resulting in mapping of data that will be non-existent when there is no allocated data. The fix in this case is to emit a runtime check to see if the data has been allocated, if it hasn't been we select a size of 0, if it has we emit the usual type size.
2025-01-28	[Offload] Rework offloading entry type to be more generic (#124018)	Joseph Huber	1	-8/+14
	Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.
2025-01-14	[Offload][PGO] Fix dump of array in ProfData (#122039)	Jinsong Ji	1	-6/+6
	Exposed by -Warray-bounds: In file included from ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252: ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: error: array index 4 is past the end of the array (that has type 'const std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned short[4]')) [-Werror,-Warray-bounds] 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ~~~~~~~~~~~ ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15: note: expanded from macro 'INSTR_PROF_DATA' 250 \| outs() << ProfData.Name << " "; \ \| ^ ~~~~ ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: note: array 'NumValueSites' declared here 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3: note: expanded from macro 'INSTR_PROF_DATA' 62 \| std::remove_const<Type>::type Name; Avoid accessing out-of-bound data, but skip printing array data for now. As there is no simple way to do this without hardcoding the NumValueSites field. --------- Co-authored-by: Ethan Luis McDonough <ethanluismcdonough@gmail.com>
2025-01-03	[OpenMP][MLIR] Fix threadprivate lowering when compiling for target when ↵	agozillon	1	-0/+37
	target operations are in use (#119310) Currently the compiler will ICE in programs like the following on the device lowering pass: ``` program main implicit none type i1_t integer :: val(1000) end type i1_t integer :: i type(i1_t), pointer :: newi1 type(i1_t), pointer :: tab=>null() integer, dimension(:), pointer :: tabval !$omp THREADPRIVATE(tab) allocate(newi1) tab=>newi1 tab%val(:)=1 tabval=>tab%val !$omp target teams distribute parallel do do i = 1, 1000 tabval(i) = i end do !$omp end target teams distribute parallel do end program main ``` This is due to the fact that THREADPRIVATE returns a result operation, and this operation can actually be used by other LLVM dialect (or other dialect) operations. However, we currently skip the lowering of threadprivate, so we effectively never generate and bind an LLVM-IR result to the threadprivate operation result. So when we later go on to lower dependent LLVM dialect operations, we are missing the required LLVM-IR result, try to access and use it and then ICE. The fix in this particular PR is to allow compilation of threadprivate for device as well as host, and simply treat the device compilation as a no-op, binding the LLVM-IR result of threadprivate with no alterations and binding it, which will allow the rest of the compilation to proceed, where we'll eventually discard the host segment in any case. The other possible solution to this I can think of, is doing something similar to Flang's passes that occur prior to CodeGen to the LLVM dialect, where they erase/no-op certain unrequired operations or transform them to lower level series of operations. And we would erase/no-op threadprivate on device as we'd never have these in target regions. The main issues I can see with this are that we currently do not specialise this stage based on wether we're compiling for device or host, so it's setting a precedent and adding another point of having to understand the separation between target and host compilation. I am also not sure we'd necessarily want to enforce this at a dialect level incase someone else wishes to add a different lowering flow or translation flow. Another possible issue is that a target operation we have/utilise would depend on the result of threadprivate, meaning we'd not be allowed to entirely erase/no-op it, I am not sure of any situations where this may be an issue currently though.
2025-01-03	[Flang][OpenMP] Fix allocating arrays with size intrinisic (#119226)	agozillon	1	-0/+39
	Attempt to address the following example from causing an assert or ICE: ``` subroutine test(a) implicit none integer :: i real(kind=real64), dimension(:) :: a real(kind=real64), dimension(size(a, 1)) :: b !$omp target map(tofrom: b) do i = 1, 10 b(i) = i end do !$omp end target end subroutine ``` Where we utilise a Fortran intrinsic (size) to calculate the size of allocatable arrays and then map it to device.
2024-12-18	Re-apply (#117867): [flang][OpenMP] Implicitly map allocatable record fields ↵	Kareem Ergawy	2	-0/+135
	(#120374) This re-applies #117867 with a small fix that hopefully prevents build bot failures. The fix is avoiding `dyn_cast` for the result of `getOperation()`. Instead we can assign the result to `mlir::ModuleOp` directly since the type of the operation is known statically (`OpT` in `OperationPass`).
2024-12-18	Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867)" ↵	Kareem Ergawy	2	-135/+0
	(#120360)
2024-12-18	[flang][OpenMP] Implicitly map allocatable record fields (#117867)	Kareem Ergawy	2	-0/+135
	This is a starting PR to implicitly map allocatable record fields. This PR contains the following changes: 1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that these utils work on the `mlir::Value` level rather than the `semantics::Symbol` level. This takes one step towards to enabling MLIR passes to more easily do some lowering themselves (e.g. creating `omp.map.bounds` ops for implicitely caputured data like this PR does). 2. Adds support for implicitely capturing and mapping allocatable fields in record types. There is quite some distant to still cover to have full support for this. I added a number of todos to guide further development. Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com> Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
2024-12-06	[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042)	Shilei Tian	5	-11/+67

2024-11-25	[OpenMP] Remove use of '__AMDGCN_WAVEFRONT_SIZE' (#113156)	Joseph Huber	2	-12/+31
	Summary: This is going to be deprecated in https://github.com/llvm/llvm-project/pull/112849. This patch ports it to use the builtin instead. This isn't a compile constant, so it could slightly negatively affect codegen. There really should be an IR pass to turn it into a constant if the function has known attributes. Using the builtin is correct when we just do it for knowing the size like we do here. Obviously guarding w32/w64 code with this check would be broken.
2024-11-16	[OpenMP] Allocatable explicit member mapping fortran offloading tests (#113555)	agozillon	82	-193/+1433
	This PR is one in a series of 3 that aim to add support for explicit member mapping of allocatable components in derived types within OpenMP+Fortran for Flang. This PR provides all of the runtime tests that are currently upstreamable, unfortunately some of the other tests would require linking of the fortran runtime for offload which we currently do not do. But regardless, this is plenty to ensure that the mapping is working in most cases.
2024-11-14	[Flang][OpenMP] Update MapInfoFinalization to use BlockArgs Interface and ↵	agozillon	1	-0/+37
	modify use_device_ptr/addr to be order independent (#113919) This patch primarily updates the MapInfoFinalization pass to utilise the BlockArgument interface. It also shuffles newly added arguments the MapInfoFinalization passes to the end of the BlockArg/Relevant MapInfo lists, instead of one prior to the owning descriptor type. During this it was noted that the use_device_ptr/addr handling of target data was a little bit too order dependent so I've attempted to make it less so, as we cannot depend on argument ordering to be the same as Fortran for any future frontends.
2024-09-06	Fix typo in test.	Akash Banerjee	1	-1/+1

2024-09-05	[OpenMP][Flang] Fix dynamic-extent array mapping (#107247)	Akash Banerjee	1	-0/+33
	This patch fixes the mapping and lowering of arrays with dynamic extents and adds a new test for the same. The fix discards the incomplete the dynamic extent information and replacing it with just the base type. When lowering to llvm later, the bounds information is used instead.
2024-09-03	[Offload] Change x86_64-pc-linux to x86_64-unknown-linux (#107023)	Jan Patrick Lehr	17	-34/+34
	It appears that the RUNTIMES build prefers the x86-64-unknown-linux-gnu triple notation for the host. This fixes runtime / test breakages when compiler-rt is used as the CLANG_DEFAULT_RTLIB.
2024-08-23	[Flang][OpenMP] Align map clause generation and fix issue with non-shared ↵	agozillon	1	-0/+75
	allocations for assumed shape/size descriptor types (#97855) This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering. Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far. The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original. However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.
2024-08-22	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)	Ethan Luis McDonough	1	-0/+74
	This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.
2024-08-13	[Offload] Ensure to load images when the device is used (#103002)	Johannes Doerfert	2	-0/+57
	When we use the device, e.g., with an API that interacts with it, we need to ensure the image is loaded and the constructors are executed. Two tests are included to verify we 1) load images and run constructors when needed, and 2) we do so lazily only if the device is actually used. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-08-13	[LinkerWrapper] Always pass `-flto` if the linker supports it (#102972)	Joseph Huber	2	-6/+6
	Summary; Now that we use the linker to do LTO / device linking, we need to inform the `clang` invocation to use `-flto` so it forwards arguments like `-On` correctly.
2024-08-12	[Offload] Add the right paths to the CUDA lit tests (#102997)	Johannes Doerfert	4	-9/+15

2024-08-12	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)	Johannes Doerfert	5	-0/+136
	Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```
2024-07-29	[OpenMP] Re-enable test after correctly forwarding `mllvm`	Joseph Huber	1	-1/+0

2024-07-23	[OpenMP] Ensure the actual kernel is annotated with launch bounds (#99927)	Johannes Doerfert	1	-0/+3
	In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function.
2024-07-23	[Offload] Re-enable tests that are now passing	Joseph Huber	1	-1/+0
	Summary: Some recent patches made these stop failing so the XFAIL now makes the bots go red. Fixes https://github.com/llvm/llvm-project/issues/98903
2024-07-22	[OMPIRBuilder] - Handle dependencies in `createTarget` (#93977)	Pranav Bhandarkar	1	-0/+69
	This patch handles dependencies specified by the `depend` clause on an OpenMP target construct. It does this much the same way clang does it by materializing an OpenMP `task` that is tagged with the dependencies. The following functions are relevant to this patch - 1) `createTarget` - This function itself is largely unchanged except that it now accepts a vector of `DependData` objects that it simply forwards to `emitTargetCall` 2) `emitTargetCall` - This function has changed now to check if an outer target-task needs to be materialized (i.e if `target` construct has `nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to do all the heavy lifting for creating and dispatching the task. 3) `emitTargetTask` - Bulk of the change is here. See the large comment explaining what it does at the beginning of this function
2024-07-15	[Offload] XFAIL four tests while working on fix (#98899)	Jan Patrick Lehr	2	-0/+2
	omp_dynamic_shared_memory_mixed_amdgpu.c omp_dynamic_shared_memory_amdgpu.c amdgcn-amd-amdhsa::bug51982.c amdgcn-amd-amdhsa::bug51781.c
2024-07-10	[Offload][test]Fix typo of requires (#98327)	Jinsong Ji	1	-1/+1
	Typos in 8823448807f3b1a1362d1417e062d763734e02f5.
2024-07-01	[OpenMP][offload] Fix dynamic schedule tracking (#97065)	Gheorghe-Teodor Bercea	3	-0/+201
	This patch fixes the dynamic schedule tracking.