riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-08-29	[OMPIRBuilder] Avoid crash in BasicBlock::splice. (#154987)	Abid Qadeer	1	-0/+24
	Calling `BasicBlock::splice` in `spliceBB` when both `Old` and `New` are empty is a `nop` currently but it can cause a crash once debug records are used instead of debug intrinsics. This PR makes the call conditional on at least one of `Old` or `New` being non-empty. Consider the following mlir: ``` omp.target map_entries() { llvm.intr.dbg.declare ... llvm.intr.dbg.declare ... omp.teams ... ... } ``` Current code would translate llvm.intr Ops to llvm intrinsics. Old is the BasicBlock where they were get inserted and it will have 2 llvm debug intrinsics by the time the implementation of `omp.teams` starts. This implementation creates many BasicBlocks by calling `splitBB`. The `New` is the just created BasicBlock which is empty. In the new scheme (using debug records), there will be no instruction in the `Old` BB after llvm.intr Ops get translated but just 2 trailing debug records. So both `Old` and `New` are empty. When control reaches `BasicBlock::splice`, it calls `spliceDebugInfoEmptyBlock`. This function expects that in this case (`Src` is empty but has trailing debug records), the `ToIt` is valid and it can call `adoptDbgRecords` on it. This assumption is not true in this case as `New` is empty and `ToIt` is pointing to end(). The fix is to only call `BasicBlock::splice` when at least of `Old` or `New` is not empty.
2025-08-18	[NFC][HLSL] Remove confusing enum aliases / duplicates (#153909)	Damyan Pepper	1	-11/+12
	Remove: * DescriptorType enum - this almost exactly shadowed the ResourceClass enum * ClauseType aliased ResourceClass Although these were introduced to make the HLSL root signature handling code a bit cleaner, they were ultimately causing confusion as they appeared to be unique enums that needed to be converted between each other. Closes #153890
2025-08-11	[DirectX] Fix resource binding analysis incorrectly removing duplicates ↵	Helena Kotas	1	-0/+18
	(#152253) The resource binding analysis was incorrectly reducing the size of the `Bindings` vector by one element after sorting and de-duplication. This led to an inaccurate setting of the `HasOverlappingBinding` flag in the `DXILResourceBindingInfo` analysis, as the truncated vector no longer reflected the true binding state. This update corrects the shrink logic and introduces an `assert` in the `DXILPostOptimizationValidation` pass. The assertion will trigger if `HasOverlappingBinding` is set but no corresponding error is detected, helping catch future inconsistencies. The bug surfaced when the `srv_metadata.hlsl` and `uav_metadata.hlsl` tests were updated to include unbounded resource arrays as part of https://github.com/llvm/llvm-project/issues/145422. These updated test files are included in this PR, as they would cause the new assertion to fire if the original issue remained unresolved. Depends on #152250
2025-08-07	[OpenMP] [IR Builder] Changes to Support Scan Operation (#136035)	Anchu Rajendran S	1	-0/+139
	Scan reductions are supported in OpenMP with the help of scan directive. Reduction clause of the for loop/simd directive can take an `inscan` modifier along with the body of the directive specifying a `scan` directive. This PR implements the lowering logic for scan reductions in workshare loops of OpenMP. The body of the for loop is split into two loops (Input phase loop and Scan Phase loop) and a scan reduction loop is added in the middle. The Input phase loop populates a temporary buffer with initial values that are to be reduced. The buffer is used by the reduction loop to perform scan reduction. Scan phase loop copies the values of the buffer to the reduction variable before executing the scan phase. Below is a high level view of the code generated. ``` <declare pointer to buffer> ptr omp parallel { size num_iters = <num_iters> // temp buffer allocation omp masked { buff = malloc(num_itersscanvarstype) ptr = buff } barrier; // input phase loop for (i: 0..<num_iters>) { <input phase>; buffer = ptr; buffer[i] = red; } // scan reduction omp masked { for (int k = 0; k != ceil(log2(num_iters)); ++k) { i=pow(2,k) for (size cnt = last_iter; cnt >= i; --cnt) { buffer = ptr; buffer[cnt] op= buffer[cnt-i]; } } } barrier; // scan phase loop for (0..<num_iters>) { buffer = ptr; red = buffer[i] ; <scan phase>; } // temp buffer deletion omp masked { free(ptr) } barrier; } ``` The temporary buffer needs to be shared between all threads performing reduction since it is read/written in Input and Scan workshare Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread. This is the reason why allocation, deallocation and scan reduction are performed within `masked`. The code is verified to produce correct results for Fortran programs with the code changes in the PR https://github.com/llvm/llvm-project/pull/133149
2025-08-05	[HLSL][Sema] Use hlsl::BindingInfoBuilder instead of RangeInfo. NFC (#150634)	Justin Bogner	2	-178/+0
	Clean up some duplicated logic. We had two ways to do the same thing here, and BindingInfoBuilder is more flexible.
2025-08-02	Add missing link component for Frontend offloading test (#151796)	Arvind Sudarsanam	1	-0/+1
	This change fixes one of the failures in https://github.com/llvm/llvm-project/pull/147321 /usr/bin/ld: unittests/Frontend/CMakeFiles/LLVMFrontendTests.dir/PropertySetRegistryTest.cpp.o: undefined reference to symbol '_ZN4llvm10offloading21writePropertiesToJSONERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_IS7_St7variantIJjNS_11SmallVectorIhLj0EEEEESt4lessIS7_ESaISt4pairIKS7_SB_EEESD_SaISE_ISF_SI_EEERNS_11raw_ostreamE' Need to add a missing LLVM link component in CMakeLists.txt. Thanks Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
2025-08-01	[SYCL] Add property set types and JSON representation (#147321)	Justin Cai	2	-0/+77
	This PR adds the `PropertySet` type, along with a pair of functions used to serialize and deserialize into a JSON representation. A property set is a key-value map, with values being one of 2 types - uint32 or byte array. A property set registry is a collection of property sets, indexed by a "category" name. In SYCL offloading, property sets will be used to communicate metadata about device images needed by the SYCL runtime. For example, there is a property set which has a byte array containing the numeric ID, offset, and size of each SYCL2020 spec constant. Another example is a property set describing the optional kernel features used in the module: does it use fp64? fp16? atomic64? This metadata will be computed by `clang-sycl-linker` and the JSON representation will be inserted in the string table of each output `OffloadBinary`. This JSON will be consumed the SYCL offload wrapper and will be lowered to the binary form SYCL runtime expects. For example, consider this SYCL program that calls a kernel that uses fp64: ```c++ #include <sycl/sycl.hpp> using namespace sycl; class MyKernel; int main() { queue q; auto p = malloc_shared<double>(1, q); p = .1; q.single_task<MyKernel>([=]{ p = 2; }).wait(); std::cout << p << "\n"; free(p, q); } ``` The device code for this program would have the kernel marked with `!sycl_used_aspects`: ``` define spir_kernel void @_ZTS8MyKernel([...]) !sycl_used_aspects !n { [...] } !n = {i32 6} ``` `clang-sycl-linker` would recognize this metadata and then would output the following JSON in the `OffloadBinary`'s key-value map: ``` { "SYCL/device requirements": { // aspects contains a list of sycl::aspect values used // by the module; in this case just the value 6 encoded // as a 4-byte little-endian integer "aspects": "BjAwMA==" } } ``` The SYCL offload wrapper would lower those property sets to something like this: ```c++ struct _sycl_device_binary_property_set_struct { char CategoryName; _sycl_device_binary_property PropertiesBegin; _sycl_device_binary_property PropertiesEnd; }; struct _sycl_device_binary_property_struct { char PropertyName; void ValAddr; uint64_t ValSize; }; // _sycl_device_binary_property_struct device_requirements[] = { /* PropertyName / "aspects", / ValAddr / [pointer to the bytes 0x06 0x00 0x00 0x00], / ValSize / 4, }; _sycl_device_binary_property_set_struct properties[] = { / CategoryName / "SYCL/device requirements", / PropertiesBegin / device_requirements, / PropertiesEnd */ std::end(device_requirments), } ``` --------- Co-authored-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
2025-07-31	Suppress -Wuninitialized-const-pointer warning (#151583)	Justin Bogner	1	-2/+4
	Recent clang (as of #148337) introduced a warning on passing unitialized pointers to functions that take const pointers. This is entirely spurious on this code, but this works around it to keep the bots happy. Build failure: https://lab.llvm.org/buildbot/#/builders/168/builds/14779
2025-07-31	[HLSL][DirectX] Extract HLSLBinding out of DXILResource. NFC (#150633)	Justin Bogner	2	-0/+274
	We extract the binding logic out of the DXILResource analysis passes into the FrontendHLSL library. This will allow us to use this logic for resource and root signature bindings in both the DirectX backend and the HLSL frontend.
2025-07-22	[flang][OpenMP] Sema checks, lowering with new format of MAP modifiers (#149137)	Krzysztof Parzyszek	1	-8/+8
	OpenMP 6.0 has changed the modifiers on the MAP clause. Previous patch has introduced parsing support for them. This patch introduces processing of the new forms in semantic checks and in lowering. This only applies to existing modifiers, which were updated in the 6.0 spec. Any of the newly introduced modifiers (SELF and REF) are ignored.
2025-07-17	[utils][TableGen] Make some non-bitmask enums iterable (#148647)	Krzysztof Parzyszek	1	-15/+6
	Additionally, add sentinel values <Enum>::First_ and <Enum>::Last_ to each one of those enums. This will allow using `enum_seq_inclusive` to generate the list of enum-typed values of any generated scoped (non-bitmask) enum.
2025-07-16	[DebugInfo] Remove getPrevNonDebugInstruction (#148859)	Jeremy Morse	1	-4/+4
	With the advent of intrinsic-less debug-info, we no longer need to scatter calls to getPrevNonDebugInstruction around the codebase. Remove most of them -- there are one or two that have the "SkipPseudoOp" flag turned on, however they don't seem to be in positions where skipping anything would be reasonable.
2025-07-15	[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383)	Jeremy Morse	1	-9/+9
	There are no longer debug-info instructions, thus we don't need this skipping. Horray!
2025-07-15	[mlir][OpenMP] Allow composite SIMD REDUCTION and IF (#147568)	Tom Eccles	1	-11/+22
	Reduction support: https://github.com/llvm/llvm-project/pull/146671 If Support is fixed in this PR The problem for the IF clause in composite constructs was that wsloop and simd both operate on the same CanonicalLoopInfo structure: with the SIMD processed first, followed by the wsloop. Previously the IF clause generated code like ``` if (cond) { while (...) { simd_loop_body; } } else { while (...) { nonsimd_loop_body; } } ``` The problem with this is that this invalidates the CanonicalLoopInfo structure to be processed by the wsloop later. To avoid this, in this patch I preserve the original loop, moving the IF clause inside of the loop: ``` while (...) { if (cond) { simd_loop_body; } else { non_simd_loop_body; } } ``` On simple examples I tried LLVM was able to hoist the if condition outside of the loop at -O3. The disadvantage of this is that we cannot add the llvm.loop.vectorize.enable attribute on either the SIMD or non-SIMD loops because they both share a loop back edge. There's no way of solving this without keeping the old design of having two different loops: which cannot be represented using only one CanonicalLoopInfo structure. I don't think the presence or absence of this attribute makes much difference. In my testing it is the llvm.loop.parallel_access metadata which makes the difference to vectorization. LLVM will vectorize if legal whether or not this attribute is there in the TRUE branch. In the FALSE branch this means the loop might be vectorized even when the condition is false: but I think this is still standards compliant: OpenMP 6.0 says that when the if clause is false that should be treated like the SIMDLEN clause is one. The SIMDLEN clause is defined as a "hint". For the same reason, SIMDLEN and SAFELEN clauses are silently ignored when SIMD IF is used. I think it is better to implement SIMD IF and ignore SIMDLEN and SAFELEN and some vectorization encouragement metadata when combined with IF than to ignore IF because IF could have correctness consequences whereas the rest are optimiztion hints. For example, the user might use the IF clause to disable SIMD programatically when it is known not safe to vectorize the loop. In this case it is not at all safe to add the parallel access or SAFELEN metadata.
2025-07-04	[HLSL][RootSignature] Update `setDefaultFlags` to account for Root Signature ↵	Finn Plummer	1	-2/+70
	Version (#145828) This pr updates `setDefaultFlags` in `HLSLRootSignature.h` to account for which version it should initialize the default flag values for. - Updates `setDefaultFlags` with a `Version` argument and initializes them to be compliant as described [here](https://github.com/llvm/wg-hlsl/pull/297). - Updates `RootSignatureParser` to retain the `Version` and pass this into `setDefaultFlags` - Updates all uses of `setDefaultFlags` in test-cases - Adds some new unit testing to ensure behaviour is as expected and that the Parser correctly passes down the version Resolves https://github.com/llvm/llvm-project/issues/145820.
2025-07-04	[NFC][HLSL][RootSignature] Split up `HLSLRootSignatureUtils` (#146124)	Finn Plummer	2	-2/+2
	This pr breaks-up `HLSLRootSignatureUtils` into separate orthogonal and meaningful libraries. This prevents it ending up as a dumping grounds of many different parts. - Creates a library `RootSignatureMetadata` to contain helper functions for interacting the root signatures in their metadata representation - Create a library `RootSignatureValidations` to contain helper functions that will validate various values of root signatures - Move the serialization of root signature elements to `HLSLRootSignature` Resolves: https://github.com/llvm/llvm-project/issues/145946
2025-07-03	[NFC][HLSL][DirectX] Let `HLSLRootSignature` reuse the `dxbc` defined enums ↵	Finn Plummer	1	-19/+43
	(#145986) This pr removes the redundancy of having the same enums defined in both the front-end and back-end of handling root signatures. Since there are many more uses of the enum in the front-end of the code, we will adhere to the naming conventions used in the front-end, to minimize the diff. The macros in `DXContainerConstants.def` are also touched-up to be consistent and to have each macro name follow its respective definition in d3d12.h and searchable by name [here](https://learn.microsoft.com/en-us/windows/win32/api/d3d12/). Additionally, the many `getEnumNames` are moved to `DXContainer` from `HLSLRootSignatureUtils` as they we will want them to be exposed publicly anyways. Changes for each enum follow the pattern of a commit that will make the enum definition in `DXContainer` adhere to above listed naming conventions, followed by a commit to actually use that enum in the front-end. Resolves https://github.com/llvm/llvm-project/issues/145815
2025-07-03	[Frontend][OpenMP] Implement directive name parser (#146776)	Krzysztof Parzyszek	2	-0/+172
	Implement a state machine that consumes tokens (words delimited by white space), and returns the corresponding directive id, or fails if the tokens did not form a valid name.
2025-06-25	[OpenMP] Add directive spellings introduced in OpenMP 6.0 (#141772)	Krzysztof Parzyszek	2	-0/+97
	For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
2025-06-23	[HLSL][RootSignature] Plug-in serialization and add full sample testcase ↵	Finn Plummer	1	-3/+3
	(#144769) This pr extends `dumpRootElements` to invoke the print methods of all `RootElement`s now that they are all implemented. Extends the `RootSignatures-AST.hlsl` testcase to have a root element of each type being parsed, constructed to the in-memory representation mode and then being dumped as part of the AST dump. - Update `HLSLRootSignatureUtils.cpp` to extend `dumpRootElements` - Extend `AST/HLSL/RootSigantures-AST.hlsl` testcase - Defines the helper `operator<<` for `RootElement` - Small correction to the output of `numDescriptors` to be `unbounded` in special case Resolves https://github.com/llvm/llvm-project/issues/124595.
2025-06-17	[HLSL][RootSignature] Implement serialization of remaining Root Elements ↵	Finn Plummer	1	-1/+121
	(#143198) Implements serialization of the remaining `RootElement`s, namely `RootDescriptor`s and `StaticSampler`s. - Adds unit testing for the serialization methods Resolves https://github.com/llvm/llvm-project/issues/138191 Resolves https://github.com/llvm/llvm-project/issues/138193
2025-06-17	[HLSL][RootSignature] Implement `ResourceRange` as an `IntervalMap` (#140957)	Finn Plummer	2	-0/+178
	A resource range consists of a closed interval, `[a;b]`, denoting which shader registers it is bound to. For instance: - `CBV(b1)` corresponds to the resource range of `[1;1]` - `CBV(b0, numDescriptors = 3)` likewise to `[0;2]` We want to provide an error diagnostic when there is an overlap in the required registers (an overlap in the resource ranges). The goal of this pr is to implement a structure to model a set of resource ranges and provide an api to detect any overlap over a set of resource ranges. `ResourceRange` models this by implementing an `IntervalMap` to denote a mapping from an interval of registers back to a resource range. It allows for a new `ResourceRange` to be added to the mapping and it will report if and what the first overlap is. For the context of how this will be used in validation of a `RootSignatureDecl` please see the proceeding pull request here: https://github.com/llvm/llvm-project/pull/140962. - Implements `ResourceRange` as an `IntervalMap` - Adds unit testing of the various `insert` scenarios Note: it was also considered to implement this as an `IntervalTree`, this would allow reporting of a diagnostic for each overlap that is encountered, as opposed to just the first. However, error generation of just reporting the first error is already rather verbose, and adding the additional diagnostics only made this worse. Part 1 of https://github.com/llvm/llvm-project/issues/129942
2025-06-16	Reland "[HLSL][RootSignature] Implement serialization of RootConstants and ↵	Finn Plummer	1	-0/+69
	RootFlags" (#143019) This relands #141130. The initial commit uncovered that we are missing the correct linking of FrontendHLSL into clang/lib/Parse and clang/lib/unittests/Parse. This change addreses this by linking them accordingly. It was also checked and ensured that the LexHLSLRootSignature libraries do not depend on FrontendHLSL and so we are not required to link there. Resolves: #138190 and #138192
2025-06-03	[NFC][RootSignature] Move RootSignature util functions (#142491)	Finn Plummer	1	-1/+1
	`HLSLRootSignature.h` was originally created to hold the struct definitions of an `llvm::hlsl::rootsig::RootElement` and some helper functions for it. However, there many users of the structs that don't require any of the helper methods. This requires us to link the `FrontendHLSL` library, where we otherwise wouldn't need to. For instance: - This [revert](https://github.com/llvm/llvm-project/pull/142005) was required as it requires linking to the unrequired `FrontendHLSL` library - As part of the change required here: https://github.com/llvm/llvm-project/issues/126557. We will want to add an `HLSLRootSignatureVersion` enum. Ideally this could live with the root signature struct defs, but we don't want to link the helper objects into `clang/Basic/TargetOptions.h` This change allows the struct definitions to be kept in a single header file and to then have the `FrontendHLSL` library only be linked when required.
2025-05-29	Revert "[HLSL][RootSignature] Implement serialization of `RootConstants` and ↵	Finn Plummer	1	-69/+0
	`RootFlags`" (#142005) The commit caused build failures, [here](https://lab.llvm.org/buildbot/#/builders/10/builds/6308), due to a missing linked llvm library (HLSLFrontend) into `clang/unittests/Parse/CMakeLists.txt`. While it seems like the fix is straightforwardly to just add this library, I will revert now to build and verify locally it correctly fixes it. Reverts llvm/llvm-project#141130
2025-05-29	[HLSL][RootSignature] Implement serialization of `RootConstants` and ↵	Finn Plummer	1	-0/+69
	`RootFlags` (#141130) - Implements serialization of the currently completely defined `RootElement`s, namely `RootConstants` and `RootFlags` - Adds unit testing for the serialization methods Resolves: https://github.com/llvm/llvm-project/issues/138190 and https://github.com/llvm/llvm-project/issues/138192
2025-05-22	[NFC][HLSL][RootSignature] Use `operator<<` overload instead of dump method ↵	Finn Plummer	1	-5/+5
	(#141127) - we will need to provide a way to dump `RootFlags` for serialization and by using operator overloads we can maintain a consistent interface This is an NFC to allow for https://github.com/llvm/llvm-project/issues/138192 to be more straightforwardly implemented.
2025-05-19	[AMDGPU] Set AS8 address width to 48 bits	Alexander Richardson	1	-2/+4
	Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-13	[OpenMP][NFC] Use pass by const ref for Dependencies (#139592)	Shafik Yaghmour	1	-7/+7
	Static analysis flagged the passing of Dependencies to emitTargetCall as a place we could use std::move to avoid copying. A closer look indicated we could instead turn the parameter into a const & and not have a default value since it was only used in two lines in a test and changing those two locations was easy.
2025-05-09	[HLSL][RootSignature] Implement serialized dump of Descriptor Tables (#138326)	Finn Plummer	2	-0/+113
	- defines the `dump` method for in-memory descriptor table data structs in `Frontend/HLSLRootSignature` - creates unit test infrastructure to support unit tests of the dump methods Resolves https://github.com/llvm/llvm-project/issues/138189
2025-05-09	[LLVM][OpenMP] Add "version" parameter to getOpenMPDirectiveName (#139114)	Krzysztof Parzyszek	2	-6/+6
	Some OpenMP directives have different spellings in different versions of the OpenMP spec. To use the proper spelling for a given spec version pass "version" as a parameter to getOpenMPDirectiveName. This parameter won't be used at the moment, and will have a default value to allow callers not to pass it, for gradual adoption in various components. RFC: https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
2025-05-04	[llvm] Remove unused local variables (NFC) (#138454)	Kazu Hirata	1	-2/+0

2025-04-29	[mlir][OpenMP] Add __atomic_store to AtomicInfo (#121055)	NimishMishra	1	-2/+9
	This PR adds functionality for `__atomic_store` libcall in AtomicInfo. This allows for supporting complex types in `atomic write`. Fixes https://github.com/llvm/llvm-project/issues/113479 Fixes https://github.com/llvm/llvm-project/issues/115652
2025-04-18	unittests: Avoid using getNumUses (#136352)	Matt Arsenault	1	-30/+30

2025-04-09	[mlir][llvm][OpenMP] Hoist __atomic_load alloca (#132888)	NimishMishra	1	-3/+8
	Current implementation of `__atomic_compare_exchange` uses an alloca for `__atomic_load`, leading to issues like https://github.com/llvm/llvm-project/issues/120724. This PR hoists this alloca to `AllocaIP`. Fixes: https://github.com/llvm/llvm-project/issues/120724
2025-04-07	[MLIR][OpenMP] Add codegen for teams reductions (#133310)	Jan Leyonberg	1	-0/+1
	This patch adds the lowering of teams reductions from the omp dialect to LLVM-IR. Some minor cleanup was done in clang to remove an unused parameter.
2025-02-25	[OpenMPIRBuilder] Split calculation of canonical loop trip count, NFC (#127820)	Sergio Afonso	1	-13/+3
	This patch splits off the calculation of canonical loop trip counts from the creation of canonical loops. This makes it possible to reuse this logic to, for instance, populate the `__tgt_target_kernel` runtime call for SPMD kernels. This feature is used to simplify one of the existing OpenMPIRBuilder tests.
2025-02-18	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers ↵	Akash Banerjee	1	-18/+48
	(#124746) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005
2025-02-05	[flang][OMPIRbuilder] Set debug loc on terminator created by splitBB. (#125897)	Abid Qadeer	1	-0/+16
	Fixes #125088. When splitBB is called with createBranch=true, it creates a branch instruction in the old block. But no debug loc is set on that branch instruction. If that is used as InsertPoint in the restoreIP, it has the potential to set the current debug location to null and subsequent instruction will come out without a debug location. This caused the verification check to fail as shown in the bug report. This PR changes splitBB and spliceBB function to also take a debugLoc parameter which can be used to set the debug location of the branch instruction.
2025-02-05	[MLIR][OpenMP] Use correct DebugLoc in target construct callbacks. (#125856)	Abid Qadeer	1	-0/+18
	This is same as PR #125106 which somehow is stuck in a "Processing Update" loop for many hours now. I am going to close that one and push this one instead. While working on https://github.com/llvm/llvm-project/issues/125088, I noticed a problem with the TargetBodyGenCallbackTy and TargetGenArgAccessorsCallbackTy. The OMPIRBuilder and MLIR side Both maintain their own IRBuilder and when control goes from one to other, we have to take care to not use a stale debug location. The code currently rely on restoreIP to set the insertion point and the debug location. But if the passes InsertPointTy has an empty block, then the debug location will not be updated (see SetInsertPoint). This can cause invalid debug location to be attached to instruction and the verifier will complain. Similarly when we exit the callback, the debug location of the Builder is not set to what it was before the callback. This again can cause verification failures. This PR resets the debug location at the start and also uses an InsertPointGuard to restore the debug location at exit. Both of these problems would have been caught by the unit tests but they were not setting the debug location of the builder before calling the createTarget so the problem was hidden. I have updated the tests accordingly.
2025-02-05	[OpenMP]Initial parsing/sema support for target_device selector set (#118471)	Ritanya-B-Bharadwaj	1	-11/+15
	This patch adds initial support for target_device selector set - Section 9.2 (Spec 6.0)
2025-01-27	[NFC][DebugInfo] Make some block-start-position methods return iterators ↵	Jeremy Morse	1	-8/+16
	(#124287) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction's as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction's and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.
2025-01-24	[OpenMP] Replace nvvm.annotation usage with kernel calling conventions (#122320)	Alex MacLean	1	-20/+0
	Specifying a kernel with the `ptx_kernel` or `amdgpu_kernel` calling convention is a more idiomatic and compile-time performant than using the `nvvm.annoation !"kernel"` metadata. Transition OMPIRBuilder to use calling conventions for PTX kernels and no longer emit `nvvm.annoation`. Update OpenMPOpt to work with kernels specified via calling convention as well as metadata. Update OpenMP tests to use the calling conventions.
2025-01-24	[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites ↵	Jeremy Morse	1	-8/+8
	(#123737) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to getFirstNonPHI use the iterator-returning version. This patch changes a bunch of call-sites calling getFirstNonPHI to use getFirstNonPHIIt, which returns an iterator. All these call sites are where it's obviously safe to fetch the iterator then dereference it. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer getFirstNonPHI, but not before adding concise documentation of what considerations are needed (very few). --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-22	[IR][unittests] Replace of PointerType::getUnqual(Type) with opaque version ↵	Mats Jun Larsen	1	-2/+1
	(NFC) (#123901) Follow up to https://github.com/llvm/llvm-project/issues/123569
2025-01-15	[OMPIRBuilder][MLIR] Add support for target 'if' clause (#122478)	Sergio Afonso	1	-19/+22
	This patch implements support for handling the 'if' clause of OpenMP 'target' constructs in the OMPIRBuilder and updates MLIR to LLVM IR translation of the `omp.target` MLIR operation to make use of this new feature.
2025-01-14	[OMPIRBuilder] Propagate attributes to outlined target regions (#117875)	Sergio Afonso	1	-0/+25
	This patch copies the target-cpu and target-features attributes of functions containing target regions into the corresponding outlined function holding the target region. This mirrors what is currently being done for all other outlined functions through the `CodeExtractor` in `OpenMPIRBuilder::finalize()`.
2025-01-14	[OMPIRBuilder] Support runtime number of teams and threads, and SPMD mode ↵	Sergio Afonso	1	-19/+262
	(#116051) This patch introduces a `TargetKernelRuntimeAttrs` structure to hold host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count values passed to the runtime kernel offloading call. Additionally, kernel type information is used to influence target device code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which provides more granularity.
2025-01-14	[OMPIRBuilder] Introduce struct to hold default kernel teams/threads (#116050)	Sergio Afonso	1	-7/+15
	This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs` structure used to simplify passing default and constant values for number of teams and threads, and possibly other target kernel-related information in the future. This is used to forward values passed to `createTarget` to `createTargetInit`, which previously used a default unrelated set of values.
2025-01-09	[OpenMP][OMPIRBuilder] Handle non-failing calls properly (#115863)	Sergio Afonso	1	-387/+443
	The preprocessor definition used to enable asserts and the one that `llvm::Error` and `llvm::Expected` use to ensure all created instances are checked are not the same. By making these checks inside of an `assert` in cases where errors are not expected, certain build configurations would trigger runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF -DLLVM_UNREACHABLE_OPTIMIZE=ON`). The `llvm::cantFail()` function, which was intended for this use case, is used by this patch in place of `assert` to prevent these runtime failures. In tests, new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and `EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release builds.