riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
12 days	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)	Sander de Smalen	1	-1/+4
	The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.
2025-09-10	[x86][AVX-VNNI] Fix VPDPBUSD Argument Types (#155194)	BaiXilin	1	-13/+89
	Fixed intrinsic VPDPBUSD[,S]_128/256/512's argument types to match with the ISA. Fixes part of #97271
2025-09-05	[LLD][COFF] Add more `--time-trace` tags for ThinLTO linking (#156471)	Alexandre Ganea	1	-0/+2
	In order to better see what's going on during ThinLTO linking, this PR adds more profile tags when using `--time-trace` on a `lld-link.exe` invocation. After PR, linking `clang.exe`: <img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021" src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910" /> Linking a custom (Unreal Engine game) binary gives a completly different picture, probably because of using Unity files, and the sheer amount of input files (here, providing over 60 GB of .OBJs/.LIBs). <img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048" src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0" />
2025-08-27	[NVPTX] Auto-upgrade nvvm.grid_constant to param attribute (#155489)	Alex MacLean	1	-0/+10
	Upgrade the !"grid_constant" !nvvm.annotation to a "nvvm.grid_constant" attribute. This attribute is much simpler for front-ends to apply and faster and simpler to query.
2025-08-18	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)	Kazu Hirata	1	-1/+1
	This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.
2025-08-08	[AutoUpgrade] Fix use after free	Nikita Popov	1	-5/+5
	Determine the intrinsic ID before the name is freed during renaming.
2025-08-08	[IR] Remove size argument from lifetime intrinsics (#150248)	Nikita Popov	1	-13/+13
	Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).
2025-07-24	[LLVM][NVPTX] Upstream tanh intrinsic for libdevice (#149596)	Meredith Julian	1	-0/+7
	Currently __nv_fast_tanhf() in libdevice maps to an nvvm intrinsic that has not been upstreamed, which is causing issues when using the NVPTX backend from upstream. Instead of upstreaming the intrinsic, we can instead use the existing Intrinsic::tanh with the afn flag. This change adds NVPTX backend support for ISD::TANH, adds auto-upgrade for the old tanh_approx intrinsic to @llvm.tanh.f32 with afn flag so that libdevice works properly upstream, and adds a basic codegen test and a case to the auto-upgrade test.
2025-07-21	[IR] Only allow lifetime.start/end on allocas (#149310)	Nikita Popov	1	-2/+42
	lifetime.start and lifetime.end are primarily intended for use on allocas, to enable stack coloring and other liveness optimizations. This is necessary because all (static) allocas are hoisted into the entry block, so lifetime markers are the only way to convey the actual lifetimes. However, lifetime.start and lifetime.end are currently allowed to be used on non-alloca pointers. We don't actually do this in practice, but just the mere fact that this is possible breaks the core purpose of the lifetime markers, which is stack coloring of allocas. Stack coloring can only work correctly if all lifetime markers for an alloca are analyzable. * If a lifetime marker may operate on multiple allocas via a select/phi, we don't know which lifetime actually starts/ends and handle it incorrectly (https://github.com/llvm/llvm-project/issues/104776). * Stack coloring operates on the assumption that all lifetime markers are visible, and not, for example, hidden behind a function call or escaped pointer. It's not possible to change this, as part of the purpose of lifetime markers is that they work even in the presence of escaped pointers, where simple use analysis is insufficient. I don't think there is any way to have coherent semantics for lifetime markers on allocas, while also permitting them on arbitrary pointer values. This PR restricts lifetimes to operate on allocas only. As a followup, I will also drop the size argument, which is superfluous if we always operate on an alloca. (This change also renders various code handling lifetime markers on non-alloca dead. I plan to clean up that kind of code after dropping the size argument as well.) In practice, I've only found a few places that currently produce lifetimes on non-allocas: * CoroEarly replaces the promise alloca with the result of an intrinsic, which will later be replaced back with an alloca. I think this is the only place where there is some legitimate loss of functionality, but I don't think this is particularly important (I don't think we'd expect the promise in a coroutine to admit useful lifetime optimization.) * SafeStack moves unsafe allocas onto a separate frame. We can safely drop lifetimes here, as SafeStack performs its own stack coloring. * Similar for AddressSanitizer, it also moves allocas into separate memory. * LSR sometimes replaces the lifetime argument with a GEP chain of the alloca (where the offsets ultimately cancel out). This is just unnecessary. (Fixed separately in https://github.com/llvm/llvm-project/pull/149492.) * InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast of an alloca. I don't think this is necessary.
2025-07-04	[ARM] Add neon vector support for roundeven	David Green	1	-0/+1
	As per #142559, this marks froundeven as legal for Neon and upgrades the existing arm.neon.vrintn intrinsics.
2025-07-03	[ARM] Add neon vector support for rint	David Green	1	-1/+2
	As per #142559, this marks frint as legal for Neon and upgrades the existing arm.neon.vrintx intrinsics.
2025-07-03	[ARM] Add neon vector support for trunc	David Green	1	-0/+1
	As per #142559, this marks ftrunc as legal for Neon and upgrades the existing arm.neon.vrintz intrinsics.
2025-07-01	[ARM] Add neon vector support for ceil	David Green	1	-1/+2
	As per #142559, this marks fceil as legal for Neon and upgrades the existing arm.neon.vrintp intrinsics.
2025-06-30	[ARM] Add neon vector support for round	David Green	1	-0/+1
	As per #142559, this marks fround as legal for Neon and upgrades the existing arm.neon.vrinta intrinsics.
2025-06-29	[ARM] Add neon vector support for floor (#142559)	David Green	1	-0/+1
	This marks ffloor as legal providing that armv8 and neon is present (or fullfp16 for the fp16 instructions). The existing arm_neon_vrintm intrinsics are auto-upgraded to llvm.floor. If this is OK I will update the other vrint intrinsics.
2025-06-23	[AutoUpgrade] Remove unnecessary name check (NFCI)	Nikita Popov	1	-5/+1
	If only the name is incorrect (due to added overload), but the signature is correct, we should go through the generic remangling upgrade.
2025-06-12	[NVPTX] Add cta_group support to TMA G2S intrinsics (#143178)	Durgadoss R	1	-21/+83
	This patch extends the TMA G2S intrinsics with the support for cta_group::1/2 available from Blackwell onwards. The existing intrinsics are auto-upgraded with a default value of '0' for the `cta_group` flag operand. * lit tests are added for all combinations of the newer variants. * Negative tests are added to validate the error-handling when the value of the cta_group flag falls out-of-range. * The generated PTX is verified with a 12.8 ptxas executable. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-06-11	Reapply 76197ea6f91f after removing an assertion	Jeremy Morse	1	-20/+5
	Specifically this is the assertion in BasicBlock.cpp. Now that we're not examining or setting that flag consistently (because it'll be deleted in about an hour) there's no need to keep this assertion. Original commit title: [DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)
2025-06-11	Revert "[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths ↵	Jeremy Morse	1	-5/+20
	(#143451)" This reverts commit c71a2e688828ab3ede4fb54168a674ff68396f61. /me squints -- this is hitting an assertion I thought had been deleted, will revert and investigate for a bit.
2025-06-11	[DebugInfo][RemoveDIs] Remove some debug intrinsic-only codepaths (#143451)	Jeremy Morse	1	-20/+5
	These are opportunistic deletions as more places that make use of the IsNewDbgInfoFormat flag are removed. It should (TM)(R) all be dead code now that `IsNewDbgInfoFormat` should be true everywhere. FastISel: we don't need to do debug-aware instruction counting any more, because there are no debug instructions, Autoupgrade: you can no-longer avoid autoupgrading of intrinsics to records DIBuilder: Delete the code for creating debug intrinsics (!) LoopUtils: No need to handle debug instructions, they don't exist
2025-06-11	[DebugInfo][RemoveDIs] Use autoupgrader to convert old debug-info (#143452)	Jeremy Morse	1	-25/+52
	By chance, two things have prevented the autoupgrade path being exercised much so far: * LLParser setting the debug-info mode to "old" on seeing intrinsics, * The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info block. In practice, this appears to mean this code path hasn't seen the various invalid inputs that can come its way. This commit does a number of things: * Tolerates the various illegal inputs that can be written with debug-intrinsics, and that must be tolerated until the Verifier runs, * Printing illegal/null DbgRecord fields must succeed, * Verifier errors need to localise the function/block where the error is, * Tests that now see debug records will print debug-record errors, Plus a few new tests for other intrinsic-to-debug-record failures modes I found. There are also two edge cases: * Some of the unit tests switch back and forth between intrinsic and record modes at will; I've deleted coverage and some assertions to tolerate this as intrinsic support is now Gone (TM), * In sroa-extract-bits.ll, the order of debug records flips. This is because the autoupgrader upgrades in the opposite order to the basic block conversion routines... which doesn't change the record order, but _does_ change the use list order in Metadata! This should (TM) have no consequence to the correctness of LLVM, but will change the order of various records and the order of DWARF record output too. I tried to reduce this patch to a smaller collection of changes, but they're all intertwined, sorry.
2025-06-08	[ARM] Handle roundeven for MVE. (#142557)	David Green	1	-0/+6
	Now that #141786 handles scalar and neon types, this adds MVE definitions and legalization for llvm.roundeven intrinsics. The existing llvm.arm.mve.vrintn are auto-upgraded to llvm.roundeven like other vrint instructions, so should continue to work.
2025-05-22	Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (#141143)	Alex MacLean	1	-6/+29
	Note: This relands #140615 adding a ".count" suffix to the non-".all" variants. Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.count(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
2025-05-22	Revert "[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615)"	Alex Maclean	1	-28/+6
	This reverts commit 735209c0688b10a66c24750422b35d8c2ad01bb5.
2025-05-21	[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615)	Alex MacLean	1	-6/+28
	Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync(i32, i32) - llvm.nvvm.barrier.cta.arrive(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
2025-05-19	[AMDGPU] Set AS8 address width to 48 bits	Alexander Richardson	1	-1/+4
	Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-15	[NFC][AutoUpgrade] Use `ConstantPointerNull::get` instead of ↵	Shilei Tian	1	-9/+9
	`Constant::getNullValue` for known pointer types (#139984) This is a preparation change for upcoming PRs that will update the semantics of `ConstantPointerNull`, making it to represent an actual `nullptr` rather than a zero-valued pointer.
2025-05-14	Fix-forward build error from #132489	Thurston Dang	1	-1/+1
	Replace deprecated use of getDeclaration that was added in #132489 llvm/lib/IR/AutoUpgrade.cpp:1480:26: error: 'getDeclaration' is deprecated: Use getOrInsertDeclaration instead [-Werror,-Wdeprecated-declarations] 1480 \| NewFn = Intrinsic::getDeclaration( \| ^~~~~~~~~~~~~~ \| getOrInsertDeclaration
2025-05-14	[clang][IR] Overload @llvm.thread.pointer to support non-AS0 targets (#132489)	Jessica Clarke	1	-2/+10
	Thread-local globals live, by default, in the default globals address space, which may not be 0, so we need to overload @llvm.thread.pointer to support other address spaces, and use the default globals address space in Clang.
2025-05-08	[NVPTX] Fixup AutoUpgrade of llvm.nvvm.atomic.load.{inc,dec}.32 (#138907)	Alex MacLean	1	-8/+13
	The previous implementation failed to account for the fact that these intrinsics have an overloaded pointer type. This version handles the pointer type and adds tests for llvm.nvvm.atomic.load.add.{f32,f64}.
2025-05-02	[IRBuilder] Add versions of createInsertVector/createExtractVector that take ↵	Craig Topper	1	-4/+2
	a uint64_t index. (#138324) Most callers want a constant index. Instead of making every caller create a ConstantInt, we can do it in IRBuilder. This is similar to createInsertElement/createExtractElement.
2025-04-25	[NVPTX] Remove 'param' variants of nvvm.ptr.* intrinics (#137065)	Alex MacLean	1	-14/+12
	After #136008 these intrinsics are no longer inserted by the compiler and can be upgraded to addrspacecasts.
2025-04-22	[NVPTX] Add support for Shared Cluster Memory address space [2/2] (#136768)	modiking	1	-0/+83
	Adds support for new Shared Cluster Memory Address Space (SHARED_CLUSTER, addrspace 7). See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory for details. Follow-up to https://github.com/llvm/llvm-project/pull/135444 1. Update existing codegen/intrinsics in LLVM and MLIR that now use this address space 2. Auto-upgrade previous intrinsics that used SMEM (addrspace 3) but were really taking in a shared cluster pointer to the new address space
2025-04-17	[NVPTX] Add auto-upgrade rules for fabs.{f,d,ftz.f} (#136150)	Alex MacLean	1	-0/+7
	These auto-upgrade rules are required after these intrinsics were removed in #135644
2025-04-17	[NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (#135644)	Alex MacLean	1	-7/+9
	This change unifies the NVVM intrinsics for floating point absolute value into two new overloaded intrinsics "llvm.nvvm.fabs." and "llvm.nvvm.fabs.ftz.". Documentation has been added specifying the semantics of these intrinsics to clarify how they differ from "llvm.fabs.*". In addition, support for these new intrinsics is extended to cover the f16 variants.
2025-04-17	[Intrinsics] Add support for range attributes (#135642)	Nikita Popov	1	-2/+7
	Add support for specifying range attributes in Intrinsics.td. Use this to specify the ucmp/scmp range [-1,2). This case is trickier than existing intrinsic attributes, because we need to create the attribute with the correct bitwidth. As such, the attribute construction now needs to be aware of the function type. We also need to be careful to no longer assign attributes on intrinsics with invalid signatures, as we'd make invalid assumptions about the number of arguments etc otherwise. Fixes https://github.com/llvm/llvm-project/issues/130179.
2025-04-08	[NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (#134111)	Alex MacLean	1	-0/+9
	These intrinsics can be upgrade to an atomicrmw instruction.
2025-03-31	[IRBuilder] Add new overload for CreateIntrinsic (#131942)	Rahul Joshi	1	-26/+26
	Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.
2025-03-24	[NVPTX] Auto-Upgrade !"align" metadata on return values to stackalign (#131726)	Alex MacLean	1	-3/+0
	This commit follows up 0191307b by auto-upgrading !"align" metadata on return values to stackalign. This allows us to remove all logic to check the metadata from NVPTXUtilities.
2025-03-20	[NVPTX] Auto-Upgrade llvm.nvvm.swap.lo.hi.b64 to llvm.fshl (#132098)	Alex MacLean	1	-1/+7
	After 3c8c2914e067e132af951f70d2b3577fe049e19a the lowering of 64-bit funnel shifts has been improved to the point where this intrinsic is no longer needed.
2025-03-14	[NFC][llvm/IR] comparison of unsigned expression in ‘>= 0’ is always ↵	AdityaK	1	-1/+1
	true (#130843)
2025-03-14	Add IRBuilder::CreateFMA (#131112)	Frederik Harwath	1	-2/+2
	This commit adds a function for creating fma intrinsic calls to the IRBuilder. If the "IsFPConstrained" flag of the builder is set, the function creates a call to "experimental.constrained.fma" instead of "llvm.fma" . To support the creation of the constrained intrinsic, a function "CreateConstrainedFPIntrinsic" is introduced.
2025-02-26	[NVPTX] Convert vector function nvvm.annotations to attributes (#127736)	Alex MacLean	1	-0/+52
	Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxntid[xyz]" -> "nvvm.maxntid" - !"reqntid[xyz]" -> "nvvm.reqntid" - !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"
2025-02-12	[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908)	Alex MacLean	1	-0/+16
	Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank" - !"minctasm" -> "nvvm.minctasm" - !"maxnreg" -> "nvvm.maxnreg"
2025-01-29	[NVPTX] Auto-Upgrade some nvvm.annotations to attributes (#119261)	Alex MacLean	1	-0/+67
	Add a new AutoUpgrade function to convert some legacy nvvm.annotations metadata to function level attributes. These attributes are quicker to look-up so improve compile time and are more idiomatic than using metadata which should not include required information that changes the meaning of the program. Currently supported annotations are: - !"kernel" -> ptx_kernel calling convention - !"align" -> alignstack parameter attributes (return not yet supported)
2025-01-21	[AArch64] Improve bcvtn2 and remove aarch64_neon_bfcvt intrinsics (#120363)	David Green	1	-25/+61
	This started out as trying to combine bf16 fpround to BFCVT2 instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in favour of generating fpround instructions directly. This simplifies the patterns and can lead to other optimizations. The BFCVT2 instruction is adjusted to makes sure the types are valid, and a bfcvt2 is now generated in more place. The old intrinsics are auto-upgraded to fptrunc instructions too.
2025-01-20	[AutoUpgrade] Avoid unnecessary pointer bitcasts (NFCI)	Nikita Popov	1	-52/+13
	Not needed with opaque pointers.
2024-12-10	[WebAssembly] Add `-i128:128` to the `datalayout` string. (#119204)	Dan Gohman	1	-1/+2
	Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM `datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is currently using the defaults for both, so it's inconsistent. Fix this by adding `-i128:128` to Wasm's `datalayout` string so that it aligns `i128` to 16 bytes too. This is similar to [llvm/llvm-project@dbad963](https://github.com/llvm/llvm-project/commit/dbad963a69fd7b16c6838f81b61167fbf00a413c) for SPARC. This fixes rust-lang/rust#133991; see that issue for further discussion. [defaults to aligning `__int128_t` to 16 bytes]: https://github.com/llvm/llvm-project/blob/f8b4182f076f8fe55f9d5f617b5a25008a77b22f/clang/lib/Basic/TargetInfo.cpp#L77 [default to aligning `i128` to 8 bytes]: https://llvm.org/docs/LangRef.html#langref-datalayout
2024-12-09	[PowerPC] Update data layout aligment of i128 to 16 (#118004)	Lei Huang	1	-1/+1
	Fix 64-bit PowerPC part of https://github.com/llvm/llvm-project/issues/102783.
2024-11-14	[IR] Vector extract last active element intrinsic (#113587)	Graham Hunter	1	-0/+3
	As discussed in #112738, it may be better to have an intrinsic to represent vector element extracts based on mask bits. This intrinsic is for the case of extracting the last active element, if any, or a default value if the mask is all-false. The target-agnostic SelectionDAG lowering is similar to the IR in #106560.