riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
8 days	[VPlan] Add WidenGEP::getSourceElementType (NFC) (#159029)	Ramkumar Ramachandra	3	-17/+21

8 days	[Coroutines] Take byval param alignment into account when spilling to frame ↵	Hans Wennborg	1	-4/+8
	(#159765) Fixes #159571
8 days	[LV] Set correct costs for interleave group members.	Florian Hahn	1	-3/+12
	This ensures each scalarized member has an accurate cost, matching the cost it would have if it would not have been considered for an interleave group.
8 days	[LV] Skip select cost for invariant divisors in legacy cost model.	Florian Hahn	1	-8/+10
	For UDiv/SDiv with invariant divisors, the created selects will be hoisted out. Don't compute their cost for each iteration, to match the more accurate VPlan-based cost modeling. Fixes https://github.com/llvm/llvm-project/issues/159402.
8 days	[VPlanPatternMatch] Introduce m_ConstantInt (#159558)	Ramkumar Ramachandra	2	-5/+33

9 days	[LV] Also handle non-uniform scalarized loads when processing AddrDefs.	Florian Hahn	1	-2/+5
	Loads of addresses are scalarized and have their costs computed w/o scalarization overhead. Consistently apply this logic also to non-uniform loads that are already scalarized, to ensure their costs are consistent with other scalarized lodas that are used as addresses.
9 days	[InstCombine][nfc] Fix assert failure with function entry count equal to zero	Alan Zhao	1	-12/+13
	We were hitting an assert discovered in https://github.com/llvm/llvm-project/pull/157768#issuecomment-3315359832
9 days	[IR] Fix a few implicit conversions from TypeSize to uint64_t. NFC (#159894)	Craig Topper	1	-2/+2

9 days	[MemProf] Propagate function call assignments to newly cloned nodes (#159907)	Teresa Johnson	1	-12/+22
	There are a couple of places during function cloning where we may create new callsite clone nodes. One of those places was correctly propagating the assignment to which function clone it should call, and one was not. Refactor this handling into a helper and use in both places so the newly created callsite clones actually call the assigned callee function clones.
10 days	Reland [BasicBlockUtils] Handle funclets when detaching EH pad blocks (#159379)	Gábor Spaits	1	-28/+69
	Fixes #148052 . Last PR did not account for the scenario, when more than one instruction used the `catchpad` label. In that case I have deleted uses, which were already "choosen to be iterated over" by the early increment iterator. This issue was not visible in normal release build on x86, but luckily later on the address sanitizer build it has found it on the buildbot. Here is the diff from the last version of this PR: #158435 ```diff diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp index 91e245e5e8f5..1dd8cb4ee584 100644 --- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp +++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp @@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock > BBs, // first block, the we would have possible cleanupret and catchret // instructions with poison arguments, which wouldn't be valid. if (isa<FuncletPadInst>(I)) { - for (User User : make_early_inc_range(I.users())) { + SmallPtrSet<BasicBlock , 4> UniqueEHRetBlocksToDelete; + for (User User : I.users()) { Instruction ReturnInstr = dyn_cast<Instruction>(User); // If we have a cleanupret or catchret block, replace it with just an // unreachable. The other alternative, that may use a catchpad is a @@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock > BBs, if (isa<CatchReturnInst>(ReturnInstr) \|\| isa<CleanupReturnInst>(ReturnInstr)) { BasicBlock ReturnInstrBB = ReturnInstr->getParent(); - // This catchret or catchpad basic block is detached now. Let the - // successors know it. - // This basic block also may have some predecessors too. For - // example the following LLVM-IR is valid: - // - // [cleanuppad_block] - // \| - // [regular_block] - // \| - // [cleanupret_block] - // - // The IR after the cleanup will look like this: - // - // [cleanuppad_block] - // \| - // [regular_block] - // \| - // [unreachable] - // - // So regular_block will lead to an unreachable block, which is also - // valid. There is no need to replace regular_block with unreachable - // in this context now. - // On the other hand, the cleanupret/catchret block's successors - // need to know about the deletion of their predecessors. - emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs); + UniqueEHRetBlocksToDelete.insert(ReturnInstrBB); } } + for (BasicBlock EHRetBB : + make_early_inc_range(UniqueEHRetBlocksToDelete)) + emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs); } } ```
10 days	[SampleProfile] Always use FAM to get ORE	Aiden Grossman	1	-14/+9
	The split in this code path was left over from when we had to support the old PM and the new PM at the same time. Now that the legacy pass has been dropped, this simplifies the code a little bit and swaps pointers for references in a couple places. Reviewers: aeubanks, efriedma-quic, wlei-llvm Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/159858
10 days	[SROA] Use tree-structure merge to remove alloca (#152793)	Chengjun	1	-7/+306
	This patch introduces a new optimization in SROA that handles the pattern where multiple non-overlapping vector `store`s completely fill an `alloca`. The current approach to handle this pattern introduces many `.vecexpand` and `.vecblend` instructions, which can dramatically slow down compilation when dealing with large `alloca`s built from many small vector `store`s. For example, consider an `alloca` of type `<128 x float>` filled by 64 `store`s of `<2 x float>` each. The current implementation requires: - 64 `shufflevector`s( `.vecexpand`) - 64 `select`s ( `.vecblend` ) - All operations use masks of size 128 - These operations form a long dependency chain This kind of IR is both difficult to optimize and slow to compile, particularly impacting the `InstCombine` pass. This patch introduces a tree-structured merge approach that significantly reduces the number of operations and improves compilation performance. Key features: - Detects when vector `store`s completely fill an `alloca` without gaps - Ensures no loads occur in the middle of the store sequence - Uses a tree-based approach with `shufflevector`s to merge stored values - Reduces the number of intermediate operations compared to linear merging - Eliminates the long dependency chains that hurt optimization Example transformation: ``` // Before: (stores do not have to be in order) %alloca = alloca <8 x float> store <2 x float> %val0, ptr %alloca ; offset 0-1 store <2 x float> %val2, ptr %alloca+16 ; offset 4-5 store <2 x float> %val1, ptr %alloca+8 ; offset 2-3 store <2 x float> %val3, ptr %alloca+24 ; offset 6-7 %result = load <8 x float>, ptr %alloca // After (tree-structured merge): %shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> ``` Benefits: - Logarithmic depth (O(log n)) instead of linear dependency chains - Fewer total operations for large vectors - Better optimization opportunities for subsequent passes - Significant compilation time improvements for large vector patterns For some large cases, the compile time can be reduced from about 60s to less than 3s. --------- Co-authored-by: chengjunp <chengjunp@nividia.com>
10 days	[LV] Pass operand info to getMemoryOpCost in getMemInstScalarizationCost.	Florian Hahn	1	-3/+4
	Pass operand info to getMemoryOpCost in getMemInstScalarizationCost. This matches the behavior in VPReplicateRecipe::computeCost.
11 days	Fix NDEBUG Wundef warning; NFC (#159539)	Sven van Haastregt	1	-1/+1
	The `NDEBUG` macro is tested for defined-ness everywhere else. The instance here triggers a warning when compiling with `-Wundef`.
11 days	[SeparateConstOffsetFromGEP] Check if non-extracted indices may be negative ↵	Fabian Ritter	1	-6/+7
	when preserving inbounds (#159515) If we know that the initial GEP was inbounds, and we change it to a sequence of GEPs from the same base pointer where every offset is non-negative, then the new GEPs are inbounds. So far, the implementation only checked if the extracted offsets are non-negative. In cases where non-extracted offsets can be negative, this would cause the inbounds flag to be wrongly preserved. Fixes an issue in #130617 found by nikic.
11 days	[InferAddressSpaces] Mark ConstantAggregateZero as safe to cast to a ↵	Wenju He	1	-1/+1
	ConstantExpr addrspacecast (#159695) This PR extends isSafeToCastConstAddrSpace to treat ConstantAggregateZero like ConstantPointerNull. Tests shows an extra addrspacecast instruction is removed and icmp pointer vector operand's address space is now inferred. This change is motivated by inspecting the test in commit f7629f5945f6.
11 days	Revert "[TTI][ASan][RISCV] Move InterestingMemoryOperand to Analysis and ↵	Florian Mayer	1	-18/+6
	embed in MemIntrinsicInfo" (#159700) Reverts llvm/llvm-project#157863
11 days	[TTI][ASan][RISCV] Move InterestingMemoryOperand to Analysis and embed in ↵	Hank Chang	1	-6/+18
	MemIntrinsicInfo (#157863) Previously asan considers target intrinsics as black boxes, so asan could not instrument accurate check. This patch make SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so that TTI can make targets describe their intrinsic informations to asan. Note, 1. This patch move InterestingMemoryOperand from Transforms to Analysis. 2. Extend MemIntrinsicInfo by adding a SmallVector<InterestingMemoryOperand> member. 3. This patch does not support RVV indexed/segment load/store.
11 days	[InferAddressSpaces] Extend undef pointer operand support to phi inst (#159548)	Wenju He	1	-61/+40
	Previously undef pointer operand is only supported for select inst, where undef in generic AS behaves like `take the other side`. This PR extends the support to other instructions, e.g. phi inst. Defer joining and inferring constant pointer operand until all other operand AS states considered. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
11 days	[LV] Always add uniform pointers to uniforms list.	Florian Hahn	1	-0/+5
	Always add pointers proved to be uniform via legal/SCEV to worklist. This extends the existing logic to handle a few more pointers known to be uniform.
11 days	[SLP]Clear the operands deps of non-schedulable nodes, if previously all ↵	Alexey Bataev	1	-4/+37
	operands were copyable If all operands of the non-schedulable nodes were previously only copyables, need to clear the dependencies of the original schedule data for such copyable operands and recalculate them to correctly handle number of dependecies. Fixes #159406
11 days	[VPlan] Strip dead code in cst live-in match (NFC) (#159589)	Ramkumar Ramachandra	1	-4/+1
	A live-in constant can never be of vector type.
11 days	[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)	Florian Hahn	5	-59/+95
	After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510
11 days	Use internal linkage for __NoopCoro_ResumeDestroy (#159407)	Daniel Paoliello	1	-1/+1
	`__NoopCoro_ResumeDestroy` currently has private linkage, which causes [issues for Arm64EC](https://github.com/llvm/llvm-project/issues/158341). The Arm64EC lowering is trying to mangle and add thunks for `__NoopCoro_ResumeDestroy`, since it sees that it's address is taken (and, therefore, might be called from x64 code via a function pointer). MSVC's linker requires that the function be placed in COMDAT (`LNK1361: non COMDAT symbol '.L#__NoopCoro_ResumeDestroy' in hybrid binary`) which trips an assert in the verifier (`comdat global value has private linkage`) and the subsequent linking step fails since the private symbol isn't in the symbol table. Since there is no reason to use private linkage for `__NoopCoro_ResumeDestroy` and other coro related functions have also been [switched to internal linkage to improve debugging](https://github.com/llvm/llvm-project/pull/151224), this change switches to using internal linkage. Fixes #158341
11 days	[LV] Provide utility routine to find uncounted exit recipes (#152530)	Graham Hunter	4	-0/+140
	Splitting out just the recipe finding code from #148626 into a utility function (along with the extra pattern matchers). Hopefully this makes reviewing a bit easier. Added a gtest, since this isn't actually used anywhere yet.
11 days	[DropUnnecessaryAssumes] Add pass for dropping assumes (#159403)	Nikita Popov	2	-0/+63
	This adds a new pass for dropping assumes that are unlikely to be useful for further optimization. It works by discarding any assumes whose affected values are one-use (which implies that they are only used by the assume, i.e. ephemeral). This pass currently runs at the start of the module optimization pipeline, that is post-inline and post-link. Before that point, it is more likely for previously "useless" assumes to become useful again, e.g. because an additional user of the value is introduced after inlining + CSE.
11 days	[NewPM] Remove BranchProbabilityInfo from FunctionToLoopPassAdaptor. NFCI ↵	Luke Lau	1	-6/+0
	(#159516) No loop pass seems to use now it after LoopPredication stopped using it in https://reviews.llvm.org/D111668
11 days	[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550)	Ramkumar Ramachandra	4	-13/+8

12 days	[VPlanPatternMatch] Introduce match functor (NFC) (#159521)	Ramkumar Ramachandra	4	-14/+27
	Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to introduce the VPlanPatternMatch version of the match functor to shorten some idioms. Co-authored-by: Luke Lau <luke@igalia.com>
12 days	[NewPM] Don't preserve BlockFrequencyInfo in FunctionToLoopPassAdaptor (#157888)	Luke Lau	1	-4/+0
	Function analyses in LoopStandardAnalysisResults are marked as preserved by the loop pass adaptor, because LoopAnalysisManagerFunctionProxy manually invalidates most of them. However the proxy doesn't invalidate BFI, since it is only preserved on a "lossy" basis: see https://reviews.llvm.org/D86156 and https://reviews.llvm.org/D110438. So any changes to the CFG will result in BFI giving incorrect results, which is fine for loop passes which deal with the lossiness. But the loop pass adapator still marks it as preserved, which causes the lossy result to leak out into function passes. This causes incorrect results when viewed from e.g. LoopVectorizer, where an innermost loop header may be reported to have a smaller frequency than its successors. This fixes this by dropping the call to preserve, and adds a test with the -O1 pipeline which shows the effects whenever the CFG is changed and UseBlockFrequencyInfo is set. I've also dropped it for BranchProbabilityAnalysis too, but I couldn't test for it since UseBranchProbabilityInfo always seems to be false? This may be dead code.
12 days	[InstCombine] Generalize `foldAndOrOfICmpsUsingRanges` to handle more cases. ↵	Yingwei Zheng	1	-59/+40
	(#158498) Closes https://github.com/llvm/llvm-project/issues/158326. Closes https://github.com/llvm/llvm-project/issues/59555. Proof for `(X & -Pow2) == C -> (X - C) < Pow2`: https://alive2.llvm.org/ce/z/HMgkuu
12 days	[PatternMatch] Introduce match functor (NFC) (#159386)	Ramkumar Ramachandra	7	-21/+14
	A common idiom is the usage of the PatternMatch match function within a functional algorithm like all_of. Introduce a match functor to shorten this idiom. Co-authored-by: Luke Lau <luke@igalia.com>
12 days	[SLP][NFC] Refactor a long `if` into an early `return` (#156410)	Piotr Fusik	1	-119/+118

12 days	[SCCP] Relax two-instruction range checks (#158495)	Yingwei Zheng	1	-0/+54
	If we know x in R1, the range check `x in R2` can be relaxed into `x in Union(R2, Inverse(R1))`. The latter one may be more efficient if we can represent it with one icmp. Fixes regressions introduced by https://github.com/llvm/llvm-project/pull/156497. Proof for `(X & -Pow2) == C -> (X - C) < Pow2`: https://alive2.llvm.org/ce/z/HMgkuu Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=ead4f3e271fdf6918aef2ede3a7134811147d276&to=bee3d902dd505cf9b11499ba4f230e4e8ae96b92&stat=instructions%3Au
12 days	[LV]: Ensure fairness when selecting epilogue VF. (#155547)	Hassnaa Hamdi	1	-7/+1
	Consider IC when deciding if epilogue profitable for scalable vectors, same as fixed-width vectors.
12 days	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)	Sander de Smalen	1	-3/+3
	The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.
13 days	Revert "Reland "[BasicBlockUtils] Handle funclets when detaching EH p… ↵	Gábor Spaits	1	-85/+28
	(#159292) …ad blocks" (#158435)" This reverts commit 41cef78227eb909181cb9360099b2d92de8d649f.
13 days	Reland "[BasicBlockUtils] Handle funclets when detaching EH pad blocks" ↵	Gábor Spaits	1	-28/+85
	(#158435) When removing EH Pad blocks, the value defined by them becomes poison. These poison values are then used by `catchret` and `cleanupret`, which is invalid. This commit replaces those unreachable `catchret` and `cleanupret` instructions with `unreachable`.
13 days	Revert "Reapply "[Coroutines] Add llvm.coro.is_in_ramp and drop return value ↵	Weibo He	4	-36/+19
	of llvm.coro.end #153404"" (#159236) Reverts llvm/llvm-project#155339 because of CI fail
13 days	Reapply "[Coroutines] Add llvm.coro.is_in_ramp and drop return value of ↵	Weibo He	4	-19/+36
	llvm.coro.end #153404" (#155339) As mentioned in #151067, current design of llvm.coro.end mixes two functionalities: querying where we are and lowering to some code. This patch separate these functionalities into independent intrinsics by introducing a new intrinsic llvm.coro.is_in_ramp.
13 days	Re-apply "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional ↵	Mingming Liu	1	-3/+2
	update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159161) This is a reland of https://github.com/llvm/llvm-project/pull/158460 Test failures are gone once I undo the changes in codegenprepare.
13 days	Revert "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional ↵	Mingming Liu	1	-2/+3
	update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159159) Reverts llvm/llvm-project#158460 due to buildbot failures
13 days	[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional update if ↵	Mingming Liu	1	-3/+2
	existing prefix is not equivalent to the new one. Returns whether prefix changed. (#158460) Before this change, `setSectionPrefix` overwrites existing section prefix with new one unconditionally. After this change, `setSectionPrefix` checks for equivalences, updates conditionally and returns whether an update happens. Update the existing callers to make use of the return value. [PR 155337](https://github.com/llvm/llvm-project/pull/155337/files#diff-cc0c67ac89807f4453f0cfea9164944a4650cd6873a468a0f907e7158818eae9) is a motivating use case whether the 'update' semantic is needed.
13 days	[SLPVectorizer][NFC] Save stride in a map. (#157706)	Mikhail Gudim	1	-68/+103
	In order to avoid recalculating stride of strided load twice save it in a map.
13 days	[LSR] Add an addressing mode that considers all addressing modes (#158110)	John Brawn	1	-16/+14
	The way that loops strength reduction works is that the target has to upfront decide whether it wants its addressing to be preindex, postindex, or neither. This choice affects: * Which potential solutions we generate * Whether we consider a pre/post index load/store as costing an AddRec or not. None of these choices are a good fit for either AArch64 or ARM, where both preindex and postindex addressing are typically free: * If we pick None then we count pre/post index addressing as costing one addrec more than is correct so we don't pick them when we should. * If we pick PreIndexed or PostIndexed then we get the correct cost for that addressing type, but still get it wrong for the other and also exclude potential solutions using offset addressing that could have less cost. This patch adds an "all" addressing mode that causes all potential solutions to be generated and counts both pre and postindex as having AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how we calculate the cost of things that need to be fixed before we can make use of it.
13 days	Add DebugSSAUpdater class to track debug value liveness (#135349)	Stephen Tozer	2	-0/+391
	This patch adds a class that uses SSA construction, with debug values as definitions, to determine whether and which debug values for a particular variable are live at each point in an IR function. This will be used by the IR reader of llvm-debuginfo-analyzer to compute variable ranges and coverage, although it may be applicable to other debug info IR analyses.
13 days	[VPlan] Extend CSE to eliminate GEPs (#156699)	Ramkumar Ramachandra	2	-5/+27
	The motivation for this patch is to close the gap between the VPlan-based CSE and the legacy CSE, to make it easier to remove the legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22 test failures, and after this patch, there are only 12 failures, and all of them seem to have a single root cause: VPlanTransforms::createInterleaveGroups() and VPInterleaveGroup::execute(). The improvements from this patch are of course welcome. While developing the patch, a miscompile was found when GEP source-element-types differ, and this has been fixed. Co-authored-by: Florian Hahn <flo@fhahn.com> Co-authored-by: Luke Lau <luke@igalia.com>
14 days	[MemProf] Add NodeId field to ContextNode for debugging (#158736)	Teresa Johnson	1	-5/+20
	This has been handy locally for debugging cloning issues. The NodeIds are assigned sequentially on creation and included in the dumps and the dot graphs. No measurable memory increase was found for a large thin link. I only changed one test (Transforms/MemProfContextDisambiguation/basic.ll) to actually check the emitted NodeIds, most ignore them.
2025-09-15	[VPlan] Handle predicated UDiv in VPReplicateRecipe::computeCost.	Florian Hahn	1	-3/+16
	Account for predicated UDiv,SDiv,URem,SRem in VPReplicateRecipe::computeCost: compute costs of extra phis and apply getPredBlockCostDivisor. Fixes https://github.com/llvm/llvm-project/issues/158660
2025-09-15	[InstCombine] Preserve profile data with select instructions and binary ↵	Alan Zhao	1	-6/+11
	operators (#158375) Tracking issue: #147390