aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms
AgeCommit message (Collapse)AuthorFilesLines
8 days[VPlan] Add WidenGEP::getSourceElementType (NFC) (#159029)Ramkumar Ramachandra3-17/+21
8 days[Coroutines] Take byval param alignment into account when spilling to frame ↵Hans Wennborg1-4/+8
(#159765) Fixes #159571
8 days[LV] Set correct costs for interleave group members.Florian Hahn1-3/+12
This ensures each scalarized member has an accurate cost, matching the cost it would have if it would not have been considered for an interleave group.
8 days[LV] Skip select cost for invariant divisors in legacy cost model.Florian Hahn1-8/+10
For UDiv/SDiv with invariant divisors, the created selects will be hoisted out. Don't compute their cost for each iteration, to match the more accurate VPlan-based cost modeling. Fixes https://github.com/llvm/llvm-project/issues/159402.
8 days[VPlanPatternMatch] Introduce m_ConstantInt (#159558)Ramkumar Ramachandra2-5/+33
9 days[LV] Also handle non-uniform scalarized loads when processing AddrDefs.Florian Hahn1-2/+5
Loads of addresses are scalarized and have their costs computed w/o scalarization overhead. Consistently apply this logic also to non-uniform loads that are already scalarized, to ensure their costs are consistent with other scalarized lodas that are used as addresses.
9 days[InstCombine][nfc] Fix assert failure with function entry count equal to zeroAlan Zhao1-12/+13
We were hitting an assert discovered in https://github.com/llvm/llvm-project/pull/157768#issuecomment-3315359832
9 days[IR] Fix a few implicit conversions from TypeSize to uint64_t. NFC (#159894)Craig Topper1-2/+2
9 days[MemProf] Propagate function call assignments to newly cloned nodes (#159907)Teresa Johnson1-12/+22
There are a couple of places during function cloning where we may create new callsite clone nodes. One of those places was correctly propagating the assignment to which function clone it should call, and one was not. Refactor this handling into a helper and use in both places so the newly created callsite clones actually call the assigned callee function clones.
10 daysReland [BasicBlockUtils] Handle funclets when detaching EH pad blocks (#159379)Gábor Spaits1-28/+69
Fixes #148052 . Last PR did not account for the scenario, when more than one instruction used the `catchpad` label. In that case I have deleted uses, which were already "choosen to be iterated over" by the early increment iterator. This issue was not visible in normal release build on x86, but luckily later on the address sanitizer build it has found it on the buildbot. Here is the diff from the last version of this PR: #158435 ```diff diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp index 91e245e5e8f5..1dd8cb4ee584 100644 --- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp +++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp @@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs, // first block, the we would have possible cleanupret and catchret // instructions with poison arguments, which wouldn't be valid. if (isa<FuncletPadInst>(I)) { - for (User *User : make_early_inc_range(I.users())) { + SmallPtrSet<BasicBlock *, 4> UniqueEHRetBlocksToDelete; + for (User *User : I.users()) { Instruction *ReturnInstr = dyn_cast<Instruction>(User); // If we have a cleanupret or catchret block, replace it with just an // unreachable. The other alternative, that may use a catchpad is a @@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs, if (isa<CatchReturnInst>(ReturnInstr) || isa<CleanupReturnInst>(ReturnInstr)) { BasicBlock *ReturnInstrBB = ReturnInstr->getParent(); - // This catchret or catchpad basic block is detached now. Let the - // successors know it. - // This basic block also may have some predecessors too. For - // example the following LLVM-IR is valid: - // - // [cleanuppad_block] - // | - // [regular_block] - // | - // [cleanupret_block] - // - // The IR after the cleanup will look like this: - // - // [cleanuppad_block] - // | - // [regular_block] - // | - // [unreachable] - // - // So regular_block will lead to an unreachable block, which is also - // valid. There is no need to replace regular_block with unreachable - // in this context now. - // On the other hand, the cleanupret/catchret block's successors - // need to know about the deletion of their predecessors. - emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs); + UniqueEHRetBlocksToDelete.insert(ReturnInstrBB); } } + for (BasicBlock *EHRetBB : + make_early_inc_range(UniqueEHRetBlocksToDelete)) + emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs); } } ```
10 days[SampleProfile] Always use FAM to get OREAiden Grossman1-14/+9
The split in this code path was left over from when we had to support the old PM and the new PM at the same time. Now that the legacy pass has been dropped, this simplifies the code a little bit and swaps pointers for references in a couple places. Reviewers: aeubanks, efriedma-quic, wlei-llvm Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/159858
10 days[SROA] Use tree-structure merge to remove alloca (#152793)Chengjun1-7/+306
This patch introduces a new optimization in SROA that handles the pattern where multiple non-overlapping vector `store`s completely fill an `alloca`. The current approach to handle this pattern introduces many `.vecexpand` and `.vecblend` instructions, which can dramatically slow down compilation when dealing with large `alloca`s built from many small vector `store`s. For example, consider an `alloca` of type `<128 x float>` filled by 64 `store`s of `<2 x float>` each. The current implementation requires: - 64 `shufflevector`s( `.vecexpand`) - 64 `select`s ( `.vecblend` ) - All operations use masks of size 128 - These operations form a long dependency chain This kind of IR is both difficult to optimize and slow to compile, particularly impacting the `InstCombine` pass. This patch introduces a tree-structured merge approach that significantly reduces the number of operations and improves compilation performance. Key features: - Detects when vector `store`s completely fill an `alloca` without gaps - Ensures no loads occur in the middle of the store sequence - Uses a tree-based approach with `shufflevector`s to merge stored values - Reduces the number of intermediate operations compared to linear merging - Eliminates the long dependency chains that hurt optimization Example transformation: ``` // Before: (stores do not have to be in order) %alloca = alloca <8 x float> store <2 x float> %val0, ptr %alloca ; offset 0-1 store <2 x float> %val2, ptr %alloca+16 ; offset 4-5 store <2 x float> %val1, ptr %alloca+8 ; offset 2-3 store <2 x float> %val3, ptr %alloca+24 ; offset 6-7 %result = load <8 x float>, ptr %alloca // After (tree-structured merge): %shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> ``` Benefits: - Logarithmic depth (O(log n)) instead of linear dependency chains - Fewer total operations for large vectors - Better optimization opportunities for subsequent passes - Significant compilation time improvements for large vector patterns For some large cases, the compile time can be reduced from about 60s to less than 3s. --------- Co-authored-by: chengjunp <chengjunp@nividia.com>
10 days[LV] Pass operand info to getMemoryOpCost in getMemInstScalarizationCost.Florian Hahn1-3/+4
Pass operand info to getMemoryOpCost in getMemInstScalarizationCost. This matches the behavior in VPReplicateRecipe::computeCost.
11 daysFix NDEBUG Wundef warning; NFC (#159539)Sven van Haastregt1-1/+1
The `NDEBUG` macro is tested for defined-ness everywhere else. The instance here triggers a warning when compiling with `-Wundef`.
11 days[SeparateConstOffsetFromGEP] Check if non-extracted indices may be negative ↵Fabian Ritter1-6/+7
when preserving inbounds (#159515) If we know that the initial GEP was inbounds, and we change it to a sequence of GEPs from the same base pointer where every offset is non-negative, then the new GEPs are inbounds. So far, the implementation only checked if the extracted offsets are non-negative. In cases where non-extracted offsets can be negative, this would cause the inbounds flag to be wrongly preserved. Fixes an issue in #130617 found by nikic.
11 days[InferAddressSpaces] Mark ConstantAggregateZero as safe to cast to a ↵Wenju He1-1/+1
ConstantExpr addrspacecast (#159695) This PR extends isSafeToCastConstAddrSpace to treat ConstantAggregateZero like ConstantPointerNull. Tests shows an extra addrspacecast instruction is removed and icmp pointer vector operand's address space is now inferred. This change is motivated by inspecting the test in commit f7629f5945f6.
11 daysRevert "[TTI][ASan][RISCV] Move InterestingMemoryOperand to Analysis and ↵Florian Mayer1-18/+6
embed in MemIntrinsicInfo" (#159700) Reverts llvm/llvm-project#157863
11 days[TTI][ASan][RISCV] Move InterestingMemoryOperand to Analysis and embed in ↵Hank Chang1-6/+18
MemIntrinsicInfo (#157863) Previously asan considers target intrinsics as black boxes, so asan could not instrument accurate check. This patch make SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so that TTI can make targets describe their intrinsic informations to asan. Note, 1. This patch move InterestingMemoryOperand from Transforms to Analysis. 2. Extend MemIntrinsicInfo by adding a SmallVector<InterestingMemoryOperand> member. 3. This patch does not support RVV indexed/segment load/store.
11 days[InferAddressSpaces] Extend undef pointer operand support to phi inst (#159548)Wenju He1-61/+40
Previously undef pointer operand is only supported for select inst, where undef in generic AS behaves like `take the other side`. This PR extends the support to other instructions, e.g. phi inst. Defer joining and inferring constant pointer operand until all other operand AS states considered. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
11 days[LV] Always add uniform pointers to uniforms list.Florian Hahn1-0/+5
Always add pointers proved to be uniform via legal/SCEV to worklist. This extends the existing logic to handle a few more pointers known to be uniform.
11 days[SLP]Clear the operands deps of non-schedulable nodes, if previously all ↵Alexey Bataev1-4/+37
operands were copyable If all operands of the non-schedulable nodes were previously only copyables, need to clear the dependencies of the original schedule data for such copyable operands and recalculate them to correctly handle number of dependecies. Fixes #159406
11 days[VPlan] Strip dead code in cst live-in match (NFC) (#159589)Ramkumar Ramachandra1-4/+1
A live-in constant can never be of vector type.
11 days[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)Florian Hahn5-59/+95
After https://github.com/llvm/llvm-project/pull/153643, there may be a BranchOnCond with constant condition in the entry block. Simplify those in removeBranchOnConst. This removes a number of redundant conditional branch from entry blocks. In some cases, it may also make the original scalar loop unreachable, because we know it will never execute. In that case, we need to remove the loop from LoopInfo, because all unreachable blocks may dominate each other, making LoopInfo invalid. In those cases, we can also completely remove the loop, for which I'll share a follow-up patch. Depends on https://github.com/llvm/llvm-project/pull/153643. PR: https://github.com/llvm/llvm-project/pull/154510
11 daysUse internal linkage for __NoopCoro_ResumeDestroy (#159407)Daniel Paoliello1-1/+1
`__NoopCoro_ResumeDestroy` currently has private linkage, which causes [issues for Arm64EC](https://github.com/llvm/llvm-project/issues/158341). The Arm64EC lowering is trying to mangle and add thunks for `__NoopCoro_ResumeDestroy`, since it sees that it's address is taken (and, therefore, might be called from x64 code via a function pointer). MSVC's linker requires that the function be placed in COMDAT (`LNK1361: non COMDAT symbol '.L#__NoopCoro_ResumeDestroy' in hybrid binary`) which trips an assert in the verifier (`comdat global value has private linkage`) and the subsequent linking step fails since the private symbol isn't in the symbol table. Since there is no reason to use private linkage for `__NoopCoro_ResumeDestroy` and other coro related functions have also been [switched to internal linkage to improve debugging](https://github.com/llvm/llvm-project/pull/151224), this change switches to using internal linkage. Fixes #158341
11 days[LV] Provide utility routine to find uncounted exit recipes (#152530)Graham Hunter4-0/+140
Splitting out just the recipe finding code from #148626 into a utility function (along with the extra pattern matchers). Hopefully this makes reviewing a bit easier. Added a gtest, since this isn't actually used anywhere yet.
11 days[DropUnnecessaryAssumes] Add pass for dropping assumes (#159403)Nikita Popov2-0/+63
This adds a new pass for dropping assumes that are unlikely to be useful for further optimization. It works by discarding any assumes whose affected values are one-use (which implies that they are only used by the assume, i.e. ephemeral). This pass currently runs at the start of the module optimization pipeline, that is post-inline and post-link. Before that point, it is more likely for previously "useless" assumes to become useful again, e.g. because an additional user of the value is introduced after inlining + CSE.
11 days[NewPM] Remove BranchProbabilityInfo from FunctionToLoopPassAdaptor. NFCI ↵Luke Lau1-6/+0
(#159516) No loop pass seems to use now it after LoopPredication stopped using it in https://reviews.llvm.org/D111668
11 days[VPlan] Allow zero-operand m_VPInstruction (NFC) (#159550)Ramkumar Ramachandra4-13/+8
12 days[VPlanPatternMatch] Introduce match functor (NFC) (#159521)Ramkumar Ramachandra4-14/+27
Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to introduce the VPlanPatternMatch version of the match functor to shorten some idioms. Co-authored-by: Luke Lau <luke@igalia.com>
12 days[NewPM] Don't preserve BlockFrequencyInfo in FunctionToLoopPassAdaptor (#157888)Luke Lau1-4/+0
Function analyses in LoopStandardAnalysisResults are marked as preserved by the loop pass adaptor, because LoopAnalysisManagerFunctionProxy manually invalidates most of them. However the proxy doesn't invalidate BFI, since it is only preserved on a "lossy" basis: see https://reviews.llvm.org/D86156 and https://reviews.llvm.org/D110438. So any changes to the CFG will result in BFI giving incorrect results, which is fine for loop passes which deal with the lossiness. But the loop pass adapator still marks it as preserved, which causes the lossy result to leak out into function passes. This causes incorrect results when viewed from e.g. LoopVectorizer, where an innermost loop header may be reported to have a smaller frequency than its successors. This fixes this by dropping the call to preserve, and adds a test with the -O1 pipeline which shows the effects whenever the CFG is changed and UseBlockFrequencyInfo is set. I've also dropped it for BranchProbabilityAnalysis too, but I couldn't test for it since UseBranchProbabilityInfo always seems to be false? This may be dead code.
12 days[InstCombine] Generalize `foldAndOrOfICmpsUsingRanges` to handle more cases. ↵Yingwei Zheng1-59/+40
(#158498) Closes https://github.com/llvm/llvm-project/issues/158326. Closes https://github.com/llvm/llvm-project/issues/59555. Proof for `(X & -Pow2) == C -> (X - C) < Pow2`: https://alive2.llvm.org/ce/z/HMgkuu
12 days[PatternMatch] Introduce match functor (NFC) (#159386)Ramkumar Ramachandra7-21/+14
A common idiom is the usage of the PatternMatch match function within a functional algorithm like all_of. Introduce a match functor to shorten this idiom. Co-authored-by: Luke Lau <luke@igalia.com>
12 days[SLP][NFC] Refactor a long `if` into an early `return` (#156410)Piotr Fusik1-119/+118
12 days[SCCP] Relax two-instruction range checks (#158495)Yingwei Zheng1-0/+54
If we know x in R1, the range check `x in R2` can be relaxed into `x in Union(R2, Inverse(R1))`. The latter one may be more efficient if we can represent it with one icmp. Fixes regressions introduced by https://github.com/llvm/llvm-project/pull/156497. Proof for `(X & -Pow2) == C -> (X - C) < Pow2`: https://alive2.llvm.org/ce/z/HMgkuu Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=ead4f3e271fdf6918aef2ede3a7134811147d276&to=bee3d902dd505cf9b11499ba4f230e4e8ae96b92&stat=instructions%3Au
12 days[LV]: Ensure fairness when selecting epilogue VF. (#155547)Hassnaa Hamdi1-7/+1
Consider IC when deciding if epilogue profitable for scalable vectors, same as fixed-width vectors.
12 days[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)Sander de Smalen1-3/+3
The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.
13 daysRevert "Reland "[BasicBlockUtils] Handle funclets when detaching EH p… ↵Gábor Spaits1-85/+28
(#159292) …ad blocks" (#158435)" This reverts commit 41cef78227eb909181cb9360099b2d92de8d649f.
13 daysReland "[BasicBlockUtils] Handle funclets when detaching EH pad blocks" ↵Gábor Spaits1-28/+85
(#158435) When removing EH Pad blocks, the value defined by them becomes poison. These poison values are then used by `catchret` and `cleanupret`, which is invalid. This commit replaces those unreachable `catchret` and `cleanupret` instructions with `unreachable`.
13 daysRevert "Reapply "[Coroutines] Add llvm.coro.is_in_ramp and drop return value ↵Weibo He4-36/+19
of llvm.coro.end #153404"" (#159236) Reverts llvm/llvm-project#155339 because of CI fail
13 daysReapply "[Coroutines] Add llvm.coro.is_in_ramp and drop return value of ↵Weibo He4-19/+36
llvm.coro.end #153404" (#155339) As mentioned in #151067, current design of llvm.coro.end mixes two functionalities: querying where we are and lowering to some code. This patch separate these functionalities into independent intrinsics by introducing a new intrinsic llvm.coro.is_in_ramp.
13 daysRe-apply "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional ↵Mingming Liu1-3/+2
update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159161) This is a reland of https://github.com/llvm/llvm-project/pull/158460 Test failures are gone once I undo the changes in codegenprepare.
13 daysRevert "[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional ↵Mingming Liu1-2/+3
update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159159) Reverts llvm/llvm-project#158460 due to buildbot failures
13 days[NFCI][Globals] In GlobalObjects::setSectionPrefix, do conditional update if ↵Mingming Liu1-3/+2
existing prefix is not equivalent to the new one. Returns whether prefix changed. (#158460) Before this change, `setSectionPrefix` overwrites existing section prefix with new one unconditionally. After this change, `setSectionPrefix` checks for equivalences, updates conditionally and returns whether an update happens. Update the existing callers to make use of the return value. [PR 155337](https://github.com/llvm/llvm-project/pull/155337/files#diff-cc0c67ac89807f4453f0cfea9164944a4650cd6873a468a0f907e7158818eae9) is a motivating use case whether the 'update' semantic is needed.
13 days[SLPVectorizer][NFC] Save stride in a map. (#157706)Mikhail Gudim1-68/+103
In order to avoid recalculating stride of strided load twice save it in a map.
13 days[LSR] Add an addressing mode that considers all addressing modes (#158110)John Brawn1-16/+14
The way that loops strength reduction works is that the target has to upfront decide whether it wants its addressing to be preindex, postindex, or neither. This choice affects: * Which potential solutions we generate * Whether we consider a pre/post index load/store as costing an AddRec or not. None of these choices are a good fit for either AArch64 or ARM, where both preindex and postindex addressing are typically free: * If we pick None then we count pre/post index addressing as costing one addrec more than is correct so we don't pick them when we should. * If we pick PreIndexed or PostIndexed then we get the correct cost for that addressing type, but still get it wrong for the other and also exclude potential solutions using offset addressing that could have less cost. This patch adds an "all" addressing mode that causes all potential solutions to be generated and counts both pre and postindex as having AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how we calculate the cost of things that need to be fixed before we can make use of it.
13 daysAdd DebugSSAUpdater class to track debug value liveness (#135349)Stephen Tozer2-0/+391
This patch adds a class that uses SSA construction, with debug values as definitions, to determine whether and which debug values for a particular variable are live at each point in an IR function. This will be used by the IR reader of llvm-debuginfo-analyzer to compute variable ranges and coverage, although it may be applicable to other debug info IR analyses.
13 days[VPlan] Extend CSE to eliminate GEPs (#156699)Ramkumar Ramachandra2-5/+27
The motivation for this patch is to close the gap between the VPlan-based CSE and the legacy CSE, to make it easier to remove the legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22 test failures, and after this patch, there are only 12 failures, and all of them seem to have a single root cause: VPlanTransforms::createInterleaveGroups() and VPInterleaveGroup::execute(). The improvements from this patch are of course welcome. While developing the patch, a miscompile was found when GEP source-element-types differ, and this has been fixed. Co-authored-by: Florian Hahn <flo@fhahn.com> Co-authored-by: Luke Lau <luke@igalia.com>
14 days[MemProf] Add NodeId field to ContextNode for debugging (#158736)Teresa Johnson1-5/+20
This has been handy locally for debugging cloning issues. The NodeIds are assigned sequentially on creation and included in the dumps and the dot graphs. No measurable memory increase was found for a large thin link. I only changed one test (Transforms/MemProfContextDisambiguation/basic.ll) to actually check the emitted NodeIds, most ignore them.
2025-09-15[VPlan] Handle predicated UDiv in VPReplicateRecipe::computeCost.Florian Hahn1-3/+16
Account for predicated UDiv,SDiv,URem,SRem in VPReplicateRecipe::computeCost: compute costs of extra phis and apply getPredBlockCostDivisor. Fixes https://github.com/llvm/llvm-project/issues/158660
2025-09-15[InstCombine] Preserve profile data with select instructions and binary ↵Alan Zhao1-6/+11
operators (#158375) Tracking issue: #147390