aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Utils/LoopUnroll.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-09-09[AArch64] Enable RT and partial unrolling with reductions for Apple CPUs. ↵Florian Hahn1-2/+4
(#149699) Update unrolling preferences for Apple Silicon CPUs to enable partial unrolling and runtime unrolling for small loops with reductions. This builds on top of unroller changes to introduce parallel reduction phis, if possible: https://github.com/llvm/llvm-project/pull/149470. PR: https://github.com/llvm/llvm-project/pull/149699
2025-09-04[LoopUnroll] Introduce parallel reduction phis when unrolling. (#149470)Florian Hahn1-0/+136
When partially or runtime unrolling loops with reductions, currently the reductions are performed in-order in the loop, negating most benefits from unrolling such loops. This patch extends unrolling code-gen to keep a parallel reduction phi per unrolled iteration and combining the final result after the loop. For out-of-order CPUs, this allows executing mutliple reduction chains in parallel. For now, the initial transformation is restricted to cases where we unroll a small number of iterations (hard-coded to 4, but should maybe be capped by TTI depending on the execution units), to avoid introducing an excessive amount of parallel phis. It also requires single block loops for now, where the unrolled iterations are known to not exit the loop (either due to runtime unrolling or partial unrolling). This ensures that the unrolled loop will have a single basic block, with a single exit block where we can place the final reduction value computation. The initial implementation also only supports parallelizing loops with a single reduction and only integer reductions. Those restrictions are just to keep the initial implementation simpler, and can easily be lifted as follow-ups. With corresponding TTI to the AArch64 unrolling preferences which I will also share soon, this triggers in ~300 loops across a wide range of workloads, including LLVM itself, ffmgep, av1aom, sqlite, blender, brotli, zstd and more. PR: https://github.com/llvm/llvm-project/pull/149470
2025-05-24[Transforms] Remove unused includes (NFC) (#141357)Kazu Hirata1-2/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-09[KeyInstr][LoopUnroll] Remap atoms while unrolling (#133489)Orlando Cazalet-Hyams1-1/+10
RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
2025-04-26[llvm] Use llvm::replace (NFC) (#137481)Kazu Hirata1-1/+1
2025-03-23[Transforms] Use *Set::insert_range (NFC) (#132652)Kazu Hirata1-2/+1
We can use *Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range); In some cases, we can further fold that into the set declaration.
2025-01-27[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462)Florian Hahn1-4/+5
Add an extra knob to RuntimeUnrollMultiExit to let backends control whether to allow multi-exit unrolling on a per-loop basis. This gives backends more fine-grained control on deciding if multi-exit unrolling is profitable for a given loop and uarch. Similar to 4226e0a0c75. PR: https://github.com/llvm/llvm-project/pull/124462
2024-12-02[TTI] Add SCEVExpansionBudget to loop unrolling options. (#118316)Florian Hahn1-4/+5
Add an extra know to UnrollingPreferences to let backends control the maximum budget for SCEV expansions. This gives backends more fine-grained control on the cost of the runtime checks for runtime unrolling. PR: https://github.com/llvm/llvm-project/pull/118316
2024-11-08[DebugInfo][LoopUnroll] Preserve DebugLocs on optimized cond branches (#114225)Stephen Tozer1-1/+2
This patch fixes a simple error where as part of loop unrolling we optimize conditional loop-exiting branches into unconditional branches when we know that they will or won't exit the loop, but does not propagate the source location of the original branch to the new one. Found using https://github.com/llvm/llvm-project/pull/107279.
2024-11-05[Utils] Simplify code with DenseMap::operator[] (NFC) (#114932)Kazu Hirata1-1/+1
2024-11-04[Utils] Remove unused includes (NFC) (#114748)Kazu Hirata1-3/+0
Identified with misc-include-cleaner.
2024-09-23[Loops] Use forgetLcssaPhiWithNewPredecessor() in more placesNikita Popov1-1/+1
Use the more aggressive invalidation method in a number of places that add incoming values to lcssa phi nodes. It is likely that it's possible to construct cases with incorrect SCEV preservation similar to https://github.com/llvm/llvm-project/issues/109333 for these.
2024-07-21[Transforms] Use range-based for loops (NFC) (#99607)Kazu Hirata1-2/+2
2024-06-27[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)Nikita Popov1-1/+1
This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.
2024-06-06[LoopUnroll] Consider convergence control tokens when unrolling (#91715)Sameer Sahasrabuddhe1-14/+33
- There is no restriction on a loop with controlled convergent operations when the relevant tokens are defined and used within the loop. - When a token defined outside a loop is used inside (also called a loop convergence heart), unrolling is allowed only in the absence of remainder or runtime checks. - When a token defined inside a loop is used outside, such a loop is said to be "extended". This loop can only be unrolled by also duplicating the extended part lying outside the loop. Such unrolling is disabled for now. - Clean up loop hearts: When unrolling a loop with a heart, duplicating the heart will introduce multiple static uses of a convergence control token in a cycle that does not contain its definition. This violates the static rules for tokens, and needs to be cleaned up into a single occurrence of the intrinsic. - Spell out the initializer for UnrollLoopOptions to improve readability. Original implementation [D85605] by Nicolai Haehnle <nicolai.haehnle@amd.com>.
2024-05-13[LoopUnroll] Remove redundant debug instructions after blocks have been ↵chenlin1-0/+4
merged (#91246) Remove redundant debug instructions after blocks have been merged into the predecessor, It can reduce some compile time in some cases. This change only fixes the situation of loop unrolling, and other situations are not considered. "RemoveRedundantDbgInstrs" seems to be very time-consuming. Thus, we just add here after the "Dest" has been merged into the "Fold", this may be a more targeted solution!!! fixes: https://github.com/llvm/llvm-project/issues/89073
2024-05-04[Transforms] Use StringRef::operator== instead of StringRef::equals (NFC) ↵Kazu Hirata1-1/+1
(#91072) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 31 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-02[LoopUnroll] Add CSE to remove redundant loads after unrolling. (#83860)Florian Hahn1-8/+147
This patch adds loadCSE support to simplifyLoopAfterUnroll. It is based on EarlyCSE's implementation using ScopeHashTable and is using SCEV for accessed pointers to check to find redundant loads after unrolling. This applies to the late unroll pass only, for full unrolling those redundant loads will be cleaned up by the regular pipeline. The current approach constructs MSSA on-demand per-loop, but there is still small but notable compile-time impact: stage1-O3 +0.04% stage1-ReleaseThinLTO +0.06% stage1-ReleaseLTO-g +0.05% stage1-O0-g +0.02% stage2-O3 +0.09% stage2-O0-g +0.04% stage2-clang +0.02% https://llvm-compile-time-tracker.com/compare.php?from=c089fa5a729e217d0c0d4647656386dac1a1b135&to=ec7c0f27cb5c12b600d9adfc8543d131765ec7be&stat=instructions:u This benefits some workloads with runtime-unrolling disabled, where users use pragmas to force unrolling, as well as with runtime unrolling enabled. On SPEC/MultiSource, this removes a number of loads after unrolling on AArch64 with runtime unrolling enabled. ``` External/S...te/526.blender_r/526.blender_r 96 MultiSourc...rks/mediabench/gsm/toast/toast 39 SingleSource/Benchmarks/Misc/ffbench 4 External/SPEC/CINT2006/403.gcc/403.gcc 18 MultiSourc.../Applications/JM/ldecod/ldecod 4 MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg 6 MultiSourc...OE-ProxyApps-C/miniGMG/miniGMG 9 MultiSourc...e/Applications/ClamAV/clamscan 4 MultiSourc.../MallocBench/espresso/espresso 3 MultiSourc...dence-flt/LinearDependence-flt 2 MultiSourc...ch/office-ispell/office-ispell 4 MultiSourc...ch/consumer-jpeg/consumer-jpeg 6 MultiSourc...ench/security-sha/security-sha 11 MultiSourc...chmarks/McCat/04-bisect/bisect 3 SingleSour...tTests/2020-01-06-coverage-009 12 MultiSourc...ench/telecomm-gsm/telecomm-gsm 39 MultiSourc...lds-flt/CrossingThresholds-flt 24 MultiSourc...dence-dbl/LinearDependence-dbl 2 External/S...C/CINT2006/445.gobmk/445.gobmk 6 MultiSourc...enchmarks/mafft/pairlocalalign 53 External/S...31.deepsjeng_r/531.deepsjeng_r 3 External/S...rate/510.parest_r/510.parest_r 58 External/S...NT2006/464.h264ref/464.h264ref 29 External/S...NT2017rate/502.gcc_r/502.gcc_r 45 External/S...C/CINT2006/456.hmmer/456.hmmer 6 External/S...te/538.imagick_r/538.imagick_r 18 External/S.../CFP2006/447.dealII/447.dealII 4 MultiSourc...OE-ProxyApps-C++/miniFE/miniFE 12 External/S...2017rate/525.x264_r/525.x264_r 36 MultiSourc...Benchmarks/7zip/7zip-benchmark 33 MultiSourc...hmarks/ASC_Sequoia/AMGmk/AMGmk 2 MultiSourc...chmarks/VersaBench/8b10b/8b10b 1 MultiSourc.../Applications/JM/lencod/lencod 116 MultiSourc...lds-dbl/CrossingThresholds-dbl 24 MultiSource/Benchmarks/McCat/05-eks/eks 15 ``` PR: https://github.com/llvm/llvm-project/pull/83860
2024-03-04[RemoveDIs] Reapply 3fda50d3915, insert instructions using iteratorsJeremy Morse1-1/+1
I'd reverted this in 6c7805d5d1 after a bad stage. Original commit messsage follows: [NFC][RemoveDIs] Bulk update utilities to insert with iterators As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. I've also switched DemotePHIToStack to take an optional iterator: it needs to take an iterator, and having a no-insert-location behaviour appears to be important. The constructors for ICmpInst and FCmpInst have been updated too. They're the only instructions that take block _references_ rather than pointers for certain calls, and a future patch is going to make use of default-null block insertion locations. All of this should be NFC.
2024-02-29Revert "[NFC][RemoveDIs] Bulk update utilities to insert with iterators"Jeremy Morse1-1/+1
This reverts commit 3fda50d3915b2163a54a37b602be7783a89dd808. Apparently I've missed a hunk while staging this; will back out for now. Picked up here: https://lab.llvm.org/buildbot/#/builders/139/builds/60429/steps/6/logs/stdio
2024-02-29[NFC][RemoveDIs] Bulk update utilities to insert with iteratorsJeremy Morse1-1/+1
As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. I've also switched DemotePHIToStack to take an optional iterator: it needs to take an iterator, and having a no-insert-location behaviour appears to be important. The constructors for ICmpInst and FCmpInst have been updated too. They're the only instructions that take block _references_ rather than pointers for certain calls, and a future patch is going to make use of default-null block insertion locations. All of this should be NFC.
2023-10-24[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)Kazu Hirata1-1/+1
C++20 comes with std::erase to erase a value from std::vector. This patch renames llvm::erase_value to llvm::erase for consistency with C++20. We could make llvm::erase more similar to std::erase by having it return the number of elements removed, but I'm not doing that for now because nobody seems to care about that in our code base. Since there are only 50 occurrences of erase_value in our code base, this patch replaces all of them with llvm::erase and deprecates llvm::erase_value.
2023-10-22[llvm] Stop including llvm/ADT/iterator_range.h (NFC)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2023-07-05[LoopUnroll] Fold add chains during unrollingNikita Popov1-0/+27
Loop unrolling tends to produce chains of `%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled iteration. This patch simplifies these adds to `%xN = add %x0, N` directly during unrolling, rather than waiting for InstCombine to do so. The motivation for this is that having a single add (rather than an add chain) on the induction variable makes it a simple recurrence, which we specially recognize in a number of places. This allows InstCombine to directly perform folds with that knowledge, instead of first folding the add chains, and then doing other folds in another InstCombine iteration. Due to the reduced number of InstCombine iterations, this also results in a small compile-time improvement. Differential Revision: https://reviews.llvm.org/D153540
2023-06-05Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)"Nikita Popov1-1/+1
This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7. In preparation for reverting a dependent revision.
2023-05-10[PseudoProbe] Clean up dwarf discriminator and avoid duplicating factor.Hongtao Yu1-1/+1
A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dwarf discriminator, it will be shared with the probe as well. This can confuse the later FS-AFDO discriminator assignment pass. To fix this, I'm cleaning up the discriminator fields for probes when they are inserted. I also notice another possibility to change the discriminator field of pseudo probes in the pipeline before the FS discriminator assignment pass. That is the loop unroller, which assigns duplication factor to instruction being vectorized. I'm disabling that for pseudo probe intrinsics specifically, also for callsites with probes. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D148569
2023-05-02[LCSSA] Remove unused ScalarEvolution argument (NFC)Nikita Popov1-1/+1
After D149435, LCSSA formation no longer needs access to ScalarEvolution, so remove the argument from the utilities.
2023-02-14[loop unroll] Fix `branch-weights` for unrolled loop.Mircea Trofin1-1/+12
The branch weights of the unrolled loop need to be reduced by the unroll factor. Differential Revision: https://reviews.llvm.org/D143948
2023-01-20Recommit "[LoopUnroll] Directly update DT instead of DTU."Florian Hahn1-10/+43
This reverts commit c5ea42bcf48c8f3d3e35a6bff620b06d2a499108. Recommit the patch with a fix for loops where the exiting terminator is not a branch instruction. In that case, ExitInfos may be empty. In addition to checking if there's a single exiting block also check if there's a single ExitInfo. A test case has been added in f92b35392ed8e4631.
2023-01-19Revert "[LoopUnroll] Directly update DT instead of DTU."Arthur Eubanks1-43/+10
This reverts commit d0907ce7ed9f159562ca3f4cfd8d87e89e93febe. Causes `opt -passes=loop-unroll-full` to crash on ``` define void @foo() { bb: br label %bb1 bb1: ; preds = %bb1, %bb1, %bb switch i1 true, label %bb1 [ i1 true, label %bb2 i1 false, label %bb1 ] bb2: ; preds = %bb1 ret void } ```
2023-01-19[LoopUnroll] Directly update DT instead of DTU.Florian Hahn1-10/+43
The scope of DT updates are very limited when unrolling loops: the DT should only need updating for * new blocks added * exiting blocks we simplified branches This can be done manually without too much extra work. MergeBlockIntoPredecessor also needs to be updated to support direct DT updates. This fixes excessive time spent in DTU for same cases. In an internal example, time spent in LoopUnroll with this patch goes from ~200s to 2s. It also is slightly positive for CTMark: * NewPM-O3: -0.13% * NewPM-ReleaseThinLTO: -0.11% * NewPM-ReleaseLTO-g: -0.13% Notable improvements are mafft (~ -0.50%) and lencod (~ -0.30%), with no workload regressed. https://llvm-compile-time-tracker.com/compare.php?from=78a9ee7834331fb4360457cc565fa36f5452f7e0&to=687e08d011b0dc6d3edd223612761e44225c7537&stat=instructions:u Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D141487
2023-01-16[LoopUnroll] Don't update DT for changeToUnreachable.Florian Hahn1-2/+7
There is no need to update the DT here, because there must be a unique latch. Hence if the latch is not exiting it must directly branch back to the original loop header and does not dominate any nodes. Skipping a DT update here simplifies D141487. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D141810
2022-12-21[NFC] Rename Function::isDebugInfoForProfiling to shouldEmit[...]Mircea Trofin1-1/+2
The function name was misleading - the expectation set both by the name and by other members of Function (like isDeclaration or isIntrinsic) would be that the function somehow would "be" "debug info for profiling". But that's not the case - the property indicates (as the comment over the declaration also explains) whether debug info should be emitted (for profiling).
2022-12-15[NFC] Rename Function::insertBasicBlockAt() to Function::insert().Vasileios Porpodas1-1/+1
I think this is a better name because it is what STL uses. Differential Revision: https://reviews.llvm.org/D140068
2022-12-14Don't include Optional.hKazu Hirata1-1/+0
These files no longer use llvm::Optional.
2022-12-12[IR][NFC] Adds Function::insertBasicBlockAt() to replace things like ↵Vasileios Porpodas1-1/+1
F->getBasicBlockList().insert() This is part of a series of patches that aim at making Function::getBasicBlockList() private. Differential Revision: https://reviews.llvm.org/D139906
2022-12-12Transforms/Utils: llvm::Optional => std::optionalFangrui Song1-2/+2
2022-12-02[Transforms] Use std::nullopt instead of None (NFC)Kazu Hirata1-4/+4
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-23[NFC] Replaced BB->getInstList().{erase(),pop_front(),pop_back()} with ↵Vasileios Porpodas1-2/+2
eraseFromParent(). Differential Revision: https://reviews.llvm.org/D138617
2022-10-18[LoopUnroll] Forget exit values when making changes.Florian Hahn1-0/+1
When unrolling, the exit values in LCSSA phis will get updated. Invalidate cached SCEV values for those phis in case SCEV looked through a exit phi. Fixes #58340.
2022-09-27[LoopUnroll] Forget block and loop dispositions during unrolling.Florian Hahn1-1/+3
After unrolling a loop, the block and loop dispositions need to be cleared. As we don't know which SCEVs in the loop/blocks may be impacted, completely clear the cache. This should also fix some cases where deleted loops remained in the LoopDispositions cache. This fixes a verification failure surfaced by D134531. I am planning on reviewing/updating the existing uses of forgetLoopDispositions to check if they should be replaced by forgetBlockAndLoopDispositions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134612
2022-08-27Use std::gcd (NFC)Kazu Hirata1-1/+2
This patch replaces calls to GreatestCommonDivisor64 with std::gcd where both arguments are known to be of unsigned types no larger than 64 bits in size.
2022-06-20[llvm] Don't use Optional::getValue (NFC)Kazu Hirata1-1/+1
2022-06-09[NFC] format InstructionSimplify & lowerCaseFunctionNamesSimon Moll1-1/+1
Clang-format InstructionSimplify and convert all "FunctionName"s to "functionName". This patch does touch a lot of files but gets done with the cleanup of InstructionSimplify in one commit. This is the alternative to the less invasive clang-format only patch: D126783 Reviewed By: spatel, rengolin Differential Revision: https://reviews.llvm.org/D126889
2022-01-06[unroll] Strengthen verification of analysis updates under expensive assertsPhilip Reames1-0/+18
I am suspecting a bug around updates of loop info for unreachable exits, but don't have a test case. Running this locally on make check didn't reveal anything, we'll see if the expensive checks bots find it.
2022-01-03Revert "[unroll] Prune all but first copy of invariant exit"Philip Reames1-5/+0
This reverts commit 9bd22595bad36cd19f5e7ae18ccd9f41cba29dc5. Seeing some bot failures which look plausibly connected. Revert while investigating/waiting for bots to stablize. e.g. https://lab.llvm.org/buildbot#builders/36/builds/15933
2022-01-03[unroll] Prune all but first copy of invariant exitPhilip Reames1-0/+5
If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead. The change itself is pretty straight forward, but let me add two points of context: * I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying. * I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure. Differential Revision: https://reviews.llvm.org/D116496
2021-11-12[unroll] Keep unrolled iterations with initial iterationPhilip Reames1-1/+5
The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
2021-09-13[Utils] Use make_early_inc_range (NFC)Kazu Hirata1-7/+6
2021-07-26[Local] Do not introduce a new `llvm.trap` before `unreachable`Johannes Doerfert1-2/+1
This is the second attempt to remove the `llvm.trap` insertion after https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6 reverted the first one. It is not clear what the exact issue was back then and it might already be gone by now, it has been >5 years after all. Replaces D106299. Differential Revision: https://reviews.llvm.org/D106308