aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-09-16[LSR] Add an addressing mode that considers all addressing modes (#158110)John Brawn1-16/+14
The way that loops strength reduction works is that the target has to upfront decide whether it wants its addressing to be preindex, postindex, or neither. This choice affects: * Which potential solutions we generate * Whether we consider a pre/post index load/store as costing an AddRec or not. None of these choices are a good fit for either AArch64 or ARM, where both preindex and postindex addressing are typically free: * If we pick None then we count pre/post index addressing as costing one addrec more than is correct so we don't pick them when we should. * If we pick PreIndexed or PostIndexed then we get the correct cost for that addressing type, but still get it wrong for the other and also exclude potential solutions using offset addressing that could have less cost. This patch adds an "all" addressing mode that causes all potential solutions to be generated and counts both pre and postindex as having AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how we calculate the cost of things that need to be fixed before we can make use of it.
2025-07-24[Scalar] Remove an unnecessary cast (NFC) (#150474)Kazu Hirata1-1/+1
getOperand() already returns Value *.
2025-07-18[LSR] Do not consider uses in lifetime intrinsics (#149492)Nikita Popov1-0/+5
We should ignore uses of pointers in lifetime intrinsics, as these are not actually materialized in the final code, so don't affect register pressure or anything else LSR needs to model. Handling these only results in peculiar rewrites where additional intermediate GEPs are introduced.
2025-07-18[DebugInfo] Suppress lots of users of DbgValueInst (#149476)Jeremy Morse1-105/+74
This is another prune of dead code -- we never generate debug intrinsics nowadays, therefore there's no need for these codepaths to run. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2025-07-15[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383)Jeremy Morse1-1/+1
There are no longer debug-info instructions, thus we don't need this skipping. Horray!
2025-07-14[LSR] Account for hardware loop instructions (#147958)John Brawn1-19/+50
A hardware loop instruction combines a subtract, compare with zero, and branch. We currently account for the compare and branch being combined into one in Cost::RateFormula, as part of more general handling for compare-branch-zero, but don't account for the subtract, leading to suboptimal decisions in some cases. Fix this in Cost::RateRegister by noticing when we have such a subtract and discounting the AddRecCost in such a case.
2025-07-14[DebugInfo][LoopStrengthReduce] Salvage the debug value of the dead cmp ↵Shan Huang1-1/+3
instruction (#147241) Fix #147238
2025-07-03[LSR] Strip dead code (NFC) (#146109)Ramkumar Ramachandra1-9/+0
Nested AddRec is already rejected by the handling in pushSCEV().
2025-06-28[LSR] Clean up code using SCEVPatternMatch (NFC) (#145556)Ramkumar Ramachandra1-48/+37
2025-06-17[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)Jeremy Morse1-5/+1
Seeing how we can't generate any debug intrinsics any more: delete a variety of codepaths where they're handled. For the most part these are plain deletions, in others I've tweaked comments to remain coherent, or added a type to (what was) type-generic-lambdas. This isn't all the DbgInfoIntrinsic call sites but it's most of the simple scenarios. Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-16[LSR] Make canHoistIVInc allow non-integer types (#143707)John Brawn1-3/+2
canHoistIVInc was made to only allow integer types to avoid a crash in isIndexedLoadLegal/isIndexedStoreLegal due to them failing an assertion in getValueType (or rather in MVT::getVT which gets called from that) when passed a struct type. Adjusting these functions to pass AllowUnknown=true to getValueType means we don't get an assertion failure (MVT::Other is returned which TLI->isIndexedLoadLegal should then return false for), meaning we can remove this check for integer type.
2025-06-08[llvm] Use *Map::try_emplace (NFC) (#143321)Kazu Hirata1-1/+1
- try_emplace(Key) is shorter than insert(std::make_pair(Key, 0)). - try_emplace performs value initialization without value parameters. - We overwrite values on successful insertion anyway.
2025-05-26[llvm] Value-initialize values with *Map::try_emplace (NFC) (#141522)Kazu Hirata1-1/+1
try_emplace value-initializes values, so we do not need to pass nullptr to try_emplace when the value types are raw pointers or std::unique_ptr<T>.
2025-05-25[SCEV] Add dedicated AffineAddRec matcher + loop matchers (NFC). (#141141)Florian Hahn1-3/+4
Add dedicated m_scev_AffineAddRec matcher with complementing m_Loop() and m_SpecificLoop matchers. PR: https://github.com/llvm/llvm-project/pull/141141
2025-05-24[Transforms] Remove unused includes (NFC) (#141357)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-21[llvm] Use *Map::try_emplace (NFC) (#140843)Kazu Hirata1-3/+2
try_emplace can default-construct values, so we do not need to do so on our own. Plus, try_emplace(Key) is much shorter than insert(std::make_pair(Key, Value()).
2025-05-19[SCEVPatternMatch] Introduce m_scev_AffineAddRec (#140377)Ramkumar Ramachandra1-66/+54
Introduce m_scev_AffineAddRec to match affine AddRecs, a class_match for SCEVConstant, and demonstrate their utility in LSR and SCEV. While at it, rename m_Specific to m_scev_Specific for clarity.
2025-05-17[NFC] Add a specialization of DenseMapInfo for SmallVector (#140380)Jon Chesterfield1-28/+2
Equivalent to the three existing uses I found which were all pointers. Implementing the general pattern so SmallVector<int> etc will work as well. Added to the SmallVector.h header as opposed to DenseMapInfo.h following the StringRef.h and SmallBitVector.h prior art. Noticed while writing an unrelated patch which currently wants a map from small vectors to other things and cleaner to generalise than add another specialisation to said patch.
2025-05-08[LSR] Replace casts with an equivalent std::as_const (NFC) (#138980)Sergei Barannikov1-4/+2
The casts / `std::as_const` are used here to select `const` overload of `begin()`/`end()` so that the type of the returned iterator matches the type of `J`, which is `const_iterator`.
2025-04-23[CostModel] Remove optional from InstructionCost::getValue() (#135596)David Green1-1/+1
InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.
2025-04-20[llvm] Call hash_combine_range with ranges (NFC) (#136511)Kazu Hirata1-1/+1
2025-03-19[Transforms] Use *Set::insert_range (NFC) (#132056)Kazu Hirata1-5/+4
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.
2025-01-27[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283)Jeremy Morse1-2/+2
To finalise the "RemoveDIs" work removing debug intrinsics, we're updating call sites that insert instructions to use iterators instead. This set of changes are those where it's not immediately obvious that just calling getIterator to fetch an iterator is correct, and one or two places where more than one line needs to change. Overall the same rule holds though: iterators generated for the start of a block such as getFirstNonPHIIt need to be passed into insert/move methods without being unwrapped/rewrapped, everything else can use getIterator.
2025-01-27[NFC][DebugInfo] Use iterators for instruction insertion in more places ↵Jeremy Morse1-1/+1
(#124291) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.
2025-01-24[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)Jeremy Morse1-1/+1
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).
2024-11-05[LSR][NFC] Use range-based `for` (#113889)Piotr Fusik1-7/+7
2024-11-02[Scalar] Remove unused includes (NFC) (#114645)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2024-10-17[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576)Youngsuk Kim1-2/+6
Comments attached to the `ScaledReg` field of `struct Formula` explains that, `ScaledReg` must be non-null when `Scale` is non-zero. This fixes up a code path where this invariant is violated. Also, add an assert to ensure this invariant holds true. Without this patch, compiler aborts with the attached test case. Fixes #76504
2024-10-03[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits ↵Orlando Cazalet-Hyams1-0/+2
wide (#110979) Fixes #110494
2024-10-03Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef ↵Mehdi Amini1-2/+2
(#110938) This macros is always defined: either 0 or 1. The correct pattern is to use #if. Re-apply #110185 with more fixes for debug build with the ABI breaking checks disabled.
2024-09-09Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA ↵Sergey Kachkov1-15/+16
form" (#107380) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice. Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are moved into separate duplicated-phis.ll test with explicit x86 target requirement to fix bots which are not building this target.
2024-09-06Revert "[LSR] Do not create duplicated PHI nodes while preserving LCSSA ↵dyung1-16/+15
form" (#107666) Reverts llvm/llvm-project#107380 Change is causing the test preserve-lcssa.ll to fail on at least 2 build bots: - https://lab.llvm.org/buildbot/#/builders/190/builds/5231 - https://lab.llvm.org/buildbot/#/builders/161/builds/1855
2024-09-06[LSR] Do not create duplicated PHI nodes while preserving LCSSA form (#107380)Sergey Kachkov1-15/+16
Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice.
2024-08-28[LSR] Use computeConstantDifference()Nikita Popov1-3/+3
This API is faster than getMinusSCEV() and a SCEVConstant cast.
2024-08-17[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234)Philip Reames1-262/+0
This transformation doesn't actually use any of the internal state of LSR and recomputes all information from SCEV. Splitting it out makes it easier to test. Note that long term I would like to write a version of this transform which *is* integrated with LSR's solver, but if that happens, we'll just delete the extra pass. Integration wise, I switched from using TTI to using a pass configuration variable. This seems slightly more idiomatic, and means we don't run the extra logic on any target other than RISCV.
2024-07-24[LSR] Fix matching vscale immediates (#100080)Benjamin Maxwell1-2/+4
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the `SCEVMulExpr`, so would ignore any operands after the first two. This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a `vscale * vscale` multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first `16 * vscale * vscale` (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by `#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).
2024-07-15[DebugInfo][LoopStrengthReduce] Fix missing debug location updates (#97519)Shan Huang1-0/+2
Fix #97510 . Note that, for the new phi instruction `NewPH`, which replaces the old phi `PH` and the cast `ShadowUse`, I choose to propagate the debug location of `PH` to it, because the cast is eliminated according to the optimization semantics.
2024-07-14[Transforms] Use range-based for loops (NFC) (#98725)Kazu Hirata1-2/+2
2024-07-01[LSR] Recognize vscale-relative immediates (#88124)Graham Hunter1-187/+428
Extends LoopStrengthReduce to recognize immediates multiplied by vscale, and query the current target for whether they are legal offsets for memory operations or adds.
2024-06-27[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)Nikita Popov1-6/+6
This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.
2024-06-10[LSR][AArch64] Optimize chain generation based on legal addressing modes ↵David Green1-14/+58
(#94453) LSR will generate chains of related instructions with a known increment between them. With SVE, in the case of the test case, this can include increments like 'vscale * 16 + 8'. The idea of this patch is if we have a '+8' increment already calculated in the chain, we can generate a (legal) '+ vscale*16' addressing mode from it, allowing us to use the '[x16, #1, mul vl]' addressing mode instructions. In order to do this we keep track of the known 'bases' when generating chains in GenerateIVChain, checking for each if the accumulated increment expression from the base neatly folds into a legal addressing mode. If they do not we fall back to the existing LeftOverExpr, whether it is legal or not. This is mostly orthogonal to #88124, dealing with the generation of chains as opposed to rest of LSR. The existing vscale addressing mode work has greatly helped compared to the last time I looked at this, allowing us to check that the addressing modes are indeed legal.
2024-06-05[LSR] Provide TTI hook to enable dropping solutions deemed to be ↵Alex Bradbury1-3/+15
unprofitable (#89924) <https://reviews.llvm.org/D126043> introduced a flag to drop solutions if deemed unprofitable. As noted there, introducing a TTI hook enables backends to individually opt into this behaviour. This will be used by #89927.
2024-05-14[LSR] Tweak debug output to always print initial costPhilip Reames1-2/+2
2024-05-10[TTI] Support scalable offsets in getScalingFactorCost (#88113)Graham Hunter1-2/+4
Part of the work to support vscale-relative immediates in LSR.
2024-03-19[RemoveDIs][NFC] Rename DPValue -> DbgVariableRecord (#85216)Stephen Tozer1-10/+10
This is the major rename patch that prior patches have built towards. The DPValue class is being renamed to DbgVariableRecord, which reflects the updated terminology for the "final" implementation of the RemoveDI feature. This is a pure string substitution + clang-format patch. The only manual component of this patch was determining where to perform these string substitutions: `DPValue` and `DPV` are almost exclusively used for DbgRecords, *except* for: - llvm/lib/target, where 'DP' is used to mean double-precision, and so appears as part of .td files and in variable names. NB: There is a single existing use of `DPValue` here that refers to debug info, which I've manually updated. - llvm/tools/gold, where 'LDPV' is used as a prefix for symbol visibility enums. Outside of these places, I've applied several basic string substitutions, with the intent that they only affect DbgRecord-related identifiers; I've checked them as I went through to verify this, with reasonable confidence that there are no unintended changes that slipped through the cracks. The substitutions applied are all case-sensitive, and are applied in the order shown: ``` DPValue -> DbgVariableRecord DPVal -> DbgVarRec DPV -> DVR ``` Following the previous rename patches, it should be the case that there are no instances of any of these strings that are meant to refer to the general case of DbgRecords, or anything other than the DPValue class. The idea behind this patch is therefore that pure string substitution is correct in all cases as long as these assumptions hold.
2024-03-14[RemoveDIs][NFC] Move DPValue::filter -> filterDbgVars (#85208)Stephen Tozer1-1/+1
This patch changes DPValue::filter to be a non-member method filterDbgVars. There are two reasons for this: firstly, the name of DPValue is about to change to DbgVariableRecord, which will result in every `for` loop that uses DPValue::filter to require a line break. This is a small thing, but it makes the rename patch more difficult to review, and is just generally more awkward for what is a fairly common loop. Secondly, the intent is to later break up the DPValue class into subclasses, at which point it would be better to have a non-member function that allows template arguments for the cases we want to filter with greater specificity.
2024-03-12[LSR] Clear SCEVExpander before deleting phi nodesNikita Popov1-0/+2
Fixes https://github.com/llvm/llvm-project/issues/84709.
2024-03-12[RemoveDIs][NFC] Rename common interface functions for DPValues->DbgRecords ↵Stephen Tozer1-1/+1
(#84793) As part of the effort to rename the DbgRecord classes, this patch renames the widely-used functions that operate on DbgRecords but refer to DbgValues or DPValues in their names to refer to DbgRecords instead; all such functions are defined in one of `BasicBlock.h`, `Instruction.h`, and `DebugProgramInstruction.h`. This patch explicitly does not change the names of any comments or variables, except for where they use the exact name of one of the renamed functions. The reason for this is reviewability; this patch can be trivially examined to determine that the only changes are direct string substitutions and any results from clang-format responding to the changed line lengths. Future patches will cover renaming variables and comments, and then renaming the classes themselves.
2024-03-05[NFC][RemoveDIs] Insert instruction using iterators in Transforms/Jeremy Morse1-19/+18
As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. Noteworthy changes: * FindInsertedValue now takes an optional iterator rather than an instruction pointer, as we need to always insert with iterators, * I've added a few iterator-taking versions of some value-tracking and DomTree methods -- they just unwrap the iterator. These are purely convenience methods to avoid extra syntax in some passes. * A few calls to getNextNode become std::next instead (to keep in the theme of using iterators for positions), * SeparateConstOffsetFromGEP has it's insertion-position field changed. Noteworthy because it's not a purely localised spelling change. All this should be NFC.
2024-03-04[LSR][term-fold] Ensure the simple recurrence is from the current loop (#83085)Patrick O'Neill1-0/+4
If the phi node found by matchSimpleRecurrence is not from the current loop, then isAlmostDeadIV panics. With this patch we bail out early. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com> --------- Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>