aboutsummaryrefslogtreecommitdiff
path: root/llvm/test/Transforms/LoopStrengthReduce
AgeCommit message (Collapse)AuthorFilesLines
2026-02-17[NFC][AMDGPU] Use `zeroinitializer` instead of `null` for `ptr ↵Shilei Tian2-3/+3
addrspace(2/3/5)` in AMDGPU tests (#181710)
2026-02-07[LSR] Support SCEVPtrToAddr in SCEVDbgValueBuilder.Florian Hahn1-0/+61
Allow SCEVPtrToAddr as cast in assertion in SCEVDbgValueBuilder. SCEVPtrToAddr is handled similarly to SCEVPtrToInt. Fixes a crash with debug info after bd40d1de9c9ee, which started to generate ptrtoaddr instead of ptrtoint expressions.
2025-12-19[LSR] Add another test case for implicit truncation (NFC)Nikita Popov1-0/+31
From https://github.com/llvm/llvm-project/pull/171456#issuecomment-3675488354.
2025-12-19[LSR] Add test for implicit truncation on icmp immediate (NFC)Nikita Popov1-0/+48
Test case for: https://github.com/llvm/llvm-project/pull/171456#issuecomment-3670755137
2025-12-03[LSR] Make OptimizeLoopTermCond able to handle some non-cmp conditions (#165590)John Brawn1-0/+205
Currently OptimizeLoopTermCond can only convert a cmp instruction to using a postincrement induction variable, which means it can't handle predicated loops where the termination condition comes from get_active_lane_mask. Relax this restriction so that we can handle any kind of instruction, though only if it's the instruction immediately before the branch (except for possibly an extractelement).
2025-12-02[LSR] Insert the transformed IV increment in the user block (#169515)John Brawn1-3/+1
Currently we try to hoist the transformed IV increment instruction to the header block to help with generation of postincrement instructions, but this only works if the user instruction is also in the header. We should instead be trying to insert it in the same block as the user.
2025-11-10X86: Enable terminal rule (#165957)Matt Arsenault1-10/+11
2025-11-10[AArch64][SVE] Avoid movprfx by reusing register for _UNDEF pseudos. (#166926)Sander de Smalen1-3/+2
For predicated SVE instructions where we know that the inactive lanes are undef, it is better to pick a destination register that is not unique. This avoids introducing a movprfx to copy a unique register to the destination operand, which would be needed to comply with the tied operand constraints. For example: ``` %src1 = COPY $z1 %src2 = COPY $z2 %dst = SDIV_ZPZZ_S_UNDEF %p, %src1, %src2 ``` Here it is beneficial to pick $z1 or $z2 as the destination register, because if it would have chosen a unique register (e.g. $z0) then the pseudo expand pass would need to insert a MOVPRFX to expand the operation into: ``` $z0 = SDIV_ZPZZ_S_UNDEF $p0, $z1, $z2 -> $z0 = MOVPRFX $z1 $z0 = SDIV_ZPmZ_S $p0, $z0, $z2 ``` By picking $z1 directly, we'd get: ``` $z1 = SDIV_ZPmZ_S, $p0 $z1, $z2 ```
2025-10-30[LSR] Don't count conditional loads/store as enabling pre/post-index (#159573)John Brawn1-4/+140
When a load/store is conditionally executed in a loop it isn't a candidate for pre/post-index addressing, as the increment of the address would only happen on those loop iterations where the load/store is executed. Detect this and only discount the AddRec cost when the load/store is unconditional.
2025-10-23[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786)paperchalice4-6/+6
Post cleanup for #164534.
2025-09-17Reapply "[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1." (#158328)Florian Hahn1-2/+1
This reverts commit fd58f235f8c5bd40d98acfd8e7fb11d41de301c7. The recommit contains an extra check to make sure that D is a multiple of C2, if C2 > C1. This fixes the issue causing the revert fd58f235f8c. Tests have been added in 6a726e9a4d3d0. Original message: If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1). Depends on https://github.com/llvm/llvm-project/pull/157555. Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN PR: https://github.com/llvm/llvm-project/pull/157656
2025-09-17[SCEV] Add more tests for MUL/UDIV folds.Florian Hahn1-0/+38
Add additional test coverage for https://github.com/llvm/llvm-project/pull/157656 covering cases where the dividend is not a multiple of the divisor, causing the revert of 70012fda631.
2025-09-16[LSR] Add an addressing mode that considers all addressing modes (#158110)John Brawn1-0/+178
The way that loops strength reduction works is that the target has to upfront decide whether it wants its addressing to be preindex, postindex, or neither. This choice affects: * Which potential solutions we generate * Whether we consider a pre/post index load/store as costing an AddRec or not. None of these choices are a good fit for either AArch64 or ARM, where both preindex and postindex addressing are typically free: * If we pick None then we count pre/post index addressing as costing one addrec more than is correct so we don't pick them when we should. * If we pick PreIndexed or PostIndexed then we get the correct cost for that addressing type, but still get it wrong for the other and also exclude potential solutions using offset addressing that could have less cost. This patch adds an "all" addressing mode that causes all potential solutions to be generated and counts both pre and postindex as having AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how we calculate the cost of things that need to be fixed before we can make use of it.
2025-09-12Revert "[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1." (#158328)Reid Kleckner1-1/+2
Reverts llvm/llvm-project#157656 There are multiple reports that this is causing miscompiles in the MSan test suite after bootstrapping and that this is causing miscompiles in rustc. Let's revert for now, and work to capture a reproducer next week.
2025-09-11[SCEV] Fold (C1 * A /u C2) -> A /u (C2 /u C1), if C2 > C1. (#157656)Florian Hahn1-2/+1
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1). Depends on https://github.com/llvm/llvm-project/pull/157555. Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN PR: https://github.com/llvm/llvm-project/pull/157656
2025-09-05[SCEV] Fold (C * A /u C) -> A, if A is a multiple of C and C a pow-of-2. ↵Florian Hahn1-3/+5
(#156730) Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9 PR: https://github.com/llvm/llvm-project/pull/156730
2025-09-03[LSR] Move test from Analysis/ScalarEvolution to Transforms, regen.Florian Hahn1-0/+119
Move transform test to llvm/test/Transforms/LoopStrengthReduce/X86, clean up the names a bit and regenerate check lines.
2025-08-08[IR] Remove size argument from lifetime intrinsics (#150248)Nikita Popov2-32/+32
Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).
2025-08-01[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << ↵Paul Walker1-4/+4
C1)). (#150651)
2025-07-18[LSR] Do not consider uses in lifetime intrinsics (#149492)Nikita Popov1-0/+59
We should ignore uses of pointers in lifetime intrinsics, as these are not actually materialized in the final code, so don't affect register pressure or anything else LSR needs to model. Handling these only results in peculiar rewrites where additional intermediate GEPs are introduced.
2025-07-14[DebugInfo][LoopStrengthReduce] Salvage the debug value of the dead cmp ↵Shan Huang1-1/+12
instruction (#147241) Fix #147238
2025-07-11Revert "[LSR] Regenerate test checks (NFC)"Nikita Popov1-209/+112
This reverts commit a60405c2d51ffe428dd38b8b1020c19189968f76. Causes buildbot failures.
2025-07-11[LSR] Regenerate test checks (NFC)Nikita Popov1-112/+209
2025-07-11[AArch64LoadStoreOpt] BaseReg update is searched also in CF successor (#145583)Sergey Shcherbinin1-2/+1
Look for reg update instruction (to merge w/ mem instruction into pre/post-increment form) not only inside a single MBB but also along a CF path going downward w/o side enters such that BaseReg is alive along it but not at its exits. Regression test is updated accordingly.
2025-07-03[PHIElimination] Revert #131837 #146320 #146337 (#146850)Guy David2-19/+25
Reverting because mis-compiles: - https://github.com/llvm/llvm-project/pull/131837 - https://github.com/llvm/llvm-project/pull/146320 - https://github.com/llvm/llvm-project/pull/146337
2025-06-29[PHIElimination] Reuse existing COPY in predecessor basic block (#131837)Guy David2-25/+19
The insertion point of COPY isn't always optimal and could eventually lead to a worse block layout, see the regression test in the first commit. This change affects many architectures but the amount of total instructions in the test cases seems too be slightly lower.
2025-06-16[LSR] Make canHoistIVInc allow non-integer types (#143707)John Brawn1-0/+189
canHoistIVInc was made to only allow integer types to avoid a crash in isIndexedLoadLegal/isIndexedStoreLegal due to them failing an assertion in getValueType (or rather in MVT::getVT which gets called from that) when passed a struct type. Adjusting these functions to pass AllowUnknown=true to getValueType means we don't get an assertion failure (MVT::Other is returned which TLI->isIndexedLoadLegal should then return false for), meaning we can remove this check for integer type.
2025-06-05MachineScheduler: Improve instruction clustering (#137784)Ruiling, Song1-16/+12
The existing way of managing clustered nodes was done through adding weak edges between the neighbouring cluster nodes, which is a sort of ordered queue. And this will be later recorded as `NextClusterPred` or `NextClusterSucc` in `ScheduleDAGMI`. But actually the instruction may be picked not in the exact order of the queue. For example, we have a queue of cluster nodes A B C. But during scheduling, node B might be picked first, then it will be very likely that we only cluster B and C for Top-Down scheduling (leaving A alone). Another issue is: ``` if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum) std::swap(SUa, SUb); if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) ``` may break the cluster queue. For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3 2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2), As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be pred of 3. This makes both 1 and 2 become preds of 3, but there is no edge between 1 and 2. Thus we get a broken cluster chain. To fix both issues, we introduce an unordered set in the change. This could help improve clustering in some hard case. One key reason the change causes so many test check changes is: As the cluster candidates are not ordered now, the candidates might be picked in different order from before. The most affected targets are: AMDGPU, AArch64, RISCV. For RISCV, it seems to me most are just minor instruction reorder, don't see obvious regression. For AArch64, there were some combining of ldr into ldp being affected. With two cases being regressed and two being improved. This has more deeper reason that machine scheduler cannot cluster them well both before and after the change, and the load combine algorithm later is also not smart enough. For AMDGPU, some cases have more v_dual instructions used while some are regressed. It seems less critical. Seems like test `v_vselect_v32bf16` gets more buffer_load being claused.
2025-04-30[AMDGPU] Remove explicit datalayout from tests where not neededAlexander Richardson5-12/+0
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921
2025-04-17[AArch64][SVE] Fold ADD+CNTB to INCB/DECB (#118280)Ricardo Jesus1-32/+33
Currently, given: ```cpp uint64_t incb(uint64_t x) { return x+svcntb(); } ``` LLVM generates: ```gas incb: addvl x0, x0, #1 ret ``` Which is equivalent to: ```gas incb: incb x0 ret ``` However, on microarchitectures like the Neoverse V2 and Neoverse V3, the second form (with INCB) can have significantly better latency and throughput (according to their SWOG). On the Neoverse V2, for example, ADDVL has a latency and throughput of 2, whereas some forms of INCB have a latency of 1 and a throughput of 4. The same applies to DECB. This patch adds patterns to prefer the cheaper INCB/DECB forms over ADDVL where applicable.
2025-03-29MIPS: Set EnableLoopTermFold (#133454)YunQiang Su2-29/+58
Setting `EnableLoopTermFold` enables `loop-term-fold` pass.
2025-03-28Revert "MIPS: Set EnableLoopTermFold (#133378)"Weaver1-56/+29
This reverts commit 71f43a7c42a37d18be98b0885d62ef76e658f242. Caused build bot failures: https://lab.llvm.org/buildbot/#/builders/190/builds/17267 https://lab.llvm.org/buildbot/#/builders/144/builds/21445 https://lab.llvm.org/buildbot/#/builders/46/builds/14343 please consider fixing the test failure in long-array-initialize.ll before recommitting.
2025-03-28MIPS: Set EnableLoopTermFold (#133378)YunQiang Su1-29/+56
Setting `EnableLoopTermFold` enables `loop-term-fold` pass.
2025-03-26MIPS: Implements MipsTTIImpl::isLSRCostLess using Insns as first (#133068)YunQiang Su1-0/+72
So that LoopStrengthReduce can work for MIPS. The code is copied from RISC-V. --------- Co-authored-by: qethu <190734095+qethu@users.noreply.github.com>
2025-03-14[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298)Jeremy Morse15-15/+0
These date back to when the non-intrinsic format of variable locations was still being tested and was behind a compile-time flag, so not all builds / bots would correctly run them. The solution at the time, to get at least some test coverage, was to have tests opt-in to non-intrinsic debug-info if it was built into LLVM. Nowadays, non-intrinsic format is the default and has been on for more than a year, there's no need for this flag to exist. (I've downgraded the flag from "try" to explicitly requesting non-intrinsic format in some places, so that we can deal with tests that are explicitly about non-intrinsic format in their own commit).
2025-02-26[AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (#127837)Ricardo Jesus1-38/+29
Currently, given: ```cpp svuint8_t foo(uint8_t *x) { return svld1(svptrue_b8(), x); } ``` We generate: ```gas foo: ptrue p0.b ld1b { z0.b }, p0/z, [x0] ret ``` However, on little-endian and with unaligned memory accesses allowed, we could instead be using LDR as follows: ```gas foo: ldr z0, [x0] ret ``` The second form avoids the predicate dependency. Likewise for other types and stores.
2025-02-01[MachineScheduler] Fix physreg dependencies of ExitSU (#123541)Sergei Barannikov1-2/+2
Providing the correct operand index allows addPhysRegDataDeps to compute the correct latency. Pull Request: https://github.com/llvm/llvm-project/pull/123541
2025-01-29[IR] Convert from nocapture to captures(none) (#123181)Nikita Popov5-7/+7
This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.
2025-01-07[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)Alex MacLean1-3/+1
the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.
2024-12-29Remove -print-lsr-output in favor of --stop-after=loop-reduceFangrui Song1-1/+1
Pull Request: https://github.com/llvm/llvm-project/pull/121305
2024-11-21[llvm] Remove `br i1 undef` from some regression tests [NFC] (#117112)Lee Wei29-158/+158
This PR removes tests with `br i1 undef` under `llvm/tests/Transforms/Loop*, Lower*`.
2024-11-11[SCEVExpander] Don't try to reuse SCEVUnknown values (#115141)Nikita Popov2-5/+3
The expansion of a SCEVUnknown is trivial (it's just the wrapped value). If we try to reuse an existing value it might be a more complex expression that simplifies to the SCEVUnknown. This is inspired by https://github.com/llvm/llvm-project/issues/114879, because SCEVExpander replacing a constant with a phi node is just silly. (I don't consider this a fix for that issue though.)
2024-11-07[SCEV] Disallow simplifying phi(undef, X) to X (#115109)Yingwei Zheng1-8/+8
See the following case: ``` @GlobIntONE = global i32 0, align 4 define ptr @src() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph %retval.0 = phi ptr [ %retval.2.peel, %entry.peel.newph ], [ %retval.2, %cleanup ] br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body %retval.2 = phi ptr [ %retval.0, %for.body ], [ @GlobIntONE, %cleanup.loopexit ] br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } define ptr @tgt() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2.peel, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } ``` 1. `simplifyInstruction(%retval.2.peel)` returns `@GlobIntONE`. Thus, `ScalarEvolution::createNodeForPHI` returns SCEV expr `@GlobIntONE` for `%retval.2.peel`. 2. `SimplifyIndvar::replaceIVUserWithLoopInvariant` tries to replace the use of `%retval.2.peel` in `%retval.2.lcssa.ph` with `@GlobIntONE`. 3. `simplifyLoopAfterUnroll -> simplifyLoopIVs -> SCEVExpander::expand` reuses `%retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]` to generate code for `@GlobIntONE`. It is incorrect. This patch disallows simplifying `phi(undef, X)` to `X` by setting `CanUseUndef` to false. Closes https://github.com/llvm/llvm-project/issues/114879.
2024-10-17[llvm][LSR] Fix where invariant on ScaledReg & Scale is violated (#112576)Youngsuk Kim1-0/+30
Comments attached to the `ScaledReg` field of `struct Formula` explains that, `ScaledReg` must be non-null when `Scale` is non-zero. This fixes up a code path where this invariant is violated. Also, add an assert to ensure this invariant holds true. Without this patch, compiler aborts with the attached test case. Fixes #76504
2024-10-03[DebugInfo][LSR] Fix assertion failure salvaging IV with offset > 64 bits ↵Orlando Cazalet-Hyams1-0/+40
wide (#110979) Fixes #110494
2024-10-02[X86] Don't request 0x90 nop filling in p2align directives (#110134)Jeremy Morse4-16/+16
As of rev ea222be0d, LLVMs assembler will actually try to honour the "fill value" part of p2align directives. X86 printed these as 0x90, which isn't actually what it wanted: we want multi-byte nops for .text padding. Compiling via a textual assembly file produces single-byte nop padding since ea222be0d but the built-in assembler will produce multi-byte nops. This divergent behaviour is undesirable. To fix: don't set the byte padding field for x86, which allows the assembler to pick multi-byte nops. Test that we get the same multi-byte padding when compiled via textual assembly or directly to object file. Added same-align-bytes-with-llasm-llobj.ll to that effect, updated numerous other tests to not contain check-lines for the explicit padding.
2024-10-02[SCEVExpander] Preserve gep nuw during expansion (#102133)Nikita Popov3-3/+3
When expanding SCEV adds to geps, transfer the nuw flag to the resulting gep. (Note that this doesn't apply to IV increment GEPs, which go through a different code path.)
2024-10-01[SCEVExpander] Clear flags when reusing GEP (#109293)Nikita Popov2-5/+5
As pointed out in the review of #102133, SCEVExpander currently incorrectly reuses GEP instructions that have poison-generating flags set. Fix this by clearing the flags on the reused instruction.
2024-09-19[LSR] Regenerate test checks (NFC)Nikita Popov2-17/+59
2024-09-09Reland "[LSR] Do not create duplicated PHI nodes while preserving LCSSA ↵Sergey Kachkov6-33/+35
form" (#107380) Motivating example: https://godbolt.org/z/eb97zrxhx Here we have 2 induction variables in the loop: one is corresponding to i variable (add rdx, 4), the other - to res (add rax, 2). The second induction variable can be removed by rewriteLoopExitValues() method (final value of res at loop exit is unroll_iter * -2); however, this doesn't happen because we have duplicated LCSSA phi nodes at loop exit: ``` ; Preheader: for.body.preheader.new: ; preds = %for.body.preheader %unroll_iter = and i64 %N, -4 br label %for.body ; Loop: for.body: ; preds = %for.body, %for.body.preheader.new %lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.preheader.new ] %i.07 = phi i64 [ 0, %for.body.preheader.new ], [ %inc.3, %for.body ] %inc.3 = add nuw i64 %i.07, 4 %lsr.iv.next = add nsw i64 %lsr.iv, -2 %niter.ncmp.3.not = icmp eq i64 %unroll_iter, %inc.3 br i1 %niter.ncmp.3.not, label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !llvm.loop !7 ; Exit blocks for.end.loopexit.unr-lcssa.loopexit: ; preds = %for.body %inc.3.lcssa = phi i64 [ %inc.3, %for.body ] %lsr.iv.next.lcssa11 = phi i64 [ %lsr.iv.next, %for.body ] %lsr.iv.next.lcssa = phi i64 [ %lsr.iv.next, %for.body ] br label %for.end.loopexit.unr-lcssa ``` rewriteLoopExitValues requires %lsr.iv.next value to have only 2 uses: one in LCSSA phi node, the other - in induction phi node. Here we have 3 uses of this value because of duplicated lcssa nodes, so the transform doesn't apply and leads to an extra add operation inside the loop. The proposed solution is to accumulate inserted instructions that will require LCSSA form update into SetVector and then call formLCSSAForInstructions for this SetVector once, so the same instructions don't process twice. Reland fixes the issue with preserve-lcssa.ll test: it fails in the situation when x86_64-unknown-linux-gnu target is unavailable in opt. The changes are moved into separate duplicated-phis.ll test with explicit x86 target requirement to fix bots which are not building this target.