aboutsummaryrefslogtreecommitdiff
path: root/llvm/test/Transforms/LoopVectorize
AgeCommit message (Collapse)AuthorFilesLines
2025-12-09Revert "[LV] Mark checks as never succeeding for high cost cutoff."Aiden Grossman1-14/+25
This reverts commit 8a115b6934a90441d77ea54af73e7aaaa1394b38. This broke premerge. https://lab.llvm.org/staging/#/builders/192/builds/13326 /home/gha/llvm-project/clang/test/Frontend/optimization-remark-options.c:10:11: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)' before the loop or by providing the compiler option '-ffast-math'
2025-12-09[LV] Mark checks as never succeeding for high cost cutoff.Florian Hahn1-25/+14
When GeneratedRTChecks::create bails out due to exceeding the cost threshold, no runtime checks are generated and we must not proceed assuming checks have been generated. Mark the checks as never succeeding, to make sure we don't try to vectorize assuming the runtime checks hold. This fixes a case where we previously incorrectly vectorized assuming runtime checks had been generated when forcing vectorization via metadate. Fixes the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588
2025-12-09[LV] Add test with threshold=0 and metadata forcing vectorization.Florian Hahn1-0/+109
Test case for the mis-compile mentioned in https://github.com/llvm/llvm-project/pull/166247#issuecomment-3631471588 The issue is that we don't generate a runtime check even though it is required to vectorize.
2025-12-09[VPlan] Use SCEV to prove non-aliasing for stores at different offsets. ↵Florian Hahn1-30/+18
(#170347) Extend the logic add in https://github.com/llvm/llvm-project/pull/168771 to also allow sinking stores past stores in the same noalias set by checking if we can prove no-alias via the distance between accesses, checked via SCEV. PR: https://github.com/llvm/llvm-project/pull/170347
2025-12-09[VPlan] Remove ExtractLastLane for plans with scalar VFs. (#171145)Florian Hahn1-2/+1
ExtractLastLane is a no-op for scalar VFs. Update simplifyRecipe to remove them. This also requires adjusting the code in VPlanUnroll.cpp to split off handling of ExtractLastLane/ExtractPenultimateElement for scalar VFs, which now needs to match ExtractLastPart. PR: https://github.com/llvm/llvm-project/pull/171145
2025-12-09[LV] Return getPredBlockCostDivisor in uint64_tLuke Lau1-0/+49
When the probability of a block is extremely low, HeaderFreq / BBFreq may be larger than 32 bits. Previously this got truncated to uint32_t which could cause division by zero exceptions on x86. Widen the return type to uint64_t which should fit the entire range of BlockFrequency values. It's also worth noting that a frequency can never be zero according to BlockFrequency.h, so we shouldn't need to worry about divide by zero in getPredBlockCostDivisor itself.
2025-12-08[LV] Simplify IR for gather-cost.ll, auto-generate checks. (NFC)Florian Hahn3-212/+243
Simplify tests and auto-generate check in preparation for further updates.
2025-12-08[VPlan] Use nuw when computing {VF,VScale}xUF (#170710)Ramkumar Ramachandra9-31/+31
These quantities should never unsigned-wrap. This matches the behavior if only VFxUF is used (and not VF): when computing both VF and VFxUF, nuw should hold for each step separately.
2025-12-08[VPlan] Use BlockFrequencyInfo in getPredBlockCostDivisor (#158690)Luke Lau5-4/+335
In 531.deepsjeng_r from SPEC CPU 2017 there's a loop that we unprofitably loop vectorize on RISC-V. The loop looks something like: ```c for (int i = 0; i < n; i++) { if (x0[i] == a) if (x1[i] == b) if (x2[i] == c) // do stuff... } ``` Because it's so deeply nested the actual inner level of the loop rarely gets executed. However we still deem it profitable to vectorize, which due to the if-conversion means we now always execute the body. This stems from the fact that `getPredBlockCostDivisor` currently assumes that blocks have 50% chance of being executed as a heuristic. We can fix this by using BlockFrequencyInfo, which gives a more accurate estimate of the innermost block being executed 12.5% of the time. We can then calculate the probability as `HeaderFrequency / BlockFrequency`. Fixing the cost here gives a 7% speedup for 531.deepsjeng_r on RISC-V. Whilst there's a lot of changes in the in-tree tests, this doesn't affect llvm-test-suite or SPEC CPU 2017 that much: - On armv9-a -flto -O3 there's 0.0%/0.2% more geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. - On x86-64 -flto -O3 **with PGO** there's 0.9%/0% less geomean loops vectorized on llvm-test-suite/SPEC CPU 2017. Overall geomean compile time impact is 0.03% on stage1-ReleaseLTO: https://llvm-compile-time-tracker.com/compare.php?from=9eee396c58d2e24beb93c460141170def328776d&to=32fbff48f965d03b51549fdf9bbc4ca06473b623&stat=instructions%3Au
2025-12-07[VPlan] Replace ExtractLast(Elem|LanePerPart) with ExtractLast(Lane/Part) ↵Florian Hahn4-14/+28
(#164124) Replace ExtractLastElement and ExtractLastLanePerPart with more generic and specific ExtractLastLane and ExtractLastPart, which model distinct parts of extracting across parts and lanes. ExtractLastElement == ExtractLastLane(ExtractLastPart) and ExtractLastLanePerPart == ExtractLastLane, the latter clarifying the name of the opcode. A new m_ExtractLastElement matcher is provided for convenience. The patch should be NFC modulo printing changes. PR: https://github.com/llvm/llvm-project/pull/164124
2025-12-06[VPlan] Remove stray space before ops when printing vector-ptr (NFC)Florian Hahn1-22/+22
2025-12-05[VPlan] Use strict whitespace checks for VPlan printing test.Florian Hahn1-37/+37
Use --strict-whitespace for vplan-printing.ll to catch stray whitespaces. The test updates show a few places where we currently emit those.
2025-12-04[VPlan] Don't try to hoist multi-defs for first-order recurrences.Florian Hahn1-1/+49
Currently the hoisting implementation expects single-defs. Bail out on multi-defs (VPInterleaveRecipe), to fix an assertion. Fixes https://github.com/llvm/llvm-project/issues/170666
2025-12-04[VPlan] Implement printing VPIRMetadata. (#168385)Florian Hahn1-9/+21
mplement printing for VPIRMetadata, using generic dyn_cast to VPIRMetadata. Depends on https://github.com/llvm/llvm-project/pull/166245 PR: https://github.com/llvm/llvm-project/pull/168385
2025-12-03[SCEV] Handle non-constant start values in AddRec UDiv canonicalization. ↵Florian Hahn1-0/+33
(#170474) Follow-up to https://github.com/llvm/llvm-project/pull/169576 to enable UDiv canonicalization if the start of the AddRec is not constant. The fold is not restricted to constant start values, as long as we are able to compute a constant remainder. The fold is only applied if the subtraction of the remainder can be folded into to start expression, but that is just to avoid creating more complex AddRecs. For reference, the proof from #169576 is https://alive2.llvm.org/ce/z/iu2tav PR: https://github.com/llvm/llvm-project/pull/170474
2025-12-03[LV] Add more tests for finding the first-iv of argmin.Florian Hahn2-37/+122
Adds more test coverage for https://github.com/llvm/llvm-project/pull/170223.
2025-12-03[VPlan] Use predicate in VPInstruction::computeCost for selects. (#170278)Florian Hahn1-0/+23
In some cases, the lowering a select depends on the predicate. If the condition of a select is a compare instruction, thread the predicate through to the TTI hook. PR: https://github.com/llvm/llvm-project/pull/170278
2025-12-03[ValueTracking] Support scalable vector splats in computeKnownBits (#170345)Yingwei Zheng1-3/+3
Similar to https://github.com/llvm/llvm-project/pull/170325, this patch adds support for scalable vector splats in computeKnownBits.
2025-12-02[LV] Use forced cost once for whole interleave group in legacy costmodel ↵Florian Hahn1-1/+159
(#168270) The VPlan-based cost model assigns the forced cost once for a whole VPInterleaveRecipe. Update the legacy cost model to match this behavior. This fixes a cost-model divergence, and assigns the cost in a way that matches the generated code more accurately. PR: https://github.com/llvm/llvm-project/pull/168270
2025-12-02[LV] Add predicated store sinking tests requiring further noalias checksFlorian Hahn1-71/+148
Add additional tests where extra no-alias checks are needed, as future extensions of https://github.com/llvm/llvm-project/pull/168771.
2025-12-02[SCEV] Allow udiv canonicalization of potentially-wrapping AddRecs (#169576)Florian Hahn1-1/+27
Extend the {X,+,N}/C => {(X - X%N),+,N}/C canonicalization to handle AddRecs that may wrap, when X < N <= C and both N,C are powers of 2. The alignment and power-of-2 properties ensure division results remain equivalent for all offsets [(X - X%N), X). Alive2 Proof: https://alive2.llvm.org/ce/z/iu2tav Fixes https://github.com/llvm/llvm-project/issues/168709 PR: https://github.com/llvm/llvm-project/pull/169576
2025-12-02[LV] Emit better debug and opt-report messages when vectorization is ↵Tibor Győri1-0/+94
disallowed in the LoopVectorizer (#158513) While looking into fixing #158499, I found some other cases where the messages emitted could be improved. This PR improves both the messages printed to the debug output and the missed-optimization messages in cases where: - loop vectorization is explicitly disabled - loop vectorization is implicitly disabled by disabling all loop transformations - loop vectorization is set to happen only where explicitly enabled A branch that should currently be unreachable is also added. If the related logic ever breaks (eg. due to changes to getForce() or the ForceKind enum) this should alert devs and users. New test cases are also added to verify that the correct messages (and only them) are outputted. --------- Co-authored-by: GYT <tiborgyri@gmail.com> Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-12-02[VPlan] Sink predicated stores with complementary masks. (#168771)Florian Hahn1-201/+204
Extend the logic to hoist predicated loads (https://github.com/llvm/llvm-project/pull/168373) to sink predicated stores with complementary masks in a similar fashion. The patch refactors some of the existing logic for legality checks to be shared between hosting and sinking, and adds a new sinking transform on top. With respect to the legality checks, for sinking stores the code also checks if there are any aliasing stores that may alias, not only loads. PR: https://github.com/llvm/llvm-project/pull/168771
2025-12-01[LV] Add more tests for argmin finding the first index.Florian Hahn2-0/+262
Add more test coverage for supporting argmin/argmax with strict predicates, in preparation for follow up to 99addbf73db596403a17.
2025-12-01[VPlan] Use wide IV if scalar lanes > 0 are used with scalable vectors. ↵Florian Hahn1-99/+58
(#169796) For scalable vectors, VPScsalarIVStepsRecipe cannot create all scalar step values. At the moment, it creates a vector, in addition to to the first lane. The only supported case for this is when only the last lane is used. A recipe should not set both scalar and vector values. Instead, we can simply use a vector induction. It would also be possible to preserve the current vector code-gen, by creating VPInstructions based on the first lane of VPScalarIVStepsRecipe, but using a vector induction seems simpler. PR: https://github.com/llvm/llvm-project/pull/169796
2025-12-01[LV] Don't create WidePtrAdd recipes for scalar VFs (#169344)David Sherwood1-46/+93
While attempting to remove the use of undef from more loop vectoriser tests I discovered a bug where this assert was firing: ``` llvm::Constant* llvm::Constant::getSplatValue(bool) const: Assertion `this->getType()->isVectorTy() && "Only valid for vectors!"' failed. ... #8 0x0000aaaab9e2fba4 llvm::Constant::getSplatValue #9 0x0000aaaab9dfb844 llvm::ConstantFoldBinaryInstruction ``` This seems to be happening because we are incorrectly generating WidePtrAdd recipes for scalar VFs. The PR fixes this by checking whether a plan has a scalar VF only in legalizeAndOptimizeInductions. This PR also removes the use of undef from the test `both` in Transforms/LoopVectorize/iv_outside_user.ll, which is what started triggering the assert. Fixes #169334
2025-12-01[LV] Regenerate some check lines. NFCLuke Lau1-12/+16
The scalar loop doesn't exist anymore after 8907b6d39371d439461cdd3475d5590f87821377
2025-11-30[LV] Add additional tests for argmin with find-first wrapping IV ranges.Florian Hahn1-0/+89
Add test cases for upcoming argmin vectorization changes that have wrapping IV ranges.
2025-11-29[VPlan] Skip cost verification for loops with EVL gather/scatter.Florian Hahn1-0/+116
The VPlan-based cost model use vp_gather/vp_scatter for gather/scatter costs, which is different to the legacy cost model and cannot be matched there. Don't verify the costs match for plans containing gather/scatters with EVL. Fixes https://github.com/llvm/llvm-project/issues/169948.
2025-11-29[VPlan] Turn IVOp assertion into early exit.Florian Hahn4-0/+170
Turn assertion added in 99addbf73 [0] into an early exit. There are cases where the operand may not be a VPWidenIntOrFpInductionRecipe, e.g. if the IV increment is selected, as in the test cases. [0] https://github.com/llvm/llvm-project/pull/141431
2025-11-29[LV] Extend test coverage for inductions depending on complex SCEVs.Florian Hahn1-109/+458
Re-generate check lines, add test with complex SCEV as induction start value and add stores to existing loops to make them not trivial.
2025-11-28[LV] Vectorize selecting last IV of min/max element. (#141431)Florian Hahn6-115/+995
Add support for vectorizing loops that select the index of the minimum or maximum element. The patch implements vectorizing those patterns by combining Min/Max and FindFirstIV reductions. It extends matching Min/Max reductions to allow in-loop users that are FindLastIV reductions. It records a flag indicating that the Min/Max reduction is used by another reduction. The extra user is then check as part of the new `handleMultiUseReductions` VPlan transformation. It processes any reduction that has other reduction users. The reduction using the min/max reduction currently must be a FindLastIV reduction, which needs adjusting to compute the correct result: 1. We need to find the last IV for which the condition based on the min/max reduction is true, 2. Compare the partial min/max reduction result to its final value and, 3. Select the lanes of the partial FindLastIV reductions which correspond to the lanes matching the min/max reduction result. Depends on https://github.com/llvm/llvm-project/pull/140451 PR: https://github.com/llvm/llvm-project/pull/141431
2025-11-28[LV] Add additional argmin/argmax tests for #141431.Florian Hahn9-173/+568
Apply suggestions for tests from https://github.com/llvm/llvm-project/pull/141431 and add additional missing coverage.
2025-11-28[VPlan] Skip uses-scalars restriction if one of ops needs broadcast. (#168246)Florian Hahn15-87/+388
Update the logic in narrowToSingleScalar to allow narrowing even if not all users use scalars, if at least one of the operands already needs broadcasting. In that case, there won't be any additional broadcasts introduced. This should allow removing the special handling for stores, which can introduce additional broadcasts currently. Fixes https://github.com/llvm/llvm-project/issues/169668. PR: https://github.com/llvm/llvm-project/pull/168246
2025-11-27[VPlan] Handle scalar VPWidenPointerInd in convertToConcreteRecipes. (#169338)Florian Hahn1-0/+100
In some case, VPWidenPointerInductions become only used by scalars after legalizeAndOptimizationInducftions was already run, for example due to some VPlan optimizations. Move the code to scalarize VPWidenPointerInductions to a helper and use it if needed. This fixes a crash after #148274 in the added test case. Fixes https://github.com/llvm/llvm-project/issues/169780
2025-11-27[LV] Test more combinations of scalar stores using last lane of IV.Florian Hahn1-12/+515
Extends test coverage to include different start and step values, as well as interleaving.
2025-11-27[VPlan] Optimize LastActiveLane to EVL - 1 (#169766)Luke Lau6-39/+7
With EVL tail folding, the LastActiveLane can be computed with EVL - 1. This removes the need for a header mask and vfirst.m for loops with live outs on RISC-V: # %bb.5: # %for.cond.cleanup7 - vsetvli zero, zero, e32, m2, ta, ma - vmv.v.x v8, s1 - vmsleu.vv v10, v8, v22 - vfirst.m a0, v10 - srli a1, a0, 63 - czero.nez a0, a0, a1 - czero.eqz a1, s8, a1 - or a0, a0, a1 - addi a0, a0, -1 - vsetvli zero, zero, e64, m4, ta, ma - vslidedown.vx v8, v12, a0 + addi s1, s1, -1 + vslidedown.vx v8, v12, s1
2025-11-26Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when ↵Florian Hahn13-581/+1512
tail-folding. (#149042)" This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6. The following fixes have landed, addressing issues causing the original revert: * https://github.com/llvm/llvm-project/pull/169298 * https://github.com/llvm/llvm-project/pull/167897 * https://github.com/llvm/llvm-project/pull/168949 Original message: Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26Revert "Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when ↵Florian Hahn13-1512/+581
tail-folding. (#149042)"" This reverts commit 72e51d389f66d9cc6b55fd74b56fbbd087672a43. Missed some test updates.
2025-11-26Reapply "[LV] Use ExtractLane(LastActiveLane, V) live outs when ↵Florian Hahn13-581/+1512
tail-folding. (#149042)" This reverts commit a6edeedbfa308876d6f2b1648729d52970bb07e6. The following fixes have landed, addressing issues causing the original revert: * https://github.com/llvm/llvm-project/pull/169298 * https://github.com/llvm/llvm-project/pull/167897 * https://github.com/llvm/llvm-project/pull/168949 Original message: Building on top of https://github.com/llvm/llvm-project/pull/148817, introduce a new abstract LastActiveLane opcode that gets lowered to Not(Mask) → FirstActiveLane(NotMask) → Sub(result, 1). When folding the tail, update all extracts for uses outside the loop the extract the value of the last actice lane. See also https://github.com/llvm/llvm-project/issues/148603 PR: https://github.com/llvm/llvm-project/pull/149042
2025-11-26[AArch64] Enable maximising scalable vector bandwidth (#166748)Sam Tebbs14-859/+1214
This PR enables maximising scalable vector bandwidth for all AArch64 cores other than the V1 and N2. Those two have shown small regressions that we'll investigate, fix and then enable.
2025-11-26[LV] Use VPReductionRecipe for partial reductions (#147513)Sam Tebbs1-6/+6
Partial reductions can easily be represented by the VPReductionRecipe class by setting their scale factor to something greater than 1. This PR merges the two together and gives VPReductionRecipe a VFScaleFactor so that it can choose to generate the partial reduction intrinsic at execute time. Stacked PRs: 1. https://github.com/llvm/llvm-project/pull/147026 2. https://github.com/llvm/llvm-project/pull/147255 3. https://github.com/llvm/llvm-project/pull/156976 4. https://github.com/llvm/llvm-project/pull/160154 5. https://github.com/llvm/llvm-project/pull/147302 6. https://github.com/llvm/llvm-project/pull/162503 7. -> https://github.com/llvm/llvm-project/pull/147513 Replaces https://github.com/llvm/llvm-project/pull/146073 .
2025-11-26[VPlan] Hoist predicated loads with complementary masks. (#168373)Florian Hahn2-374/+114
This patch adds a new VPlan transformation to hoist predicated loads, if we can prove they execute unconditionally, i.e. there are 2 predicated loads to the same address with complementary masks. Then we are guaranteed to execute one of them on each iteration, allowing us to remove the mask. The transform groups masked replicating loads by their address SCEV, then checks if there are 2 loads with complementary mask. If that is the case, we check if there are any writes that may alias the load address in the blocks between the first and last load with the same address. The transforms operates after linearizing the CFG, but before introducing replicate regions, which means this is just checking a chain of consecutive blocks. Currently this only uses noalias metadata to check for no-alias (using the helpers added in https://github.com/llvm/llvm-project/pull/166247). Then we create an unpredicated VPReplicateRecipe at the position of the first load, then replace all users of the grouped loads with it. Small Alive2 proof for hoisting with complementary masks: https://alive2.llvm.org/ce/z/kUx742 PR: https://github.com/llvm/llvm-project/pull/168373
2025-11-26[VPlan] Use DL index type consistently for GEPs (#169396)Ramkumar Ramachandra127-1387/+1451
In preparation to strip VPUnrollPartAccessor and unroll recipes directly, strip unnecessary complication in getGEPIndexTy, as the unroll part will no longer be available in follow-ups (see #168886 for instance). The patch also helps by doing a mass test update up-front. Narrowing the GEP index type conditionally does not yield any benefit, and the change is non-functional in terms of emitted assembly. While at it, avoid hard-coding address-space 0, and use the pointer operand's address space to get the GEP index type.
2025-11-26[LV][NFC] Remove remaining uses of undef in tests (#169357)David Sherwood11-163/+158
Split off from PR #163525, this standalone patch replaces almost all the remaining cases where undef is used as value in loop vectoriser tests. This will reduce the likelihood of contributors hitting the `undef deprecator` warning in github. NOTE: The remaining use of undef in iv_outside_user.ll will be fixed in a separate PR. I've removed the test stride_undef from version-mem-access.ll, since there is already a stride_poison test.
2025-11-25[PGO] Add REQUIRES to test (#169531)Joel E. Denny1-0/+1
The test was added by b8ef25aa643761233dc5b74d9fb7c38a2064d9c7. It failed on at least the following bots, but the failure did not reproduce on my test machines or in pre-commit CI: - https://lab.llvm.org/buildbot/#/builders/190/builds/31643 - https://lab.llvm.org/buildbot/#/builders/65/builds/25949 - https://lab.llvm.org/buildbot/#/builders/154/builds/24417 d69e70149636efa0293310303878fbf9a5f31433 did not fix the failure. Hopefully this will.
2025-11-25[PGO] Add missing target datalayout in test (#169520)Joel E. Denny1-0/+1
The test was added by b8ef25aa643761233dc5b74d9fb7c38a2064d9c7. It failed on at least the following bots, but the failure did not reproduce on my test machines or in pre-commit CI: - https://lab.llvm.org/buildbot/#/builders/190/builds/31638 - https://lab.llvm.org/buildbot/#/builders/190/builds/31638 This fix hopefully addresses at least the warnings there.
2025-11-25[PGO] Fix zeroed estimated trip count (#167792)Joel E. Denny1-0/+34
Before PR #152775, `llvm::getLoopEstimatedTripCount` never returned 0. If `llvm::setLoopEstimatedTripCount` were called with 0, it would zero branch weights, causing `llvm::getLoopEstimatedTripCount` to return `std::nullopt`. PR #152775 changed that behavior: if `llvm::setLoopEstimatedTripCount` is called with 0, it sets `llvm.loop.estimated_trip_count` to 0, causing `llvm::getLoopEstimatedTripCount` to return 0. However, it kept documentation saying `llvm::getLoopEstimatedTripCount` returns a positive count. Some passes continue to assume `llvm::getLoopEstimatedTripCount` never returns 0 and crash if it does, as reported in issue #164254. To restore the behavior they expect, this patch changes `llvm::getLoopEstimatedTripCount` to return `std::nullopt` when `llvm.loop.estimated_trip_count` is 0.
2025-11-25[VPlan] Include flags in VectorPointerRecipe::printRecipe (#169466)Ramkumar Ramachandra17-86/+86
The change is non-functional with respect to emitted IR.
2025-11-25[VPlan] Simplify x + 0 -> x (#169394)Ramkumar Ramachandra7-92/+53