aboutsummaryrefslogtreecommitdiff
path: root/llvm/test/Transforms/LoopVectorize
AgeCommit message (Collapse)AuthorFilesLines
2026-02-12[UTC][VPlan] Use `-vplan-print-after` for VPlan-dump-based tests (#178736)Andrei Elovikov8-98/+98
Switch tests from using `-debug[-only=LoopVectorize]` to `-vplan-print-after` as that provides better control at what step in the pipeline we want to check the VPlan (I'm using `optimize$` for now to preserve previous state). Then, update `-vplan-print-after*` to print what function the loop belongs to. That enables us to simplify VPlan UTC support as the output of the updated tests contains the VPlan dump only - no special filtering/extraction is necessary anymore.
2026-02-13[InstructionSimplify] Extend simplifyICmpWithZero to handle equivalent zero ↵Kunqiu Chen1-8/+8
RHS (#179055) Add a new helper function `matchEquivZeroRHS()` that recognizes comparisons with constants that are equivalent to comparisons with zero, and transforms the predicate accordingly. This handles the following transformations: - icmp sgt X, -1 --> icmp sge X, 0 - icmp sle X, -1 --> icmp slt X, 0 - icmp [us]ge X, 1 --> icmp [us]gt X, 0 - icmp [us]lt X, 1 --> icmp [us]le X, 0 This enables more optimization opportunities in `simplifyICmpWithZero`, such as folding icmp sgt X, -1 when X is known to be non-negative. --- - IR Impact: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3414
2026-02-12[VPlan] Explicitly reassociate header mask in logical and (#180898)Luke Lau5-23/+26
We reassociate ((x && y) && z) -> (x && (y && z)) if x has more than use, in order to allow simplifying the header mask further. However this is somewhat unreliable as there are times when it doesn't have more than one use, e.g. see the case we run into in https://github.com/llvm/llvm-project/pull/173265/changes#r2769759907. This moves it into a separate transformation that always reassociates the header mask regardless of the number of uses, which prevents some fragile test changes in #173265. We need to run it before both calls to simplifyRecipes in optimize. I considered putting it in simplifyRecipes itself but simplifyRecipes is also called after unrolling and when the loop region is dissolved which causes vputils::findHeaderMask to assert. There isn't really any benefit to reassociating masks that aren't the header mask so the existing simplification was removed.
2026-02-12[LV] Add LoopVectorize/VPlan subdirectory for VPlan printing tests. (#180611)Florian Hahn38-45/+12
Add a new VPlan subdirectory as common place for tests checking VPlan printing. It contains a lit.local.cfg that only runs the tests when assertions are enabled. This removes the need to add explicit REQUIRES: asserts to VPlan tests. PR: https://github.com/llvm/llvm-project/pull/180611
2026-02-12[VPlan] Introduce m_c_Logical(And|Or) (#180048)Ramkumar Ramachandra1-0/+83
2026-02-11[LV] Don't scalarize loads that need predication in legacy CM.Florian Hahn1-0/+83
The legacy cost model tries to scalarize loads that are used as pointers. Skip if the load would need predicating when scalarized, because that would incur very high costs, see useEmulatedMaskMemRefHack. Fixes https://github.com/llvm/llvm-project/issues/180780.
2026-02-11[LAA] Use SCEVPtrToAddr in tryToCreateDiffChecks. (#178861)Florian Hahn40-383/+383
The checks created by LAA only compute a pointer difference and do not need to capture provenance. Use SCEVPtrToAddr instead of SCEVPtrToInt for computations. To avoid regressions while parts of SCEV are migrated to use PtrToAddr this adds logic to rewrite all PtrToInt to PtrToAddr if possible in the created expressions. This is needed to avoid regressions. Similarly, if in the original IR we have a PtrToInt, SCEVExpander tries to re-use it if possible when expanding PtrToAddr. Depends on https://github.com/llvm/llvm-project/pull/178727. Fixes https://github.com/llvm/llvm-project/issues/156978. PR: https://github.com/llvm/llvm-project/pull/178861
2026-02-10[VPlan] Reject partial reductions with invalid costs in getScaledReds. (#180438)Florian Hahn1-0/+95
Check if costs for partial reductions are valid up-front in getScaledReductions instead when transforming each link in the chain in transformToPartialReduction. This ensures that we either transform all entries in the chain together, or none via the existing invalidation logic. This fixes a crash when a link in the chain would have invalid cost, as in the added test cases. Fixes https://github.com/llvm/llvm-project/issues/180340. PR: https://github.com/llvm/llvm-project/pull/180438
2026-02-10[VPlan] Use UTC to auto-generate more VPlan checks.Florian Hahn3-284/+403
Update more VPlan tests to use auto-generated check lines via new UTC support.
2026-02-10[VPlan] Fix convertToPhisToBlends folding non poison blend to poison (#180686)Luke Lau1-0/+85
This fixes a miscompile in #180005 where we didn't check that the first incoming value isn't poison. We should use the first non-poison incoming value if it exists, or just poison if all the incoming values are poison.
2026-02-10[VPlan] Add `-vplan-print-after=` option (#178700)Andrei Elovikov1-0/+29
UpdateTestChecks support is updated in subsequent https://github.com/llvm/llvm-project/pull/178736.
2026-02-10[LoopVectorizer] Generate test checks (NFC)Nikita Popov1-4/+94
2026-02-10[LV] Handle partial sub-reductions with sub in middle block. (#178919)Sander de Smalen3-24/+102
Sub-reductions can be implemented in two ways: (1) negate the operand in the vector loop (the default way). (2) subtract the reduced value from the init value in the middle block. Note that both ways keep the reduction itself as an 'add' reduction, which is necessary because only llvm.vector.partial.reduce.add exists. The ISD nodes for partial reductions don't support folding the sub/negation into its operands because the following is not a valid transformation: ``` sub(0, mul(ext(a), ext(b))) -> mul(ext(a), ext(sub(0, b))) ``` It can therefore be better to choose option (2) such that the partial reduction is always positive (starting at '0') and to do a final subtract in the middle block. For AArch64 there are no dot-product instructions that can do a `partial.reduce.sub(acc, mul(ext(a), ext(b)))` operation. I'm not sure if such instructions exist for other targets. (If so then we may want to make this decision a target option) This PR also increases the AArch64 cost of a partial sub-reduction when this exists in an 'add-sub' reduction chain. Fixes https://github.com/llvm/llvm-project/issues/178703
2026-02-10Reland "[LV] Support conditional scalar assignments of masked operations" ↵Benjamin Maxwell3-9/+1244
(#180708) This patch extends the support added in #158088 to loops where the assignment is non-speculatable (e.g. a conditional load or divide). For example, the following loop can now be vectorized: ``` int simple_csa_int_load( int* a, int* b, int default_val, int N, int threshold) { int result = default_val; for (int i = 0; i < N; ++i) if (a[i] > threshold) result = b[i]; return result; } ``` It does this by extending the recurrence matching from only looking for selects, to include phis where all operands are the header phi, except for one which can be an arbitrary value outside the recurrence. --- Reverts llvm/llvm-project#180275 (original PR: #178862) Additional type legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` was added in #180290, which should resolve the backend crashes on x86.
2026-02-10InstCombine: Use SimplifyDemandedFPClass on fmul (#177490)Matt Arsenault1-3/+3
Start trying to use SimplifyDemandedFPClass on instructions, starting with fmul. This subsumes the old transform on multiply of 0. The main change is the introduction of nnan/ninf. I do not think anywhere was systematically trying to introduce fast math flags before, though a few odd transforms would set them. Previously we only called SimplifyDemandedFPClass on function returns with nofpclass annotations. Start following the pattern of SimplifyDemandedBits, where this will be called from relevant root instructions. I was wondering if this should go into InstCombineAggressive, but that apparently does not make use of InstCombineInternal's worklist.
2026-02-10[VPlan] Simplify true && x -> x (#179426)Mel Chen2-9/+6
2026-02-09[LV] Add FindLast tests where IV-based expression could be sunk. (NFC)Florian Hahn1-0/+934
Add set of FindLast tests where the selected expression is based on an IV and could be sunk.
2026-02-09[VPlan] Auto-generate CHECKs in some VPlan printing tests.Florian Hahn4-150/+251
Use new UTC support to re-generate check lines.
2026-02-09[LV] Add additional tests for reductions with intermediate stores. (NFC)Florian Hahn2-3/+160
Adds missing test coverage for reductions with intermediate stores, including partial reductions with intermediate stores, as well as chained min/max reductions with intermediate stores.
2026-02-09[VPlan] Simplify single-entry VPWidenPHIRecipe.Florian Hahn7-25/+16
Include VPWidenPHIRecipe in phi simplification if there's a single incoming value.
2026-02-09Reland "[LoopVectorize] Support vectorization of overflow intrinsics" (#180526)Vishruth Thimmaiah3-17/+560
Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`, `ssub`, `umul` and `smul` as trivially vectorizable. Fixes #174617 --- This patch is a reland of #174835. Reverts #179819
2026-02-10[IVDesc] Check loop-preheader for loop-legality when pass-remarks enabled ↵hanbeom1-0/+24
(#166310) When `-pass-remarks=loop-vectorize` is specified, the subsequent logic is executed to display detailed debug messages even if no PreHeader exists in the loop. Therefore, an assert occurs when the `getLoopPreHeader()` function is called. This commit resolves that issue. Fixed: #165377
2026-02-09Revert "[VPlan] Add missing REQUIRES: asserts to VPlan output test"Luke Lau1-1/+0
This reverts commit 2805c8aaa61a94ef22ac76c8dac56f7dfe970651. This added the REQUIRES line to the wrong test, 041ce9f added it to the correct one.
2026-02-09[LV][NFC] Add "REQUIRES: assert" to new test file (#180522)David Sherwood1-0/+1
Fixes a minor test regression introduced by https://github.com/llvm/llvm-project/pull/180226 in file llvm/test/Transforms/LoopVectorize/phi-with-fastflags-vplan.ll
2026-02-09[LV] Fix issue in VPFirstOrderRecurrencePHIRecipe::usesFirstLaneOnly (#179977)David Sherwood1-0/+52
In some cases we decide to vectorise loops with first-order recurrences using VF=1, IC>1. We then attempt to unroll a vplan in replicateByVF, however when trying to erase the list of values from the parent we trigger the following assert: ``` virtual llvm::VPRecipeValue::~VPRecipeValue(): Assertion `Users.empty() && "trying to delete a VPRecipeValue with remaining users"' failed. ``` The problem seems to stem from this code: ``` DefR->replaceUsesWithIf(LaneDefs[0], [DefR](VPUser &U, unsigned) { return U.usesFirstLaneOnly(DefR); }); ``` since usesFirstLaneOnly returns false and we fail to replace uses of DefR with LaneDefs[0]. Upon inspection the only VPUser objects that return false are VPInstruction::FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHIRecipe. Since the values are all scalar it's simply not possible for us to be using anything other than the first lane. I've fixed this by bailing out of replicateByVF early for plans with only a scalar VF. Fixes https://github.com/llvm/llvm-project/issues/179671
2026-02-09[VPlan] Skip applying InstsToScalarize with forced instr costs. (#168269)Florian Hahn1-0/+75
ForceTargetInstructionCost in the legacy cost model overrides any costs from InstsToScalarize. Match the behavior in the VPlan-based cost model. This fixes a crash with -force-target-instr-cost for the added test case. PR: https://github.com/llvm/llvm-project/pull/168269
2026-02-09[VPlan] Add missing REQUIRES: asserts to VPlan output testLuke Lau1-0/+1
Should fix https://lab.llvm.org/buildbot/#/builders/11/builds/33293
2026-02-09[VPlan] Propagate FastMathFlags from phis to blends (#180226)Luke Lau2-0/+120
If a phi has fast math flags, we can propagate it to the widened select. To do this, this patch makes VPPhi and VPBlendRecipe subclasses of VPRecipeWithIRFlags, and propagates it through PlainCFGBuilder and VPPredicator. Alive2 proofs for some of the FMFs (it looks like it can't reason about the full "fast" set yet) nnan: https://alive2.llvm.org/ce/z/f0bRd4 nsz: https://alive2.llvm.org/ce/z/u9P96T The actual motivation for this to eventually be able to move the special casing for tail folding in LoopVectorizationPlanner::addReductionResultComputation into the CFG in #176143, which requires passing through FMFs.
2026-02-08[VPlan] Use PredBB's terminator as insert point for VPIRPhi extracts.Florian Hahn1-0/+57
Use PredBB's terminator as insert point in VPIRPhi::execute to make sure the extracts are placed after any possibly sunk instructions. Fixes https://github.com/llvm/llvm-project/issues/180363.
2026-02-08[VPlan] Pass underlying instr to getMemoryOpCost in ::computeCost.Florian Hahn1-5/+45
Pass underlying instruction to getMemoryOpCost in VPReplicateRecipe::computeCost if UsedByLoadStoreAddress is true. Some targets use the underlying instruction to improve costs, and this is needed to match the legacy cost model. Fixes https://github.com/llvm/llvm-project/issues/177780. Fixes https://github.com/llvm/llvm-project/issues/177772.
2026-02-08[VPlan] Fall back to legacy cost model if PtrSCEV is nullptr.Florian Hahn1-0/+54
There are some cases when PtrSCEV can be nullptr. Fall back to legacy cost model, to not call isLoopInvariant with nullptr. Fixes a crash after 0c4f8094939d2.
2026-02-06Revert "[LV] Support conditional scalar assignments of masked operations" ↵Kewen Meng3-1244/+9
(#180275) Reverts llvm/llvm-project#178862 revert to unblock bot: https://lab.llvm.org/buildbot/#/builders/206/builds/13225
2026-02-06Reapply "[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible. ↵Florian Hahn1-3/+3
(#180257)" This reverts commit cb905605b2e95f88296afe136b21a7d2476cb058. Recommit the patch with a small change to check the destination type matches the address type, to avoid a crash on mismatch. Original message: This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using SCEVPtrToInt allows re-use, because the IR already contains a re-usable phi using PtrToInt. This is a first step towards migrating to SCEVPtrToAddr and avoids regressions in follow-up changes. PR: https://github.com/llvm/llvm-project/pull/178727
2026-02-06Revert "[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible." ↵Florian Hahn1-3/+3
(#180257) Reverts llvm/llvm-project#178727 triggers asserts in on some build bots
2026-02-06[SCEVExp] Use SCEVPtrToAddr in tryToReuseLCSSAPhi if possible. (#178727)Florian Hahn1-3/+3
This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using SCEVPtrToInt allows re-use, because the IR already contains a re-usable phi using PtrToInt. This is a first step towards migrating to SCEVPtrToAddr and avoids regressions in follow-up changes. PR: https://github.com/llvm/llvm-project/pull/178727
2026-02-06[VPlan] Simplify x & AllOnes -> x (#180049)Ramkumar Ramachandra3-54/+40
2026-02-06[VPlan] Add ExitingIVValue VPInstruction. (#175651)Florian Hahn1-17/+6
Add a new VPInstruction opcode to compute the exiting value of an induction variable after vectorization. This replaces the pattern of extracting the last lane from the last part of the induction backedge value when applicable. This allows us to always use the pre-computed IV end value. It will also allow unifying end value creation for both induction resume and exit values. PR: https://github.com/llvm/llvm-project/pull/175651
2026-02-06[LV] Support conditional scalar assignments of masked operations (#178862)Benjamin Maxwell3-9/+1244
This patch extends the support added in #158088 to loops where the assignment is non-speculatable (e.g. a conditional load or divide). For example, the following loop can now be vectorized: ``` int simple_csa_int_load( int* a, int* b, int default_val, int N, int threshold) { int result = default_val; for (int i = 0; i < N; ++i) if (a[i] > threshold) result = b[i]; return result; } ``` It does this by extending the recurrence matching from only looking for selects, to include phis where all operands are the header phi, except for one which can be an arbitrary value outside the recurrence.
2026-02-06[VPlan] Ignore poison incoming values when creating blend (#180005)Luke Lau3-34/+66
We have an optimization in VPPredicator when creating blends where if all the incoming values are the same, we just return that value. This extends it to handle cases like "phi [%x, %x, poison, %x]" by ignoring poison values. This is split off from #176143 to prevent regressions when maintaining SSA by adding PHIs with a poison incoming value.
2026-02-05[LV] Regen a VPlan-printing test with UTC (#179948)Ramkumar Ramachandra1-664/+978
Post 49288b65 ([UTC] Add initial VPlan support, #178534), we can generate VPlan-printing tests with UTC. Do it for one test, with the caveat that two Final VPlan prints are no longer checked.
2026-02-05[VPlan] Auto-generate some VPlan check lines.Florian Hahn4-672/+949
Use new UTC support to auto-generate some check lines to make them easier to update in the future.
2026-02-05[AArch64] Add FeatureUseFixedOverScalableIfEqualCost to Neoverse-V3 and ↵David Green3-209/+7
Neoverse-V3ae (#179903) This was missing from neoverse-v3 and neoverse-v3ae, but should be present like neoverse-v2.
2026-02-05Revert "[LoopVectorize] Support vectorization of overflow intrinsics" (#179819)Alexander Kornienko3-560/+17
Reverts llvm/llvm-project#174835, which causes clang crashes. See https://github.com/llvm/llvm-project/pull/174835#issuecomment-3844233831 and https://github.com/llvm/llvm-project/issues/179671 for details.
2026-02-05[LV] Optimize FindLast recurrences to FindIV (NFCI). (#177870)Florian Hahn1-1/+2
This patch restructures Find(First|Last)IV handling. Instead of differentiating between FindLast, FindFirstIV and FindLastIV up front, this patch simplifies the logic in IVDescriptor to just identify the FindLast pattern up-front. It then adds a new VPlan transformation to optimize FindLast reductions to FindIV reductions if there is a suitable sentinel value. Find(Last|First)IV recurrence kinds to a single FindIV kind. This is simpler and more accurate, given selecting the first/last induction of the final IV reduction is directly controlled by the corresponding recurrence kind of the ComputeReductionResult. The new structure also allows further optimizations, like vectorizing FindLastIV with another boolean reduction that tracks if the condition in the loop was ever true, if there is no suitable sentinel value. PR: https://github.com/llvm/llvm-project/pull/177870
2026-02-05[VPlan] Create edge mask for single-destination switch (#179107)nora1-4/+160
When converting phis to blends, the `VPPredicator` expects to have edge masks to the phi node if the phi node has different incoming blocks. This was not the case if the predecessor of the phi was a switch where a conditional destination was the same as the default destination. This was because when creating edge masks in `createSwitchEdgeMasks`, edge masks are set in a loop through the *non-default* destinations. But when there are no non-default destinations (but at least one condition, otherwise an earlier condition would trigger and just forward the source mask), this loop is never executed, so the masks are never set. To resolve this, we explicitly forward the source mask for these cases as well, which is correct because it is an unconditional branch, just a very convoluted one. fixes #179074
2026-02-04[LV] Use DomTree DFS numbers to sort early exit blocks.Florian Hahn1-0/+31
properlyDominates does not provide a strict weak ordering. Use DFS in numbers instead, to avoid ordering violations.
2026-02-04[UTC] Add initial VPlan support. (#178534)Florian Hahn1-651/+779
Add support for extracting a VPlan from LV debug output and generalizing matching for unnamed VPValues. Once we have support for -vplan-print-after=xxxx we can strip the logic to extract a VPlan manually. We cannot use regex, as we need to match from start opening bracket to the correct closing bracket. PR: PR: https://github.com/llvm/llvm-project/pull/178534
2026-02-03[VPlan] Refine exit select check in transformtoPartialReduction.Florian Hahn1-0/+54
Make sure we find the actual select for the exit users and only use it for the final link in the chain. This fixes a miscompile after 90b3712d8a20efa2cbaadc177da576e485dce038.
2026-02-03[VPlan] Sink recipes from the vector loop region in licm. (#168031)Mel Chen49-432/+439
When a recipe can be safely sunk and all of its users are outside the vector loop region in the same dedicated exit block, the recipe does not need to be executed on every iteration. This patch extends the VPlan-based LICM (Loop Invariant Code Motion) to also sink such recipes from the vector loop region into the exit block. This reduces redundant computation and improves cost model accuracy. TODO: Support nested loop sinking TODO: Support sinking `VPReplicateRecipe` (requires `replicateByVF` fixes) TODO: Support recipes with multiple defined values (e.g., interleaved loads) TODO: Clone recipes without users to all exit blocks TODO: Support PHI node users by checking incoming value blocks TODO: Support sinking when users are in multiple blocks TODO: Clone recipes when users are on multiple exit paths Co-authored-by: Luke Lau <luke@igalia.com> --------- Co-authored-by: Luke Lau <luke@igalia.com> Co-authored-by: Luke Lau <luke_lau@icloud.com>
2026-02-01[VPlan] Fold (x | !x) -> true. (#177887)Florian Hahn3-7/+42
PR: https://github.com/llvm/llvm-project/pull/177887