aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Analysis/ScalarEvolution.cpp
AgeCommit message (Collapse)AuthorFilesLines
2024-08-13[SCEV] Fix -Wrange-loop-construct (NFC)Jie Fu1-1/+1
/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:12009:21: error: loop variable '[S, Mul]' creates a copy from type 'const value_type' (aka 'const llvm::detail::DenseMapPair<const llvm::SCEV *, int>') [-Werror,-Wrange-loop-construct] for (const auto [S, Mul] : Multiplicity) { ^ /llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:12009:10: note: use reference type 'const value_type &' (aka 'const llvm::detail::DenseMapPair<const llvm::SCEV *, int> &') to prevent copying for (const auto [S, Mul] : Multiplicity) { ^~~~~~~~~~~~~~~~~~~~~ &
2024-08-13[SCEV] Handle more add/addrec mixes in computeConstantDifference() (#101999)Nikita Popov1-48/+80
computeConstantDifference() can currently look through addrecs with identical steps, and then through adds with identical operands (apart from constants). However, it fails to handle minor variations, such as two nested add recs, or an outer add with an inner addrec (rather than the other way around). This patch supports these cases by adding a loop over the simplifications, limited to a small number of iterations. The motivation is the same as in #101339, to make computeConstantDifference() powerful enough to replace existing uses of `dyn_cast<SCEVConstant>(getMinusSCEV())` with it. Though as the IR test diff shows, other callers may also benefit.
2024-08-12[SCEV] Consolidate code for proving wrap flags of controlling finite IVs ↵Philip Reames1-47/+27
(#101404) The canAssumeNoSelfWrap routine in howManyLessThans was doing two subtly inter-related things. First, it was proving no-self-wrap. This exactly duplicates the existing logic in the caller. Second, it was establishing the precondition for the nw->nsw/nuw inference. Specifically, we need to know that *this* exit must be taken for the inference to be sound. Otherwise, another (possible abnormal) exit could be taken in the iteration where this IV would become poison. This change moves all of that logic into the caller, and caches the resulting nuw/nsw flags in the AddRec. This centralizes the logic in one place, and makes it clear that it all depends on controlling the sole exit. We do loose a couple cases with SCEV predication. Specifically, if SCEV predication was able to convert e.g. zext(addrec) into an addrec(zext) using predication, but didn't record the nuw fact on the new addrec, then the consuming code can no longer fix this up. I don't think this case particularly matters. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2024-08-12[SCEV] Fix incorrect extension in computeConstantDifference()Nikita Popov1-3/+8
The Mul factor was zero-extended here, resulting in incorrect results for integers larger than 64-bit. As we currently only multiply by 1 or -1, just split this into two cases -- there's no need for a full multiplication here. Fixes https://github.com/llvm/llvm-project/issues/102597.
2024-08-10[Analysis] Use llvm::set_is_subset (NFC) (#102766)Kazu Hirata1-3/+1
2024-08-09[llvm] Construct SmallVector with ArrayRef (NFC) (#102712)Kazu Hirata1-2/+1
Without this patch, the constructor arguments come from SmallVectorImpl, not ArrayRef. This patch switches them to ArrayRef so that we can construct SmallVector with a single argument. Note that LLVM Programmer’s Manual prefers ArrayRef to SmallVectorImpl for flexibility.
2024-08-02[SCEV] Fix warning (NFC)Nikita Popov1-1/+1
Produces -Wrange-loop-construct on some buildbots.
2024-08-02[SCEV] Handle more adds in computeConstantDifference() (#101339)Nikita Popov1-26/+25
Currently it only deals with the case where we're subtracting adds with at most one non-constant operand. This patch extends it to cancel out common operands for the subtraction of arbitrary add expressions. The background here is that I want to replace a getMinusSCEV() call in LAA with computeConstantDifference(): https://github.com/llvm/llvm-project/blob/93fecc2577ece0329f3bbe2719bbc5b4b9b30010/llvm/lib/Analysis/LoopAccessAnalysis.cpp#L1602-L1603 This particular call is very expensive in some cases (e.g. lencod with LTO) and computeConstantDifference() could achieve this much more cheaply, because it does not need to construct new SCEV expressions. However, the current computeConstantDifference() implementation is too weak for this and misses many basic cases. This is a step towards making it more powerful while still keeping it pretty fast.
2024-08-02[SCEV] Unify and optimize constant folding (NFC) (#101473)Nikita Popov1-100/+92
Add a common constantFoldAndGroupOps() helper that takes care of constant folding and grouping transforms that are common to all nary ops. This moves the constant folding prior to grouping, which is more efficient, and excludes any constant from the sort. The constant folding has hooks for folding, identity constants and absorber constants. This gives a compile-time improvement for SCEV-heavy workloads like lencod.
2024-08-01[SCEV] Prove no-self-wrap from negative power of two step (#101416)Philip Reames1-5/+10
We have existing code which reasons about a step evenly dividing the iteration space is a finite loop with a single exit implying no-self-wrap. The sign of the step doesn't effect this. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2024-07-31[SCEV] Use power of two facts involving vscale when inferring wrap flags ↵Philip Reames1-59/+71
(#101380) SCEV has logic for inferring wrap flags on AddRecs which are known to control an exit based on whether the step is a power of two. This logic only considered constants, and thus did not trigger for steps such as (4 x vscale) which are common in scalably vectorized loops. The net effect is that we were very sensative to the preservation of nsw/nuw flags on such IVs, and could not infer trip counts if they got lost for any reason. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2024-07-30[SCEV] Fix outdated comment (NFC)Nikita Popov1-16/+1
The EqCache parameter has been removed.
2024-07-30Remove value cache in SCEV comparator. (#100721)Johannes Reifferscheid1-17/+10
The cache triggers almost never, and seems unlikely to help with performance. However, when it does, it is likely to cause the comparator to become inconsistent due to a bad interaction of the depth limit and cache hits. This leads to crashes in debug builds. See the new unit test for a reproducer.
2024-07-30[SCEV] Avoid unnecessary computeConstantDifference() call (NFC)Nikita Popov1-1/+3
No need to do the second one if the first one already failed.
2024-07-09[SCEV] forgetValue: support (with-overflow-inst op0, op1) (#98015)v01dXYZ1-1/+1
The use-def walk in forgetValue() was skipping instructions with non-SCEVable types. However, SCEV may look past with.overflow intrinsics returning aggregates. Fixes #97586.
2024-07-02[SCEV] Split collecting and applying rewrite info from loop guards (NFC) ↵Florian Hahn1-158/+178
(#97316) Introduce a new LoopGuards class to track info from loop guards and split off collecting rewrite info to LoopGuards::collect. This allows users of applyLoopGuards to collect rewrite info once in cases where the same loop guards are applied multiple times. This is used to collect rewrite info once in howFarToZero, which saves a bit of compile-time: stage1-O3: -0.04% stage1-ReleaseThinLTO: -0.02% stage1-ReleaseLTO-g: -0.04% stage2-O3: -0.02% https://llvm-compile-time-tracker.com/compare.php?from=117b53ae38428ca66eaa886fb432e6f09db88fe4&to=4ffb7b2e1c99081ccebe6f236c48a0be2f64b6ff&stat=instructions:u Notably this improves mafft by -0.9% with -O3, -0.11% with LTO and -0.12% with stage2-O3. PR: https://github.com/llvm/llvm-project/pull/97316
2024-06-28[SCEV] Cache DataLayout in class (NFC)Nikita Popov1-3/+3
PR #96919 caused a minor compile-time regression, mostly because SCEV now goes through an extra out-of-line function to fetch the data layout, and does this a lot. Cache the DataLayout in SCEV to avoid these repeated calls.
2024-06-25[SCEV] Support addrec in right hand side in howManyLessThans (#92560)vaibhav1-162/+210
Fixes #92554 (std::reverse will auto-vectorize now) When calculating number of times a exit condition containing a comparison is executed, we mostly assume that RHS of comparison should be loop invariant, but it may be another add-recurrence. ~In that case, we can try the computation with `LHS = LHS - RHS` and `RHS = 0`.~ (It is not valid unless proven that it doesn't wrap) **Edit:** We can calculate back edge count for loop structure like: ```cpp left = left_start right = right_start while(left < right){ // ...do something... left += s1; // the stride of left is s1 (> 0) right -= s2; // the stride of right is -s2 (s2 > 0) } // left and right converge somewhere in the middle of their start values ``` We can calculate the backedge-count as ceil((End - left_start) /u (s1- (-s2)) where, End = max(left_start, right_start). **Alive2**: https://alive2.llvm.org/ce/z/ggxx58
2024-06-25[Analysis] Use range-based for loops (NFC) (#96587)Kazu Hirata1-16/+14
2024-06-21[llvm] format and terminate namespaces with closing comment (#94917)Mohammed Keyvanzadeh1-1/+1
Namespaces are terminated with a closing comment in the majority of the codebase so do the same here for consistency. Also format code within some namespaces to make clang-format happy.
2024-06-21[SCEV] Avoid unnecessary call to getExitingBlock() in computeExitLimit() ↵Enna11-5/+4
(#96188) In `computeExitLimit()`, we use `getExitingBlock()` to check if loop has exactly one exiting block. Since `computeExitLimit()` is only used in `computeBackedgeTakenCount()`, and `getExitingBlocks()` is called to get all exiting blocks in `computeBackedgeTakenCount()`, we can simply check if loop has exactly one exiting block by checking if the number of exiting blocks equals 1 in `computeBackedgeTakenCount()` and pass it as an argument to `computeExitLimit()`. This change helps to improve the compile time for files containing large loops.
2024-06-20[SCEV] Handle nusw/nuw gep flags for addrecsNikita Popov1-9/+11
Set the nw flag is gep has any nowrap flags. Transfer the nuw flag. Also set nuw for the nusw + nneg combination.
2024-06-20[SCEV] Transfer gep nusw and nuw flagsNikita Popov1-11/+14
nusw implies nsw offset and nuw base+offset arithmetic if offset is non-negative. nuw implies nuw offset and base+offset arithmetic. As usual, we can only transfer is poison implies UB.
2024-06-19[SCEV] Use context sensitive reasoning in howFarToZero (#94525)Philip Reames1-4/+14
This change builds on 0a357ad which supported non-constant strides in howFarToZero, but used only context insensitive reasoning. This change does two things: 1) Directly use context sensitive queries to prove facts established before the loop. Note that we technically only need facts known at the latch, but using facts known on entry is a conservative approximation which will cover most everything. 2) For the non-zero check, we can usually prove non-zero from the finite assumption implied by mustprogress. This eliminates the need to do the context sensitive query in the common case.
2024-06-19[SCEVExpander] Recognize urem idiom during expansion (#96005)Philip Reames1-0/+3
If we have a urem expression, emitting it as a urem is significantly better that letting the fully expansion kick in. We have the risk of a udiv or mul which could have previously been shared, but loosing that seems like a reasonable tradeoff for being able to round trip a urem w/o modification.
2024-06-05[SCEV] Support non-constant step in howFarToZero (#94411)Philip Reames1-11/+10
VF * vscale is the canonical step for a scalably vectorized loop, and LFTR canonicalizes to NE loop tests, so having our trip count logic be unable to compute trip counts for such loops is unfortunate. The existing code needed minimal generalization to handle non-constant strides. The tricky cases to be sure we handle correctly are: zero, and -1 (due to the special case of abs(-1) being non-positive). This patch does the full generalization in terms of code structure, but in practice, this seems unlikely to benefit anything beyond the (C * vscale) case. I did some quick investigation, and it seems the context free non-zero, and sign checks are basically never disproved for arbitrary scales. I think we have alternate tactics available for these, but I'm going to return to that in a separate patch.
2024-06-03[SCEV] Preserve flags in SCEVLoopGuardRewriter for add and mul. (#91472)Florian Hahn1-5/+54
SCEVLoopGuardRewriter only replaces operands with equivalent values, so we should be able to transfer the flags from the original expression. PR: https://github.com/llvm/llvm-project/pull/91472
2024-05-28[SCEV] Add predicated version of getSymbolicMaxBackedgeTakenCount. (#93498)Florian Hahn1-4/+44
This patch adds a predicated version of getSymbolicMaxBackedgeTakenCount. The intended use for this is loop access analysis for loops with uncountable exits. When analyzing dependences and computing runtime checks, we need the smallest upper bound on the number of iterations. In terms of memory safety, it shouldn't matter if any uncomputable exits leave the loop, as long as we prove that there are no dependences given the minimum of the countable exits. The same should apply also for generating runtime checks. PR: https://github.com/llvm/llvm-project/pull/93498
2024-05-28[SCEV] Compute symbolic max backedge taken count in BTI directly. (NFC)Florian Hahn1-26/+22
Move symbolic max backedge taken count computation to BackedgeTakenInfo, use existing ExitNotTaken info. In preparation for https://github.com/llvm/llvm-project/pull/93498.
2024-05-26[SCEV] Don't add predicates already implied by UnionPredicate. (#93397)Florian Hahn1-1/+3
Update SCEVUnionPredicate::add to only add predicates from another union predicate, if they aren't alread implied by the union predicate we add them to. Note that there exists logic elsewhere to avoid adding predicates if they are already implied, but this logic misses cases when only some predicates of a union predicate are implied by the current set of predicates. PR: https://github.com/llvm/llvm-project/pull/93397
2024-05-23[SCEV] Support ule/sle exit counts via widening (#92206)Nikita Popov1-2/+19
If we have an exit condition of the form IV <= Limit, we will first try to convert it into IV < Limit+1 or IV-1 < Limit based on range info (in icmp simplification). If that fails, we try to convert it to IV < Limit + 1 based on controlling exits in non-infinite loops. However, if all else fails, we can still determine the exit count by rewriting to ext(IV) < ext(Limit) + 1, where the zero/sign extension ensures that the addition does not overflow. Proof: https://alive2.llvm.org/ce/z/iR-iYd
2024-05-20[SCEV] Don't use non-deterministic constant folding for trip counts (#90942)Nikita Popov1-2/+4
When calculating the exit count exhaustively, if any of the involved operations is non-deterministic, the exit count we compute at compile-time and the exit count at run-time may differ. Using these non-deterministic constant folding results is only correct if we actually replace all uses of the instruction with the value. SCEV (or its consumers) generally don't do this. Handle this by adding a new AllowNonDeterministic flag to the constant folding API, and disabling it in SCEV. If non-deterministic results are not allowed, do not fold FP lib calls in general, and FP operations returning NaNs in particular. This could be made more precise (some FP libcalls like fabs are fully deterministic), but I don't think this that precise handling here is worthwhile. Fixes the interesting part of https://github.com/llvm/llvm-project/issues/89885.
2024-05-13[SCEV] Swap order of arguments to MatchBinaryAddToConst (NFCI). (#91945)Florian Hahn1-2/+2
The argument order to MatchBinaryAddToConst doesn't match the comment and also is counter-intuitive (passing RHS before LHS, C2 before C1). This patch adjusts the order to be inline with the calls above, which should be equivalent, but more natural: https://alive2.llvm.org/ce/z/ZWGp-Z PR: https://github.com/llvm/llvm-project/pull/91945
2024-05-09Replace uses of ConstantExpr::getCompare. (#91558)Eli Friedman1-3/+1
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands in other places. This only changes places where the either the fold is guaranteed to succeed, or the code doesn't use the resulting compare if we fail to fold.
2024-04-18[IR] Drop poison-generating return attributes when necessary (#89138)Andreas Jonson1-1/+1
Rename has/dropPoisonGeneratingFlagsOrMetadata to has/dropPoisonGeneratingAnnotations and make it also handle nonnull, align and range return attributes on calls, similar to the existing handling for !nonnull, !align and !range metadata.
2024-04-16[ValueTracking] Restore isKnownNonZero parameter order. (#88873)Harald van Dijk1-1/+1
Prior to #85863, the required parameters of llvm::isKnownNonZero were Value and DataLayout. After, they are Value, Depth, and SimplifyQuery, where SimplifyQuery is implicitly constructible from DataLayout. The change to move Depth before SimplifyQuery needed callers to be updated unnecessarily, and as commented in #85863, we actually want Depth to be after SimplifyQuery anyway so that it can be defaulted and the caller does not need to specify it.
2024-04-12[ValueTracking] Convert `isKnownNonZero` to use SimplifyQuery (#85863)Yingwei Zheng1-1/+1
This patch converts `isKnownNonZero` to use SimplifyQuery. Then we can use the context information from `DomCondCache`. Fixes https://github.com/llvm/llvm-project/issues/85823. Alive2: https://alive2.llvm.org/ce/z/QUvHVj
2024-04-12[SCEV] Add range attribute handling (#88449)Andreas Jonson1-1/+8
2024-04-10[SCEV] Fix BinomialCoefficient Iteration to fit in W bits (#88010)annamthomas1-4/+2
BinomialCoefficient computes the value of W-bit IV at iteration It of a loop. When W is 1, we can call multiplicative inverse on 0 which triggers an assert since 1b76120. Since the arithmetic is supposed to wrap if It or K does not fit in W bits, do the truncation into W bits after we do the shift. Fixes #87798
2024-04-04[APInt] Add a simpler overload of multiplicativeInverse (#87610)Jay Foad1-8/+3
The current APInt::multiplicativeInverse takes a modulus which can be any value, but all in-tree callers use a power of two. Moreover, most callers want to use two to the power of the width of an existing APInt, which is awkward because 2^N is not representable as an N-bit APInt. Add a new overload of multiplicativeInverse which implicitly uses 2^BitWidth as the modulus.
2024-03-10Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678)Justin Lebar1-4/+3
For some reason this was missing from STLExtras.
2024-03-06[SCEV] Match both (-1)b + a and a + (-1)b as a - b (#84247)Philip Reames1-11/+21
In our analysis of guarding conditions, we were converting a-b == 0 into a == b alternate form, but we were only checking for one of the two forms for the sub. There's no requirement that the multiply only be on the LHS of the add.
2024-03-06[SCEV] Extend type hint in analysis output to all backedge kindsPhilip Reames1-17/+33
This extends the work from 7755c26 to all of the different backend taken count kinds that we print for the scev analysis printer. As before, the goal is to cut down on confusion as i4 -1 is a very different (unsigned) value from i32 -1.
2024-03-06[SCEV] Print predicate backedge count only if new information availablePhilip Reames1-11/+12
When printing the result of SCEV's analysis, we can avoid printing the predicated backedge taken count and the predicates if the predicates are empty and no new information is provided. This helps to reduce the verbosity of the output.
2024-03-06[SCEV] Include type when printing constant max backedge taken countPhilip Reames1-1/+2
When printing the result of the analysis, i8 -1 and i64 -1 are quite different in terms of analysis quality. In a recent conversion with a new contributor, we ran into exactly this confusion. Adding the type for constant scevs more globally seems worthwhile, but introduces a much larger test diff. I'm splitting this off first since it addresses the immediate need, and then going to do some further changes to clarify a few related bits of analysis result output.
2024-03-04[LV] Handle scalable VFs in optimizeForVFAndUF (#82669)Philip Reames1-0/+7
Given a scalable VF of the form <NumElts * VScale>, this patch adds the ability to discharge a backedge test for a loop whose trip count is between (NumElts, MinVScale*NumElts). A couple of notes on this: * Annoyingly, I could not figure out to write a test for this case. My attempt is checked in as test32_i8 in f67ef1a, but LV uses a fixed vector in that case, and ignored the force flags. * This depends on 9eb5f94f to avoid appearing like a regression. Since SCEV doesn't know any upper bound on vscale without the vscale_range attribute (it doesn't query TTI), the ranges overflow on the multiply. Arguably, this is fixing a bug in the current LV code since in theory vscale can be large enough to overflow for real, but no actual target is going to see that case.
2024-02-02[SCEV] Move canReuseInstruction() helper into SCEV (NFC)Nikita Popov1-0/+62
To allow reusing it in IndVars.
2024-01-12[SCEV] Special case sext in isKnownNonZero (#77834)Philip Reames1-0/+4
The existing logic in isKnownNonZero relies on unsigned ranges, which can be problematic when our range calculation is imprecise. Consider the following: %offset.nonzero = or i32 %offset, 1 --> %offset.nonzero U: [1,0) S: [1,0) %offset.i64 = sext i32 %offset.nonzero to i64 --> (sext i32 %offset.nonzero to i64) U: [-2147483648,2147483648) S: [-2147483648,2147483648) Note that the unsigned range for the sext does contain zero in this case despite the fact that it can never actually be zero. Instead, we can push the query down one level - relying on the fact that the sext is an invertible operation and that the result can only be zero if the input is. We could likely generalize this reasoning for other invertible operations, but special casing sext seems worthwhile.
2023-12-22[SCEV] Ensure shift amount is in range before calling getZExtValue()Simon Pilgrim1-3/+4
Fixes #76234
2023-12-12[SCEV] Use loop guards when checking that RHS >= Start (#75039)Nikita Popov1-1/+5
Loop guards tend to provide better results when it comes to reasoning about ranges than isLoopEntryGuardedByCond(). See the test change for the motivating case. I have retained both the loop guard check and the implied cond based check for now, though the latter only seems to impact a single test and only via side effects (nowrap flag calculation) at that.