Age | Commit message (Collapse) | Author | Files | Lines |
|
We currently have getMinTrailingZeros(), from which we can get a SCEV's
multiple by computing 1 << MinTrailingZeroes. However, this only gets us
multiples that are a power of 2. This patch introduces a way to get max
constant multiples that are not just a power of 2. The logic is similar
to that of getMinTrailingZeros. getMinTrailingZerosImpl is replaced by
computing the max constant multiple, and counting the number of trailing
bits.
I have so far found this useful in two places:
1) Computing unsigned constant ranges. For example, if we have i8
{10,+,10}<nuw>, we know the max constant it can be is 250.
2) My original intent was to use this in getSmallConstantTripMultiples,
but it has no effect right now due to change from D110587. For
example, if we have backedge count `(6 * %N) - 1`, the trip count
becomes `1 + zext((6 * %N) - 1)`, and we cannot say that 6 is a
multiple of the SCEV. I plan to look further into this separately.
The implementation assumes the value is unsigned. It can probably be
extended to handle signed values as well.
If the code sees that a SCEV does not have <nuw>, it will fall back to
finding the max multiple that is a power of 2. Multiples that are a
power of 2 will still be a multiple even after the SCEV overflows. This
does not apply to other values. This is the 1st commit message:
---
This relands https://reviews.llvm.org/D141823. The verification fails
when expensive checks are turned on. This can occur when:
1. SCEV S's multiple is cached
2. SCEV S's no wrap flags are strengthened, and the multiple changes
3. SCEV verifier finds that S's cached and recomputed multiple are
different
We eliminate most cases by forgetting SCEVAddRecExpr's cached values
when the flags are modified, but there are still cases for other SCEV
types. We relax the check by making sure the cached multiple divides the
recomputed multiple, ensuring the cached multiple is correct,
conservative multiple.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D149529
|
|
The GEP minus the base pointer (which is the pre-inc addrec) is
exactly the Accum value that was already calculated.
|
|
The highest address the object can start is ObjSize bytes before the
end (unsigned max value). If this value is not a multiple of the
alignment, the last possible start value is the next lowest multiple
of the alignment. Note: The computations cannot overflow,
because if they would there's no possible start address for the
object.
At the moment, this is limited to GlobalVariables, because I could not
find a API similar to getObjectSize to also get the alignment of the
object. With such an API, this can be generalized to general addresses.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149483
|
|
No SCEVs are formed for instructions with non-scevable types, so no
other SCEV expressions can depend on them. Skip those instructions and
their users when invalidating SCEV expressions.
Depends on D144847.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D144848
|
|
When invalidating a value, we walk all users of that value and
invalidate them as well. This can be very expensive for large use
graphs.
However, we only need to invalidate a user U of instruction I if
SCEV(U) can depend on SCEV(I). This is not the case if U is an
instruction that always produces a SCEVUnknown, such as a load.
If the load pointer operand is invalidated, there is no need to
invalidate the load result, which is completely unrelated from a
SCEV perspective.
Differential Revision: https://reviews.llvm.org/D149323
|
|
As far as I understand, the IsAvailableOnEntry() function basically
implements the same functionality as the properlyDominates() block
disposition. The primary difference (apart from a weaker
implementation) seems to be in this comment at the top:
// Checks if the SCEV S is available at BB. S is considered available at BB
// if S can be materialized at BB without introducing a fault.
However, I don't really understand why there would be such a
requirement. It's my understanding that SCEV explicitly does not
care about trapping udiv instructions itself, and it's the job of
SCEVExpander's isSafeToExpand() to make sure these don't get
expanded if they may trap.
Differential Revision: https://reviews.llvm.org/D149344
|
|
SCEV expressions no longer try to preserve LCSSA form. SCEV
construction will try to look through LCSSA phi nodes. As such,
we also no longer need to limit this special-case fold.
|
|
Sometimes a phi can both be trivial and match the
createNodeFromSelectLikePHI() fold. In that case it is generally
more profitable to look through the phi node.
|
|
We no longer try to preserve LCSSA form in SCEV representation:
Nowadays, we look through LCSSA PHI nodes directly during SCEV
construction. As such, this separate special case in
getSCEVAtScope() is no longer needed.
|
|
We should be checking the current BO here, not the nested one. If
the current BO has nowrap flags (and is UB on poison), then we'll
fetch both operand SCEVs of that BO. We'll check the nested BO
on the next iteration of the do/while loop.
|
|
These are not necessarily the same (e.g. or can become add) and
this is what we're switching over in the first place.
|
|
We can create expressions either for constant operand or i1
and/or. The implementation was inverting the latter check.
|
|
This is a follow on to D147117 and D147355. In both cases, we were adding special cases to compute zext(BTC+1) instead of zext(BTC)+1 when the BTC+1 computation was known not to overflow.
Differential Revision: https://reviews.llvm.org/D148661
|
|
This lifts two TODOs from this function, allowing us to prove
no-overflow whether it happens through max int (up) or through
min int (down) for both and and sub.
Differential Revision: https://reviews.llvm.org/D148618
Reviewed By: dmakogon
|
|
This reverts commit 027a4c8b96c7f97df8e98b1dac069b956810ab94.
|
|
We currently have getMinTrailingZeros(), from which we can get a SCEV's
multiple by computing 1 << MinTrailingZeroes. However, this only gets us
multiples that are a power of 2. This patch introduces a way to get max
constant multiples that are not just a power of 2. The logic is similar
to that of getMinTrailingZeros. getMinTrailingZeros is replaced by
computing the max constant multiple, and counting the number of trailing
bits.
This is applied in two places:
1) Computing unsigned constant ranges. For example, if we have i8
{10,+,10}<nuw>, we know the max constant it can be is 250.
2) Computing trip multiples as shown in SCEV output. This is useful if
for example, we are unrolling a loop by a factor of 5, and we know
the trip multiple is 5, then we don't need a loop epilog.
If the code sees that a SCEV does not have <nuw>, it will fall back to
finding the max multiple that is a power of 2. Multiples that are a
power of 2 will still be a multiple even after the SCEV overflows.
Differential Revision: https://reviews.llvm.org/D141823
|
|
The justification in isAddRecNeverPoison() no longer applies, as
it dates back to a time where LLVM had an unconditional forward
progress guarantee. However, we also no longer need it, because we
can exploit branch on poison UB instead.
For a single exit loop (without abnormal exits) we know that all
instructions dominating the exit will be executed, so if any of
them trigger UB on poison that means that addrec is not poison.
This is slightly stronger than the previous code, because a) we
don't need the exit to also be the latch and b) we don't need the
value to be used in the exit branch in particular, any UB-producing
instruction is fine.
I don't expect much practical impact from this change, this is
mainly to clarify the reasoning behind this logic.
Differential Revision: https://reviews.llvm.org/D148633
|
|
This preserves NSW flag for AddRecs multiplied by -1 if we can prove
via constant ranges that the AddRec cannot be signed minimum.
An explanation:
Let M be signed minimum. If AddRec's range contains M, then M * (-1) will
stay M and (M + 1) * (-1) will be signed maximum, so we get a signed overflow.
In all other cases if an AddRec didn't signed overflow,
then AddRec * (-1) wouldn't too.
Differential Revision: https://reviews.llvm.org/D148084
|
|
* combine zext and sext into the one switch case
* combine vscale and udiv into one switch case
* renames according to LLVM style
|
|
SCEV determines that loops with trip count >=2^32 have a trip multiple
of 1 to guard against huge multiples. This patch stregthens this to
instead find the greatest power of 2 divisor that is less than the
threshold.
Differential Revision: https://reviews.llvm.org/D147868
|
|
|
|
This patch improves on https://reviews.llvm.org/D110587. To summarize
the patch, given backedge-taken count BC, trip count TC is `BC + 1`.
However, we don't know if BC we might overflow. So the patch modifies TC
computation to `1 + zext(BC)`.
This patch only adds the zext if necessary by looking at the constant
range. If we can determine that BC cannot be the max value for its
bitwidth, then we know adding 1 will not overflow, and the zext is not
needed. We apply loop guards before computing TC to get more data.
The primary motivation is to support my work on more precise trip
multiples in https://reviews.llvm.org/D141823. For example:
```
void test(unsigned n)
__builtin_assume(n % 6 == 0);
for (unsigned i = 0; i < n; ++i)
foo();
```
Prior to this patch, we had `TC = 1 + zext(-1 + 6 * ((6 umax %n) /u
6))<nuw>`. SCEV range computation is able to determine that the BC
cannot be the max value, so the zext is not needed. The result is `TC
-> (6 * ((6 umax %n) /u 6))<nuw>`. From here, we would be able to
determine that %n is a multiple of 6.
There was one change in LoopCacheAnalysis/LoopInterchange required.
Before this patch, if a loop has BC = false, it would compute `TC -> 1 +
zext(false) -> 1`, which was fine. After this patch, it computes `TC -> 1
+ false = true`. CacheAnalysis would then sign extend the `true`, which
was not the intended the behavior. I modified CacheAnalysis such that
it would only zero extend trip counts.
This patch is not NFC, but also does not change any SCEV outputs. I
would like to get this patch out first to make work with trip multiples
easier.
Differential Revision: https://reviews.llvm.org/D147117
|
|
Apply loop guards to AddRec's start in range computation for
non-self-wrapping AddRecs.
According to CT measurements, this has a wide negative compile time impact,
so we hold it in expensive range sharpening mode where it's not so critical.
However, we need to find a way to share benefits of this mode with default mode.
Patch by Aleksandr Popov!
Differential Revision: https://reviews.llvm.org/D147557
Reviewed By: mkazantsev
|
|
Follow LLVM coding standards and make clangd emit less warnings.
|
|
Without this, pointer IVs in loops with small constant trip counts couldn't be proven no-self-wrap. This came up in a new LSR transform, but may also benefit other SCEV consumers as well.
Differential Revision: https://reviews.llvm.org/D146596
|
|
applyLoopGuards doesn't always preserve information when there are multiple assumes.
This patch tries to deal with multiple assumes regarding a SCEV's divisibility and min/max values, and rewrite it into a SCEV that still preserves all of the information.
For example, let the trip count of the loop be TC. Consider the 3 following assumes:
1. __builtin_assume(TC % 8 == 0);
2. __builtin_assume(TC > 0);
3. __builtin_assume(TC < 100);
Before this patch, depending on the assume processing order applyLoopGuards could create the following SCEV:
max(min((8 * (TC / 8)) , 99), 1)
Looking at this SCEV, it doesn't preserve the divisibility by 8 information.
After this patch, depending on the assume processing order applyLoopGuards could create the following SCEV:
max(min((8 * (TC / 8)) , 96), 8)
By aligning up 1 to 8, and aligning down 99 to 96, the new SCEV still preserves all of the original assumes.
Differential Revision: https://reviews.llvm.org/D144947
|
|
Now that SCEV has a dedicated vscale node type, we should also map
vscale intrinsics to it. To make sure this does not regress ranges
(which were KnownBits based previously), add support for vscale to
getRangeRef() as well.
Differential Revision: https://reviews.llvm.org/D146226
|
|
When debugging and using debug-pass-manager (e.g. in regression tests)
we prefer a consistent order in which analysis passes are executed.
But when for example doing
return MyClass(AM.getResult<LoopAnalysis>(F),
AM.getResult<DominatorTreeAnalysis>(F));
then the order in which LoopAnalysis and DominatorTreeAnalysis isn't
guaranteed, and might for example depend on which compiler that is
used when building LLVM.
I've not scanned the full source tree, but this fixes some occurances
of the above pattern found in lib/Analysis.
This problem was discussed briefly in review for D146206.
|
|
Modifying AddRecs when constructing other expressions can lead to
surprising changes. It also seems like it is not really beneficial i
most cases.
At the moment, there's a single regression, but we still might be able
to improve the flags at AddRec construction.
Might help with the issue discussed in D143409.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D144051
|
|
As suggested in https://reviews.llvm.org/D145510#4192162.
|
|
There are a number of issues with the current code for converting
ule -> ult (etc) predicates for comparisons controlling finite loops:
* It sets nowrap flags, which may only hold for that particular
comparison, not globally. (PR60944)
* It doesn't check that the RHS is invariant. (I'm not sure this
can cause practical issues independently of the previous point.)
* It runs before simplifications that may be more profitable. (PR54191)
This patch moves the handling for this into computeExitLimitFromICmp(),
because it is somewhat tightly coupled with assumptions in that code,
and addresses the aforementioned issues.
Fixes https://github.com/llvm/llvm-project/issues/60944.
Fixes https://github.com/llvm/llvm-project/issues/54191.
Differential Revision: https://reviews.llvm.org/D145510
|
|
|
|
This replaces several rewriting rules in ScalarEvolution::applyLoopGuards
that are applied to min/max expressions with the equivalent ones but
applied to its arguments.
So previously given we had a loop guard min(a, b) >= c,
the min expression would get rewritten as max(c, min(a, b)).
With such approach, we were unable to apply the rewrite if min operands
were zext for example (min(zext(a), zext(b))), however it's equivalent
to the expression zext(min(a, b)) for which we could apply the rewrite.
Now we'd rewrite the min operands also with these expressions:
a -> max(c, a) and
b -> max(c, b).
and this would allow us to apply the loop guard in this and similar cases:
min(zext(a), zext(b)) would get rewritten as min(zext(max(c, a)), zext(max(c, b)))
instead of just being skipped.
The list of added rules (omitting predicates signedness for simplicity):
1. Guard: min(a, b) >= c
Old rule: min(a, b) -> max(c, min(a, b))
New rules: a -> max(a, c) and b -> max(b, c)
2. Guard: min(a, b) > c
Old rule: min(a, b) -> max(c + 1, min(a, b))
New rules: a -> max(a, c + 1) and b -> max(b, c + 1)
3. Guard: max(a, b) <= c
Old rule: max(a, b) -> min(c, max(a, b))
New rules: a -> min(a, c) and b -> min(b, c)
4. Guard: max(a, b) < c
Old rule: max(a, b) -> min(c - 1, max(a, b))
New rules: a -> min(a, c - 1) and b -> min(b, c - 1)
The old rewrites still hold.
Differential Revision: https://reviews.llvm.org/D145230
|
|
|
|
At the moment, proveNoWrapViaConstantRanges is only used when creating
SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by
strengthening flags after creating the AddRec.
I'll also share a follow-up patch that removes the code to strengthen
flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while
creating those can lead to surprising changes.
Compile-time looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u
Reviewed By: mkazantsev, nikic
Differential Revision: https://reviews.llvm.org/D144050
|
|
I regularly try and fail to use this while debugging.
|
|
guard (NFC)
|
|
|
|
|
|
Just the scevUnconditionallyPropagatesPoisonFromOperands() check
is sufficient. Also rename the flag to be more in line with the
more general predicate.
|
|
This factors out two utilities used with RewriteMap in applyLoopGuards:
- AddRewrite, which puts a rewrite rule in the map and if needed registers
the rewrite in the list of rewritten expressions,
- GetMaybeRewritten, which checks whether an expression has already been
rewritten, and if so, returns the rewrite. Otherwise, returns the given
expression.
This may be needed when adding new rewrite rules as not to copy-paste this
code.
|
|
Relanding after fixing Polly related build error.
This reverts commit 7b26dcae9eaf8cdcba7fef032fd83d060dffd4b4.
|
|
This reverts commit 7912f5cc92f65ad0d3c705f3683a0b69dbedcc57.
|
|
This is part of an effort to remove ConstantExpr based
representations of `vscale` so that its LangRef definiton can
be relaxed to accommodate a less strict definition of constant.
Differential Revision: https://reviews.llvm.org/D144891
|
|
This allows for easier updating of common code in follow-on patches.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D144847
|
|
Represent scalable type sizes using C * vscale, where vscale is
the vscale constant expression. This exposes a bit more information
to SCEV, because the vscale multiplier is explicitly modeled in SCEV
(rather than part of the sizeof expression).
This is mainly intended as an alternative to D143642.
Differential Revision: https://reviews.llvm.org/D144624
|
|
Replace dyn_cast<> with isa<> as we don't actually need the variable
|
|
Passes in general shouldn't replace an alias with the aliasee (see
https://reviews.llvm.org/D66606). This can lead to situations where a
linkonce_odr symbol (which could be interposable if lowered to weak
linkage) can be replaced with a local aliasee which won't be
interposable. SVEC does this when the function is invoked by
FunctionPass Manager -> Loop Pass Manager -> Induction Variable Users in
the codegen pipeline. This was found in hwasan instrumented code where a
linonce_odr alias was replaced with its private aliasee.
This fixes the bug descriped at
https://github.com/llvm/llvm-project/issues/60668.
Differential Revision: https://reviews.llvm.org/D144035
|
|
This reverts commit 219ba2fb7b0ae89101f3c81a47fe4fc4aa80dea4.
|
|
applyLoopGuards doesn't always preserve information when there are multiple assumes.
This patch tries to deal with multiple assumes regarding a SCEV's divisibility and min/max values, and rewrite it into a SCEV that still preserves all of the information.
For example, let the trip count of the loop be TC. Consider the 3 following assumes:
1. __builtin_assume(TC % 8 == 0);
2. __builtin_assume(TC > 0);
3. __builtin_assume(TC < 100);
Before this patch, depending on the assume processing order applyLoopGuards could create the following SCEV:
max(min((8 * (TC / 8)) , 99), 1)
Looking at this SCEV, it doesn't preserve the divisibility by 8 information.
After this patch, depending on the assume processing order applyLoopGuards could create the following SCEV:
max(min((8 * (TC / 8)) , 96), 8)
By aligning up 1 to 8, and aligning down 99 to 96, the new SCEV still preserves all of the original assumes.
Differential Revision: https://reviews.llvm.org/D141850
|