Age | Commit message (Collapse) | Author | Files | Lines |
|
(#162617)
Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.
Split off from https://github.com/llvm/llvm-project/pull/160012.
PR: https://github.com/llvm/llvm-project/pull/162617
|
|
(#160941)
When computing the backedge taken count, we know that the expression
must be valid just before we enter the loop. Using the terminator of the
loop predecessor as context instruction for getConstantMultiple,
getMinTrailingZeros allows using information from things like alignment
assumptions.
When a context instruction is used, the result is not cached, as it is
only valid at the specific context instruction.
Compile-time looks neutral:
http://llvm-compile-time-tracker.com/compare.php?from=9be276ec75c087595ebb62fe11b35c1a90371a49&to=745980f5e1c8094ea1293cd145d0ef1390f03029&stat=instructions:u
No impact on llvm-opt-benchmark
(https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2867), but leads to
additonal unrolling in ~90 files across a C/C++ based corpus including
LLVM on AArch64 using libc++ (which emits alignment assumptions for
things like std::vector::begin).
PR: https://github.com/llvm/llvm-project/pull/160941
|
|
(#157836)
|
|
compares. (#141798)
My usecase is simplifying the control flow generated by LoopVectorize
when vectorising loops whose tripcount is a function of the runtime
vector length. This can be problematic because:
* CSE is a pre-LoopVectorize transform and so it's common for an IR
function to include several calls to llvm.vscale(). (NOTE: Code
generation will typically remove the duplicates)
* Pre-LoopVectorize instcombines will rewrite some multiplies as shifts.
This leads to a mismatch between VL based maths of the scalar loop and
that created for the vector loop, which prevents some obvious
simplifications.
SCEV does not suffer these issues because it effectively does CSE during
construction and shifts are represented as multiplies.
|
|
This reverts commit fd58f235f8c5bd40d98acfd8e7fb11d41de301c7.
The recommit contains an extra check to make sure that D is a multiple of
C2, if C2 > C1. This fixes the issue causing the revert fd58f235f8c. Tests
have been added in 6a726e9a4d3d0.
Original message:
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1).
Depends on https://github.com/llvm/llvm-project/pull/157555.
Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN
PR: https://github.com/llvm/llvm-project/pull/157656
|
|
When adding a new predicate to a union, we currently do a bidirectional
implication for all the contained predicates. This means that the number
of implication checks is quadratic in the number of total predicates (if
they don't end up being eliminated).
Fix this by not checking for implication if the number of predicates
grows too large. The expectation is that if there is a large number of
predicates, we should be discarding them later anyway, as expanding them
would be too expensive.
Fixes https://github.com/llvm/llvm-project/issues/156114.
|
|
Reverts llvm/llvm-project#157656
There are multiple reports that this is causing miscompiles in the MSan
test suite after bootstrapping and that this is causing miscompiles in
rustc. Let's revert for now, and work to capture a reproducer next week.
|
|
If we have a phi where one of it's source blocks is an unreachable
block, we don't want to traverse back into the unreachable region. Doing
so allows e.g. finding a trivial self loop when walking back the
predecessor chain.
|
|
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1).
Depends on https://github.com/llvm/llvm-project/pull/157555.
Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN
PR: https://github.com/llvm/llvm-project/pull/157656
|
|
Treat negative constants C as -1 * abs(C1) when folding multiplies and
udivs.
Alive2 Proof: https://alive2.llvm.org/ce/z/bdj9W2
PR: https://github.com/llvm/llvm-project/pull/157555
|
|
(#157159)
Generalize fold added in 74ec38fad0a1289
(https://github.com/llvm/llvm-project/pull/156730) to support multiplying and
dividing by different constants, given they are both powers-of-2 and C1 is a
multiple of C2, checked via logBase2.
https://alive2.llvm.org/ce/z/eqJ2xj
PR: https://github.com/llvm/llvm-project/pull/157159
|
|
(#156730)
Alive2 Proof: https://alive2.llvm.org/ce/z/JoHJE9
PR: https://github.com/llvm/llvm-project/pull/156730
|
|
deref. (#155672)"
This reverts commit f0df1e3dd4ec064821f673ced7d83e5a2cf6afa1.
Recommit with extra check for SCEVCouldNotCompute. Test has been added in
b16930204b.
Original message:
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.
The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.
Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).
PR: https://github.com/llvm/llvm-project/pull/155672
|
|
deref. (#155672)"
This reverts commit 08001cf340185877665ee381513bf22a0fca3533.
This triggers an assertion in some build configs, e.g.
https://lab.llvm.org/buildbot/#/builders/24/builds/12211
|
|
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.
The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.
Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).
PR: https://github.com/llvm/llvm-project/pull/155672
|
|
Trip count expressions sometimes consist of adding 3 operands, i.e.
(Const + A + B). There may be guard info for A + B, and if so, apply it.
We can probably more generally apply this, but need to be careful w.r.t
compile-time.
Alive2 Proof for changes in miniters.ll:
https://alive2.llvm.org/ce/z/HFfXOx
Fixes https://github.com/llvm/llvm-project/issues/155941.
PR: https://github.com/llvm/llvm-project/pull/156013
|
|
Add support for identifying multiplication overflow in SCEV.
This is needed in LoopAccessAnalysis and that limitation was worked
around by 484417a.
This allows early-exit vectorization to work as expected in
vect.stats.ll test without needing the workaround.
|
|
Try to push constant multiply operand into a ZExt containing an add, if
possible. In general we are trying to push down ops through ZExt if
possible. This is similar to
https://github.com/llvm/llvm-project/pull/151227 which did the same for
additions.
For now this is restricted to adds with a constant operand, which is
similar to some of the logic above.
This enables some additional simplifications.
Alive2 Proof: https://alive2.llvm.org/ce/z/97pbSL
PR: https://github.com/llvm/llvm-project/pull/155300
|
|
This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.
The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.
When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.
PR: https://github.com/llvm/llvm-project/pull/153643
|
|
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing
the redirection found in SmallSet.h. With that, we no longer need to
include SmallSet.h in many files.
|
|
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
|
|
Similarly to https://github.com/llvm/llvm-project/pull/131538, we can
also try and check if a predicate is known to wrap given the backedge
taken count.
For now, this just checks directly when we try to create predicated
AddRecs. This both helps to avoid spending compile-time on optimizations
where we know the predicate is false, and can also help to allow
additional vectorization (e.g. by deciding to scalarize memory accesses
when otherwise we would try to create a predicated AddRec with a
predicate that's always false).
The initial version is quite restricted, but can be extended in
follow-ups to cover more cases.
PR: https://github.com/llvm/llvm-project/pull/151134
|
|
forward progress (#150916)
For the attached test:
Before the loop-idiom pass, we have a store into the inner loop which is
considered simple and one that does not have any side effects on the
loop. Post loop-idiom pass, we get a memset into the outer loop that is
considered to introduce side effects on the loop. This changes the
backedge taken count before and after the pass and hence, the crash with
verify-scev.
We try to consider non-volatile memory intrinsics as not having
side-effect for forward progress to fix the issue.
Fixes #149377
|
|
Follow-up to
https://github.com/llvm/llvm-project/pull/151227#pullrequestreview-3074670031
to check the inner expression is an Add before calling getTruncateExpr.
Adds a new matcher that just matches and captures SCEVAddExpr, to
support matching a SCEVAddExpr with arbitrary number of operands.
|
|
Try to push the constant operand into a ZExt:
A + zext (-A + B) -> zext (B), if trunc (A) + -A + B does not
unsigned-wrap.
The actual code supports ZExts with arbitrary number of arguments, hence
the getAddExpr in the return.
This helps SCEV reasoning in some cases, commonly when adding an offset
to a zero-extended SCEV that subtracts the same offset.
Note that this is restricted to cases where we can fold away an operand
of the inner Add. This is needed to avoid bad interactions with patterns
when forming ZExts, which try to push to ZExt to add operands.
https://alive2.llvm.org/ce/z/q7d303
PR: https://github.com/llvm/llvm-project/pull/151227
|
|
Relax the NUW requirements for isKnownPredicateViaNoOverflow, if the
second operand (Y) is an ADD. The code only simplifies the condition if
C1 < C2, so if the second ADD is NUW, it doesn't matter whether the
first operand also has the NUW flag, as it cannot wrap if C1 < C2.
https://alive2.llvm.org/ce/z/b3dM7N
PR: https://github.com/llvm/llvm-project/pull/149795
|
|
Current the comparator is inconsistent when we have two external globals
and one internal globals due to
```
if (IsGVNameSemantic(LGV) && IsGVNameSemantic(RGV))
return LGV->getName().compare(RGV->getName());
```
The internal global compares equal to (not strictly less than) the
external globals, but the external globals are not equal.
Fixes #147862.
|
|
|
|
|
|
|
|
Update SimplifyICmpOperands to only try subtracting 1 from RHS first, if
RHS is an op we can fold the subtract directly into. Otherwise try
adding to LHS first, as we can preserve NUW flags.
This improves results in a few cases, including the modified test case
from berkeley-abc and new code to be added in
https://github.com/llvm/llvm-project/pull/128061.
Note that there are more cases where the results can be improved by
better ordering here which I'll try to investigate as follow-up.
PR: https://github.com/llvm/llvm-project/pull/144404
|
|
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
|
|
Also demonstrate its use in ScalarEvolution.
|
|
Add dedicated m_scev_AffineAddRec matcher with
complementing m_Loop() and m_SpecificLoop matchers.
PR: https://github.com/llvm/llvm-project/pull/141141
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
try_emplace can default-construct values, so we do not need to do so
on our own. Plus, try_emplace(Key) is much simpler/shorter than
insert({Key, LongValueType()}).
|
|
This patch corrects the behavior of the Dependence Analysis for memory
accesses that do not start at the same offset or do not have similar
strides. When offsets or strides cannot be disambiguated at compile
time, DA collects a set of runtime assumptions under which the
dependence test becomes valid. The default remains the same as before
the patch: DA rejects the dependence test as undecidable instead of
collecting runtime assumptions.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
Co-authored-by: Ryotaro Kasuga <kasuga.ryotaro@fujitsu.com>
|
|
Introduce m_scev_AffineAddRec to match affine AddRecs, a class_match for
SCEVConstant, and demonstrate their utility in LSR and SCEV. While at
it, rename m_Specific to m_scev_Specific for clarity.
|
|
Prefer DenseMap::lookup over DenseMap::find.
|
|
|
|
Add `llvm::uninitialized_copy` that accepts a range instead of start/end
iterator for the source of the copy.
|
|
SCEVWrapPredicate::implies (#137935)
|
|
|
|
- Do not call pass initialization from pass constructors.
- Instead, pass initialization should happen in the `initializeAnalysis`
function.
- https://github.com/llvm/llvm-project/issues/111767
|
|
|
|
SCEV converts "-2 *nsw (i32 V)" into "2148473647 *nsw (i32 V)". But we
cannot preserve the nsw flag when the constant multiplier is negative.
This patch changes lshr to ashr so that we can preserve both nsw and nuw
flags.
Alive2 proof: https://alive2.llvm.org/ce/z/LZVSEa
Closes https://github.com/llvm/llvm-project/issues/135531.
|
|
|
|
Address review comment
https://github.com/llvm/llvm-project/pull/133711#discussion_r2024519641
|
|
This patch relands https://github.com/llvm/llvm-project/pull/124270.
Closes https://github.com/llvm/llvm-project/issues/126409.
The root cause is that we incorrectly preserve the samesign flag after
truncating operands of an icmp:
https://alive2.llvm.org/ce/z/4NE9gS
---------
Co-authored-by: Ramkumar Ramachandra <ramkumar.ramachandra@codasip.com>
|
|
This was added in https://reviews.llvm.org/D26389 to help with extremely
deep SCEV expressions.
However, this is wrong since we may cache sub-SCEVs to be equivalent
that CompareValueComplexity returned 0 due to hitting the max comparison
depth.
This also improves compile time in some compiles:
https://llvm-compile-time-tracker.com/compare.php?from=34fa037c4fd7f38faada5beedc63ad234e904247&to=e241ecf999f4dd42d4b951d4a5d4f8eabeafcff0&stat=instructions:u
Similar to #100721.
Fixes #130688.
|