Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a follow-up PR for post-commit comments in #121104 .
Details:
- Rename `mergeTwoCounter` to `mergeTwoCounters` (add trailing `s`).
- Avoid duplicated hash lookup.
- Use `///` instead of `//`.
- Fix typo.
|
|
LoopPeel currently considers PHI nodes that become loop invariants
through peeling. However, in some cases, peeling transforms PHI nodes
into induction variables (IVs), potentially enabling further
optimizations such as loop vectorization. For example:
```c
// TSVC s292
int im = N-1;
for (int i=0; i<N; i++) {
a[i] = b[i] + b[im];
im = i;
}
```
In this case, peeling one iteration converts `im` into an IV, allowing
it to be handled by the loop vectorizer.
This patch adds a new feature to peel loops when to convert PHIs into
IVs. At the moment this feature is disabled by default.
Enabling it allows to vectorize the above example. I have measured on
neoverse-v2 and observed a speedup of more than 60% (options: `-O3
-ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`).
This PR is taken over from #94900
Related #81851
|
|
This isn't terribly useful at the moment because of the step=1
restriction but it should be functionally sound. This is mostly just
making sure the codepaths don't diverge as we make other changes.
|
|
Apply loop guards to BTC before checking if the last iteration should be
peeled off. This also adds an assert to make sure applying the guards
does not pessimize the results. I checked on a large test set and it did
not trigger there, but it adds an additional guard to catch potential
cases where loop-guards pessimize results.
Peels ~15% more loops.
PR: https://github.com/llvm/llvm-project/pull/142605
|
|
values (#142993)
Similar to
https://github.com/llvm/llvm-project/commit/7e14161f49b32387988cf9d937bbfaa27d0fbdd5,
the exiting value may be a non-local instruction or an argument.
Closes https://github.com/llvm/llvm-project/issues/142895.
|
|
(#140792)"
This reverts commit 580454526b936f7a576ddbc9bb932cf9be376ec4.
The recommitted version contains an extra check to not peel if the
latch exit is controlled by a pointer induction.
Original message:
Remove the restriction that the loop must be known to execute at least 2
iterations when peeling the last iteration. If we cannot prove at least
2 iterations are executed, a check and branch to skip the peeled loop is
inserted.
PR: https://github.com/llvm/llvm-project/pull/140792
|
|
(#140792)"
This reverts commit 24b97756decb7bf0e26dcf0e30a7a9aaf27f417c.
Also reverts ac9a466e39bf97ffeab127982aa7c405cb257551.
Building CMake triggers a crash with the patch, revert while I
investigate.
|
|
Make sure the new phis are inserted before any non-phi instructions.
This fixes a crash when dbg_value instructions are present in the
original exit block.
|
|
Remove the restriction that the loop must be known to execute at least 2
iterations when peeling the last iteration. If we cannot prove at least
2 iterations are executed, a check and branch to skip the peeled loop is
inserted.
PR: https://github.com/llvm/llvm-project/pull/140792
|
|
Follow-up to post-commit comment for
(https://github.com/llvm/llvm-project/pull/139551.
This should effectively be NFC, given the other existing restrictions.
|
|
Follow-up to post-commit comment for
(https://github.com/llvm/llvm-project/pull/139551.
This should effectively be NFC, given the other existing restrictions.
|
|
Update the check in canPeelLastIteration to make sure the exiting
condition has a single use. When peeling the last iteration, we adjust
the condition in the loop body to be true one iteration early, which
would be incorrect for other users.
Fixes https://github.com/llvm/llvm-project/issues/140444.
|
|
Account for constant values when updating exit values after peeling an
iteration from the end. This can happen if the inner loop gets unrolled
and simplified.
Fixes https://github.com/llvm/llvm-project/issues/140442.
|
|
(#139551)"
This reverts the revert commit bf92b127d2637948f53d11a187e865aa10e2e74c.
This adds missing initialization of PeelLast in gatherPeelingPreferences.
Original message:
Generalize countToEliminateCompares to also consider peeling off the
last iteration if it eliminates a compare.
At the moment, codegen for peeling off the last iteration is quite
restrictive and callers have to make sure that the exit condition can be
adjusted when peeling and that the loop executes at least 2 iterations.
Both will be relaxed in follow-ups.
PR: https://github.com/llvm/llvm-project/pull/139551
|
|
(#139551)"
This reverts commit bb10c3ba7f77d40a7fbfd4ac815015d3a4ae476a.
Also reverts 4f663cca15f2b53c2bc6a84d1b1f5bd81679356d:
Revert "[LoopPeel] Make sure PeelLast is always initialized."
Revert for now to bring msan bots back to green
https://lab.llvm.org/buildbot/#/builders/164/builds/9992
https://lab.llvm.org/buildbot/#/builders/94/builds/7158
|
|
Make sure PeelLast is initialized on all paths.
Should fix MSan bootstrap failures
https://lab.llvm.org/buildbot/#/builders/164/builds/9992
https://lab.llvm.org/buildbot/#/builders/94/builds/7158
Fixup after https://github.com/llvm/llvm-project/pull/139551.
|
|
Generalize countToEliminateCompares to also consider peeling off the
last iteration if it eliminates a compare.
At the moment, codegen for peeling off the last iteration is quite
restrictive and callers have to make sure that the exit condition can be
adjusted when peeling and that the loop executes at least 2 iterations.
Both will be relaxed in follow-ups.
PR: https://github.com/llvm/llvm-project/pull/139551
|
|
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
|
|
getLoopEstimatedTripCount returns the trip count based on profiling
data, and its documentation says that it could return 0 when the trip
count is zero, but this is not the case: a valid trip count can never be
zero, and it returns 0 when the unsigned ExitCount is incremented by 1
and wraps. Some callers are careful about checking for this zero value
in an std::optional, but it makes for an API with footguns, as a
std::optional return value indicates that a non-nullopt value would be a
valid trip count. Fix this by explicitly returning std::nullopt when the
return value would wrap, and strip additional checks in callers. This
also fixes a minor bug in LoopVectorize.
|
|
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
|
|
|
|
In the test case, the BECount of the second loop uses %load,
but we only have an LCSSA phi node for %add, so that is what
gets invalidated. Use the forgetLcssaPhiWithNewPredecessor()
API instead, which will invalidate the roots of the expression
instead.
Fixes https://github.com/llvm/llvm-project/issues/109333.
|
|
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
|
|
(#95281)
…f weights" #95136
Reverts #95060, and relands #86609, with the unintended code generation
changes addressed.
This patch implements the changes to LLVM IR discussed in
https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032
In this patch, we add an optional field to MD_prof meatdata nodes for
branch weights, which can be used to distinguish weights added from
llvm.expect* intrinsics from those added via other methods, e.g. from
profiles or inserted by the compiler.
One of the major motivations, is for use with MisExpect diagnostics,
which need to know if branch_weight metadata originates from an
llvm.expect intrinsic. Without that information, we end up checking
branch weights multiple times in the case if ThinLTO + SampleProfiling,
leading to some inaccuracy in how we report MisExpect related
diagnostics to users.
Since we change the format of MD_prof metadata in a fundamental way, we
need to update code handling branch weights in a number of places.
We also update the lang ref for branch weights to reflect the change.
|
|
weights" (#95060)
Reverts llvm/llvm-project#86609
This change causes compile-time regressions for stage2 builds
(https://llvm-compile-time-tracker.com/compare.php?from=3254f31a66263ea9647c9547f1531c3123444fcd&to=c5978f1eb5eeca8610b9dfce1fcbf1f473911cd8&stat=instructions:u).
It also introduced unintended changes to `.text` which should be
addressed before relanding.
|
|
This patch implements the changes to LLVM IR discussed in
https://discourse.llvm.org/t/rfc-update-branch-weights-metadata-to-allow-tracking-branch-weight-origins/75032
In this patch, we add an optional field to MD_prof metadata nodes for
branch weights, which can be used to distinguish weights added from
`llvm.expect*` intrinsics from those added via other methods, e.g.
from profiles or inserted by the compiler.
One of the major motivations, is for use with MisExpect diagnostics,
which need to know if branch_weight metadata originates from an
llvm.expect intrinsic. Without that information, we end up checking
branch weights multiple times in the case if ThinLTO + SampleProfiling,
leading to some inaccuracy in how we report MisExpect related
diagnostics to users.
Since we change the format of MD_prof metadata in a fundamental way, we
need to update code handling branch weights in a number of places.
We also update the lang ref for branch weights to reflect the change.
|
|
This patch adds processing of min/max intrinsics in LoopPeel in the
similar way as it was done for conditional statements: for
min/max(IterVal, BoundVal) we peel iterations where IterVal < BoundVal
for monotonically increasing IterVal; for monotonically decreasing
IterVal we peel iterations where IterVal > BoundVal (strict comparision
predicates are used to minimize number of peeled iterations).
|
|
For example, this allows us to peel this loop with a `and`:
```
for (int i = 0; i < N; ++i) {
if (i % 2 == 0 && i < 3) // can peel based on || as well
f1();
f2();
```
into:
```
for (int i = 0; i < 3; ++i) { // peel three iterations
if (i % 2 == 0)
f1();
f2();
}
for (int i = 3; i < N; ++i)
f2();
```
|
|
Add `setBranchWeights` convenience function to ProfDataUtils.h and use
it where appropriate.
|
|
In https://reviews.llvm.org/D64235 a new algorithm has been introduced
for updating the branch weights of latch blocks and their copies.
It increases the probability of going to the exit block for each next
peel iteration, calculating weights by (F - I * E, E), where:
- F is a weight of the edge from latch to header.
- E is a weight of the edge from latch to exit.
- I is a number of peeling iteration.
E.g: Let's say the latch branch weights are (100,300) and the estimated
trip count is 4. If we peel off all 4 iterations the weights of the
copied branches will be:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,1)
https://godbolt.org/z/93KnoEsT6
So we make the original loop almost unreachable from the 3rd peeled copy
according to the profile data. But that's only true if the profiling
data is accurate.
Underestimated trip count can lead to a performance issues with the
register allocator, which may decide to spill intervals inside the loop
assuming it's unreachable.
Since we don't know how accurate the profiling data is, it seems better
to set neutral 1/1 weights on the last peeled latch branch. After this
change, the weights in the example above will look like this:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,100)
Co-authored-by: Aleksandr Popov <apopov@azul.com>
|
|
Those fixes were taken from https://reviews.llvm.org/D137338
|
|
Block dispositions of values defined inside the loop may change
during peeling, so clear them. We already do this for other kinds
of unrolling.
Differential Revision: https://reviews.llvm.org/D153762
|
|
This also allows us to peel loops with a `select`:
```
for (int i = 0; i <= N; ++i);
f3(i == 0 ? a : b); // select instruction
```
into:
```
f3(a); // peel one iteration
for (int i = 1; i <= N; ++i)
f3(b);
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151052
|
|
The value map of last peeled iteration is computed within peelLoop API.
This patch exposes it for callers of peelLoop.
While this is not currently used by upstream passes, we have a usecase
downstream which benefits from this API update. Future users of peelLoop
can also use the ValueMap if needed.
Similar value maps are exposed by other loop utilities such as loop
cloning.
Differential Revision: https://reviews.llvm.org/D138228
|
|
These files no longer use llvm::Optional.
|
|
Function::splice()
This is part of a series of patches that aim at making Function::getBasicBlockList() private.
Differential Revision: https://reviews.llvm.org/D139984
|
|
|
|
Summary:
Expand the capabilities of the code for computing how many peels are
needed to make phis determined. A cast gets the peel count for the
value being casted while a binary op gets the maximum of the operands.
Respond to review comments: remove redundant asserts.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By:mkazantsev (Max Kazantsev),syzaara (Zaara Syeda)
Differential Revision: https://reviews.llvm.org/D138719
|
|
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
|
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
|
Summary:
Refactor loop peeling code by moving code for calculating phi invariance
into a separate class that does the calculation. Redescribe and rework
the algorithm in preparation for adding increased functionality. Add
test case that does not exhibit peeling that will be subsequently supported.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: mkazantsev (Max Kazantsev)
Differential Revision: https://reviews.llvm.org/D138232
|
|
eraseFromParent().
Differential Revision: https://reviews.llvm.org/D138617
|
|
Add a flag to allow disabling the changes in
https://reviews.llvm.org/D134803.
Differential Revision: https://reviews.llvm.org/D136643
|
|
Loop peeling currently requires that a) the latch is exiting
b) a branch and c) other exits are unreachable/deopt. This patch
removes all of these limitations, and adds the necessary branch
weight updating support. It essentially works the same way as
before with latch -> exiting terminator and
loop trip count -> per exit trip count.
It's worth noting that there are still other limitations in
profitability heuristics: This patch enables peeling of loops to
make conditions invariant (which is pretty much always highly
profitable if possible), while peeling to make loads dereferenceable
still checks that non-latch exits are unreachable and PGO-based
peeling has even more conditions. Those checks could be relaxed
later if we consider those cases profitable.
The motivation for this change is that loops using iterator adaptors
in Rust often optimize very badly, and end up with a loop phi of the
form phi(true, false) in the final result. Peeling eliminates that
phi and conditions based on it, which enables a lot of follow-on
simplification.
Differential Revision: https://reviews.llvm.org/D134803
|
|
|
|
This does not try to pass it through from the end users.
|
|
|
|
|
|
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
|
|
This reverts commit 300c9a78819b4608b96bb26f9320bea6b8a0c4d0.
We will reland once these issues are ironed out.
|