| Age | Commit message (Collapse) | Author | Files | Lines |
|
In https://reviews.llvm.org/D129599, non-trivial switching was disabled
for cold loops in the interest of code size. This added a dependency on
BlockFrequencyInfo with PGO, but in loop passes this is only available
on a lossy basis: see https://reviews.llvm.org/D86156
LICM moved away from BFI so as of today SimpleLoopUnswitch is the only
remaining loop pass that uses BFI, for the sole reason to prevent code
size increases in PGO builds. It doesn't use BFI if there's no profile
summary available.
After some investigation on llvm-test-suite it turns out that the lossy
BFI causes very significant deviations in block frequency, since when
new loops are deleted/created during the loop pass manager it can return
frequencies for different loops altogether.
This results in unswitchable loops being mistakenly skipped because they
are thought to be cold.
This patch removes the use of BFI from SimpleLoopUnswitch and thus the
last remaining use of BFI in a loop pass.
To recover the original intent of not unswitching cold code,
PGOForceFunctionAttrs can be used to annotate functions which can be
optimized for code size, since SimpleLoopUnswitch will respect OptSize:
https://reviews.llvm.org/D94559
This isn't 100% the same behaviour since the previous behaviour checked
for coldness at the loop level and this is now at the function level. We
could expand PGOForceFunctionAttrs to be more granular at the loop
level, https://github.com/llvm/llvm-project/issues/159595 tracks this
idea.
|
|
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
|
|
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/Loop*, Lower*`.
|
|
When analyzing pass debug output it is helpful to have the function name
along with the loop name.
|
|
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.
This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.
This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.
This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
|
|
In the d6e7c162e1df3736d8e2b3610a831b7cfa5be99b was introduced util to
to extract widenable conditions from branch. That util was applied in
the llvm::isWidenableBranch to check if branch is widenable. So we
consider branch is widenable if it has widenable condition anywhere in
the condition tree. But that will be true when we finish GuardWidening
reworking from branch widening to widenable conditions widening.
For now we still need to check that widenable branch is in the form of:
`br(widenable_condition & (...))`,
because that form is assumed by LoopPredication and GuardWidening
algorithms.
Fixes: https://github.com/llvm/llvm-project/issues/66418
Co-authored-by: Aleksander Popov <apopov@azul.com>
|
|
Treat the case where all branch weights are zero as if there was no
profile.
Fixes #66382
|
|
|
|
Currently after widening br(WC && (c1 && c2)) we insert assume of
(c1 && c2) which is joined to WC by And operation.
But we are going to support more flexible form of widenable branches
where WC could be placed arbitrary in the expression tree, e.g:
br(c1 && (c2 && WC)).
In that case we won't have (c1 && c2) in the IR. So we need to add
explicit (c1 && c2) and then create an assumption of it.
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D157502
|
|
This reverts commit aa603c41caab63e246f4a4258c8b96e6ea06fdc9.
Revert due to LLVM Buildbot failure
|
|
Differential Revision: https://reviews.llvm.org/D157491
|
|
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D151082
|
|
Loop predication's predicateLoopExit pass does two incorrect things:
It sinks the widenable call into the loop, thereby converting an invariant condition to a variant one
It widens the widenable call at a branch thereby converting the branch into a loop-varying one.
The latter is problematic when the branch may have been loop-invariant
and prior optimizations (such as indvars) may have relied on this
fact, and updated the deopt state accordingly.
Now, when we widen this with a loop-varying condition, the deopt state
is no longer correct.
https://github.com/llvm/llvm-project/issues/61963 fixed.
Differential Revision: https://reviews.llvm.org/D147662
|
|
|
|
|
|
LoopPredication introduces the use of possibly posion value in branch (guard)
instruction, so to avoid introducing undefined behavior it should be frozen.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D146685
|
|
|
|
LoopPredication may introduce undefined behavior.
|
|
Loop predication can insert assumes to preserve knowledge about some facts that
may otherwise be lost, because loop predication is a lossy transform. When a guard
is represented as branch by widenable condition, it should insert it in the guarded
block. However, if the guarded block has other predecessors than the guard block,
then the condition might not dominate it. Currently we generate invalid code here.
One possible fix here is to split critical edge and insert the assume there, but in
this case we should modify CFG, which Loop Predication is not currently doing, and we
want to keep it that way.
The fix is to handle this case by inserting a Phi which takes `Cond` as input from the
guard block and `true` from any other blocks. This is valid in terms of IR and does
not introduce any new knowledge if we came from another block.
Differential Revision: https://reviews.llvm.org/D144859
Reviewed By: nikic, skatkov
|
|
Details: https://github.com/llvm/llvm-project/issues/61022
|
|
|
|
|
|
|
|
LoopPredication might be able to turn such checks (which are
not necessarily are done on IV) into loop invariant checks.
|
|
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
|
|
As LoopPredication performs non-equivalent transforms removing some
checks from loops, other passes may not be able to perform transforms
they'd be able to do if the checks were left in loops.
This patch makes LoopPredication insert assumes of the replaced
conditions either after a guard call or in the true block of
widenable condition branch.
Differential Revision: https://reviews.llvm.org/D135354
|
|
transform
|
|
Summary:
The code for generating a name for loops for various reporting scenarios
created a name by serializing the loop into a string. This may result in
a very large name for a loop containing many blocks. Use the getName()
function on the loop instead.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D133587
|
|
With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.
Reviewed By: aeubanks, asbirlea
Differential Revision: https://reviews.llvm.org/D129599
|
|
The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct.
This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid.
In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled.
On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE.
The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB.
Differential Revision: https://reviews.llvm.org/D112734
|
|
Upon further investigation and discussion,
this is actually the opposite direction from what we should be taking,
and this direction wouldn't solve the motivational problem anyway.
Additionally, some more (polly) tests have escaped being updated.
So, let's just take a step back here.
This reverts commit f3190dedeef9da2109ea57e4cb372f295ff53b88.
This reverts commit 749581d21f2b3f53e4fca4eb8728c942d646893b.
This reverts commit f3df87d57e096143670e0fd396e81d43393a2dd2.
This reverts commit ab1dbcecd6f0969976fafd62af34730436ad5944.
|
|
https://alive2.llvm.org/ce/z/YzPhSb
Refs. https://reviews.llvm.org/D109368#3089809
|
|
Refs. https://reviews.llvm.org/D109368#3089809
|
|
|
|
Using BPI within loop predication is non-trivial because BPI is only
preserved lossily in loop pass manager (one fix exposed by lossy
preservation is up for review at D111448). However, since loop
predication is only used in downstream pipelines, it is hard to keep BPI
from breaking for incomplete state with upstream changes in BPI.
Also, correctly preserving BPI for all loop passes is a non-trivial
undertaking (D110438 does this lossily), while the benefit of using it
in loop predication isn't clear.
In this patch, we rely on profile metadata to get almost similar benefit as
BPI, without actually using the complete heuristics provided by BPI.
This avoids the compile time explosion we tried to fix with D110438 and
also avoids fragile bugs because BPI can be lossy in loop passes
(D111448).
Reviewed-By: asbirlea, apilipenko
Differential Revision: https://reviews.llvm.org/D111668
|
|
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.
This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.
This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM. This caused massive compile time
in our downstream LPM invocation which contained loop predication.
See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.
Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
|
|
Function profile counts added to test cases. Regenerated test lines for
loop predication test.
|
|
Need a more general CHECK line for testcase in 5df9112 for correctly
handling lld-x86_64-win buildbot.
|
|
This relands commit 16a62d4f.
Relanded after fixing CHECK-LINES for opt pipeline output to be more
general (based on failures seen in buildbot).
|
|
This reverts commit 16a62d4f3dca189b0e0565c7ebcd83ddfcc67629.
Needs some update to check lines to fix bb failure.
|
|
Precommit testcase for D110438. Since we do not preserve BPI in loop
pass manager, we are forced to compute BPI everytime Loop predication is
invoked.
The patch referenced changes that behaviour by preserving lossy BPI for
loop passes.
|
|
This patch updates tests added in 5f2b7879f16ad5023f0684febeb0a20f7d53e4a8.
|
|
To make the IR easier to analyze, this pass makes some minor transformations.
After that, even if it doesn't decide to optimize anything, it can't report that
it changed nothing and preserved all the analyses.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D109855
|
|
|
|
The attached testcase crashes without the patch (Not the same accesses
in the same order).
When we move instructions before another instruction, we also need to
update the memory accesses corresponding to it.
Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D109197
|
|
Since LICM has now unconditionally moved to MemorySSA based form, all
passes that run in same LPM as LICM need to preserve MemorySSA (i.e. our
downstream pipeline).
Added loop-mssa to all tests and perform -verify-memoryssa within
LoopPredication itself.
Differential Revision: https://reviews.llvm.org/D108724
|
|
expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.
This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.
Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.
Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
|
|
Follow-up to D82607
Revert an accidental change (empty.ll) of D82683
|
|
Blindly following unique-successors chain appeared to be a bad idea.
In a degenerate case when block jumps to itself that goes into endless loop.
Discovered this problem when playing with additional changes,
managed to reproduce it on existing LoopPredication code.
Fix by checking a "visited" set while iterating through unique successors.
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72908
|
|
We may end up with a case where we have a widenable branch above the loop, but not all widenable branches within the loop have been removed. Since a widenable branch inhibit SCEVs ability to reason about exit counts (by design), we have a tradeoff between effectiveness of this optimization and allowing future widening of the branches within the loop. LoopPred is thought to be one of the most important optimizations for range check elimination, so let's pay the cost.
|