Age | Commit message (Collapse) | Author | Files | Lines |
|
When estimating the cost to avoid exponential unswitches of non-trivial
invariant conditions, also consider the parent loop basic blocks size,
ensuring this does not grow unexpectedly.
Fixes: https://github.com/llvm/llvm-project/issues/138509.
|
|
conditions"
This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to
multiple performance regressions observed across downstream Numba
benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772).
While avoiding non-trivial unswitches on newly-cloned loops helps
mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509,
it may as well make the IR less friendly to vectorization / loop-
canonicalization (in the test reported, previously no select with
loop-carried dependence existed in the new specialized loops),
leading the abovementioned approach to be reconsidered.
|
|
Track newly-cloned loops coming from unswitching non-trivial invariant
conditions, so as to prevent conditions in such cloned blocks from
being unswitched again.
Fixes: https://github.com/llvm/llvm-project/issues/138509.
|
|
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.
Some notes for reviewers:
- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
|
|
RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
|
|
|
|
|
|
To finalise the "RemoveDIs" work removing debug intrinsics, we're
updating call sites that insert instructions to use iterators instead.
This set of changes are those where it's not immediately obvious that
just calling getIterator to fetch an iterator is correct, and one or two
places where more than one line needs to change.
Overall the same rule holds though: iterators generated for the start of
a block such as getFirstNonPHIIt need to be passed into insert/move
methods without being unwrapped/rewrapped, everything else can use
getIterator.
|
|
(#124287)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.
Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
|
|
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
|
|
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
|
|
Fixes https://github.com/llvm/llvm-project/issues/117537.
|
|
Add `Intrinsic::getDeclarationIfExists` to lookup an existing
declaration of an intrinsic in a `Module`.
|
|
Remove redundant condition from '!A || (A && B)' to '!A || B'
Fixes: #99799
|
|
terminators (#98789)
Fix #98787 .
|
|
Fix #97559 .
For the change at line 1253, I propagate the debug location of the
terminator (i.e., the insertion point) to the new phi. because `MergeBB`
is generated by splitting `ExitBB` several lines above, it only has the
terminator, which could provide a reasonable debug location.
For the change at line 2348, I switch the order of moving and cloning
`TI`. Because `NewTI` cloned from `TI` is inserted into the original
place where `TI` is, `NewTI` should preserve the origianl debug
location. At the same time, doing this allows us to propagate the debug
location to the new branch instruction replacing `NewTI` (the change at
line 2446).
|
|
This used to be necessary to fetch the DataLayout, but isn't anymore.
|
|
We need to remap any DbgRecord, not just DbgVariableRecords.
This is the followup to #91447.
Co-authored-by: PietroGhg <pietro.ghiglio@codeplay.com>
|
|
IndVars may be able to replace a loop dependent condition with a loop
invariant one, but loop-unswitch runs before IndVars, so the invariant
check remains in the loop.
For an example, consider a read-only loop with a bounds check:
https://godbolt.org/z/8cdj4qhbG
This patch uses a approach similar to the way extra cleanup passes are
run on demand after vectorization (added in acea6e9cfa4c4a0e8678c7).
It introduces a new ShouldRunExtraSimpleLoopUnswitch analysis marker,
which IndVars can use to indicate that extra unswitching is beneficial.
ExtraSimpleLoopUnswitchPassManager uses this analysis to determine
whether to run its passes on a loop.
Compile-time impact (geomean) ranges from +0.0% to 0.02%
https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=19e6e99eeb280d426907ea73a21b139ba7225627&stat=instructions%3Au
Compile-time impact (geomean) of unconditionally running
SimpleLoopUnswitch ranges from +0.05% - +0.16%
https://llvm-compile-time-tracker.com/compare.php?from=138c0beb109ffe47f75a0fe8c4dc2cdabe8a6532&to=2930dfd5accdce2e6f8d5146ae4d626add2065a2&stat=instructions:u
Unconditionally running SimpleLoopUnswitch seems to indicate that there
are multiple other scenarios where we fail to run unswitching when
opportunities remain.
Fixes https://github.com/llvm/llvm-project/issues/85551.
PR: https://github.com/llvm/llvm-project/pull/81271
|
|
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
|
|
(#84793)
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.
This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
|
|
As part of the RemoveDIs project we need LLVM to insert instructions using
iterators wherever possible, so that the iterators can carry a bit of
debug-info. This commit implements some of that by updating the contents of
llvm/lib/Transforms/Utils to always use iterator-versions of instruction
constructors.
There are two general flavours of update:
* Almost all call-sites just call getIterator on an instruction
* Several make use of an existing iterator (scenarios where the code is
actually significant for debug-info)
The underlying logic is that any call to getFirstInsertionPt or similar
APIs that identify the start of a block need to have that iterator passed
directly to the insertion function, without being converted to a bare
Instruction pointer along the way.
Noteworthy changes:
* FindInsertedValue now takes an optional iterator rather than an
instruction pointer, as we need to always insert with iterators,
* I've added a few iterator-taking versions of some value-tracking and
DomTree methods -- they just unwrap the iterator. These are purely
convenience methods to avoid extra syntax in some passes.
* A few calls to getNextNode become std::next instead (to keep in the
theme of using iterators for positions),
* SeparateConstOffsetFromGEP has it's insertion-position field changed.
Noteworthy because it's not a purely localised spelling change.
All this should be NFC.
|
|
After the removal of the legacyPM version of simple loop unswitch, there
is no longer a need for the callback mechanism to handle PM specific
tasks. This patch removes the callbacks to help simplify the code now
that they're no longer needed.
|
|
This patch adds support for CloneBasicBlock duplicating the DPValues
attached to instructions, and adds facilities to remap them into their new
context. The plumbing to achieve this is fairly straightforwards and
mechanical.
I've also added illustrative uses to LoopUnrollRuntime, SimpleLoopUnswitch
and SimplifyCFG. The former only updates for the epilogue right now so I've
added CHECK lines just for the end of an unrolled loop (further updates
coming later). SimpleLoopUnswitch had no debug-info tests so I've added a
new one. The two modified parts of SimplifyCFG are covered by the two
modified SimplifyCFG tests.
These are scenarios where we have to do extra cloning for copying of
DPValues because they're no longer instructions, and remap them too.
|
|
This pass isn't used anywhere in upstream and thus doesn't have any test
coverage. For these reasons, remove it.
|
|
When unswitching via invariant condition injection, we currently
mark the condition in the old loop, so that it does not get
unswitched again. However, if there are multiple branches for
which conditions can be injected, then we can do that for both
the old and new loop. This means that the number of unswitches
increases exponentially.
Change the handling to be more similar to partial unswitching,
where we instead mark the whole loop, rather than a single
condition. This means that we will only generate a linear number
of loops.
TBH I think even that is still highly undesirable, and we should
probably be unswitching all candidates at the same time, so that
we end up with only two loops. But at least this mitigates the
worst case.
The test case is a reduced variant that generates 1700 lines of IR
without this patch and 290 with it.
Fixes https://github.com/llvm/llvm-project/issues/66868.
|
|
The in-loop successor is only on the left after a potential condition
inversion. As we re-use the old condition as-is, we should also
reuse the old successors as-is.
Fixes https://github.com/llvm/llvm-project/issues/63962.
|
|
As per the stack of patches this is attached to, allow users of
BasicBlock::splitBasicBlock to provide an iterator for a position, instead
of just an instruction pointer. This is to fit with my proposal for how to
get rid of debug intrinsics [0]. There are other call-sites that would need
to change, but this is sufficient for a stage2clang self host and some
other C++ projects to build identical binaries, in the context of the whole
remove-DIs project.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152545
|
|
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
|
|
https://reviews.llvm.org/D152033
|
|
If a select's condition is a AND/OR, we can unswitch invariant operands.
This patch uses existing logic from unswitching AND/OR's for branch
conditions.
This patch fixes the Cost computation for unswitching selects to have
the cost of the entire loop, since unswitching selects do not remove
branches. This is required for this patch because otherwise, there are
cases where unswitching selects of AND/OR is beating out unswitching of
branches.
This patch also prevents unswitching of logical AND/OR selects. This
should instead be done by unswitching of AND/OR branch conditions.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151677
|
|
A follow-up for 64397d8. Only do verification if VerifyLoopInfo is
set.
|
|
SplitBlockAndInsertIfThen doesn't correctly update LoopInfo when called
with Unreachable=true, which is the case when we turn guards to branches
in SimpleLoopUnswitch.
This adds LoopInfo verification before returning from turnGuardIntoBranch.
|
|
This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7.
In preparation for reverting a dependent revision.
|
|
turnGuardIntoBranch() can use splitBlockAndInsertIfThen to update the
DominatorTree rather than implementing it itself.
|
|
Fixes https://github.com/llvm/llvm-project/issues/62715
If a select's condition is a trivial select:
```
%s = select %cond, i1 true, i1 false
```
Unswitch on %cond, rather than %s. This fixes crashes where there is a
disparity in finding candidates and and the transformation logic.
|
|
The old LoopUnswitch pass unswitched selects, but the changes were
never ported to the new SimpleLoopUnswitch.
We unswitch by turning:
``` S = select %cond, %a, %b ```
into:
``` head: br %cond, label %then, label %tail
then: br label %tail
tail: S = phi [ %a, %then ], [ %b, %head ] ```
Unswitch selects are always nontrivial, since the successors do not
exit the loop and the loop body always needs to be cloned.
Unswitch selects always need to freeze the conditional if the
conditional could be poison or undef. Selects don't propagate
poison/undef, and branches on poison/undef causes UB.
Reland 1 - Fix the insertion of freeze instructions. The original
implementation inserts a dead freeze instruction that is not used by the
unswitched branch.
Reland 2 - Include https://reviews.llvm.org/D149560 in the same patch,
which was originally reverted along with this patch. The patch prevents
unswitching of selects with a vector conditional. This could have been
caught in SimpleLoopUnswitch/crash.ll if it included tests for
nontrivial unswitching. This reland also adds a run for the test file
with nontrivial unswitching.
Reviewed By: nikic, kachkov98, vitalybuka
Differential Revision: https://reviews.llvm.org/D138526
|
|
This reverts commit 21f226fc4591db6e98faf380137a42067c909582. Crashes on
this test case:
define void @test2() nounwind {
entry:
br label %bb.nph
bb.nph: ; preds = %entry
%and.i13521 = and <4 x i1> undef, undef
br label %for.body
for.body: ; preds = %for.body, %bb.nph
%or.i = select <4 x i1> %and.i13521, <4 x i32> undef, <4 x i32> undef
br i1 false, label %for.body, label %for.end
for.end: ; preds = %for.body, %entry
ret void
}
|
|
The old LoopUnswitch pass unswitched selects, but the changes were never
ported to the new SimpleLoopUnswitch.
We unswitch by turning:
```
S = select %cond, %a, %b
```
into:
```
head:
br %cond, label %then, label %tail
then:
br label %tail
tail:
S = phi [ %a, %then ], [ %b, %head ]
```
Unswitch selects are always nontrivial, since the successors do not exit
the loop and the loop body always needs to be cloned.
Unswitch selects always need to freeze the conditional if the
conditional could be poison or undef. Selects don't propagate
poison/undef, and branches on poison/undef causes UB.
Reviewed By: nikic, kachkov98, vitalybuka
Differential Revision: https://reviews.llvm.org/D138526
|
|
After D149435, LCSSA formation no longer needs access to
ScalarEvolution, so remove the argument from the utilities.
|
|
Revert "Don't loop unswitch vector selects"
Breaks msan. Details in D138526.
This reverts commit bf089732775520624cb4983bfed6c341e1b4c405.
This reverts commit e479ed90b591c18873fda68c12946b9d08cbe02f.
|
|
Otherwise we could produce `br <2x i1>` which are of course not legal.
```
Branch condition is not 'i1' type!
br <2 x i1> %cond.fr1, label %entry.split.us, label %entry.split
%cond.fr1 = freeze <2 x i1> %cond
LLVM ERROR: Broken module found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /home/vchuravy/builds/llvm/bin/opt -passes=simple-loop-unswitch<nontrivial> -S
```
Fixes change introduced by https://reviews.llvm.org/D138526
Reviewed By: caojoshua
Differential Revision: https://reviews.llvm.org/D149560
|
|
The old LoopUnswitch pass unswitched selects, but the changes were never
ported to the new SimpleLoopUnswitch.
We unswitch by turning:
```
S = select %cond, %a, %b
```
into:
```
head:
br %cond, label %then, label %tail
then:
br label %tail
tail:
S = phi [ %a, %then ], [ %b, %head ]
```
Unswitch selects are always nontrivial, since the successors do not exit
the loop and the loop body always needs to be cloned.
Differential Revision: https://reviews.llvm.org/D138526
Co-authored-by: Sergey Kachkov <sergey.kachkov@syntacore.com>
|
|
|
|
As shown in https://github.com/llvm/llvm-project/issues/62058, canonicalication
may fail with pointer types (and basically this transform is not expected to
work with pointers).
|
|
This patch is making sure that we use getTopMostExitingLoop when
finding out which loops to forget, when dealing with
unswitchNontrivialInvariants and unswitchTrivialSwitch. It seems
to at least be needed for unswitchNontrivialInvariants as detected
by the included test case.
Note that unswitchTrivialBranch already used getTopMostExitingLoop.
This was done in commit 4a9cde5a791cd49b96993e6. The commit
message in that commit says "If the patch makes sense, I will also
update those places to a similar approach ...", referring to these
functions mentioned above. As far as I can tell that never happened,
but this is an attempt to finally fix that.
Fixes https://github.com/llvm/llvm-project/issues/61080
Differential Revision: https://reviews.llvm.org/D147058
|
|
When doing a trivial unswitch of a switch statement the code need
to "invalidate SCEVs for the outermost loop reached by any of the
exits", as indicated by code comments.
Depending on if we find such an outermost loop or not we can limit
the invalidation to some sub-loops or the full loop-nest. As shown
in the added test case there seem to have been some bugs in the code
that was finding the "outermost loop", so we could end up invalidating
too few loops.
Seems like commit 1bf8ae17f5e2714c8c87978 introduced the bug by
moving the code that invalidates the loops above some of the code
that computed 'OuterL'. This patch fixes that by also moving that
computation of 'OuterL' so that we compute 'OuterL' properly before
we use it for the SCEV invalidation.
Differential Revision: https://reviews.llvm.org/D146963
|
|
This fixes a compile time issue due to guarding loop unswitching based
on whether the enclosing function is cold. That approach is very
inefficient in the case of large cold functions that contain numerous
loops, since the loop pass calls isFunctionColdInCallGraph once per
loop, and that function walks all BBs in the function (twice for Sample
PGO) looking for any non-cold blocks.
Originally, this code only checked if the current Loop's header was cold
(D129599). However, that apparently caused a slowdown on a SPEC
benchmark, and the example given was that of a cold inner loop nested in
a non-cold outer loop (see comments in D129599). The fix was to check if
the whole function is cold, done in D133275.
This is overkill, and we can simply check if the header of any loop in
the current loop's loop nest is non-cold (looking at both outer and
inner loops). This patch drops the compile time for a large module by
40% with this approach.
I also updated PGO-nontrivial-unswitch2.ll since it only had one cold
loop in a non-cold function, so that it instead had IR based off the
example given in the comments relating to the SPEC degradation in
D129599. I confirmed that the new version of the test fails with the
original check done in D129599 of only the current loop's header
coldness.
Similarly updated test PGO-nontrivial-unswitch.ll to contain a cold loop
in a cold loop nest, and created PGO-nontrivial-unswitch3.ll to contain
a non-cold loop in a non-cold loop nest.
Differential Revision: https://reviews.llvm.org/D146383
|
|
This patch handles the following case: turn
```
if (x <u Invariant1) {
if (zext(x) <u Invariant2) {
...
}
}
```
into
```
if (x <u Invariant1) {
if (zext(Invariant1) <=u Invariant2) { // Unswitch here
// No check needed
} else {
if (zext(x) <u Invariant2) {
...
}
}
}
```
Differential Revision: https://reviews.llvm.org/D138015
Reviewed By: skatkov
|
|
|