| Age | Commit message (Collapse) | Author | Files | Lines |
|
Reduce the size of a DomTreeNodeBase from 80 to 56 bytes by not storing
the children in a SmallVector. Instead, store children as forward-linked
list. This also avoids extra allocations for nodes with many children.
Additionally, DomTreeNodeBase is now trivially destructible.
A lot of code depends on the order of nodes in the dominator tree, so
make sure that the order is the same when inserting nodes. (Not having
to do this would save 8 bytes per node.)
NewGVN uses the order of nodes in the dominator tree in a way that is
not entirely clear to me (https://reviews.llvm.org/D28129). I kept the
semantics as, but now this is the only external user of
addChild/removeChild, which actually should be private.
https://llvm-compile-time-tracker.com/compare.php?from=263802c56b4db3fc9b6ed9fd313499cb03ca44da&to=43e0c0c5b663b3a4067252fc0addbaccefd0014d&stat=instructions:u
|
|
PHINode::removeIncomingValueIf() to use the swapping strategy. (#174274)
Reland #171963, #172639 and #173444, they are reverted in
86b9f90b9574b3a7d15d28a91f6316459dcfa046 because of introducing
non-determinism in compiles.
The non-determinism has been fixed in
9b8addffa70cee5b2acc5454712d9cf78ce45710.
|
|
This causes non-determinism in compiles.
From nikic: "FYI the non-determinism is also visible on
llvm-opt-benchmark. Maybe repeatedly running test cases from
https://github.com/dtcxzyw/llvm-opt-benchmark/commit/299446d99f04024d5f569ce1f7e9338c9bcf55fe
could reproduce the issue..."
Also revert dependent 796fafeff92fe5d2d20594859e92607116e30a16 and
e135447bda617125688b71d33480d131d1076a72.
|
|
backwards (#173444)
See discussion in
https://github.com/llvm/llvm-project/pull/172639#issuecomment-3686893143
If someone did `PN->removeIncomingValueIf([](unsigned Idx) { return Idx
== 0; })` to remove the first incoming value, current implementation
will remove all incoming values.
There are purely index based predicate use cases in:
-
https://github.com/llvm/llvm-project/blob/8c5a0f74a12d69d2fb7d6ed59f91759f18273bcd/llvm/lib/Transforms/Utils/LoopSimplify.cpp#L429
-
https://github.com/llvm/llvm-project/blob/8c5a0f74a12d69d2fb7d6ed59f91759f18273bcd/llvm/lib/Transforms/Utils/LoopUtils.cpp#L562
This patch makes `PHINode::removeIncomingValueIf()` to loop incoming
values backwards, to ensure `PHINode::removeIncomingValueIf()` working
as expected.
|
|
`PHINode::removeIncomingValue()` (#172639)
As suggested in https://github.com/llvm/llvm-project/pull/171963, update
`PHINode::removeIncomingValueIf()` to use the swap strategy too.
|
|
value with the last incoming value. (#171963)
Current implementation uses `std::copy` to shift all incoming values
after the removed index. This patch optimizes
`PHINode::removeIncomingValue()` by replacing the linear shift of
incoming values with a swap-with-last strategy.
After this change, the relative order of incoming values after removal
is not preserved.
This improves compile-time for PHI nodes with many predecessors.
Depends:
https://github.com/llvm/llvm-project/pull/171955
https://github.com/llvm/llvm-project/pull/171956
https://github.com/llvm/llvm-project/pull/171960
https://github.com/llvm/llvm-project/pull/171962
|
|
contain convergence loop/entry intrinsics. (#170247)
Fixes: https://github.com/llvm/llvm-project/issues/165642. After this
fix, optimization passes for the example in the bug.
[LLVM
Spec](https://llvm.org/docs/ConvergentOperations.html#llvm-experimental-convergence-loop)
states that only a single loop / entry convergence token can be included
in a basic block.
This PR fixes the issue in `JumpThreading` pass so that when a basic
block and its predecessor both contain such convergence intrinsics, it
skips merging the two blocks.
|
|
Post cleanup for #164534.
|
|
Tests exercizing TBAA metadata (both purposefully and not), and
previously generated via UTC, have been regenerated and updated
to version 6.
|
|
In simplifyPartiallyRedundantLoad we may replace a load with a PHI of
available values in predecessor blocks. As part of this process, we may
need to cast those values, which we do by inserting a new cast at the
end of the predecessor. These cast instructions should take their debug
location from the load instruction, just as the PHI does; we make an
exception if the predecessor does not unconditionally branch to the
load's block, as in that case we are not guaranteed to reach the load
and must therefore drop its debug location.
Found using https://github.com/llvm/llvm-project/pull/107279.
|
|
proof: https://alive2.llvm.org/ce/z/8emkHY
|
|
Note: This relands #140615 adding a ".count" suffix to the non-".all"
variants.
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync.count(i32, i32)
- llvm.nvvm.barrier.cta.arrive.count(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)
|
|
This reverts commit 735209c0688b10a66c24750422b35d8c2ad01bb5.
|
|
|
|
Our current intrinsic support for barrier intrinsics is confusing and
incomplete, with multiple intrinsics mapping to the same instruction and
intrinsic names not clearly conveying intrinsic semantics. Further, we
lack support for some variants. This change unifies the IR
representation to a single consistently named set of intrinsics.
- llvm.nvvm.barrier.cta.sync.aligned.all(i32)
- llvm.nvvm.barrier.cta.sync.aligned(i32, i32)
- llvm.nvvm.barrier.cta.arrive.aligned(i32, i32)
- llvm.nvvm.barrier.cta.sync.all(i32)
- llvm.nvvm.barrier.cta.sync(i32, i32)
- llvm.nvvm.barrier.cta.arrive(i32, i32)
The following Auto-Upgrade rules are used to maintain compatibility with
IR using the legacy intrinsics:
* llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0)
* llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x)
* llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y)
* llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x)
* llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)
|
|
(#134585)
In case the same src BB targets to the same dest BB in different
conditions/edges, such as switch-cases, we should use
prob[SrcBB->SuccIndx] instead of prob[SrcBB->DstBB] to get probability.
|
|
In unreachable code, constant PHI nodes may appear and be replaced by their
single value. As a result, instructions may become self-referencing. This
commit adds checks to avoid going into infinite recursion when handling
self-referencing compare instructions in `evaluateOnPredecessorEdge()`.
This LLVM defect was identified via the AMD Fuzzing project.
|
|
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.
Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.
(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
|
|
Although an unreachable BB is skipped by processBlock, its successor can
still be handled by processBlock, and maybeMergeBasicBlockIntoOnlyPred
may merge the two BBs and delete the unreachable BB. Then the garbage
pointer is left in Unreachable set. This patch avoids merging a BB into
unreachable predecessor.
|
|
We cannot infer more information from backedges in
`solveBlockValueNonLocal`. However, since DT is unavailable in LVI,
there is not a precise way to check whether a BB edge is a backedge.
This patch only skips self loops to unblock the range analysis.
The motivating case is extracted from
https://github.com/llvm/llvm-project/pull/127663.
Compile-time impact is high:
https://llvm-compile-time-tracker.com/compare.php?from=84ddda58c870681dd12ed765e9d59d5e00567f94&to=af032f1351358f2f5b5d9f4e87c5601c23b9bd37&stat=instructions:u
|
|
This cleanup action only needs to be performed once when the entire
optimization is converged. Doing it in every iteration has a very high
time-complexity, as it queries every dbg value in a dense map
Compare before and after for one internal source file with many basic
blocks


>90% reduction in this extreme case.
|
|
(#117716)
Preserve !alias.scope, !noalias and !mem.parallel_loop_access metadata
on the replacement instruction, if it does not move. In that case, the
program would be UB, if the aliasing property encoded in the metadata
does not hold. This makes use of the clarification re aliasing metadata
implying UB if the property does not hold: #116220
Same as #115868, but for !alias.scope, !noalias and
!mem.parallel_loop_access.
PR: https://github.com/llvm/llvm-project/pull/117716
|
|
Preserve tbaa metadata on the replacement instruction, if it does not
move. In that case, the program would be UB, if the aliasing property
encoded in the metadata does not hold.
This makes use of the clarification re tbaa metadata implying UB if the
property does not hold: https://github.com/llvm/llvm-project/pull/116220
Same as https://github.com/llvm/llvm-project/pull/115868, but for !tbaa
PR: https://github.com/llvm/llvm-project/pull/116682
|
|
This PR removes tests with br i1 undef under
`llvm/tests/Transforms/JumpThreading, LCSSA, LICM, LoopDeletion,
LoopIdiom`.
|
|
|
|
|
|
instructions (#96889)
Fix #96885 .
|
|
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
|
|
Reapply after https://github.com/llvm/llvm-project/pull/93548,
which should address the lldb failure on macos.
-----
Do not create icmp/fcmp constant expressions in IRBuilder etc anymore,
i.e. treat them as "undesirable". This is in preparation for removing
them entirely.
Part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
|
|
Reverts llvm/llvm-project#92885 due to LLDB CI breakages.
|
|
Do not create icmp/fcmp constant expressions in IRBuilder etc anymore,
i.e. treat them as "undesirable". This is in preparation for removing
them entirely.
Part of:
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179
|
|
|
|
Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands
in other places. This only changes places where the either the fold is
guaranteed to succeed, or the code doesn't use the resulting compare if
we fail to fold.
|
|
(#88907)
Reverts llvm/llvm-project#86312
|
|
Fixes #76609
This patch does:
- relax the phis constraint in `CanRedirectPredsOfEmptyBBToSucc`
- guarantee the BB has multiple different predecessors to redirect, so
that we can handle the case without phis in BB. Without this change and
phi constraint, we may redirect the CommonPred.
The motivation is consistent with JumpThreading. We always want the
branch to jump more direct to the destination, without passing the
middle block. In this way, we can expose more other optimization
opportunities.
An obivous example proposed by @dtcxzyw is like:
```llvm
define i32 @test(...) {
entry:
br i1 %c, label %do.end, label %if.then
if.then: ; preds = %entry
%call2 = call i32 @dummy()
%tobool3.not = icmp eq i32 %call2, 0
br i1 %tobool3.not, label %do.end, label %return
do.end: ; preds = %entry, %if.then
br label %return
return: ; preds = %if.then, %do.end
%retval.0 = phi i32 [ 0, %do.end ], [ %call2, %if.then ]
ret i32 %retval.0
}
```
`entry` can directly jump to return, without passing `do.end`, and then
the if-else pattern can be simplified further:
```llvm
define i32 @test(...) {
entry:
br i1 %c, label %return, label %if.then
if.then: ; preds = %entry
%call2 = call i32 @dummy()
br label %return
return: ; preds = %if.then
%retval.0 = phi i32 [ 0, %entry ], [ %call2, %if.then ]
ret i32 %retval.0
}
```
|
|
|
|
JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.
Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.
Fixes https://github.com/llvm/llvm-project/issues/79175.
|
|
|
|
|
|
(#73251)
Debugify is extremely useful as a testing and debugging tool, and a good
number of LLVM-IR transform tests use it. We need it to support "new"
non-instruction debug-info to get test coverage, but it's not important
enough to completely convert right now (and it'd be a large
undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and
exit of the pass, which gives us the functionality without any further
work. The cost is compile-time, but again this is only happening during
tests.
Tested by: the large set of debugify tests enabled here. Note the
InstCombine test (cast-mul-select.ll) that hasn't been fully enabled:
this is because there's a debug-info sinking piece of code there that
hasn't been instrumented.
|
|
With intrinsics representing debug-info, we just clone all the
intrinsics when inlining a function and don't think about it any
further. With non-instruction debug-info however we need to be a bit
more careful and manually move the debug-info from one place to another.
For the most part, this means keeping a "cursor" during block cloning of
where we last copied debug-info from, and performing debug-info copying
whenever we successfully clone another instruction.
There are several utilities in LLVM for doing this, all of which now
need to manually call cloneDebugInfo. The testing story for this is not
well covered as we could rely on normal instruction-cloning mechanisms
to do all the hard stuff. Thus, I've added a few tests to explicitly
test dbg.value behaviours, ahead of them becoming not-instructions.
|
|
This patch makes jump-threading handle non-instruction debug-info stored
in DPValues in the same way that it updates dbg.values nowadays. This
involves re-targetting their operands as with dbg.values getting moved
from one block to another, and manually cloning them when duplicating
blocks. The SSAUpdater class also grows some functions for SSA-updating
DPValues in the same way as dbg.values.
All of this is largely covered by existing debug-info tests, except for
the cloning of DPValues attached to elidable instructions and branches,
where I've added a test to thread-debug-info.ll. Where previously we
could rely on dbg.values being copied and cloned as normal instructions
are, as we need to explicitly perform that operation now I've added some
explicit testing for it.
|
|
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.
There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
|
|
When evaluating comparisons in predecessors, phi operands are translated
into the predecessor. If the translation is across a backedge, this
means that the two operands of the icmp will be from two different loop
iterations, resulting in incorrect simplification.
Fix this by not performing the phi translation for phis in loop headers.
Note: This is not a complete fix. If the
jump-threading-across-loop-headers option is enabled, the LoopHeaders
variable does not get populated. Additional changes will be needed to
fix that case.
Related to https://github.com/llvm/llvm-project/issues/70651.
|
|
|
|
and SuccBB (#68473)
Reland #67275 with #68953 resolved.
|
|
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
|
|
and SuccBB (#67275)"
This reverts commit fc86d031fec5e47c6811efd3a871742ad244afdd.
This change breaks LLVM buildbot clang-aarch64-sve-vls-2stage
https://lab.llvm.org/buildbot/#/builders/176/builds/5474
I am going to revert this patch as the bot has been failing for more than a day without a fix.
|
|
SuccBB (#67275)
This patch extends function TryToSimplifyUncondBranchFromEmptyBlock to
handle the similar cases below.
```llvm
define i8 @src(i8 noundef %arg) {
start:
switch i8 %arg, label %unreachable [
i8 0, label %case012
i8 1, label %case1
i8 2, label %case2
i8 3, label %end
]
unreachable:
unreachable
case1:
br label %case012
case2:
br label %case012
case012:
%phi1 = phi i8 [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
br label %end
end:
%phi2 = phi i8 [ %phi1, %case012 ], [ 4, %start ]
ret i8 %phi2
}
```
The phis here should be merged into one phi, so that we can better
optimize it:
```llvm
define i8 @tgt(i8 noundef %arg) {
start:
switch i8 %arg, label %unreachable [
i8 0, label %end
i8 1, label %case1
i8 2, label %case2
i8 3, label %case3
]
unreachable:
unreachable
case1:
br label %end
case2:
br label %end
case3:
br label %end
end:
%phi = phi i8 [ 4, %case3 ], [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
ret i8 %phi
}
```
Proof:
[normal](https://alive2.llvm.org/ce/z/vAWi88)
[multiple stages](https://alive2.llvm.org/ce/z/DDBQqp)
[multiple stages 2](https://alive2.llvm.org/ce/z/nGkeqN)
[multiple phi combinations](https://alive2.llvm.org/ce/z/VQeEdp)
And lookup table optimization should convert it into add %arg 1.
This patch just match similar CFG structure and merge the phis in
different cases.
Maybe such transform can be applied to other situations besides switch,
but I'm not sure whether it's better than not merging. Therefore, I only
try it in switch,
Related issue:
#63876
[Migrated](https://reviews.llvm.org/D155940)
|
|
Propagate "branch_weights" metadata whe turning a select into a
conditional branch in tryToUnfoldSelectInCurrBB
|