| Age | Commit message (Collapse) | Author | Files | Lines |
|
PHINode::removeIncomingValueIf() to use the swapping strategy. (#174274)
Reland #171963, #172639 and #173444, they are reverted in
86b9f90b9574b3a7d15d28a91f6316459dcfa046 because of introducing
non-determinism in compiles.
The non-determinism has been fixed in
9b8addffa70cee5b2acc5454712d9cf78ce45710.
|
|
This causes non-determinism in compiles.
From nikic: "FYI the non-determinism is also visible on
llvm-opt-benchmark. Maybe repeatedly running test cases from
https://github.com/dtcxzyw/llvm-opt-benchmark/commit/299446d99f04024d5f569ce1f7e9338c9bcf55fe
could reproduce the issue..."
Also revert dependent 796fafeff92fe5d2d20594859e92607116e30a16 and
e135447bda617125688b71d33480d131d1076a72.
|
|
value with the last incoming value. (#171963)
Current implementation uses `std::copy` to shift all incoming values
after the removed index. This patch optimizes
`PHINode::removeIncomingValue()` by replacing the linear shift of
incoming values with a swap-with-last strategy.
After this change, the relative order of incoming values after removal
is not preserved.
This improves compile-time for PHI nodes with many predecessors.
Depends:
https://github.com/llvm/llvm-project/pull/171955
https://github.com/llvm/llvm-project/pull/171956
https://github.com/llvm/llvm-project/pull/171960
https://github.com/llvm/llvm-project/pull/171962
|
|
(#164476)
This patch addresses the profile of 2 branches:
- one that compares the 2 limits, for which we have no information (the C1, C2, see https://reviews.llvm.org/D136233)
- one that is conditioned on a condition for which we have a profile, so we reuse it
Issue #147390
|
|
The `llvm.experimental.guard` intrinsic is a `call`, so its metadata - if present - would be one value (as per `Verifier::visitProfMetadata`). That wouldn't be a correct `branch_weights` metadata. Likely, `GI->getMetadata(LLVMContext::MD_prof)` was always `nullptr`.
We can bias away from deopt instead.
Issue #147390
|
|
When running UTC on SLU/guards.ll (without LLVM changes), there are a number of changes in the UTC-generated checks. Submitting those first to simplify the diff of PR #164271, as most of the changes in the latter were actually these.
|
|
`buildPartialInvariantUnswitchConditionalBranch` (#164270)
A new branch is created on the same condition as a branch for which we have a profile. We can reuse that profile in this case.
Issue #147390
|
|
In the case of a partial unswitch, we take the invariant part of an expression consisting of either conjunctions or disjunctions, and hoist it out of the loop, conditioning a branch on it (==the invariant part). We can't correctly calculate the branch probability of this new branch, but can use the probability of the existing branch as a bound. That would preserve block frequencies better than allowing for the default, static (50-50) probability for that branch.
Issue #147390
|
|
In https://reviews.llvm.org/D129599, non-trivial switching was disabled
for cold loops in the interest of code size. This added a dependency on
BlockFrequencyInfo with PGO, but in loop passes this is only available
on a lossy basis: see https://reviews.llvm.org/D86156
LICM moved away from BFI so as of today SimpleLoopUnswitch is the only
remaining loop pass that uses BFI, for the sole reason to prevent code
size increases in PGO builds. It doesn't use BFI if there's no profile
summary available.
After some investigation on llvm-test-suite it turns out that the lossy
BFI causes very significant deviations in block frequency, since when
new loops are deleted/created during the loop pass manager it can return
frequencies for different loops altogether.
This results in unswitchable loops being mistakenly skipped because they
are thought to be cold.
This patch removes the use of BFI from SimpleLoopUnswitch and thus the
last remaining use of BFI in a loop pass.
To recover the original intent of not unswitching cold code,
PGOForceFunctionAttrs can be used to annotate functions which can be
optimized for code size, since SimpleLoopUnswitch will respect OptSize:
https://reviews.llvm.org/D94559
This isn't 100% the same behaviour since the previous behaviour checked
for coldness at the loop level and this is now at the function level. We
could expand PGOForceFunctionAttrs to be more granular at the loop
level, https://github.com/llvm/llvm-project/issues/159595 tracks this
idea.
|
|
To remove a confusing diff in https://github.com/llvm/llvm-project/pull/159522
|
|
When estimating the cost to avoid exponential unswitches of non-trivial
invariant conditions, also consider the parent loop basic blocks size,
ensuring this does not grow unexpectedly.
Fixes: https://github.com/llvm/llvm-project/issues/138509.
|
|
conditions"
This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to
multiple performance regressions observed across downstream Numba
benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772).
While avoiding non-trivial unswitches on newly-cloned loops helps
mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509,
it may as well make the IR less friendly to vectorization / loop-
canonicalization (in the test reported, previously no select with
loop-carried dependence existed in the new specialized loops),
leading the abovementioned approach to be reconsidered.
|
|
Track newly-cloned loops coming from unswitching non-trivial invariant
conditions, so as to prevent conditions in such cloned blocks from
being unswitched again.
Fixes: https://github.com/llvm/llvm-project/issues/138509.
|
|
These date back to when the non-intrinsic format of variable locations
was still being tested and was behind a compile-time flag, so not all
builds / bots would correctly run them. The solution at the time, to get
at least some test coverage, was to have tests opt-in to non-intrinsic
debug-info if it was built into LLVM.
Nowadays, non-intrinsic format is the default and has been on for more
than a year, there's no need for this flag to exist.
(I've downgraded the flag from "try" to explicitly requesting
non-intrinsic format in some places, so that we can deal with tests that
are explicitly about non-intrinsic format in their own commit).
|
|
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
|
|
If the gep is nusw (usually via inbounds) and the offset is
non-negative, we can infer nuw.
Proof: https://alive2.llvm.org/ce/z/ihztLy
|
|
This PR removes tests with `br i1 undef` under
`llvm/tests/Transforms/ObjCARC, Reassociate, SCCP, SLPVectorizer...`.
After this PR, I'll continue to fix tests under `llvm/tests/CodeGen`,
which has more UB tests than `llvm/tests/Transforms`.
|
|
We had a lot of -verify-memoryssa tests that did not actually use
MemorySSA, because they were not using the loop-mssa adaptor.
|
|
Fixes https://github.com/llvm/llvm-project/issues/117537.
|
|
(#113933)
This patch re-queue users of phi when one of its incoming add
instructions is updated. If an add instruction is updated, the analysis
results of phis may be improved. Thus we may further fold some users of
this phi node.
See the following case:
```
define i8 @trunc_in_loop_exit_block() {
; CHECK-LABEL: @trunc_in_loop_exit_block(
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ [[IV_NEXT]], [[LOOP_LATCH]] ]
; CHECK-NEXT: [[CMP:%.*]] = icmp samesign ult i32 [[IV]], 100
; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_LATCH]], label [[EXIT:%.*]]
; CHECK: loop.latch:
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
; CHECK-NEXT: br label [[LOOP]]
; CHECK: exit:
; CHECK-NEXT: [[TRUNC:%.*]] = trunc i32 [[PHI]] to i8
; CHECK-NEXT: ret i8 [[TRUNC]]
;
entry:
br label %loop
loop:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
%phi = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]
%cmp = icmp ult i32 %iv, 100
br i1 %cmp, label %loop.latch, label %exit
loop.latch:
%iv.next = add i32 %iv, 1
br label %loop
exit:
%trunc = trunc i32 %phi to i8
ret i8 %trunc
}
```
`%iv u< 100` -> infer `nsw/nuw` for `%iv.next = add i32 %iv, 1`
-> `%iv` is non-negative -> infer `samesign` for `%cmp = icmp ult i32
%iv, 100`.
Without re-queuing users of phi nodes, we cannot improve `%cmp` in one
iteration.
Address review comment
https://github.com/llvm/llvm-project/pull/112642#discussion_r1804712271.
This patch also fixes some non-fixpoint issues in tests.
|
|
|
|
This patch introduces a function attribute
`instcombine-no-verify-fixpoint` to avoids disabling fix-point
verification for unrelated tests in the same file.
Address comment
https://github.com/llvm/llvm-project/pull/112642#discussion_r1804714387.
|
|
terminators (#98789)
Fix #98787 .
|
|
Fix #97559 .
For the change at line 1253, I propagate the debug location of the
terminator (i.e., the insertion point) to the new phi. because `MergeBB`
is generated by splitting `ExitBB` several lines above, it only has the
terminator, which could provide a reasonable debug location.
For the change at line 2348, I switch the order of moving and cloning
`TI`. Because `NewTI` cloned from `TI` is inserted into the original
place where `TI` is, `NewTI` should preserve the origianl debug
location. At the same time, doing this allows us to propagate the debug
location to the new branch instruction replacing `NewTI` (the change at
line 2446).
|
|
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
|
|
When analyzing pass debug output it is helpful to have the function name
along with the loop name.
|
|
With the addition of #84628, truncs to i1 are being emitted as
conditions to branch instructions. This caused significant regressions
in cases which were previously improved by loop unswitch. Adding truncs
to i1 restore the previous performance seen.
|
|
This extends the work from 7755c26 to all of the different backend
taken count kinds that we print for the scev analysis printer. As
before, the goal is to cut down on confusion as i4 -1 is a very
different (unsigned) value from i32 -1.
|
|
This is a followup to #76819. After those changes, we can still run into
an assertion failure for a slight variation of the test case: When
fixing up MemoryPhis, we map the incoming access to the access of the
cloned instruction -- which may now no longer exist.
Fix this by reusing the getNewDefiningAccessForClone() helper, which
will look upwards for a new defining access in that case.
|
|
Sometimes, we create a MemoryAccess for an instruction, which is later
simplified (e.g. via devirtualization) such that the new instruction has
no memory effects anymore.
If we later clone the instruction (e.g. during unswitching), then MSSA
will not create a MemoryAccess for the new instruction, triggering an
assert.
Disable the assertion (by passing CreationMustSucceed=false) and adjust
getDefiningAccessForClone() to work correctly in that case.
This PR implements the alternative suggestion by alinas from
https://github.com/llvm/llvm-project/pull/76142.
|
|
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.
DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.
The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.
This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.
Fixes https://github.com/llvm/llvm-project/issues/74242.
|
|
|
|
This patch adds support for CloneBasicBlock duplicating the DPValues
attached to instructions, and adds facilities to remap them into their new
context. The plumbing to achieve this is fairly straightforwards and
mechanical.
I've also added illustrative uses to LoopUnrollRuntime, SimpleLoopUnswitch
and SimplifyCFG. The former only updates for the epilogue right now so I've
added CHECK lines just for the end of an unrolled loop (further updates
coming later). SimpleLoopUnswitch had no debug-info tests so I've added a
new one. The two modified parts of SimplifyCFG are covered by the two
modified SimplifyCFG tests.
These are scenarios where we have to do extra cloning for copying of
DPValues because they're no longer instructions, and remap them too.
|
|
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
|
|
To reduce the spurious test delta in an upcoming change.
|
|
When unswitching via invariant condition injection, we currently
mark the condition in the old loop, so that it does not get
unswitched again. However, if there are multiple branches for
which conditions can be injected, then we can do that for both
the old and new loop. This means that the number of unswitches
increases exponentially.
Change the handling to be more similar to partial unswitching,
where we instead mark the whole loop, rather than a single
condition. This means that we will only generate a linear number
of loops.
TBH I think even that is still highly undesirable, and we should
probably be unswitching all candidates at the same time, so that
we end up with only two loops. But at least this mitigates the
worst case.
The test case is a reduced variant that generates 1700 lines of IR
without this patch and 290 with it.
Fixes https://github.com/llvm/llvm-project/issues/66868.
|
|
The in-loop successor is only on the left after a potential condition
inversion. As we re-use the old condition as-is, we should also
reuse the old successors as-is.
Fixes https://github.com/llvm/llvm-project/issues/63962.
|
|
This was supposed to document the new PM limitation but
was deleted in fb4113ef0c8b2c5e5e2817e9ca14fb57a6d252be
Switch to generated checks since that's more reliable than XFAIL, and
just preserve the preferred results as comments.
|
|
https://reviews.llvm.org/D152033
|
|
SplitBlockAndInsertIfThen utility creates two new blocks,
they're called ThenBlock and Tail (true and false destinations of a conditional
branch correspondingly). The function has a bool parameter Unreachable,
and if it's set, then ThenBlock is terminated with an unreachable.
At the end of the function the new blocks are added to the loop of the split
block. However, in case ThenBlock is terminated with an unreachable,
it cannot belong to any loop.
Differential Revision: https://reviews.llvm.org/D152434
|
|
If a select's condition is a AND/OR, we can unswitch invariant operands.
This patch uses existing logic from unswitching AND/OR's for branch
conditions.
This patch fixes the Cost computation for unswitching selects to have
the cost of the entire loop, since unswitching selects do not remove
branches. This is required for this patch because otherwise, there are
cases where unswitching selects of AND/OR is beating out unswitching of
branches.
This patch also prevents unswitching of logical AND/OR selects. This
should instead be done by unswitching of AND/OR branch conditions.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151677
|
|
|
|
A follow-up for 64397d8. Only do verification if VerifyLoopInfo is
set.
|
|
SplitBlockAndInsertIfThen doesn't correctly update LoopInfo when called
with Unreachable=true, which is the case when we turn guards to branches
in SimpleLoopUnswitch.
This adds LoopInfo verification before returning from turnGuardIntoBranch.
|
|
No tests failed when I removed the hasBranchDivergence check, so
add one.
|
|
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
|
|
Fixes https://github.com/llvm/llvm-project/issues/62715
If a select's condition is a trivial select:
```
%s = select %cond, i1 true, i1 false
```
Unswitch on %cond, rather than %s. This fixes crashes where there is a
disparity in finding candidates and and the transformation logic.
|
|
The old LoopUnswitch pass unswitched selects, but the changes were
never ported to the new SimpleLoopUnswitch.
We unswitch by turning:
``` S = select %cond, %a, %b ```
into:
``` head: br %cond, label %then, label %tail
then: br label %tail
tail: S = phi [ %a, %then ], [ %b, %head ] ```
Unswitch selects are always nontrivial, since the successors do not
exit the loop and the loop body always needs to be cloned.
Unswitch selects always need to freeze the conditional if the
conditional could be poison or undef. Selects don't propagate
poison/undef, and branches on poison/undef causes UB.
Reland 1 - Fix the insertion of freeze instructions. The original
implementation inserts a dead freeze instruction that is not used by the
unswitched branch.
Reland 2 - Include https://reviews.llvm.org/D149560 in the same patch,
which was originally reverted along with this patch. The patch prevents
unswitching of selects with a vector conditional. This could have been
caught in SimpleLoopUnswitch/crash.ll if it included tests for
nontrivial unswitching. This reland also adds a run for the test file
with nontrivial unswitching.
Reviewed By: nikic, kachkov98, vitalybuka
Differential Revision: https://reviews.llvm.org/D138526
|
|
This reverts commit 21f226fc4591db6e98faf380137a42067c909582. Crashes on
this test case:
define void @test2() nounwind {
entry:
br label %bb.nph
bb.nph: ; preds = %entry
%and.i13521 = and <4 x i1> undef, undef
br label %for.body
for.body: ; preds = %for.body, %bb.nph
%or.i = select <4 x i1> %and.i13521, <4 x i32> undef, <4 x i32> undef
br i1 false, label %for.body, label %for.end
for.end: ; preds = %for.body, %entry
ret void
}
|
|
The old LoopUnswitch pass unswitched selects, but the changes were never
ported to the new SimpleLoopUnswitch.
We unswitch by turning:
```
S = select %cond, %a, %b
```
into:
```
head:
br %cond, label %then, label %tail
then:
br label %tail
tail:
S = phi [ %a, %then ], [ %b, %head ]
```
Unswitch selects are always nontrivial, since the successors do not exit
the loop and the loop body always needs to be cloned.
Unswitch selects always need to freeze the conditional if the
conditional could be poison or undef. Selects don't propagate
poison/undef, and branches on poison/undef causes UB.
Reviewed By: nikic, kachkov98, vitalybuka
Differential Revision: https://reviews.llvm.org/D138526
|