Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
(#159765)
Fixes #159571
|
|
This ensures each scalarized member has an accurate cost, matching the
cost it would have if it would not have been considered for an
interleave group.
|
|
For UDiv/SDiv with invariant divisors, the created selects will be
hoisted out. Don't compute their cost for each iteration, to match the
more accurate VPlan-based cost modeling.
Fixes https://github.com/llvm/llvm-project/issues/159402.
|
|
|
|
Loads of addresses are scalarized and have their costs computed w/o
scalarization overhead. Consistently apply this logic also to
non-uniform loads that are already scalarized, to ensure their costs are
consistent with other scalarized lodas that are used as addresses.
|
|
We were hitting an assert discovered in https://github.com/llvm/llvm-project/pull/157768#issuecomment-3315359832
|
|
|
|
There are a couple of places during function cloning where we may create
new callsite clone nodes. One of those places was correctly propagating
the assignment to which function clone it should call, and one was not.
Refactor this handling into a helper and use in both places so the newly
created callsite clones actually call the assigned callee function
clones.
|
|
Fixes #148052 .
Last PR did not account for the scenario, when more than one instruction
used the `catchpad` label.
In that case I have deleted uses, which were already "choosen to be
iterated over" by the early increment iterator. This issue was not
visible in normal release build on x86, but luckily later on the address
sanitizer build it has found it on the buildbot.
Here is the diff from the last version of this PR: #158435
```diff
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 91e245e5e8f5..1dd8cb4ee584 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs,
// first block, the we would have possible cleanupret and catchret
// instructions with poison arguments, which wouldn't be valid.
if (isa<FuncletPadInst>(I)) {
- for (User *User : make_early_inc_range(I.users())) {
+ SmallPtrSet<BasicBlock *, 4> UniqueEHRetBlocksToDelete;
+ for (User *User : I.users()) {
Instruction *ReturnInstr = dyn_cast<Instruction>(User);
// If we have a cleanupret or catchret block, replace it with just an
// unreachable. The other alternative, that may use a catchpad is a
@@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs,
if (isa<CatchReturnInst>(ReturnInstr) ||
isa<CleanupReturnInst>(ReturnInstr)) {
BasicBlock *ReturnInstrBB = ReturnInstr->getParent();
- // This catchret or catchpad basic block is detached now. Let the
- // successors know it.
- // This basic block also may have some predecessors too. For
- // example the following LLVM-IR is valid:
- //
- // [cleanuppad_block]
- // |
- // [regular_block]
- // |
- // [cleanupret_block]
- //
- // The IR after the cleanup will look like this:
- //
- // [cleanuppad_block]
- // |
- // [regular_block]
- // |
- // [unreachable]
- //
- // So regular_block will lead to an unreachable block, which is also
- // valid. There is no need to replace regular_block with unreachable
- // in this context now.
- // On the other hand, the cleanupret/catchret block's successors
- // need to know about the deletion of their predecessors.
- emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs);
+ UniqueEHRetBlocksToDelete.insert(ReturnInstrBB);
}
}
+ for (BasicBlock *EHRetBB :
+ make_early_inc_range(UniqueEHRetBlocksToDelete))
+ emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs);
}
}
```
|
|
The split in this code path was left over from when we had to support
the old PM and the new PM at the same time. Now that the legacy pass has
been dropped, this simplifies the code a little bit and swaps pointers
for references in a couple places.
Reviewers: aeubanks, efriedma-quic, wlei-llvm
Reviewed By: aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/159858
|
|
This patch introduces a new optimization in SROA that handles the
pattern where multiple non-overlapping vector `store`s completely fill
an `alloca`.
The current approach to handle this pattern introduces many `.vecexpand`
and `.vecblend` instructions, which can dramatically slow down
compilation when dealing with large `alloca`s built from many small
vector `store`s. For example, consider an `alloca` of type `<128 x
float>` filled by 64 `store`s of `<2 x float>` each. The current
implementation requires:
- 64 `shufflevector`s( `.vecexpand`)
- 64 `select`s ( `.vecblend` )
- All operations use masks of size 128
- These operations form a long dependency chain
This kind of IR is both difficult to optimize and slow to compile,
particularly impacting the `InstCombine` pass.
This patch introduces a tree-structured merge approach that
significantly reduces the number of operations and improves compilation
performance.
Key features:
- Detects when vector `store`s completely fill an `alloca` without gaps
- Ensures no loads occur in the middle of the store sequence
- Uses a tree-based approach with `shufflevector`s to merge stored
values
- Reduces the number of intermediate operations compared to linear
merging
- Eliminates the long dependency chains that hurt optimization
Example transformation:
```
// Before: (stores do not have to be in order)
%alloca = alloca <8 x float>
store <2 x float> %val0, ptr %alloca ; offset 0-1
store <2 x float> %val2, ptr %alloca+16 ; offset 4-5
store <2 x float> %val1, ptr %alloca+8 ; offset 2-3
store <2 x float> %val3, ptr %alloca+24 ; offset 6-7
%result = load <8 x float>, ptr %alloca
// After (tree-structured merge):
%shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```
Benefits:
- Logarithmic depth (O(log n)) instead of linear dependency chains
- Fewer total operations for large vectors
- Better optimization opportunities for subsequent passes
- Significant compilation time improvements for large vector patterns
For some large cases, the compile time can be reduced from about 60s to
less than 3s.
---------
Co-authored-by: chengjunp <chengjunp@nividia.com>
|
|
Pass operand info to getMemoryOpCost in getMemInstScalarizationCost.
This matches the behavior in VPReplicateRecipe::computeCost.
|
|
The `NDEBUG` macro is tested for defined-ness everywhere else. The
instance here triggers a warning when compiling with `-Wundef`.
|
|
when preserving inbounds (#159515)
If we know that the initial GEP was inbounds, and we change it to a sequence of
GEPs from the same base pointer where every offset is non-negative, then the
new GEPs are inbounds. So far, the implementation only checked if the extracted
offsets are non-negative. In cases where non-extracted offsets can be negative,
this would cause the inbounds flag to be wrongly preserved.
Fixes an issue in #130617 found by nikic.
|
|
ConstantExpr addrspacecast (#159695)
This PR extends isSafeToCastConstAddrSpace to treat
ConstantAggregateZero like ConstantPointerNull.
Tests shows an extra addrspacecast instruction is removed and icmp
pointer vector operand's address space is now inferred.
This change is motivated by inspecting the test in commit f7629f5945f6.
|
|
embed in MemIntrinsicInfo" (#159700)
Reverts llvm/llvm-project#157863
|
|
MemIntrinsicInfo (#157863)
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch make
SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so
that TTI can make targets describe their intrinsic informations to asan.
Note,
1. This patch move InterestingMemoryOperand from Transforms to Analysis.
2. Extend MemIntrinsicInfo by adding a
SmallVector<InterestingMemoryOperand> member.
3. This patch does not support RVV indexed/segment load/store.
|
|
Previously undef pointer operand is only supported for select inst,
where undef in generic AS behaves like `take the other side`.
This PR extends the support to other instructions, e.g. phi inst. Defer
joining and inferring constant pointer operand until all other operand
AS states considered.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
Always add pointers proved to be uniform via legal/SCEV to worklist.
This extends the existing logic to handle a few more pointers known to
be uniform.
|
|
operands were copyable
If all operands of the non-schedulable nodes were previously only
copyables, need to clear the dependencies of the original schedule data
for such copyable operands and recalculate them to correctly handle
number of dependecies.
Fixes #159406
|
|
A live-in constant can never be of vector type.
|
|
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.
Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.
In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.
Depends on https://github.com/llvm/llvm-project/pull/153643.
PR: https://github.com/llvm/llvm-project/pull/154510
|
|
`__NoopCoro_ResumeDestroy` currently has private linkage, which causes
[issues for
Arm64EC](https://github.com/llvm/llvm-project/issues/158341). The
Arm64EC lowering is trying to mangle and add thunks for
`__NoopCoro_ResumeDestroy`, since it sees that it's address is taken
(and, therefore, might be called from x64 code via a function pointer).
MSVC's linker requires that the function be placed in COMDAT (`LNK1361:
non COMDAT symbol '.L#__NoopCoro_ResumeDestroy' in hybrid binary`) which
trips an assert in the verifier (`comdat global value has private
linkage`) and the subsequent linking step fails since the private symbol
isn't in the symbol table.
Since there is no reason to use private linkage for
`__NoopCoro_ResumeDestroy` and other coro related functions have also
been [switched to internal linkage to improve
debugging](https://github.com/llvm/llvm-project/pull/151224), this
change switches to using internal linkage.
Fixes #158341
|
|
Splitting out just the recipe finding code from #148626 into a utility
function (along with the extra pattern matchers). Hopefully this makes
reviewing a bit easier.
Added a gtest, since this isn't actually used anywhere yet.
|
|
This adds a new pass for dropping assumes that are unlikely to be useful
for further optimization.
It works by discarding any assumes whose affected values are one-use
(which implies that they are only used by the assume, i.e. ephemeral).
This pass currently runs at the start of the module optimization
pipeline, that is post-inline and post-link. Before that point, it is
more likely for previously "useless" assumes to become useful again,
e.g. because an additional user of the value is introduced after
inlining + CSE.
|
|
(#159516)
No loop pass seems to use now it after LoopPredication stopped using it
in https://reviews.llvm.org/D111668
|
|
|
|
Follow up on 7fb3a91 ([PatternMatch] Introduce match functor) to
introduce the VPlanPatternMatch version of the match functor to shorten
some idioms.
Co-authored-by: Luke Lau <luke@igalia.com>
|
|
Function analyses in LoopStandardAnalysisResults are marked as preserved
by the loop pass adaptor, because LoopAnalysisManagerFunctionProxy
manually invalidates most of them.
However the proxy doesn't invalidate BFI, since it is only preserved on
a "lossy" basis: see https://reviews.llvm.org/D86156 and
https://reviews.llvm.org/D110438.
So any changes to the CFG will result in BFI giving incorrect results,
which is fine for loop passes which deal with the lossiness.
But the loop pass adapator still marks it as preserved, which causes the
lossy result to leak out into function passes.
This causes incorrect results when viewed from e.g. LoopVectorizer,
where an innermost loop header may be reported to have a smaller
frequency than its successors.
This fixes this by dropping the call to preserve, and adds a test with
the -O1 pipeline which shows the effects whenever the CFG is changed and
UseBlockFrequencyInfo is set.
I've also dropped it for BranchProbabilityAnalysis too, but I couldn't
test for it since UseBranchProbabilityInfo always seems to be false?
This may be dead code.
|
|
(#158498)
Closes https://github.com/llvm/llvm-project/issues/158326.
Closes https://github.com/llvm/llvm-project/issues/59555.
Proof for `(X & -Pow2) == C -> (X - C) < Pow2`:
https://alive2.llvm.org/ce/z/HMgkuu
|
|
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
|
|
|
|
If we know x in R1, the range check `x in R2` can be relaxed into `x in
Union(R2, Inverse(R1))`. The latter one may be more efficient if we can
represent it with one icmp.
Fixes regressions introduced by
https://github.com/llvm/llvm-project/pull/156497.
Proof for `(X & -Pow2) == C -> (X - C) < Pow2`:
https://alive2.llvm.org/ce/z/HMgkuu
Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=ead4f3e271fdf6918aef2ede3a7134811147d276&to=bee3d902dd505cf9b11499ba4f230e4e8ae96b92&stat=instructions%3Au
|
|
Consider IC when deciding if epilogue profitable for scalable vectors,
same as fixed-width vectors.
|
|
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
|
|
(#159292)
…ad blocks" (#158435)"
This reverts commit 41cef78227eb909181cb9360099b2d92de8d649f.
|
|
(#158435)
When removing EH Pad blocks, the value defined by them becomes poison. These poison values are then used by `catchret` and `cleanupret`, which is invalid. This commit replaces those unreachable `catchret` and `cleanupret` instructions with `unreachable`.
|
|
of llvm.coro.end #153404"" (#159236)
Reverts llvm/llvm-project#155339 because of CI fail
|
|
llvm.coro.end #153404" (#155339)
As mentioned in #151067, current design of llvm.coro.end mixes two
functionalities: querying where we are and lowering to some code. This
patch separate these functionalities into independent intrinsics by
introducing a new intrinsic llvm.coro.is_in_ramp.
|
|
update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159161)
This is a reland of https://github.com/llvm/llvm-project/pull/158460
Test failures are gone once I undo the changes in codegenprepare.
|
|
update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159159)
Reverts llvm/llvm-project#158460 due to buildbot failures
|
|
existing prefix is not equivalent to the new one. Returns whether prefix changed. (#158460)
Before this change, `setSectionPrefix` overwrites existing section
prefix with new one unconditionally.
After this change, `setSectionPrefix` checks for equivalences, updates
conditionally and returns whether an update happens.
Update the existing callers to make use of the return value. [PR
155337](https://github.com/llvm/llvm-project/pull/155337/files#diff-cc0c67ac89807f4453f0cfea9164944a4650cd6873a468a0f907e7158818eae9)
is a motivating use case whether the 'update' semantic is needed.
|
|
In order to avoid recalculating stride of strided load twice save it in
a map.
|
|
The way that loops strength reduction works is that the target has to
upfront decide whether it wants its addressing to be preindex,
postindex, or neither. This choice affects:
* Which potential solutions we generate
* Whether we consider a pre/post index load/store as costing an AddRec
or not.
None of these choices are a good fit for either AArch64 or ARM, where
both preindex and postindex addressing are typically free:
* If we pick None then we count pre/post index addressing as costing one
addrec more than is correct so we don't pick them when we should.
* If we pick PreIndexed or PostIndexed then we get the correct cost for
that addressing type, but still get it wrong for the other and also
exclude potential solutions using offset addressing that could have less
cost.
This patch adds an "all" addressing mode that causes all potential
solutions to be generated and counts both pre and postindex as having
AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how
we calculate the cost of things that need to be fixed before we can make
use of it.
|
|
This patch adds a class that uses SSA construction, with debug values as
definitions, to determine whether and which debug values for a
particular variable are live at each point in an IR function. This will
be used by the IR reader of llvm-debuginfo-analyzer to compute variable
ranges and coverage, although it may be applicable to other debug info
IR analyses.
|
|
The motivation for this patch is to close the gap between the
VPlan-based CSE and the legacy CSE, to make it easier to remove the
legacy CSE. Before this patch, stubbing out the legacy CSE leads to 22
test failures, and after this patch, there are only 12 failures, and all
of them seem to have a single root cause:
VPlanTransforms::createInterleaveGroups() and
VPInterleaveGroup::execute(). The improvements from this patch are of
course welcome.
While developing the patch, a miscompile was found when GEP
source-element-types differ, and this has been fixed.
Co-authored-by: Florian Hahn <flo@fhahn.com>
Co-authored-by: Luke Lau <luke@igalia.com>
|
|
This has been handy locally for debugging cloning issues. The NodeIds
are assigned sequentially on creation and included in the dumps and the
dot graphs.
No measurable memory increase was found for a large thin link.
I only changed one test
(Transforms/MemProfContextDisambiguation/basic.ll)
to actually check the emitted NodeIds, most ignore them.
|
|
Account for predicated UDiv,SDiv,URem,SRem in
VPReplicateRecipe::computeCost: compute costs of extra phis and apply
getPredBlockCostDivisor.
Fixes https://github.com/llvm/llvm-project/issues/158660
|
|
operators (#158375)
Tracking issue: #147390
|