Age | Commit message (Collapse) | Author | Files | Lines |
|
`foldSwitchToSelect`
Make sure selects do exist prior to assigning weights to edges.
Fixes: https://github.com/llvm/llvm-project/issues/161137.
|
|
Propagate `!prof` from `switch` instructions.
Issue #147390
|
|
There are cases where the easiest way to regression-test a profile change is to add `!prof` metadata, with small numbers as to simplify manual verification. To ensure coverage, this (the inserting) may become tedious. This patch makes `prof-inject` do that for us, if so opted in.
The list of weights used is a bunch of primes, used as a circular buffer.
Issue #147390
|
|
This is an overly broad check, the transformation made here can be done
safely for pointers with index!=repr width. This fixes the codegen
regression introduced by https://github.com/llvm/llvm-project/pull/105735
and should be beneficial for AMDGPU code-generation once the datalayout
there no longer uses the overly strict `ni:` specifier.
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/159890
|
|
|
|
Fixes #148052 .
Last PR did not account for the scenario, when more than one instruction
used the `catchpad` label.
In that case I have deleted uses, which were already "choosen to be
iterated over" by the early increment iterator. This issue was not
visible in normal release build on x86, but luckily later on the address
sanitizer build it has found it on the buildbot.
Here is the diff from the last version of this PR: #158435
```diff
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 91e245e5e8f5..1dd8cb4ee584 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs,
// first block, the we would have possible cleanupret and catchret
// instructions with poison arguments, which wouldn't be valid.
if (isa<FuncletPadInst>(I)) {
- for (User *User : make_early_inc_range(I.users())) {
+ SmallPtrSet<BasicBlock *, 4> UniqueEHRetBlocksToDelete;
+ for (User *User : I.users()) {
Instruction *ReturnInstr = dyn_cast<Instruction>(User);
// If we have a cleanupret or catchret block, replace it with just an
// unreachable. The other alternative, that may use a catchpad is a
@@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs,
if (isa<CatchReturnInst>(ReturnInstr) ||
isa<CleanupReturnInst>(ReturnInstr)) {
BasicBlock *ReturnInstrBB = ReturnInstr->getParent();
- // This catchret or catchpad basic block is detached now. Let the
- // successors know it.
- // This basic block also may have some predecessors too. For
- // example the following LLVM-IR is valid:
- //
- // [cleanuppad_block]
- // |
- // [regular_block]
- // |
- // [cleanupret_block]
- //
- // The IR after the cleanup will look like this:
- //
- // [cleanuppad_block]
- // |
- // [regular_block]
- // |
- // [unreachable]
- //
- // So regular_block will lead to an unreachable block, which is also
- // valid. There is no need to replace regular_block with unreachable
- // in this context now.
- // On the other hand, the cleanupret/catchret block's successors
- // need to know about the deletion of their predecessors.
- emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs);
+ UniqueEHRetBlocksToDelete.insert(ReturnInstrBB);
}
}
+ for (BasicBlock *EHRetBB :
+ make_early_inc_range(UniqueEHRetBlocksToDelete))
+ emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs);
}
}
```
|
|
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.
Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.
In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.
Depends on https://github.com/llvm/llvm-project/pull/153643.
PR: https://github.com/llvm/llvm-project/pull/154510
|
|
If we know x in R1, the range check `x in R2` can be relaxed into `x in
Union(R2, Inverse(R1))`. The latter one may be more efficient if we can
represent it with one icmp.
Fixes regressions introduced by
https://github.com/llvm/llvm-project/pull/156497.
Proof for `(X & -Pow2) == C -> (X - C) < Pow2`:
https://alive2.llvm.org/ce/z/HMgkuu
Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=ead4f3e271fdf6918aef2ede3a7134811147d276&to=bee3d902dd505cf9b11499ba4f230e4e8ae96b92&stat=instructions%3Au
|
|
(#159292)
…ad blocks" (#158435)"
This reverts commit 41cef78227eb909181cb9360099b2d92de8d649f.
|
|
(#158435)
When removing EH Pad blocks, the value defined by them becomes poison. These poison values are then used by `catchret` and `cleanupret`, which is invalid. This commit replaces those unreachable `catchret` and `cleanupret` instructions with `unreachable`.
|
|
This patch adds a class that uses SSA construction, with debug values as
definitions, to determine whether and which debug values for a
particular variable are live at each point in an IR function. This will
be used by the IR reader of llvm-debuginfo-analyzer to compute variable
ranges and coverage, although it may be applicable to other debug info
IR analyses.
|
|
Co-authored-by: Nikita Popov <npopov@redhat.com>
|
|
(#158364)
Reverts llvm/llvm-project#157363
Causes crashes, see
https://github.com/llvm/llvm-project/pull/157363#issuecomment-3286783238
|
|
For cases where we can guarantee the application does not override
operator new.
|
|
(#155296)
Issue #152767
|
|
This patch fixes:
llvm/lib/Transforms/Utils/SimplifyCFG.cpp:338:6: error: unused
function 'isSelectInRoleOfConjunctionOrDisjunction'
[-Werror,-Wunused-function]
|
|
simplifications (#154426)
There’s a pattern where a branch is conditioned on a conjunction or disjunction that ends up being modeled as a `select` where the first operand is set to `true` or the second to `false`. If the branch has known branch weights, they can be copied to the `select`. This is worth doing in case later the `select` gets transformed to something else (i.e. if we know the profile, we should propagate it).
Issue #147390
|
|
Fixes #148052 .
When removing EH Pad blocks, the value defined by them becomes poison. These poison values are then used by `catchret` and `cleanupret`, which is invalid. This commit replaces those unreachable `catchret` and `cleanupret` instructions with `unreachable`.
|
|
(#154841)
|
|
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As the RFC explains, that metadata enables future patches, such as PR
#128785, to fix block frequency issues without losing estimated trip
counts.
|
|
b50ad945dd4faa288 added umul_with_overflow simplifications to
InstSimplifyFolder (used by SCEVExpander) and 9b1b93766dfa34ee9 added
dead instruction cleanup to SCEVExpander.
Remove special handling of umul by 1, handled automatically due to the
changes above.
|
|
In PromoteMem2Reg, we perform a DFS over the CFG and track, for each
alloca, its incoming value and its associated incoming DebugLoc, both of
which are taken from stores to that alloca; these values and DebugLocs
are propagated to PHI nodes when new blocks are reached. In the event
that for one incoming edge no store instruction has been seen, we
propagate an UndefValue and an empty DebugLoc to the PHI.
This is a perfectly valid occurrence, and assigning an empty DebugLoc to
the PHI is the correct course of action; therefore, we should pass an
annotated DebugLoc instead, so that in DebugLoc coverage tracking we
correctly do not expect a valid DebugLoc to be present; we generally
mark allocas as having CompilerGenerated locations, so I've chosen to
use the same annotation to represent the uninitialized value of that
alloca.
This change is NFC outside of DebugLoc coverage tracking builds.
|
|
In some cases, we can replace a switch with simpler instructions or a
lookup table.
For instance, if every case results in the same value, we can simply
replace the switch
with that single value.
However, lookup tables are not always supported.
Targets, function attributes and compiler options can deactivate lookup
table creation.
Currently, even simpler switch replacements like the single value
optimization do not
get applied, because we only enable these transformations if lookup
tables are enabled.
This PR enables the other kinds of replacements, even if lookup tables
are not supported.
First, it checks if the potential replacements are lookup tables.
If they are, then check if lookup tables are supported and whether to
continue.
If they are not, then we can apply the other transformations.
Originally, lookup table creation was delayed until late stages of the
compilation pipeline, because
it can result in difficult-to-analyze code and prevent other
optimizations.
As a side effect of this change, we can also enable the simpler
optimizations much earlier in the
compilation process.
|
|
(#149699)
Update unrolling preferences for Apple Silicon CPUs to enable partial
unrolling and runtime unrolling for small loops with reductions.
This builds on top of unroller changes to introduce parallel reduction
phis, if possible: https://github.com/llvm/llvm-project/pull/149470.
PR: https://github.com/llvm/llvm-project/pull/149699
|
|
Follow up on 528b13d ([SCEVExp] Add helper to clean up dead instructions
after expansion.) to hoist the SCEVExapnder::eraseDeadInstructions call
from LoopVectorize into the LoopUtils APIs add[Diff]RuntimeChecks, so
that other callers (LoopDistribute and LoopVersioning) can benefit from
the patch.
|
|
(#157308)"
This reverts commit eeb43806eb1b40e690aeeba496ee974172202df9.
Recommit with with a fix for MSan failure (
https://lab.llvm.org/buildbot/#/builders/169/builds/14799), by adding a
set to track deleted values. Using the InsertedInstructions set is not
sufficient, as it use asserting value handles as keys, which may
dereference the value at construction.
Original message:
Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.
Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205
PR: https://github.com/llvm/llvm-project/pull/157308
|
|
Proof: https://alive2.llvm.org/ce/z/cpXuCb
|
|
(#157308)"
This reverts commit 528b13df571c86a2c5b8305d7974f135d785e30f.
Triggers MSan errors in some configurations, e.g.
https://lab.llvm.org/buildbot/#/builders/169/builds/14799
|
|
Add new helper to erase dead instructions inserted during SCEV expansion
but not being used due to InstSimplifyFolder simplifications.
Together with https://github.com/llvm/llvm-project/pull/157307 this also
allows removing some specialized folds, e.g.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L2205
PR: https://github.com/llvm/llvm-project/pull/157308
|
|
Apply hints even if the attribute is the default "notcold" or
"ambiguous", to enable better tracking through the allocator.
Add an option to control the ambiguous allocation hint value.
|
|
In order to better see what's going on during ThinLTO linking, this PR
adds more profile tags when using `--time-trace` on a `lld-link.exe`
invocation.
After PR, linking `clang.exe`:
<img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021"
src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910"
/>
Linking a custom (Unreal Engine game) binary gives a completly
different picture, probably because of using Unity files, and the sheer
amount of input files (here, providing over 60 GB of .OBJs/.LIBs).
<img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048"
src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0"
/>
|
|
ComputeEndCheck incorrectly returned false for unsigned predicates
starting at zero and a positive step.
The AddRec could still wrap if Step * trunc ExitCount wraps or trunc
ExitCount strips leading 1s.
Fixes https://github.com/llvm/llvm-project/issues/156849.
PR: https://github.com/llvm/llvm-project/pull/156910
|
|
MergedCounts is of type double.
|
|
(#155734)
The branch weights capture probability. The probability has everything to do with the (SSA) value the condition is predicated on, and nothing to do with the position in the CFG.
|
|
Some passes synthesize functions, e.g. WPD, so we may need to indicate “this synthesized function’s entry count cannot be estimated at compile time” - akin to `branch_weights`.
Issue #147390
|
|
When partially or runtime unrolling loops with reductions, currently the
reductions are performed in-order in the loop, negating most benefits
from unrolling such loops.
This patch extends unrolling code-gen to keep a parallel reduction phi
per unrolled iteration and combining the final result after the loop.
For out-of-order CPUs, this allows executing mutliple reduction chains
in parallel.
For now, the initial transformation is restricted to cases where we
unroll a small number of iterations (hard-coded to 4, but should maybe
be capped by TTI depending on the execution units), to avoid introducing
an excessive amount of parallel phis.
It also requires single block loops for now, where the unrolled
iterations are known to not exit the loop (either due to runtime
unrolling or partial unrolling). This ensures that the unrolled loop
will have a single basic block, with a single exit block where we can
place the final reduction value computation.
The initial implementation also only supports parallelizing loops with a
single reduction and only integer reductions. Those restrictions are
just to keep the initial implementation simpler, and can easily be
lifted as follow-ups.
With corresponding TTI to the AArch64 unrolling preferences which I will
also share soon, this triggers in ~300 loops across a wide range of
workloads, including LLVM itself, ffmgep, av1aom, sqlite, blender,
brotli, zstd and more.
PR: https://github.com/llvm/llvm-project/pull/149470
|
|
getZExtValue() already return uint64_t.
|
|
(#156476)
Explicit calls to ::operator new are marked nobuiltin and cannot be
elided or updated as they may call user defined versions. However,
existing calls to the hot/cold versions of new only need their hint
parameter value updated, which does not mutate the call.
|
|
(#155602)
This PR is the first part to solve the issue in #149937.
The end goal is enabling more switch optimizations on targets that do
not support lookup tables.
SimplifyCFG has the ability to replace switches with either a few simple
calculations, a single value, or a lookup table.
However, it only considers these options if the target supports lookup
tables, even if the final result is not a LUT, but a few simple
instructions like muls, adds and shifts.
To enable more targets to use these other kinds of optimization, this PR
restructures the code in `switchToLookup`.
Previously, code was generated even before choosing what kind of
replacement to do. However, we need to know if we actually want to
create a true LUT or not before generating anything. Then we can check
for target support only if any LUT would be created.
This PR moves the code so it first determines the replacement kind and
then generates the instructions.
A later PR will insert the target support check after determining the
kind of replacement. If the result is not a LUT, then even targets
without LUT support can replace the switch with something else.
|
|
This makes the RelLookupTableConverter independent of the type used in
the GEP. In particular, it removes the requirement to have a leading
zero index.
|
|
proof: https://alive2.llvm.org/ce/z/5PNCds
|
|
Correct few typos: 'seperate' -> 'separate' .
|
|
fixes #152348
SimplifyCFG collapses raw buffer store from a if\else load into a
select.
This change prevents the TargetExtType dx.Rawbuffer from being replace
thus preserving the if\else blocks.
A further change was needed to eliminate the phi node before we process
Intrinsic::dx_resource_getpointer in DXILResourceAccess.cpp
|
|
We cannot form phis/selects of token type, so this should be checked
inside canReplaceOperandWithVariable().
|
|
After #139914, `DIBilder::finalize()` finalizes both declaration and
definition DISubprograms.
Therefore, there is no need to call `DIBuilder::finalizeSubprogram()`
right before `DIBilder::finalize()`.
|
|
ExplicitRewriteDescriptor (#154319)
Do not check that Source is a valid regex in case of Target (explicit)
transformation. Source may contain special symbols that may cause an
incorrect `invalid regex` error.
Note that source and exactly one of [Target, Transform] must be
provided.
`Target (explicit transformation)`: In this kind of rule `Source` is
treated as a symbol name and is matched in its entirety. `Target` field
will denote the symbol name to transform to.
`Transform (pattern transformation)`: This rule treats `Source` as a
regex that should match the complete symbol name. `Transform` is a regex
specifying the name to transform to.
|
|
`mergeConditionalStoreToAddress` (#155058)
This is about code readability. The operands in the disjunction forming the combined predicate in `mergeConditionalStoreToAddress` could sometimes be negated twice. This patch addresses that.
2 tests needed updating because they exposed the double negation and now they don’t.
|
|
This fixes a crash trying to use SCEVCouldNotCompute, if getPtrToIntExpr
failed.
Fixes https://github.com/llvm/llvm-project/issues/155287
|
|
This is a follow-up PR for post-commit comments in #121104 .
Details:
- Rename `mergeTwoCounter` to `mergeTwoCounters` (add trailing `s`).
- Avoid duplicated hash lookup.
- Use `///` instead of `//`.
- Fix typo.
|
|
`ICI->getOperand(0)` is non-null.
|