Age | Commit message (Collapse) | Author | Files | Lines |
|
`foldSwitchToSelect`
Make sure selects do exist prior to assigning weights to edges.
Fixes: https://github.com/llvm/llvm-project/issues/161137.
|
|
Propagate `!prof` from `switch` instructions.
Issue #147390
|
|
This is an overly broad check, the transformation made here can be done
safely for pointers with index!=repr width. This fixes the codegen
regression introduced by https://github.com/llvm/llvm-project/pull/105735
and should be beneficial for AMDGPU code-generation once the datalayout
there no longer uses the overly strict `ni:` specifier.
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/159890
|
|
Co-authored-by: Nikita Popov <npopov@redhat.com>
|
|
This patch fixes:
llvm/lib/Transforms/Utils/SimplifyCFG.cpp:338:6: error: unused
function 'isSelectInRoleOfConjunctionOrDisjunction'
[-Werror,-Wunused-function]
|
|
simplifications (#154426)
There’s a pattern where a branch is conditioned on a conjunction or disjunction that ends up being modeled as a `select` where the first operand is set to `true` or the second to `false`. If the branch has known branch weights, they can be copied to the `select`. This is worth doing in case later the `select` gets transformed to something else (i.e. if we know the profile, we should propagate it).
Issue #147390
|
|
(#154841)
|
|
In some cases, we can replace a switch with simpler instructions or a
lookup table.
For instance, if every case results in the same value, we can simply
replace the switch
with that single value.
However, lookup tables are not always supported.
Targets, function attributes and compiler options can deactivate lookup
table creation.
Currently, even simpler switch replacements like the single value
optimization do not
get applied, because we only enable these transformations if lookup
tables are enabled.
This PR enables the other kinds of replacements, even if lookup tables
are not supported.
First, it checks if the potential replacements are lookup tables.
If they are, then check if lookup tables are supported and whether to
continue.
If they are not, then we can apply the other transformations.
Originally, lookup table creation was delayed until late stages of the
compilation pipeline, because
it can result in difficult-to-analyze code and prevent other
optimizations.
As a side effect of this change, we can also enable the simpler
optimizations much earlier in the
compilation process.
|
|
Proof: https://alive2.llvm.org/ce/z/cpXuCb
|
|
(#155734)
The branch weights capture probability. The probability has everything to do with the (SSA) value the condition is predicated on, and nothing to do with the position in the CFG.
|
|
(#155602)
This PR is the first part to solve the issue in #149937.
The end goal is enabling more switch optimizations on targets that do
not support lookup tables.
SimplifyCFG has the ability to replace switches with either a few simple
calculations, a single value, or a lookup table.
However, it only considers these options if the target supports lookup
tables, even if the final result is not a LUT, but a few simple
instructions like muls, adds and shifts.
To enable more targets to use these other kinds of optimization, this PR
restructures the code in `switchToLookup`.
Previously, code was generated even before choosing what kind of
replacement to do. However, we need to know if we actually want to
create a true LUT or not before generating anything. Then we can check
for target support only if any LUT would be created.
This PR moves the code so it first determines the replacement kind and
then generates the instructions.
A later PR will insert the target support check after determining the
kind of replacement. If the result is not a LUT, then even targets
without LUT support can replace the switch with something else.
|
|
proof: https://alive2.llvm.org/ce/z/5PNCds
|
|
We cannot form phis/selects of token type, so this should be checked
inside canReplaceOperandWithVariable().
|
|
`mergeConditionalStoreToAddress` (#155058)
This is about code readability. The operands in the disjunction forming the combined predicate in `mergeConditionalStoreToAddress` could sometimes be negated twice. This patch addresses that.
2 tests needed updating because they exposed the double negation and now they don’t.
|
|
`ICI->getOperand(0)` is non-null.
|
|
condition. (#154007)
Proof: https://alive2.llvm.org/ce/z/TozSD6
|
|
Split out from https://github.com/llvm/llvm-project/pull/154007 as it
showed compile time improvements
NFC as there needs to be at least two icmps that is part of the chain.
|
|
Updates SimplifyCFG to avoid jump threading through loop headers if
-keep-loops is requested. Canonical loop form requires a loop header
that dominates all blocks in the loop. If we thread through a header, we
risk breaking its domination of the loop. This change avoids this issue
by conservatively avoiding threading through headers entirely.
Fixes: https://github.com/llvm/llvm-project/issues/151144
|
|
proof: https://alive2.llvm.org/ce/z/WVt4-F
|
|
getAssignmentMarkers was for debug intrinsics. getDVRAssignmentMarkers
is used for DbgRecords.
|
|
|
|
Extend jump-threading to allow local defs that are live outside of the
threaded block. Allow threading to destinations where the local defs are
not live.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
|
|
This was always undesirable, and after #149310 it is illegal and will
result in a verifier error.
Fix this by moving SimplifyCFG's check for this into
canReplaceOperandWithVariable(), so it's shared with GVNSink.
|
|
Avoid repeatedly querying `getUniquePredecessor` for already-visited
switch successors so as not to incur quadratic runtime.
Fixes: https://github.com/llvm/llvm-project/issues/147239.
|
|
one case (#145233)
Fix #141753 .
This patch introduces a new check, that tries to decide if the
conjunction of all the values uniquely identify the accepted values by
the switch.
|
|
(#146207)
Generate the GEP with the index type that InstCombine will cast it to but use the knowledge that the index is unsigned.
|
|
We should be able to allow `simplifySwitchOfPowersOfTwo` transform
to take place, as, on recent X86 targets, the weighted latency-size
appears to be 2. This favours computing trailing zeroes and indexing
into a smaller value table, over generating a jump table with an
indirect branch, which overall should be more efficient.
|
|
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
Part of the coverage-tracking feature, following #107279.
In order for DebugLoc coverage testing to work, we firstly have to set
annotations for intentionally-empty DebugLocs, and secondly we have to
ensure that we do not drop these annotations as we propagate DebugLocs
throughout compilation. As the annotations exist as part of the DebugLoc
class, and not the underlying DILocation, they will not survive a
DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies
a number of places in the compiler to propagate DebugLocs directly
rather than via the underlying DILocation. This has no effect on the
output of normal builds; it only ensures that during coverage builds, we
do not drop incorrectly annotations and therefore create false
positives.
The bulk of these changes are in replacing
DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in
changing the IRBuilder to store a DebugLoc directly rather than storing
DILocations in its general Metadata array. We also use a new function,
`DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair
(valid location > annotated > empty), preferring the current DebugLoc on
a tie - this encapsulates the existing behaviour at a few sites where we
_may_ assign a DebugLoc to an existing instruction, while extending the
logic to handle annotation DebugLocs at the same time.
|
|
This flag was used to let us incrementally introduce debug records
into LLVM, however everything is now using records. It serves no
purpose now, so delete it.
|
|
Following the work in PR #107279, this patch applies the annotative
DebugLocs, which indicate that a particular instruction is intentionally
missing a location for a given reason, to existing sites in the compiler
where their conditions apply. This is NFC in ordinary LLVM builds (each
function `DebugLoc::getFoo()` is inlined as `DebugLoc()`), but marks the
instruction in coverage-tracking builds so that it will be ignored by
Debugify, allowing only real errors to be reported. From a developer
standpoint, it also communicates the intentionality and reason for a
missing DebugLoc.
Some notes for reviewers:
- The difference between `I->dropLocation()` and
`I->setDebugLoc(DebugLoc::getDropped())` is that the former _may_ decide
to keep some debug info alive, while the latter will always be empty; in
this patch, I always used the latter (even if the former could
technically be correct), because the former could result in some
(barely) different output, and I'd prefer to keep this patch purely NFC.
- I've generally documented the uses of `DebugLoc::getUnknown()`, with
the exception of the vectorizers - in summary, they are a huge cause of
dropped source locations, and I don't have the time or the domain
knowledge currently to solve that, so I've plastered it all over them as
a form of "fixme".
|
|
(#142526)
Closes https://github.com/llvm/llvm-project/issues/142522.
|
|
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
|
|
|
|
|
|
The capture check here is to protect against concurrent accesses from
other threads. This requires the provenance to escape.
|
|
We've encountered an LLVM verification failure when building Swift with
the SimplifyCFG pass enabled. I found that
https://reviews.llvm.org/D158083 fixed this pass by preventing sinking
loads or stores of swifterror values, but it did not implement the same
protection for call or invokes.
In `Verifier.cpp`
[here](https://github.com/ellishg/llvm-project/blob/c68535581135a1513c9c4c1c7672307d4b5e616e/llvm/lib/IR/Verifier.cpp#L4360-L4364)
and
[here](https://github.com/ellishg/llvm-project/blob/c68535581135a1513c9c4c1c7672307d4b5e616e/llvm/lib/IR/Verifier.cpp#L3661-L3662)
we can see that swifterror values must also be used directly by call
instructions.
|
|
|
|
Given the same branch condition in `a` and `c` SimplifyCFG converts:
+> b -+
| v
--> a --> c --> e -->
| ^
+> d -+
into:
+--> bcd ---+
| v
--> a --> c --> e -->
Remap source atoms on instructions duplicated from `c` into `bcd`.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
|
|
(#133482)
SimplifyCFG folds `d` into preds `b` and `c`.
+---------------+
| |
+--> b --+ |
| v v
--> a d --> e --> f -->
| ^ ^
+--> c --+ |
| |
+---------------+
Remap source atoms so that the duplicated instructions are analysed
independently to determine is_stmt positions.
The pull request contains a discussion covering various edge cases here:
https://github.com/llvm/llvm-project/pull/133482/files#r2039519348
The summary of the discussion is that we could avoid remapping when there's a
single pred, but we decided that it's still a trade off, and not worth the
additional complexity right now.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
|
|
|
|
|
|
It is already required along certain code paths that the CondTy is
valid. Fix some of the uses to make sure it is passed.
|
|
|
|
proof: https://alive2.llvm.org/ce/z/v32Aof
|
|
|
|
|
|
|
|
|
|
This is to fix a bug when a target only support conditional faulting
load, see test case hoist_store_without_cstore.
Split `-simplifycfg-hoist-loads-stores-with-cond-faulting` into
`-simplifycfg-hoist-loads-with-cond-faulting` and
`-simplifycfg-hoist-stores-with-cond-faulting` to control conditional
faulting load and store respectively.
|