| Age | Commit message (Collapse) | Author | Files | Lines |
|
Switch tests from using `-debug[-only=LoopVectorize]` to
`-vplan-print-after` as that provides better control at what step in the
pipeline we want to check the VPlan (I'm using `optimize$` for now to
preserve previous state).
Then, update `-vplan-print-after*` to print what function the loop
belongs to. That enables us to simplify VPlan UTC support as the output
of the updated tests contains the VPlan dump only - no special
filtering/extraction is necessary anymore.
|
|
RHS (#179055)
Add a new helper function `matchEquivZeroRHS()` that recognizes
comparisons with constants that are equivalent to comparisons with zero,
and transforms the predicate accordingly.
This handles the following transformations:
- icmp sgt X, -1 --> icmp sge X, 0
- icmp sle X, -1 --> icmp slt X, 0
- icmp [us]ge X, 1 --> icmp [us]gt X, 0
- icmp [us]lt X, 1 --> icmp [us]le X, 0
This enables more optimization opportunities in `simplifyICmpWithZero`,
such as folding icmp sgt X, -1 when X is known to be non-negative.
---
- IR Impact: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3414
|
|
We reassociate ((x && y) && z) -> (x && (y && z)) if x has more than
use, in order to allow simplifying the header mask further. However this
is somewhat unreliable as there are times when it doesn't have more than
one use, e.g. see the case we run into in
https://github.com/llvm/llvm-project/pull/173265/changes#r2769759907.
This moves it into a separate transformation that always reassociates
the header mask regardless of the number of uses, which prevents some
fragile test changes in #173265.
We need to run it before both calls to simplifyRecipes in optimize. I
considered putting it in simplifyRecipes itself but simplifyRecipes is
also called after unrolling and when the loop region is dissolved which
causes vputils::findHeaderMask to assert.
There isn't really any benefit to reassociating masks that aren't the
header mask so the existing simplification was removed.
|
|
Add a new VPlan subdirectory as common place for tests checking VPlan
printing. It contains a lit.local.cfg that only runs the tests when
assertions are enabled.
This removes the need to add explicit REQUIRES: asserts to VPlan tests.
PR: https://github.com/llvm/llvm-project/pull/180611
|
|
|
|
The legacy cost model tries to scalarize loads that are used as
pointers. Skip if the load would need predicating when scalarized,
because that would incur very high costs, see useEmulatedMaskMemRefHack.
Fixes https://github.com/llvm/llvm-project/issues/180780.
|
|
The checks created by LAA only compute a pointer difference and do not
need to capture provenance. Use SCEVPtrToAddr instead of SCEVPtrToInt
for computations.
To avoid regressions while parts of SCEV are migrated to use PtrToAddr
this adds logic to rewrite all PtrToInt to PtrToAddr if possible in the
created expressions. This is needed to avoid regressions.
Similarly, if in the original IR we have a PtrToInt, SCEVExpander tries
to re-use it if possible when expanding PtrToAddr.
Depends on https://github.com/llvm/llvm-project/pull/178727.
Fixes https://github.com/llvm/llvm-project/issues/156978.
PR: https://github.com/llvm/llvm-project/pull/178861
|
|
Check if costs for partial reductions are valid up-front in
getScaledReductions instead when transforming each link in the chain in
transformToPartialReduction. This ensures that we either transform all
entries in the chain together, or none via the existing invalidation
logic.
This fixes a crash when a link in the chain would have invalid cost, as
in the added test cases.
Fixes https://github.com/llvm/llvm-project/issues/180340.
PR: https://github.com/llvm/llvm-project/pull/180438
|
|
Update more VPlan tests to use auto-generated check lines via new UTC
support.
|
|
This fixes a miscompile in #180005 where we didn't check that the first
incoming value isn't poison.
We should use the first non-poison incoming value if it exists, or just
poison if all the incoming values are poison.
|
|
UpdateTestChecks support is updated in subsequent
https://github.com/llvm/llvm-project/pull/178736.
|
|
|
|
Sub-reductions can be implemented in two ways:
(1) negate the operand in the vector loop (the default way).
(2) subtract the reduced value from the init value in the middle block.
Note that both ways keep the reduction itself as an 'add' reduction,
which is necessary because only llvm.vector.partial.reduce.add exists.
The ISD nodes for partial reductions don't support folding the
sub/negation into its operands because the following is not a valid
transformation:
```
sub(0, mul(ext(a), ext(b)))
-> mul(ext(a), ext(sub(0, b)))
```
It can therefore be better to choose option (2) such that the partial
reduction is always positive (starting at '0') and to do a final
subtract in the middle block.
For AArch64 there are no dot-product instructions that can
do a `partial.reduce.sub(acc, mul(ext(a), ext(b)))` operation.
I'm not sure if such instructions exist for other targets.
(If so then we may want to make this decision a target option)
This PR also increases the AArch64 cost of a partial sub-reduction
when this exists in an 'add-sub' reduction chain.
Fixes https://github.com/llvm/llvm-project/issues/178703
|
|
(#180708)
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).
For example, the following loop can now be vectorized:
```
int simple_csa_int_load(
int* a, int* b, int default_val, int N, int threshold)
{
int result = default_val;
for (int i = 0; i < N; ++i)
if (a[i] > threshold)
result = b[i];
return result;
}
```
It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.
---
Reverts llvm/llvm-project#180275 (original PR: #178862)
Additional type legalization for `ISD::VECTOR_FIND_LAST_ACTIVE` was
added in #180290, which should resolve the backend crashes on x86.
|
|
Start trying to use SimplifyDemandedFPClass on instructions, starting
with fmul. This subsumes the old transform on multiply of 0. The
main change is the introduction of nnan/ninf. I do not think anywhere
was systematically trying to introduce fast math flags before, though
a few odd transforms would set them.
Previously we only called SimplifyDemandedFPClass on function returns
with nofpclass annotations. Start following the pattern of
SimplifyDemandedBits, where this will be called from relevant root
instructions.
I was wondering if this should go into InstCombineAggressive, but that
apparently does not make use of InstCombineInternal's worklist.
|
|
|
|
Add set of FindLast tests where the selected expression is based on an
IV and could be sunk.
|
|
Use new UTC support to re-generate check lines.
|
|
Adds missing test coverage for reductions with intermediate stores,
including partial reductions with intermediate stores, as well as
chained min/max reductions with intermediate stores.
|
|
Include VPWidenPHIRecipe in phi simplification if there's a single
incoming value.
|
|
Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`,
`ssub`, `umul` and `smul` as trivially vectorizable.
Fixes #174617
---
This patch is a reland of #174835.
Reverts #179819
|
|
(#166310)
When `-pass-remarks=loop-vectorize` is specified, the subsequent logic
is executed to display detailed debug messages even if no PreHeader
exists in the loop.
Therefore, an assert occurs when the `getLoopPreHeader()` function is
called. This commit resolves that issue.
Fixed: #165377
|
|
This reverts commit 2805c8aaa61a94ef22ac76c8dac56f7dfe970651.
This added the REQUIRES line to the wrong test, 041ce9f added it to the
correct one.
|
|
Fixes a minor test regression introduced by
https://github.com/llvm/llvm-project/pull/180226 in file
llvm/test/Transforms/LoopVectorize/phi-with-fastflags-vplan.ll
|
|
In some cases we decide to vectorise loops with first-order recurrences
using VF=1, IC>1. We then attempt to unroll a vplan in replicateByVF,
however when trying to erase the list of values from the parent we
trigger the following assert:
```
virtual llvm::VPRecipeValue::~VPRecipeValue(): Assertion `Users.empty()
&& "trying to delete a VPRecipeValue with remaining users"' failed.
```
The problem seems to stem from this code:
```
DefR->replaceUsesWithIf(LaneDefs[0], [DefR](VPUser &U, unsigned) {
return U.usesFirstLaneOnly(DefR);
});
```
since usesFirstLaneOnly returns false and we fail to replace uses of
DefR with LaneDefs[0]. Upon inspection the only VPUser objects that
return false are VPInstruction::FirstOrderRecurrenceSplice and
VPFirstOrderRecurrencePHIRecipe. Since the values are all scalar it's
simply not possible for us to be using anything other than the first
lane. I've fixed this by bailing out of replicateByVF early for plans with
only a scalar VF.
Fixes https://github.com/llvm/llvm-project/issues/179671
|
|
ForceTargetInstructionCost in the legacy cost model overrides any costs
from InstsToScalarize. Match the behavior in the VPlan-based cost model.
This fixes a crash with -force-target-instr-cost for the added test
case.
PR: https://github.com/llvm/llvm-project/pull/168269
|
|
Should fix https://lab.llvm.org/buildbot/#/builders/11/builds/33293
|
|
If a phi has fast math flags, we can propagate it to the widened select.
To do this, this patch makes VPPhi and VPBlendRecipe subclasses of
VPRecipeWithIRFlags, and propagates it through PlainCFGBuilder and
VPPredicator.
Alive2 proofs for some of the FMFs (it looks like it can't reason about
the full "fast" set yet)
nnan: https://alive2.llvm.org/ce/z/f0bRd4
nsz: https://alive2.llvm.org/ce/z/u9P96T
The actual motivation for this to eventually be able to move the special
casing for tail folding in
LoopVectorizationPlanner::addReductionResultComputation into the CFG in
#176143, which requires passing through FMFs.
|
|
Use PredBB's terminator as insert point in VPIRPhi::execute to make sure
the extracts are placed after any possibly sunk instructions.
Fixes https://github.com/llvm/llvm-project/issues/180363.
|
|
Pass underlying instruction to getMemoryOpCost in
VPReplicateRecipe::computeCost if UsedByLoadStoreAddress is true.
Some targets use the underlying instruction to improve costs,
and this is needed to match the legacy cost model.
Fixes https://github.com/llvm/llvm-project/issues/177780.
Fixes https://github.com/llvm/llvm-project/issues/177772.
|
|
There are some cases when PtrSCEV can be nullptr. Fall back to legacy
cost model, to not call isLoopInvariant with nullptr.
Fixes a crash after 0c4f8094939d2.
|
|
(#180275)
Reverts llvm/llvm-project#178862
revert to unblock bot:
https://lab.llvm.org/buildbot/#/builders/206/builds/13225
|
|
(#180257)"
This reverts commit cb905605b2e95f88296afe136b21a7d2476cb058.
Recommit the patch with a small change to check the destination
type matches the address type, to avoid a crash on mismatch.
Original message:
This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using
SCEVPtrToInt allows re-use, because the IR already contains a re-usable
phi using PtrToInt.
This is a first step towards migrating to SCEVPtrToAddr and avoids
regressions in follow-up changes.
PR: https://github.com/llvm/llvm-project/pull/178727
|
|
(#180257)
Reverts llvm/llvm-project#178727
triggers asserts in on some build bots
|
|
This patch updates tryToReuseLCSSAPhi to use SCEVPtrToAddr, unless using
SCEVPtrToInt allows re-use, because the IR already contains a re-usable
phi using PtrToInt.
This is a first step towards migrating to SCEVPtrToAddr and avoids
regressions in follow-up changes.
PR: https://github.com/llvm/llvm-project/pull/178727
|
|
|
|
Add a new VPInstruction opcode to compute the exiting value of an
induction variable after vectorization. This replaces the pattern of
extracting the last lane from the last part of the induction backedge
value when applicable.
This allows us to always use the pre-computed IV end value. It will also
allow unifying end value creation for both induction resume and exit
values.
PR: https://github.com/llvm/llvm-project/pull/175651
|
|
This patch extends the support added in #158088 to loops where the
assignment is non-speculatable (e.g. a conditional load or divide).
For example, the following loop can now be vectorized:
```
int simple_csa_int_load(
int* a, int* b, int default_val, int N, int threshold)
{
int result = default_val;
for (int i = 0; i < N; ++i)
if (a[i] > threshold)
result = b[i];
return result;
}
```
It does this by extending the recurrence matching from only looking for
selects, to include phis where all operands are the header phi, except
for one which can be an arbitrary value outside the recurrence.
|
|
We have an optimization in VPPredicator when creating blends where if
all the incoming values are the same, we just return that value.
This extends it to handle cases like "phi [%x, %x, poison, %x]" by
ignoring poison values.
This is split off from #176143 to prevent regressions when maintaining
SSA by adding PHIs with a poison incoming value.
|
|
Post 49288b65 ([UTC] Add initial VPlan support, #178534), we can
generate VPlan-printing tests with UTC. Do it for one test, with the
caveat that two Final VPlan prints are no longer checked.
|
|
Use new UTC support to auto-generate some check lines to make them
easier to update in the future.
|
|
Neoverse-V3ae (#179903)
This was missing from neoverse-v3 and neoverse-v3ae, but should be
present like neoverse-v2.
|
|
Reverts llvm/llvm-project#174835, which causes clang crashes.
See
https://github.com/llvm/llvm-project/pull/174835#issuecomment-3844233831
and https://github.com/llvm/llvm-project/issues/179671 for details.
|
|
This patch restructures Find(First|Last)IV handling. Instead of
differentiating between FindLast, FindFirstIV and FindLastIV up front,
this patch simplifies the logic in IVDescriptor to just identify the
FindLast pattern up-front.
It then adds a new VPlan transformation to optimize FindLast reductions
to FindIV reductions if there is a suitable sentinel value.
Find(Last|First)IV recurrence kinds to a single FindIV kind.
This is simpler and more accurate, given selecting the first/last
induction of the final IV reduction is directly controlled by the
corresponding recurrence kind of the ComputeReductionResult.
The new structure also allows further optimizations, like vectorizing
FindLastIV with another boolean reduction that tracks if the condition
in the loop was ever true, if there is no suitable sentinel value.
PR: https://github.com/llvm/llvm-project/pull/177870
|
|
When converting phis to blends, the `VPPredicator` expects to have edge
masks to the phi node if the phi node has different incoming blocks.
This was not the case if the predecessor of the phi was a switch where a
conditional destination was the same as the default destination.
This was because when creating edge masks in `createSwitchEdgeMasks`,
edge masks are set in a loop through the *non-default* destinations. But
when there are no non-default destinations (but at least one condition,
otherwise an earlier condition would trigger and just forward the source
mask), this loop is never executed, so the masks are never set.
To resolve this, we explicitly forward the source mask for these cases
as well, which is correct because it is an unconditional branch, just a
very convoluted one.
fixes #179074
|
|
properlyDominates does not provide a strict weak ordering. Use DFS in
numbers instead, to avoid ordering violations.
|
|
Add support for extracting a VPlan from LV debug output and generalizing
matching for unnamed VPValues.
Once we have support for -vplan-print-after=xxxx we can strip the logic
to extract a VPlan manually. We cannot use regex, as we need to match
from start opening bracket to the correct closing bracket.
PR: PR: https://github.com/llvm/llvm-project/pull/178534
|
|
Make sure we find the actual select for the exit users and only use it
for the final link in the chain. This fixes a miscompile after
90b3712d8a20efa2cbaadc177da576e485dce038.
|
|
When a recipe can be safely sunk and all of its users are outside the
vector loop region in the same dedicated exit block, the recipe does not
need to be executed on every iteration.
This patch extends the VPlan-based LICM (Loop Invariant Code Motion) to
also sink such recipes from the vector loop region into the exit block.
This reduces redundant computation and improves cost model accuracy.
TODO: Support nested loop sinking
TODO: Support sinking `VPReplicateRecipe` (requires `replicateByVF`
fixes)
TODO: Support recipes with multiple defined values (e.g., interleaved
loads)
TODO: Clone recipes without users to all exit blocks
TODO: Support PHI node users by checking incoming value blocks
TODO: Support sinking when users are in multiple blocks
TODO: Clone recipes when users are on multiple exit paths
Co-authored-by: Luke Lau <luke@igalia.com>
---------
Co-authored-by: Luke Lau <luke@igalia.com>
Co-authored-by: Luke Lau <luke_lau@icloud.com>
|
|
PR: https://github.com/llvm/llvm-project/pull/177887
|