Age | Commit message (Collapse) | Author | Files | Lines |
|
(#160053)
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.
There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.
Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.
PR: https://github.com/llvm/llvm-project/pull/160053
|
|
The initial implementation used a very crude check where a value was
considered ephemeral if it has only one use. This is insufficient if
there are multiple assumes acting on the same value, or in more complex
cases like cyclic phis.
Generalize this to a more typical ephemeral value check, i.e. make sure
that all transitive users are in assumes, while stopping at
side-effecting instructions.
|
|
(#161020)
Fixes #160066
Whenever we have a vector with all the same elemnts, created with
`insertelement` and `shufflevector` and we sum the vector, we have a
multiplication.
|
|
Extend replaceSymbolicStrides to also replace SCEVUnknowns in
VPExpandSCEVExprs using the information from StridesMaps.
This results in simpler SCEV expansions in some cases.
|
|
The VPlan cost model is not used to compute costs of scalar VFs
currently, as conversion to replicate regions makes accurately computing
the original scalar cost difficult.
Remove left over, dead code.
|
|
Only VPlan pattern matching is used in the file, move the using
statement to the top level.
|
|
In order to avoid conflating the legacy CSE with the VPlan-based one,
rename the legacy CSE and insert a FIXME to clarify the nature of the
legacy CSE.
|
|
CSE may replace multiple redundant broadcasts of EVL with a single
broadcast which may have more than 1 user. Adjust the verifier to allow
this.
Fixes a crash when building llvm-test-suite with EVL:
https://lab.llvm.org/buildbot/#/builders/210/builds/3303
|
|
This enables additional DCE/CSE opportunities and ensures that we don't
end up with multiple redundant users of a VPInstruction using EVL. It
fixes a verifier error in the added test_3_inductions test.
|
|
(#160628)
The rotate transformation from
https://github.com/llvm/llvm-project/blob/72c04bb882ad70230bce309c3013d9cc2c99e9a7/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10312-L10337
has no middle-end equivalent in InstCombine. The following is a port of
that transformation to InstCombine.
---------
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
|
|
Propagate `!prof` from `switch` instructions.
Issue #147390
|
|
This patch is based on https://github.com/llvm/llvm-project/pull/159713
This patch extends AddressSanitizer to support indexed/segment
instructions in RVV. It enables proper instrumentation for these memory
operations.
A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to
describe the offset between the base pointer and the actual memory
reference address.
Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
|
|
Additional CSE opportunities are exposed after converting to concrete
recipes/dissolving regions and materializing various expressions. Run
CSE later, to capitalize on some of the late opportunities.
PR: https://github.com/llvm/llvm-project/pull/160572
|
|
The strcmp/strncmp inliner creates new conditional branches but was
failing to add profile metadata. This caused the ProfileVerifierPass to
fail when profcheck is enabled.
This patch fixes the issue by explicitly adding unknown branch weights
to these branches.
Issue #147390
|
|
I ran into this crash when #158690 caused a loop with a struct call to
be vectorized.
If we have a replicate recipe in a branch-on-mask predicated region
that's used by a widened recipe in another block then it will be packed
together with the other lanes via a VPPredInstPHIRecipe.
If we're replicating a call with a struct return type then we currently
crash. The code that handles structs in packScalarIntoVectorizedValue
seemed to be untested at least on test/Transforms/LoopVectorize.
There's two places that need to be fixed. The poison value that the
scalar is packed into needs to use toVectorizedTy to correctly handle
structs (not to be confused with toVectorTy!)
The other is that VPPredInstPHIRecipe expects its operand to be an
InsertElementInstr when stringing together the different lanes. For
structs this will be an InsertVlaueInstr, and the value for the previous
lane will be at the back of a chain of InsertValueInstrs.
|
|
Uses the updated handleAVX512VectorGenericMaskedFP() from
https://github.com/llvm/llvm-project/pull/159966
|
|
Need to find the last insertelement instruction in the list for the
copyable arguments, otherwise wrong def-use chain may be built
Fixes #160671
|
|
Loop fusion pass will use the information provided by the recent
DA patch to fuse additional legal loops, including those with
forward loop-carried dependencies.
|
|
(#159861)
Because we may prune differing amounts of call context for different
allocation contexts during matching (we only keep enough call context to
distinguish cold from noncold paths), we can end up with different
numbers of callsite node clones for different callsites in the same
function. Any callsites that don't have node clones for all function
clones should have their copies in those other function clones updated
the same way as the version in the original function, which might be
calling a clone of the callsite.
|
|
Some LLVM passes need access to the filesystem to read configuration
files and similar. In some places, this is achieved by grabbing the VFS
from `PGOOptions`, but some passes don't have access to these and resort
to just calling `vfs::getRealFileSystem()`. This PR allows setting the
VFS directly on `PassBuilder` that's able to pass it down to all passes
that need it.
|
|
It seems like we have a bunch of align 1 assumptions in practice and
unless I am missing something they should not add any value.
See https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2861/files
PR: https://github.com/llvm/llvm-project/pull/160695
|
|
Closes https://github.com/llvm/llvm-project/issues/160507.
Note: Replacing other users except for `ExtElt` is a bit strange to me.
I tried to only replace `ExtElt` with a new extractelement, but it
caused regressions on `widen_extract2/3`.
|
|
(#160640)
Reapplies #159686
This reverts commit 4f33d7b7a9f39d733b7572f9afbf178bca8da127.
The original landing of this patch had an issue where it would try and
hoist allocas into the entry block that were in the entry block. This
would end up actually moving them lower in the block potentially after
users, resulting in invalid IR.
This update fixes this by ensuring that we are only hoisting static
allocas that have been sunk into a split basic block. A regression test
has been added.
Integration tested using a three stage build of clang with IRPGO
enabled.
|
|
(#149049)
If a direction vector with all `*` elements, like `[* * *]`, is present,
it indicates that none of the loop pairs are legal to interchange. In
such cases, continuing the analysis is meaningless.
This patch introduces a check to detect such direction vectors and exits
early when one is found. This slightly reduces compile time.
|
|
Logic copied from the select case.
Fixes #160302
|
|
Make sure that we set the correct wrap flags when creating new
VPWidenCastRecipes for truncs and preserve the flags from the recipe
directly when cloning, to make sure they are not dropped.
Fixes https://github.com/llvm/llvm-project/issues/160396
|
|
This extends the DropUnnecessaryAssumes pass to also handle operand
bundle assumes. For this purpose, export the affected value analysis for
operand bundles from AssumptionCache.
If the bundle only affects ephemeral values, drop it. If all bundles on
an assume are dropped, drop the whole assume.
|
|
Move creation of the minimum iteration check for the epilogue vector
loop to VPlan. This is a first step towards breaking up and moving
skeleton creation for epilogue vectorization to VPlan.
It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum
iteration check is created directly in VPlan, connecting the check
blocks from the main vector loop is done as post-processing. Next steps
are to move connecting and updating the branches from the check blocks
to VPlan, as well as updating the incoming values for phis.
Test changes are improvements due to folding of live-ins.
PR: https://github.com/llvm/llvm-project/pull/157545
|
|
Initially this was needed to replace the fixed-step canonical IV with
the variable-step EVL IV, but this was eventually superseded by the loop
vectorizer doing this transform itself in #147222. The pass was then
removed from the RISC-V pipeline in #151483 and the loop vectorizer
stopped emitting the metadata used by the pass in #155760, so now
there's no users of it.
|
|
This generalizes handleAVX512VectorGenericMaskedFP() (introduced in
#158397), to potentially handle intrinsics that have A/WriteThru/Mask in
an operand order that is different to AVX512/AVX10 rcp and rsqrt. Any
operands other than A and WriteThru must be fully initialized.
For example, the generalized handler could be applied in follow-up work
to many of the AVX512 rndscale intrinsics:
```
<32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512(<32 x half>, i32, <32 x half>, i32, i32)
<16 x float> @llvm.x86.avx512.mask.rndscale.ps.512(<16 x float>, i32, <16 x float>, i16, i32)
<8 x double> @llvm.x86.avx512.mask.rndscale.pd.512(<8 x double>, i32, <8 x double>, i8, i32)
A Imm WriteThru Mask Rounding
<8 x float> @llvm.x86.avx512.mask.rndscale.ps.256(<8 x float>, i32, <8 x float>, i8)
<4 x float> @llvm.x86.avx512.mask.rndscale.ps.128(<4 x float>, i32, <4 x float>, i8)
<4 x double> @llvm.x86.avx512.mask.rndscale.pd.256(<4 x double>, i32, <4 x double>, i8)
<2 x double> @llvm.x86.avx512.mask.rndscale.pd.128(<2 x double>, i32, <2 x double>, i8)
A Imm WriteThru Mask
```
|
|
Set extend kinds together with ExtOpTypes. This will make it easier to
adjust the extend kind handling.
|
|
There are cases where the easiest way to regression-test a profile change is to add `!prof` metadata, with small numbers as to simplify manual verification. To ensure coverage, this (the inserting) may become tedious. This patch makes `prof-inject` do that for us, if so opted in.
The list of weights used is a bunch of primes, used as a circular buffer.
Issue #147390
|
|
Selects can be folded into masked loads if the masks are identical.
|
|
Assume operand bundles are emitted in a few more places now, including
used in various places in libc++. Add a dedicated ID for them.
PR: https://github.com/llvm/llvm-project/pull/158078
|
|
(#158603)
Check if the scale-factor of the accumulator is the same as the request
ScaleFactor in tryToCreatePartialReductions.
This prevents creating partial reductions if not all instructions in the
reduction chain form partial reductions. e.g. because we do not form a
partial reduction for the loop exit instruction.
Currently code-gen works fine, because the scale factor of
VPPartialReduction is not used during ::execute, but it means we compute
incorrect cost/register pressure, because the partial reduction won't
reduce to the specified scaling factor.
PR: https://github.com/llvm/llvm-project/pull/158603
|
|
ResultElem stores a weak handle of an assume, plus an index for
referring to a specific operand bundle. This makes sense for the results
of assumptionsFor(), which refers to specific operands of assumes.
However, assumptions() is a plain list of assumes. It does *not* contain
separate entries for each operand bundles. The operand bundle index is
always ExprResultIdx.
As such, we should be directly using WeakVH for this case, without the
additional wrapper.
|
|
Invariant stores of reductions are removed early in the VPlan
construction, and there is no reason to ignore them while costing.
|
|
llvm.coro.end (#155339)" (#159278)
As mentioned in #151067, current design of llvm.coro.end mixes two functionalities: querying where we are and lowering to some code. This patch separate these functionalities into independent intrinsics by introducing a new intrinsic llvm.coro.is_in_ramp.
Update a test in inline/ML, Reapply #155339
|
|
In some cases, safe-divisor selects can be hoisted out of the vector
loop. Catching all cases in the legacy cost model isn't possible, in
particular checking if all conditions guarding a division are loop
invariant.
Instead, check in planContainsAdditionalSimplifications if there are any
hoisted safe-divisor selects. If so, don't compare to the more
inaccurate legacy cost model.
Fixes https://github.com/llvm/llvm-project/issues/160354.
Fixes https://github.com/llvm/llvm-project/issues/160356.
|
|
This is an overly broad check, the transformation made here can be done
safely for pointers with index!=repr width. This fixes the codegen
regression introduced by https://github.com/llvm/llvm-project/pull/105735
and should be beneficial for AMDGPU code-generation once the datalayout
there no longer uses the overly strict `ni:` specifier.
Reviewed By: arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/159890
|
|
Move size checks inside `isStridedLoad`. In the future we plan to
possibly change the size and type of strided load there.
|
|
(#159686)"
This reverts commit a00450944d2a91aba302954556c1c23ae049dfc7.
Looks like this one is actually breaking the buildbots. Reverting the switch back
to IRPGO did not fix things.
|
|
embed in MemIntrinsicInfo #157863 (#159713)
[Previously reverted due to failures on asan-rvv-intrinsics.ll, the test
case is riscv only and it is triggered by other target]
Reland [#157863](https://github.com/llvm/llvm-project/pull/157863), and
add `; REQUIRES: riscv-registered-target` in test case to skip the
configuration that doesn't register riscv target.
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch make
SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so
that TTI can make targets describe their intrinsic informations to asan.
Note,
1. This patch move InterestingMemoryOperand from Transforms to Analysis.
2. Extend MemIntrinsicInfo by adding a
SmallVector<InterestingMemoryOperand> member.
3. This patch does not support RVV indexed/segment load/store.
|
|
This patch removes the metadata emission for EVL‑vectorized loops,
since there is no current in-tree consumer:
1) after VPlan performs canonical IV replacement #147222 and
2) RISCV dropped EVLIndVarSimplifyPass #151483, which was the only user
of this metadata.
|
|
ControlHeightReduction will duplicate some blocks and insert phi nodes
in exit blocks of regions that it operates on for any live values. This
includes allocas. Having a lifetime annotation refer to a phi node was
made illegal in 92c55a315eab455d5fed2625fe0f61f88cb25499, which causes
the verifier to fail after CHR.
There are some cases where we might not need to drop lifetime
annotations (usually because we do not need the phi to begin with), but
drop all annotations for now to be conservative.
Fixes #159621.
|
|
Summary:
The changes made in https://github.com/llvm/llvm-project/pull/156057
allows the alignment value to be increased. We assert effectively
infinite alignment when the pointer argument is invalid / null. The
problem is that for whatever reason the masked load / store functions
use i32 for their alignment value which means this gets truncated to
zero.
Add a special check for this, long term we probably want to just remove
this argument entirely.
|
|
|
|
|
|
(#159765)
Fixes #159571
|
|
This ensures each scalarized member has an accurate cost, matching the
cost it would have if it would not have been considered for an
interleave group.
|