aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms
AgeCommit message (Collapse)AuthorFilesLines
9 hours[VPlan] Compute cost of more replicating loads/stores in ::computeCost. ↵Florian Hahn4-27/+130
(#160053) Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: https://github.com/llvm/llvm-project/pull/160053
10 hours[DropUnnecessaryAssumes] Make the ephemeral value check more precise (#160700)Nikita Popov1-7/+43
The initial implementation used a very crude check where a value was considered ephemeral if it has only one use. This is insufficient if there are multiple assumes acting on the same value, or in more complex cases like cyclic phis. Generalize this to a more typical ephemeral value check, i.e. make sure that all transitive users are in assumes, while stopping at side-effecting instructions.
19 hours[InstCombine] Transform `vector.reduce.add` and `splat` into multiplication ↵Gábor Spaits1-0/+12
(#161020) Fixes #160066 Whenever we have a vector with all the same elemnts, created with `insertelement` and `shufflevector` and we sum the vector, we have a multiplication.
20 hours[VPlan] Rewrite VPExpandSCEVExprs in replaceSymbolicStrides.Florian Hahn1-0/+17
Extend replaceSymbolicStrides to also replace SCEVUnknowns in VPExpandSCEVExprs using the information from StridesMaps. This results in simpler SCEV expansions in some cases.
24 hours[VPlan] Remove dead code for scalar VFs in VPRegionBlock::cost (NFC).Florian Hahn1-12/+3
The VPlan cost model is not used to compute costs of scalar VFs currently, as conversion to replicate regions makes accurately computing the original scalar cost difficult. Remove left over, dead code.
31 hours[VPlan] Move using VPlanPatternMatch to top in VPlanUtils.cpp (NFC).Florian Hahn1-3/+1
Only VPlan pattern matching is used in the file, move the using statement to the top level.
32 hours[LV] Clarify nature of legacy CSE (NFC) (#160855)Ramkumar Ramachandra1-3/+4
In order to avoid conflating the legacy CSE with the VPlan-based one, rename the legacy CSE and insert a FIXME to clarify the nature of the legacy CSE.
44 hours[VPlan] Allow multiple users of (broadcast %evl).Florian Hahn1-1/+2
CSE may replace multiple redundant broadcasts of EVL with a single broadcast which may have more than 1 user. Adjust the verifier to allow this. Fixes a crash when building llvm-test-suite with EVL: https://lab.llvm.org/buildbot/#/builders/210/builds/3303
45 hours[VPlan] Mark VPInstruction::Broadcast as not reading/writing memory.Florian Hahn1-0/+1
This enables additional DCE/CSE opportunities and ensures that we don't end up with multiple redundant users of a VPInstruction using EVL. It fixes a verifier error in the added test_3_inductions test.
3 days[InstCombine] Rotate transformation port from SelectionDAG to InstCombine ↵Axel Sorenson1-0/+16
(#160628) The rotate transformation from https://github.com/llvm/llvm-project/blob/72c04bb882ad70230bce309c3013d9cc2c99e9a7/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L10312-L10337 has no middle-end equivalent in InstCombine. The following is a port of that transformation to InstCombine. --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
3 days[profcheck][SimplifyCFG] Propagate !prof from `switch` to `select` (#159645)Mircea Trofin1-11/+75
Propagate `!prof`​ from `switch`​ instructions. Issue #147390
3 days[ASan][RISCV] Teach AddressSanitizer to support indexed load/store. (#160443)Hank Chang1-0/+19
This patch is based on https://github.com/llvm/llvm-project/pull/159713 This patch extends AddressSanitizer to support indexed/segment instructions in RVV. It enables proper instrumentation for these memory operations. A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to describe the offset between the base pointer and the actual memory reference address. Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
3 days[VPlan] Run CSE closer to VPlan::execute. (#160572)Florian Hahn1-1/+1
Additional CSE opportunities are exposed after converting to concrete recipes/dissolving regions and materializing various expressions. Run CSE later, to capitalize on some of the late opportunities. PR: https://github.com/llvm/llvm-project/pull/160572
4 days[profcheck] Add unknown branch weights for inlined strcmp/strncmp (#160455)Jin Huang1-4/+13
The strcmp/strncmp inliner creates new conditional branches but was failing to add profile metadata. This caused the ProfileVerifierPass to fail when profcheck is enabled. This patch fixes the issue by explicitly adding unknown branch weights to these branches. Issue #147390
4 days[VPlan] Fix packed replication of struct types (#160274)Luke Lau1-6/+17
I ran into this crash when #158690 caused a loop with a struct call to be vectorized. If we have a replicate recipe in a branch-on-mask predicated region that's used by a widened recipe in another block then it will be packed together with the other lanes via a VPPredInstPHIRecipe. If we're replicating a call with a struct return type then we currently crash. The code that handles structs in packScalarIntoVectorizedValue seemed to be untested at least on test/Transforms/LoopVectorize. There's two places that need to be fixed. The poison value that the scalar is packed into needs to use toVectorizedTy to correctly handle structs (not to be confused with toVectorTy!) The other is that VPPredInstPHIRecipe expects its operand to be an InsertElementInstr when stringing together the different lanes. For structs this will be an InsertVlaueInstr, and the value for the previous lane will be at the back of a chain of InsertValueInstrs.
4 days[msan] Handle AVX512/AVX10 vrndscale (#160624)Thurston Dang1-0/+56
Uses the updated handleAVX512VectorGenericMaskedFP() from https://github.com/llvm/llvm-project/pull/159966
4 days[SLP]Correctly set the insert point for insertlements with copyable argumentsAlexey Bataev1-2/+9
Need to find the last insertelement instruction in the list for the copyable arguments, otherwise wrong def-use chain may be built Fixes #160671
4 days[LoopFusion] Detecting legal dependencies for fusion using DA info (#146383)Alireza Torabian1-0/+42
Loop fusion pass will use the information provided by the recent DA patch to fuse additional legal loops, including those with forward loop-carried dependencies.
4 days[MemProf] Make sure call clones without callsite node clones get updated ↵Teresa Johnson1-0/+115
(#159861) Because we may prune differing amounts of call context for different allocation contexts during matching (we only keep enough call context to distinguish cold from noncold paths), we can end up with different numbers of callsite node clones for different callsites in the same function. Any callsites that don't have node clones for all function clones should have their copies in those other function clones updated the same way as the version in the original function, which might be calling a clone of the callsite.
4 days[llvm] Add `vfs::FileSystem` to `PassBuilder` (#160188)Jan Svoboda1-6/+6
Some LLVM passes need access to the filesystem to read configuration files and similar. In some places, this is achieved by grabbing the VFS from `PGOOptions`, but some passes don't have access to these and resort to just calling `vfs::getRealFileSystem()`. This PR allows setting the VFS directly on `PassBuilder` that's able to pass it down to all passes that need it.
4 days[InstCombine] Remove redundant align 1 assumptions. (#160695)Florian Hahn1-0/+4
It seems like we have a bunch of align 1 assumptions in practice and unless I am missing something they should not add any value. See https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2861/files PR: https://github.com/llvm/llvm-project/pull/160695
4 days[InstCombine] Skip replaceExtractElements for ConstantData (#160575)Yingwei Zheng1-1/+5
Closes https://github.com/llvm/llvm-project/issues/160507. Note: Replacing other users except for `ExtElt` is a bit strange to me. I tried to only replace `ExtElt` with a new extractelement, but it caused regressions on `widen_extract2/3`.
4 daysReapply "[ControlHeightReduction] Drop lifetime annotations where necessary" ↵Aiden Grossman1-8/+37
(#160640) Reapplies #159686 This reverts commit 4f33d7b7a9f39d733b7572f9afbf178bca8da127. The original landing of this patch had an issue where it would try and hoist allocas into the entry block that were in the entry block. This would end up actually moving them lower in the block potentially after users, resulting in invalid IR. This update fixes this by ensuring that we are only hoisting static allocas that have been sunk into a split basic block. A regression test has been added. Integration tested using a three stage build of clang with IRPGO enabled.
4 days[LoopInterchange] Bail out when finding a dependency with all `*` elements ↵Ryotaro Kasuga1-0/+11
(#149049) If a direction vector with all `*` elements, like `[* * *]`, is present, it indicates that none of the loop pairs are legal to interchange. In such cases, continuing the analysis is meaningless. This patch introduces a check to detect such direction vectors and exits early when one is found. This slightly reduces compile time.
4 daysInstCombine: Check GEP operand is available (#160438)Matt Arsenault1-2/+12
Logic copied from the select case. Fixes #160302
4 days[VPlan] Set correct flags when creating and cloning VPWidenCastRecipe.Florian Hahn3-10/+15
Make sure that we set the correct wrap flags when creating new VPWidenCastRecipes for truncs and preserve the flags from the recipe directly when cloning, to make sure they are not dropped. Fixes https://github.com/llvm/llvm-project/issues/160396
4 days[DropUnnecessaryAssumes] Add support for operand bundles (#160311)Nikita Popov1-12/+60
This extends the DropUnnecessaryAssumes pass to also handle operand bundle assumes. For this purpose, export the affected value analysis for operand bundles from AssumptionCache. If the bundle only affects ephemeral values, drop it. If all bundles on an assume are dropped, drop the whole assume.
4 days[VPlan] Create epilogue minimum iteration check in VPlan. (#157545)Florian Hahn3-144/+187
Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan. It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis. Test changes are improvements due to folding of live-ins. PR: https://github.com/llvm/llvm-project/pull/157545
5 days[LV] Remove EVLIndVarSimplify pass (#160454)Luke Lau2-301/+0
Initially this was needed to replace the fixed-step canonical IV with the variable-step EVL IV, but this was eventually superseded by the loop vectorizer doing this transform itself in #147222. The pass was then removed from the RISC-V pipeline in #151483 and the loop vectorizer stopped emitting the metadata used by the pass in #155760, so now there's no users of it.
5 days[msan][NFCI] Generalize handleAVX512VectorGenericMaskedFP() operands (#159966)Thurston Dang1-16/+38
This generalizes handleAVX512VectorGenericMaskedFP() (introduced in #158397), to potentially handle intrinsics that have A/WriteThru/Mask in an operand order that is different to AVX512/AVX10 rcp and rsqrt. Any operands other than A and WriteThru must be fully initialized. For example, the generalized handler could be applied in follow-up work to many of the AVX512 rndscale intrinsics: ``` <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512(<32 x half>, i32, <32 x half>, i32, i32) <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512(<16 x float>, i32, <16 x float>, i16, i32) <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512(<8 x double>, i32, <8 x double>, i8, i32) A Imm WriteThru Mask Rounding <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256(<8 x float>, i32, <8 x float>, i8) <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128(<4 x float>, i32, <4 x float>, i8) <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256(<4 x double>, i32, <4 x double>, i8) <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128(<2 x double>, i32, <2 x double>, i8) A Imm WriteThru Mask ```
5 days[LV] Set extend kinds together with ExtOpTypes (NFC).Florian Hahn1-10/+7
Set extend kinds together with ExtOpTypes. This will make it easier to adjust the extend kind handling.
5 days[profcheck] Option to inject distinct small weights (#159644)Mircea Trofin1-29/+47
There are cases where the easiest way to regression-test a profile change is to add `!prof`​ metadata, with small numbers as to simplify manual verification. To ensure coverage, this (the inserting) may become tedious. This patch makes `prof-inject`​ do that for us, if so opted in. The list of weights used is a bunch of primes, used as a circular buffer. Issue #147390
5 days[InstCombine] Fold selects into masked loads (#160522)Matthew Devereau1-0/+10
Selects can be folded into masked loads if the masks are identical.
5 days[LLVMContext] Add OB_align assume bundle op ID. (#158078)Florian Hahn1-1/+1
Assume operand bundles are emitted in a few more places now, including used in various places in libc++. Add a dedicated ID for them. PR: https://github.com/llvm/llvm-project/pull/158078
5 days[LV] Don't create partial reductions if factor doesn't match accumulator ↵Florian Hahn4-17/+29
(#158603) Check if the scale-factor of the accumulator is the same as the request ScaleFactor in tryToCreatePartialReductions. This prevents creating partial reductions if not all instructions in the reduction chain form partial reductions. e.g. because we do not form a partial reduction for the loop exit instruction. Currently code-gen works fine, because the scale factor of VPPartialReduction is not used during ::execute, but it means we compute incorrect cost/register pressure, because the partial reduction won't reduce to the specified scaling factor. PR: https://github.com/llvm/llvm-project/pull/158603
5 days[AssumptionCache] Don't use ResultElem for assumption list (NFC) (#160462)Nikita Popov1-2/+2
ResultElem stores a weak handle of an assume, plus an index for referring to a specific operand bundle. This makes sense for the results of assumptionsFor(), which refers to specific operands of assumes. However, assumptions() is a plain list of assumes. It does *not* contain separate entries for each operand bundles. The operand bundle index is always ExprResultIdx. As such, we should be directly using WeakVH for this case, without the additional wrapper.
5 days[LV] Don't ignore invariant stores when costing (#158682)Ramkumar Ramachandra1-14/+0
Invariant stores of reductions are removed early in the VPlan construction, and there is no reason to ignore them while costing.
6 daysReapply "[Coroutines] Add llvm.coro.is_in_ramp and drop return value of ↵Weibo He4-19/+36
llvm.coro.end (#155339)" (#159278) As mentioned in #151067, current design of llvm.coro.end mixes two functionalities: querying where we are and lowering to some code. This patch separate these functionalities into independent intrinsics by introducing a new intrinsic llvm.coro.is_in_ramp. Update a test in inline/ML, Reapply #155339
6 days[LV] Check for hoisted safe-div selects in planContainsAdditionalSimp.Florian Hahn1-9/+28
In some cases, safe-divisor selects can be hoisted out of the vector loop. Catching all cases in the legacy cost model isn't possible, in particular checking if all conditions guarding a division are loop invariant. Instead, check in planContainsAdditionalSimplifications if there are any hoisted safe-divisor selects. If so, don't compare to the more inaccurate legacy cost model. Fixes https://github.com/llvm/llvm-project/issues/160354. Fixes https://github.com/llvm/llvm-project/issues/160356.
6 days[SimplifyCFG] Avoid using isNonIntegralPointerType()Alexander Richardson1-8/+19
This is an overly broad check, the transformation made here can be done safely for pointers with index!=repr width. This fixes the codegen regression introduced by https://github.com/llvm/llvm-project/pull/105735 and should be beneficial for AMDGPU code-generation once the datalayout there no longer uses the overly strict `ni:` specifier. Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/159890
6 days[SLPVectorizer] Move size checks (NFC) (#159361)Mikhail Gudim1-15/+12
Move size checks inside `isStridedLoad`. In the future we plan to possibly change the size and type of strided load there.
7 daysRevert "[ControlHeightReduction] Drop lifetime annotations where necessary ↵Aiden Grossman1-37/+8
(#159686)" This reverts commit a00450944d2a91aba302954556c1c23ae049dfc7. Looks like this one is actually breaking the buildbots. Reverting the switch back to IRPGO did not fix things.
7 days[TTI][ASan][RISCV] reland Move InterestingMemoryOperand to Analysis and ↵Hank Chang1-6/+18
embed in MemIntrinsicInfo #157863 (#159713) [Previously reverted due to failures on asan-rvv-intrinsics.ll, the test case is riscv only and it is triggered by other target] Reland [#157863](https://github.com/llvm/llvm-project/pull/157863), and add `; REQUIRES: riscv-registered-target` in test case to skip the configuration that doesn't register riscv target. Previously asan considers target intrinsics as black boxes, so asan could not instrument accurate check. This patch make SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so that TTI can make targets describe their intrinsic informations to asan. Note, 1. This patch move InterestingMemoryOperand from Transforms to Analysis. 2. Extend MemIntrinsicInfo by adding a SmallVector<InterestingMemoryOperand> member. 3. This patch does not support RVV indexed/segment load/store.
7 days[LV][EVL] Remove metadata on EVL vectorized loops (#155760)Shih-Po Hung1-20/+0
This patch removes the metadata emission for EVL‑vectorized loops, since there is no current in-tree consumer: 1) after VPlan performs canonical IV replacement #147222 and 2) RISCV dropped EVLIndVarSimplifyPass #151483, which was the only user of this metadata.
7 days[ControlHeightReduction] Drop lifetime annotations where necessary (#159686)Aiden Grossman1-8/+37
ControlHeightReduction will duplicate some blocks and insert phi nodes in exit blocks of regions that it operates on for any live values. This includes allocas. Having a lifetime annotation refer to a phi node was made illegal in 92c55a315eab455d5fed2625fe0f61f88cb25499, which causes the verifier to fail after CHR. There are some cases where we might not need to drop lifetime annotations (usually because we do not need the phi to begin with), but drop all annotations for now to be conservative. Fixes #159621.
7 days[InferAlignment] Fix updating alignment when larger than i32 (#160109)Joseph Huber1-1/+2
Summary: The changes made in https://github.com/llvm/llvm-project/pull/156057 allows the alignment value to be increased. We assert effectively infinite alignment when the pointer argument is invalid / null. The problem is that for whatever reason the masked load / store functions use i32 for their alignment value which means this gets truncated to zero. Add a special check for this, long term we probably want to just remove this argument entirely.
7 days[VPlan] Avoid branching around State.get (NFC) (#159042)Ramkumar Ramachandra1-9/+3
7 days[VPlan] Add WidenGEP::getSourceElementType (NFC) (#159029)Ramkumar Ramachandra3-17/+21
7 days[Coroutines] Take byval param alignment into account when spilling to frame ↵Hans Wennborg1-4/+8
(#159765) Fixes #159571
8 days[LV] Set correct costs for interleave group members.Florian Hahn1-3/+12
This ensures each scalarized member has an accurate cost, matching the cost it would have if it would not have been considered for an interleave group.