aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
AgeCommit message (Collapse)AuthorFilesLines
10 hours[RISCV] Teach getIntImmCostInst about (X & -(1 << C1) & 0xffffffff) == C2 << ↵Craig Topper1-0/+39
C1 (#160163) We can rewrite this to (srai(w)/srli X, C1) == C2 so the AND immediate is free. This transform is done by performSETCCCombine in RISCVISelLowering.cpp. This fixes the opaque constant case mentioned in #157416.
3 daysRevert "[TTI][RISCV] Add cost modelling for intrinsic vp.load.ff (#160470)"ShihPo Hung1-11/+0
This reverts commit aa08b1a9963f33ded658d3ee655429e1121b5212.
3 days[ASan][RISCV] Teach AddressSanitizer to support indexed load/store. (#160443)Hank Chang1-0/+38
This patch is based on https://github.com/llvm/llvm-project/pull/159713 This patch extends AddressSanitizer to support indexed/segment instructions in RVV. It enables proper instrumentation for these memory operations. A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to describe the offset between the base pointer and the actual memory reference address. Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
4 days[TTI][RISCV] Add cost modelling for intrinsic vp.load.ff (#160470)Shih-Po Hung1-0/+11
Split out from #151300 to isolate TargetTransformInfo cost modelling for fault-only-first loads from VPlan implementation details. This change adds costing support for vp.load.ff independently of the VPlan work. For now, model a vp.load.ff as cost-equivalent to a vp.load.
7 days[TTI][ASan][RISCV] reland Move InterestingMemoryOperand to Analysis and ↵Hank Chang1-0/+77
embed in MemIntrinsicInfo #157863 (#159713) [Previously reverted due to failures on asan-rvv-intrinsics.ll, the test case is riscv only and it is triggered by other target] Reland [#157863](https://github.com/llvm/llvm-project/pull/157863), and add `; REQUIRES: riscv-registered-target` in test case to skip the configuration that doesn't register riscv target. Previously asan considers target intrinsics as black boxes, so asan could not instrument accurate check. This patch make SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so that TTI can make targets describe their intrinsic informations to asan. Note, 1. This patch move InterestingMemoryOperand from Transforms to Analysis. 2. Extend MemIntrinsicInfo by adding a SmallVector<InterestingMemoryOperand> member. 3. This patch does not support RVV indexed/segment load/store.
11 daysRevert "[TTI][ASan][RISCV] Move InterestingMemoryOperand to Analysis and ↵Florian Mayer1-77/+0
embed in MemIntrinsicInfo" (#159700) Reverts llvm/llvm-project#157863
11 days[TTI][ASan][RISCV] Move InterestingMemoryOperand to Analysis and embed in ↵Hank Chang1-0/+77
MemIntrinsicInfo (#157863) Previously asan considers target intrinsics as black boxes, so asan could not instrument accurate check. This patch make SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so that TTI can make targets describe their intrinsic informations to asan. Note, 1. This patch move InterestingMemoryOperand from Transforms to Analysis. 2. Extend MemIntrinsicInfo by adding a SmallVector<InterestingMemoryOperand> member. 3. This patch does not support RVV indexed/segment load/store.
2025-09-12[RISCV] Use hasCPOPLike in isCtpopFast and getPopcntSupport (#158371)Craig Topper1-3/+1
2025-09-02[Reland] "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. ↵Elvis Wang1-0/+12
#149955" (#156386) This patch implements the `getAddressComputationCost()` in RISCV TTI which make the gather/scatter with address calculation more expansive that stride cost. Note that the only user of `getAddressComputationCost()` with vector type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some LV tests changes. I've checked the tests changes in LV and seems those changes can be divided into two groups. * gather/scatter with uniform vector ptr, seems can be optimized to masked.load. * can optimize to stride load/store. ---- After #155739 landed, the assertion (cost mis-aligned) is fixed. I've tested llvm-test-suite w/ rva23u64 and rva23u64_zvl1024b locally and no assertion occurred.
2025-08-30[RISCV] Unaligned vec mem => prefer alt opc vecMikhail Gudim1-0/+4
Return `true` in `RISCVTTIImpl::preferAlternateOpcodeVectorization` if subtarget supports unaligned memory accesses.
2025-08-27Revert "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI." ↵Elvis Wang1-12/+0
(#155535) Reverts llvm/llvm-project#149955
2025-08-27[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. (#149955)Elvis Wang1-0/+12
This patch implements the `getAddressComputationCost()` in RISCV TTI which make the gather/scatter with address calculation more expansive that stride cost. Note that the only user of `getAddressComputationCost()` with vector type is in `VPWidenMemoryRecipe::computeCost()`. So this patch make some LV tests changes. I've checked the tests changes in LV and seems those changes can be divided into two groups. * gather/scatter with uniform vector ptr, seems can be optimized to masked.load. * can optimize to stride load/store.
2025-08-19[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086)David Sherwood1-0/+18
There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.
2025-08-18[RISCV] Remove ST->hasVInstructions() from getIntrinsicInstrCost for ↵Jim Lin1-1/+1
cttz/ctlz/ctpop. NFC. (#154064) That isn't necessary if we've checked ST->hasStdExtZvbb().
2025-08-12[RISCV] Cost casts with illegal types that can't be legalized (#153030)Luke Lau1-0/+1
If we have a floating point vector and no zve32f/zve64f/zve64d, we can end up with an invalid type-legalization cost from getTypeLegalizationCost. Previously this triggered an assertion that the type must have been legalized if the "legal" type is a vector, but in this case when it's not possible to legalize the original type is spat back out. This fixes it by just checking that the legalization cost is valid. We don't have much testing for zve64x, so we may have other places in the cost model with this issue. Fixes #153008
2025-08-05[RISCV][TTI] Enable masked interleave access (#151665)Mel Chen1-5/+5
Now that support for masked loads/stores of interleave groups has landed, we can enable the loop vectorizer to generate masked interleave access where applicable. This improves vectorization in several ways: * Internal predication support: This enables interleave group vectorization for loops with internal control flow predication, provided all members of the group share the same predicate. Gaps in interleave groups are still not efficiently handled by masking, so masking for gaps remains disabled for now. * Tail folding: This allows tail folding of loops with interleave groups by using masking. Without this, vectorized loops with interleaves would fall back to using separate gather/scatter accesses, which can be significantly less efficient. "[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)" was reverted by 5294793bdcf6ca142f7a0df897638bd4e85ed1a7 due to triggering an assertion. The issue has been addressed in the patch "[LV] Fix gap mask requirement for interleaved access (#151105)". On the other hand, this patch also enable fixed-length masked interleave access (#150624) since support for fixed-length has also been landed 992118cb4deab139ae384bb85f03225a9a21b008. --------- Co-authored-by: Philip Reames <preames@rivosinc.com>
2025-07-31[RISCV] Adjust unroll prefs for loops with vectors (#151525)Ramkumar Ramachandra1-8/+7
Adjust the unrolling preferences to unroll hand-vectorized code, as well as the scalar remainder of a vectorized loop. Inspired by a similar effort in AArch64: see #147420 and #151164.
2025-07-30[RISCV] Fix bug in [l](lrint|lround) vector-cost (#151298)Ramkumar Ramachandra1-2/+7
Follow up on a review of bd66fd0 ([CostModel/RISCV] Fix costs of vector [l](lrint|lround)) post-landing to fix a subtle problem with the cost of vector [l](lrint|lround). We should use source LMUL in the case of a narrowing op. Co-authored-by: Luke Lau <luke@igalia.com>
2025-07-29[RISCV] Fix build failure in getIntrinsicInstrCost (#151210)Ramkumar Ramachandra1-1/+1
bd66fd0 ([CostModel/RISCV] Fix costs of vector [l](lrint|lround)) introduced buildbot failures by using a temporary ArrayRef when a SmallVector should have been used. Fix this. Failure: https://lab.llvm.org/buildbot/#/builders/186/builds/11133
2025-07-29[CostModel/RISCV] Fix costs of vector [l](lrint|lround) (#146058)Ramkumar Ramachandra1-8/+37
Take the actual instruction cost into account, and don't fallthrough to code that doesn't apply to [l]lrint. Also strip invalid costs for [b]f16, as a companion to #146507, and unify it with [l]lround costs as a companion to #147713.
2025-07-25Revert "[RISCV][TTI] Enable masked interleave access for scalable vector ↵Alex Bradbury1-6/+4
(#149981)" This reverts commit ee3a7714b7a69ac9aae4b79f4c67adc38bc6876b. Causes an assertion for the zvl1024b RISC-V build configuration. See comment with reproducer at <https://github.com/llvm/llvm-project/pull/149981#issuecomment-3118482801>
2025-07-25[RISCV][TTI] Enable masked interleave access for scalable vector (#149981)Mel Chen1-4/+6
Now that support for masked loads/stores of interleave groups has landed, we can enable the loop vectorizer to generate masked interleave access where applicable. This improves vectorization in several ways: * Internal predication support: This enables interleave group vectorization for loops with internal control flow predication, provided all members of the group share the same predicate. Gaps in interleave groups are still not efficiently handled by masking, so masking for gaps remains disabled for now. * Tail folding: This allows tail folding of loops with interleave groups by using masking. Without this, vectorized loops with interleaves would fall back to using separate gather/scatter accesses, which can be significantly less efficient. * Scalable vector support: Currently, only scalable vector types are supported for masked interleave lowering. Fixed-length vector support will be enabled in the future. As interleave access is not yet supported with tail folding by EVL, that functionality is temporarily disabled. We are going to create another patch to support it. Co-authored-by: Philip Reames <preames@rivosinc.com> --------- Co-authored-by: Philip Reames <preames@rivosinc.com>
2025-07-23[RISCV][TTI] Implement vector costs for `llvm.fpto{u|s}i.sat()`. (#143655)Elvis Wang1-0/+28
This patch implement vector costs for `llvm.fptoui.sat()` in RISCV TTI.
2025-07-10[RISCV] Unify non-vp and vp rounding intrinsic costing (#147872)Luke Lau1-41/+0
Currently we have slightly different costing for the vp and non-vp version of the rounding intrinsics. We can delete this code and use the generic BasicTTIImpl code for the vp intrinsics which falls back to the non-vp versions. I'm not sure if the zvfh costing is correct, this should probably be fixed in a follow up patch. At the moment the non-vp cost is more important since it is what the loop vectorizer will use.
2025-07-10[TTI] Move vp.{select,merge} costing from RISCV to BasicTTIImpl. NFC (#147870)Luke Lau1-11/+0
Move the costing to the generic implementation in BasicTTIImpl since it just falls back to the non-vp costing. Also pass through the OperandValueInfo if using value based costing, but I don't believe this affects the result for any in-tree target currently.
2025-06-21[CostModel] Add a DstTy to getShuffleCost (#141634)David Green1-33/+47
A shuffle will take two input vectors and a mask, to produce a new vector of size <MaskElts x SrcEltTy>. Historically it has been assumed that the SrcTy and the DstTy are the same for getShuffleCost, with that being relaxed in recent years. If the Tp passed to getShuffleCost is the SrcTy, then the DstTy can be calculated from the Mask elts and the src elt size, but the Mask is not always provided and the Tp is not reliably always the SrcTy. This has led to situations notably in the SLP vectorizer but also in the generic cost routines where assumption about how vectors will be legalized are built into the generic cost routines - for example whether they will widen or promote, with the cost modelling assuming they will widen but the default lowering to promote for integer vectors. This patch attempts to start improving that - it originally tried to alter more of the cost model but that too quickly became too many changes at once, so this patch just plumbs in a DstTy to getShuffleCost so that DstTy and SrcTy can be reliably distinguished. The callers of getShuffleCost have been updated to try and include a DstTy that is more accurate. Otherwise it tries to be fairly non-functional, keeping the SrcTy used as the primary type used in shuffle cost routines, only using DstTy where it was in the past (for InsertSubVector for example). Some asserts have been added that help to check for consistent values when a Mask and a DstTy are provided to getShuffleCost. Some of them took a while to get right, and some non-mask calls might still be incorrect. Hopefully this will provide a useful base to build more shuffles that alter size.
2025-06-19[TTI] Plumb CostKind through getPartialReductionCost (#144953)Philip Reames1-5/+4
Purely for the sake of being idiomatic with other TTI costing routines, no direct motivation beyond that.
2025-06-18[TTI] Remove PPC hasActiveVectorLength impl, simplify interface (NFC). (#142310)Florian Hahn1-1/+1
PPCTTIImpl defines hasActiveVectorLength and also getVPMemoryOpCost, but they appear unused (i.e. no changes to tests). Remove them, as they complicate the interface for hasActiveVectorLength. This simplifies the only use in LV as now no placeholder values need to be passed. PR: https://github.com/llvm/llvm-project/pull/142310
2025-06-18[RISCV] Support non-power-of-2 types when expanding memcmpPengcheng Wang1-14/+7
We can convert non-power-of-2 types into extended value types and then they will be widen. Reviewers: lukel97 Reviewed By: lukel97 Pull Request: https://github.com/llvm/llvm-project/pull/114971
2025-06-17[RISCV] Consolidate both copies of getLMUL1VT [nfc] (#144568)Philip Reames1-10/+1
Put one copy on RISCVTargetLowering as a static function so that both locations can use it, and rename the method to getM1VT for slightly improved readability.
2025-06-16[RISCV] Use RISCV::RVVBitsPerBlock instead of 64 in getLMUL1VT. NFC (#144401)Craig Topper1-1/+1
2025-06-16[RISCV][TTI] Refine reverse shuffle costing for high LMUL (#144155)Philip Reames1-22/+62
This contains two closely related changes: 1) Explicitly recurse on the i1 case - "3" happens to be the right magic constant at m1, but is not otherwise correct, and we're better off deferring this to existing logic. 2) Match the lowering for high LMUL shuffles - we've switched to using a linear number of m1 vrgather instead of a single big vrgather. This results in substantially faster (but also larger) code for reverse shuffles larger than m1. Note that fixed vectors need a slide at the end, but scalable ones don't. This will have the effect of biasing the vectorizer towards larger (particularly scalable larger) vector factors. This increases VF for the s112 and s1112 loops from TSVC_2 (in all configurations). We could refine the high LMUL estimates a bit more, but I think getting the linear scaling right is probably close enough for the moment.
2025-06-13[RISCV] Support memcmp expansion for vectorsPengcheng Wang1-0/+17
This patch adds the support of generating vector instructions for `memcmp`. This implementation is inspired by X86's. We convert integer comparisons (eq/ne only) into vector comparisons and do a vector reduction and to get the result. The range of supported load sizes is (XLEN, VLEN * LMUL8] and non-power-of-2 types are not supported. Fixes #143294. Reviewers: lukel97, asb, preames, topperc, dtcxzyw Reviewed By: topperc, lukel97 Pull Request: https://github.com/llvm/llvm-project/pull/114517
2025-06-10[RISCV][TTI] Allow partial reduce with mismatched extends (#143608)Philip Reames1-2/+1
This depends on the recently add partial_reduce_sumla node for lowering but at this point, we have all the parts.
2025-05-30[RISCV][TTI] Discount slide cost if ri.vinsert/ri.vextract are available ↵Philip Reames1-1/+4
(#142036) If we have the ri.vinsert/vextract instructions from xrivosvisni, we can do an element insert or extract without needing a vslide or a vector temporary register. Adjust the TTI cost to reflect this.
2025-05-26[RISCV][TTI] Adjust costing in getPartialReductionCost for zvqdotq (#141430)Philip Reames1-2/+2
Two changes: 1) Handle fixed vector cases now that 77a3f8 has landed. 2) Fix a mistake in the original costing - the VF passed in is the input VF, not the output VF. Given that we should be costing the accumulator type with VF/4. Note that (2) does not cause any visible test differences as the vectorizer (outside of maximize-bandwidth mode) does not consider wide enough VF for the costing difference to matter.
2025-05-23[RISCV][TTI] Implement getPartialReductionCost for the vqdotq cases (#140974)Philip Reames1-0/+23
Doing so tells the loop vectorizer that the partial.reduce intrinsic is profitable to use over the plain extend/multiply/reduce.add sequence.
2025-05-01[CostModel] Make Op0 and Op1 const in getVectorInstrCost. NFC (#137631)David Green1-2/+3
This does not alter much at the moment, but allows const pointers to be passed as Op0 and Op1, simplifying later patches
2025-04-30[SLPVectorizer] Move X86 specific handling into X86TTIImpl. (#137830)Jonas Paulsson1-1/+2
`ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" ` changed SLPVectorizer getScalarizationOverhead() to call TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some cases. This was due to X86 specific handlings in these (overridden) methods, and unfortunately the general preference of TTI.getScalarizationOverhead() was dropped. If VL is available it should always be preferred to use getScalarizationOverhead(), and this is indeed the case for SystemZ which has a special insertion instruction that can insert two GPR64s. Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with codegen"` reworked SLPVectorizer getGatherCost() which together with ad9909d caused the SystemZ test vec-elt-insertion.ll to fail. This patch restores the SystemZ test and reverts the change in SLPVectorizer getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always called again. The ForPoisonSrc argument is now passed on to the TTI method so that X86 can handle this as required. Fixes: #135346
2025-04-27[RISCV] Sink vp.splat operands of VP intrinsic. (#133245)MingYan1-7/+15
This patch introduces a `vp.splat` matching method for VP support by sinking the `vp.splat` operand of VP operations back into the same basic block as the VP operation, facilitating the generation of .vx instructions to reduce vector register pressure. --------- Co-authored-by: yanming <ming.yan@terapines.com>
2025-04-23[CostModel] Remove optional from InstructionCost::getValue() (#135596)David Green1-1/+1
InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.
2025-04-22Fix build error introduced by 1c722fcPhilip Reames1-2/+2
The change built before merge, but apparently a constness change landed since I posted this for review.
2025-04-22[RISCV][TTI] Use processShuffleMask for shuffle legalization estimate (#136191)Philip Reames1-41/+62
We had some code which tried to estimate legalization costs for illegally typed shuffles, but it only handled the case of a widening shuffle, and used a somewhat adhoc heuristic. We can reuse the processShuffleMask utility (which we already use for individual vector register splitting when exact VLEN is known) to perform the same splitting given the legal vector type as the unit of split instead. This makes the costing both simpler and more robust. Note that this swings costs for illegal shuffles pretty wildly as we were previously sometimes hitting the adhoc code, and sometimes falling through into generic scalarization costing. I don't know that any of the costs for the individual tests in tree are significant, but the test which which triggered me finding this was reported to me by Alexey reduced from something triggering a bad choice in SLP for x264. So this has the potential to be somewhat high impact.
2025-04-22[TTI] Fix discrepancies in prototypes between interface and implementations ↵Sergei Barannikov1-1/+1
(NFCI) (#136655) These are not diagnosed because implementations hide the methods of the base class rather than overriding them. This works as long as a hiding function is callable with the same arguments as the same function from the base class. Pull Request: https://github.com/llvm/llvm-project/pull/136655
2025-04-22[TTI] Make all interface methods const (NFCI) (#136598)Sergei Barannikov1-15/+16
Making `TargetTransformInfo::Model::Impl` `const` makes sure all interface methods are `const`, in `BasicTTIImpl`, its bases, and in all derived classes. Pull Request: https://github.com/llvm/llvm-project/pull/136598
2025-04-21[TTI] Constify BasicTTIImplBase::thisT() (NFCI) (#136575)Sergei Barannikov1-27/+28
The main change is making `thisT` method `const`, the rest of the changes is fixing compilation errors (*). (*) There are two tricky methods, `getVectorInstrCost()` and `getIntImmCost()`. They have several overloads; some of these overloads are typically pulled in to derived classes using the `using` directive, and then hidden by methods in the derived class. The compiler does not complain if the hiding methods are not marked as `const`, which means that clients will use the methods from the base class. If after this change your target fails cost model tests, this must be the reason. To resolve the issue you need to make all hiding overloads `const`. See the second commit in this PR. Pull Request: https://github.com/llvm/llvm-project/pull/136575
2025-04-21[RISCV] Handle scalarized reductions in getArithmeticReductionCostLuke Lau1-3/+2
This fixes a crash reported at https://github.com/llvm/llvm-project/pull/114250#issuecomment-2813686061 If the vector type isn't legal at all, e.g. bfloat with +zvfbfmin, then the legalized type will be scalarized. So use getScalarType() instead of getVectorElement() when checking for f16/bf16.
2025-03-29[RISCV][TTI] Adjust VLS shuffle costing to account for sub-mask reuse (#129793)Philip Reames1-0/+4
If we have a shuffle which can be split via VLA where two or more of the destinations have exactly the same elements, then we only need to account for them once in costing. The duplicate copies are are (at worst) whole register moves. Note that this change only handles the single source case. Doing the multiple source case seemed a bit more complicated, and I didn't have a motivating test case.
2025-03-28[RISCV] Don't vectorize for loops with small trip count (#132176)Pengcheng Wang1-0/+10
Inspired by https://reviews.llvm.org/D130755. I don't know the logic behind the value 5, it is copied from AArch64. For some tests, I have to change the trip count so that we don't break what they are testing.
2025-03-19[TTI] Align optional FMFs in getExtendedReductionCost() to ↵Elvis Wang1-1/+1
getArithmeticReductionCost(). (#131968) In the implementation of the getExtendedReductionCost(), it ofter calls getArithmeticReductionCost() with FMFs. But we shouldn't call getArithmeticReductionCost() with FMFs for non-floating-point reductions which will return the wrong cost. This patch makes FMFs in getExtendedReductionCost() optional and align to the getArithmeticReductionCost(). So the TTI will return the correct cost for non-FP extended-reductions query without FMFs. This patch is not quite NFC but it's hard to test from the CostModel side. Split from #113903.