aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/SplitKit.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-04-21[llvm] Use llvm::SmallVector::pop_back_val (NFC) (#136533)Kazu Hirata1-2/+1
2025-03-14[CodeGen] Remove parameter from LiveRangeEdit::canRematerializeAt [NFC]Philip Reames1-4/+2
Only one caller cares about the true case of this parameter, so move the check to that single caller. Note that RegisterCoalescer seems like it should care, but it already duplicates the check several lines above.
2025-03-13[CodeGen] Use early return to simplify SplitEditor::defFromParent [NFC]Philip Reames1-28/+26
2025-03-06SplitKit: Take register class directly from instruction definition (#129727)Matt Arsenault1-11/+13
This fixes an expensive chesk failure after 8476a5d480304. The issue was essentially that getRegClassConstraintEffectForVReg was not doing anything useful, sometimes. If the register passed to it is not present in the instruction, it is a no-op and returns the original classe. The Edit->getReg() register may not be the register as it appears in either the use or def instruction. It may be some split register, so take the register directly from the instruction being rematerialized. Also directly query the constraint from the def instruction, with a hardcoded operand index. This isn't ideal, but all the other rematerialize code makes the same assumption. So far I've been unable to reproduce this with a standalone MIR test. In the original case, stop-before=greedy and running the one pass is not working.
2025-03-04SplitKit: Fix rematerialization undoing subclass based split (#122110)Matt Arsenault1-3/+42
This fixes an allocation failure in the new test. In cases where getLargestLegalSuperClass can inflate the register class, rematerialization could effectively undo a split which was done to inflate the register class, if the defining instruction can only write a subclass and the use can read the superclass. Some of the x86 tests changes look like improvements, but some are likely regressions. I'm not entirely sure this is the correct place to fix this. It also seems more complicated than necessary, but the decision to change the register class is far removed from the point where the decision to split the virtual register is made. I'm also also not sure if this should be considering the register classes of all the use indexes in getUseSlots, rather than just checking if this use index instruction reads the register.
2025-03-02[CodeGen] Use MCRegister and Register. NFCCraig Topper1-2/+3
2025-01-14[CodeGen] Remove unused argument from getCoveringSubRegIndexes. NFC. (#122884)Jay Foad1-1/+1
2024-11-12[CodeGen] Remove unused includes (NFC) (#115996)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2024-09-19[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)Jay Foad1-1/+1
It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.
2024-06-14[llvm] Use llvm::unique (NFC) (#95628)Kazu Hirata1-2/+1
2023-11-18[GreedyRA] Improve RA for nested loop induction variables (#72093)David Green1-0/+12
Imagine a loop of the form: ``` preheader: %r = def header: bcc latch, inner inner1: .. inner2: b latch latch: %r = subs %r bcc header ``` It can be possible for code to spend a decent amount of time in the header<->latch loop, not going into the inner part of the loop as much. The greedy register allocator can prefer to spill _around_ %r though, adding spills around the subs in the loop, which can be very detrimental for performance. (The case I am looking at is actually a very deeply nested set of loops that repeat the header<->latch pattern at multiple different levels). The greedy RA will apply a preference to spill to the IV, as it is live through the header block. This patch attempts to add a heuristic to prevent that in this case for variables that look like IVs, in a similar regard to the extra spill weight that gets added to variables that look like IVs, that are expensive to spill. That will mean spills are more likely to be pushed into the inner blocks, where they are less likely to be executed and not as expensive as spills around the IV. This gives a 8% speedup in the exchange benchmark from spec2017 when compiled with flang-new, whilst importantly stabilising the scores to be less chaotic to other changes. Running ctmark showed no difference in the compile time. I've tried to run a range of benchmarking for performance, most of which were relatively flat not showing many large differences. One matrix multiply case improved 21.3% due to removing a cascading chains of spills, and some other knock-on effects happen which usually cause small differences in the scores.
2023-11-16[AMDGPU] RA inserted scalar instructions can be at the BB top (#72140)Christudasan Devadasan1-2/+4
We adjust the insertion point at the BB top for spills/copies during RA to ensure they are placed after the exec restore instructions required for the divergent control flow execution. This is, however, required only for the vector operations. The insertions for scalar registers can still go to the BB top.
2023-08-09Remove a reference to rdar://problem/10664933Jon Roelofs1-1/+0
The original commit, and the comments in the code already provide sufficient context. But for posterity, there's a tiny bit more that might be useful if someone is digging here in the future: > This is related to <rdar://problem/10318439> Lower invokes into terminating > machine instructions. > > The return value from a function call is live in to that function call's > landing pad. The landing pad is shared with a later call, and the variable is > undef on the first exceptional edge. > > Our computation of the last legal split point gets confused because the > return value is live-out from the calling block, and live-in to the landing > pad, but it is not live on the edge itself. > > Fixed in r147911 and r147912.
2023-07-31Reapply "[CodeGen]Allow targets to use target specific COPY instructions for ↵Matt Arsenault1-7/+10
live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.
2023-07-26Revert "[CodeGen]Allow targets to use target specific COPY instructions for ↵Vitaly Buka1-10/+7
live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381
2023-07-07[CodeGen]Allow targets to use target specific COPY instructions for live ↵Yashwant Singh1-7/+10
range splitting Replacing D143754. Right now the LiveRangeSplitting during register allocation uses TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a problem as we have both vector and scalar copies. Vector copies perform a copy over a vector register but only on the lanes(threads) that are active. This is mostly sufficient however we do run into cases when we have to copy the entire vector register and not just active lane data. One major place where we need that is live range splitting. Allowing targets to use their own copy instructions(if defined) will provide a lot of flexibility and ease to lower these pseudo instructions to correct MIR. - Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range splitting. - Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline. Reviewed By: arsenm, cdevadas, kparzysz Differential Revision: https://reviews.llvm.org/D150388
2023-02-07[CodeGen] Define and use MachineOperand::getOperandNoJay Foad1-1/+1
This is a helper function to very slightly simplify many calls to MachineInstruction::getOperandNo. Differential Revision: https://reviews.llvm.org/D143250
2022-12-10Don't include None.h (NFC)Kazu Hirata1-1/+0
I've converted all known uses of None to std::nullopt, so we no longer need to include None.h. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-07[NFC] Use Register instead of unsigned for variables that receive a Register ↵Gregory Alfonso1-2/+2
object Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D139451
2022-12-02[CodeGen] Use std::nullopt instead of None (NFC)Kazu Hirata1-1/+1
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-07-18CodeGen: Remove AliasAnalysis from regallocMatt Arsenault1-9/+5
This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.
2022-06-16Reland "[SplitKit] Handle early clobber + tied to def correctly"Kito Cheng1-5/+26
This reverts commit 7207373e1eb0dd419b4e13a5e2d0ca146ef9544e. We found another RISC-V bug when landing D126048, and it has been fixed by D127642 now. Differential Revision: https://reviews.llvm.org/D126048
2022-06-08Revert "[SplitKit] Handle early clobber + tied to def correctly"Kito Cheng1-26/+5
Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS. This reverts commit e14d04909df4e52e531f6c2e045c3cf9638dd817.
2022-06-08[SplitKit] Handle early clobber + tied to def correctlyKito Cheng1-5/+26
Spliter will try to extend a live range into `r` slot for a use operand, that's works on most situaion, however that not work correctly when the operand has tied to def, and the def operand is early clobber. Give an example to demo what's wrong: 0 %0 = ... 16 early-clobber %0 = Op %0 (tied-def 0), ... 32 ... = Op %0 Before extend: %0 = [0r, 0d) [16e, 32d) The point we want to extend is 0d to 16e not 16r in this case, but if we use 16r here we will extend nothing because that already contained in [16e, 32d). This patch add check for detect such case and adjust the extend point. Detailed explanation for testcase: https://reviews.llvm.org/D126047 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D126048
2022-02-05Simplify mask creation with llvm::seq. NFCI.Benjamin Kramer1-3/+2
2022-02-03[nfc][regalloc] const LiveIntervals within the allocatorMircea Trofin1-19/+34
Once built, LiveIntervals are immutable. This patch captures that. Differential Revision: https://reviews.llvm.org/D118918
2021-09-15SplitKit: Remove decade old live interval hackMatt Arsenault1-21/+4
This was trying to fixup broken live intervals coming out of the coalescer. The verifier is more complete now and no tests seem to fail without this.
2021-08-13SplitKit: Don't further split subrange mask in buildCopyRuiling Song1-11/+12
We may use several COPY instructions to copy the needed sub-registers during split. But the way we split the lanes during the COPYs may be different from the subranges of the old register. This would fail when we extend the subranges of the new register because the LaneMasks do not match exactly between subranges of new register and old register. Since we are bundling the COPYs, I think there is no need to further refine the subranges of the new register based on the set of LaneMasks of the inserted COPYs. I am not sure if there will be further breaking cases. But as the subranges of new register are created based on the LaneMasks of the subranges of old register, it will be highly possible we will always find an exact LaneMask match. We can think about how to make the extendPHIKillRanges() work for subrange mask mismatch case if we meet more such cases in the future. The test case was from D105065 by @arsenm. Differential Revision: https://reviews.llvm.org/D107829
2021-05-05[GreedyRA] Add support for invoke statepoint with tied-defs.Serguei Katkov1-6/+55
statepoint instruction uses tied-def registers to represent live gc value which is use and def at the same time on a call. At the same time invoke statepoint instruction is a last split point which can throw and jump to landing pad. As a result we have instructon which is last split point with tied-defs registers and we need to teach Greedy RA to work with it. The option -use-registers-for-gc-values-in-landing-pad controls whether statepoint lowering will generate tied-defs for invoke statepoint and is off by default now. To resolve all issues the following changes has been done. 1) Last Split point for invoke statepoint should be statepoint itself If statepoint has a def it is a relocated gc pointer and it should be available in landing pad. So we cannot split interval after statepoint at end of basic block. 2) Do not split interval on tied-def If end of interval for overlap utility is a use which has tied-def we should not split interval on this instruction due to in this case use and def may have different registers and it breaks tied-def property. 3) Take into account Last Split Point for enterIntvAtEnd If the use after Last Split Point is a def so it should be tied-def and we can take the def of the tied-use as ParentVNI and thus tied-use and tied-def will be live in resulting interval. 4) Handle the case when def is after LIP in InlineSpiller If def of LI is after last insertion point of basic block we cannot hoist in this BB. The example of such instruction is invoke statepoint where def represents the relocated live gc pointer. Invoke is a last insertion point and its def is located after it. In this case there is no place to insert spill and we bail out. 5) Fix removeBackCopies to account empty copies RegAssignMap cannot hold empty interval, so do not set stop to kill value if it produces empty interval. This can happen if we remove back-copy and right before that we have another back-copy. For example, for parent %0 we can get %1 = COPY %0 %2 = COPY %0 while we removing %2 we cannot set kill for %1 due to its empty. 6) Do not hoist copy to BB if its def is after LSP If the parent def is a LastSplitPoint or later we cannot hoist copy to this basic block because inserted copy (or re-materialization) will be located before the def. All parts have been reviewed separately as follows: https://reviews.llvm.org/D100747 https://reviews.llvm.org/D100748 https://reviews.llvm.org/D100750 https://reviews.llvm.org/D100927 https://reviews.llvm.org/D100945 https://reviews.llvm.org/D101028 Reviewers: reames, rnk, void, MatzeB, wmi, qcolombet Reviewed By: reames, qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D101150
2021-04-19[CSSPGO] Exclude pseudo probes from slot indexHongtao Yu1-1/+1
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D100334
2021-02-21[CodeGen] Use range-based for loops (NFC)Kazu Hirata1-4/+2
2021-02-20[CodeGen] Use range-based for loops (NFC)Kazu Hirata1-6/+6
2021-02-19[NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEditMircea Trofin1-10/+10
VirtRegAuxInfo is an extensibility point, so the register allocator's decision on which implementation to use should be communicated to the other users - namely, LiveRangeEdit. Differential Revision: https://reviews.llvm.org/D96898
2021-02-18[CodeGen] Use range-based for loops (NFC)Kazu Hirata1-2/+2
2021-02-18[splitkit] Add a minor wrapper function for readability [NFC]Philip Reames1-3/+3
2021-02-18[regalloc] Add a couple of dump routines for ease of debugging [NFC]Philip Reames1-0/+13
2021-02-15CodeGen: Move function to get subregister indexes to cover a LaneMaskMatt Arsenault1-58/+6
Return the best covering index, and additional needed to complete the mask. This logically belongs in TargetRegisterInfo, although I ended up not needing it for why I originally split this out.
2021-01-21[CodeGen] Use llvm::append_range (NFC)Kazu Hirata1-3/+1
2020-11-30SplitKit: Use RegisterMatt Arsenault1-7/+7
2020-09-30[SplitKit] Cope with no live subranges in defFromParentJay Foad1-3/+9
Following on from D87757 "[SplitKit] Only copy live lanes", it is possible to split a live range at a point when none of its subranges are live. This patch handles that case by inserting an implicit def of the superreg. Patch by Quentin Colombet! Differential Revision: https://reviews.llvm.org/D88397
2020-09-25[SplitKit] In addDeadDef tolerate parent range that defines more lanesJay Foad1-4/+12
Following on from D87757 "[SplitKit] Only copy live lanes", in SplitEditor::addDeadDef, when we're checking whether the parent live interval has a subrange defining the same lanes, tolerate the case where the parent subrange defines a superset of the lanes. This can happen when the child subrange comes from SplitEditor::buildCopy decomposing a partial copy into a sequence of subreg copies that cover the required lanes. Differential Revision: https://reviews.llvm.org/D88020
2020-09-17[SplitKit] Only copy live lanesJay Foad1-3/+6
When splitting a live interval with subranges, only insert copies for the lanes that are live at the point of the split. This avoids some unnecessary copies and fixes a problem where copying dead lanes was generating MIR that failed verification. The test case for this is test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir. Without this fix, some earlier live range splitting would create %430: %430 [256r,848r:0)[848r,2584r:1) 0@256r 1@848r L0000000000000003 [848r,2584r:0) 0@848r L0000000000000030 [256r,2584r:0) 0@256r weight:1.480938e-03 ... 256B undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec ... 848B %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec ... 2584B %431:vreg_128 = COPY %430:vreg_128 Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just before 848B giving: %432 [256r,844r:0) 0@256r L0000000000000030 [256r,844r:0) 0@256r weight:3.066802e-03 %433 [844r,848r:0)[848r,2584r:1) 0@844r 1@848r L0000000000000030 [844r,2584r:0) 0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1) 0@844r 1@848r weight:2.831776e-03 ... 256B undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec ... 844B undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 { internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128 848B } %433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec ... 2584B %431:vreg_128 = COPY %433:vreg_128 Note that the copy from %432 to %433 at 844B is a curious bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately, and it includes a copy of .sub0 which is not live at this point, and that causes it to fail verification: *** Bad machine code: No live subrange at use *** - function: zextload_global_v64i16_to_v64i64 - basic block: %bb.0 (0x7faed48) [0B;2848B) - instruction: 844B undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 - operand 1: %432.sub0:vreg_128 - interval: %432 [256r,844r:0) 0@256r L0000000000000030 [256r,844r:0) 0@256r weight:3.066802e-03 - at: 844B Using real bundles with a BUNDLE instruction might also fix this problem, but the current fix is less invasive and also avoids some unnecessary copies. https://bugs.llvm.org/show_bug.cgi?id=47492 Differential Revision: https://reviews.llvm.org/D87757
2020-09-16[NFC][Regalloc] accessors for 'reg' and 'weight'Mircea Trofin1-7/+7
Also renamed the fields to follow style guidelines. Accessors help with readability - weight mutation, in particular, is easier to follow this way. Differential Revision: https://reviews.llvm.org/D87725
2020-08-13SplitKit.cpp - removes includes already included by SplitKit.h. NFC.Simon Pilgrim1-13/+0
Don't duplicate includes already provided by the module header.
2020-07-01Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.James Y Knight1-13/+20
Before this instruction supported output values, it fit fairly naturally as a terminator. However, being a terminator while also supporting outputs causes some trouble, as the physreg->vreg COPY operations cannot be in the same block. Modeling it as a non-terminator allows it to be handled the same way as invoke is handled already. Most of the changes here were created by auditing all the existing users of MachineBasicBlock::isEHPad() and MachineBasicBlock::hasEHPadSuccessor(), and adding calls to isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate. Reviewed By: nickdesaulniers, void Differential Revision: https://reviews.llvm.org/D79794
2020-06-25LiveIntervals.h.h - reduce AliasAnalysis.h include to forward declaration. NFC.Simon Pilgrim1-0/+1
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-04-10Split LiveRangeCalc in LiveRangeCalc/LiveIntervalCalc. NFCMarcello Maggioni1-26/+26
Summary: Refactor LiveRangeCalc such that it is now split into two classes The objective is to split all the "register specific" logic away from LiveRangeCalc. The two new classes created are: - LiveRangeCalc - is meant as a generic class to compute and modify live ranges in a generic way. This class should deal only with SlotIndices and VNInfo objects. - LiveIntervalCals - is meant to be equivalent to the old LiveRangeCalc. It computes the liveness virtual registers tracked by a LiveInterval object. With this refactoring LiveRangeCalc can be used to implement tracking of liveness of LiveRanges that represent other things than just registers. Subscribers: MatzeB, qcolombet, mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76584
2019-10-17Move LiveRangeCalc header to publicily available position. NFCMarcello Maggioni1-1/+1
Differential Revision: https://reviews.llvm.org/D69078 llvm-svn: 375075
2019-08-15Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders1-2/+2
Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
2019-03-26[LiveRange] Reset the VNIs when splitting subrangesQuentin Colombet1-4/+5
When splitting a subrange we end up with two different subranges covering two different, non overlapping, lanes. As part of this splitting the VNIs of the original live-range need to be dispatched to the subranges according to which lanes they are actually defining. Prior to this patch we were assuming that all values were defining all lanes. This was wrong as demonstrated by llvm.org/PR40835. Differential Revision: https://reviews.llvm.org/D59731 llvm-svn: 357032