aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineScheduler.cpp
AgeCommit message (Collapse)AuthorFilesLines
8 days[MachineScheduler] Turn SU->isScheduled check into an assert in pickNode() ↵Jonas Paulsson1-61/+59
(#160145) It is unnecessary and confusing to have a do/while loop that checks SU->isScheduled as this should never be true. ScheduleDAGMI::updateQueues() is always called after pickNode() and it sets isScheduled on the SU. Turn this into an assertion instead.
2025-08-01[MachineScheduler] Make cluster check more efficient (#150884)Ruiling, Song1-26/+40
2025-07-22[MISched] Use SchedRegion in overrideSchedPolicy and ↵Harrison Hao1-20/+4
overridePostRASchedPolicy (#149297) This patch updates `overrideSchedPolicy` and `overridePostRASchedPolicy` to take a `SchedRegion` parameter instead of just `NumRegionInstrs`. This provides access to both the instruction range and the parent `MachineBasicBlock`, which enables looking up function-level attributes. With this change, targets can select post-RA scheduling direction per function using a function attribute. For example: ```cpp void overridePostRASchedPolicy(MachineSchedPolicy &Policy, const SchedRegion &Region) const { const Function &F = Region.RegionBegin->getMF()->getFunction(); Attribute Attr = F.getFnAttribute("amdgpu-post-ra-direction"); ... }
2025-06-29[CodeGen] Use std::tie to implement a comparison functor (NFC) (#146252)Kazu Hirata1-14/+14
std::tie clearly expresses the intent while slightly shortening the code.
2025-06-05MachineScheduler: Improve instruction clustering (#137784)Ruiling, Song1-30/+52
The existing way of managing clustered nodes was done through adding weak edges between the neighbouring cluster nodes, which is a sort of ordered queue. And this will be later recorded as `NextClusterPred` or `NextClusterSucc` in `ScheduleDAGMI`. But actually the instruction may be picked not in the exact order of the queue. For example, we have a queue of cluster nodes A B C. But during scheduling, node B might be picked first, then it will be very likely that we only cluster B and C for Top-Down scheduling (leaving A alone). Another issue is: ``` if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum) std::swap(SUa, SUb); if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) ``` may break the cluster queue. For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3 2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2), As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be pred of 3. This makes both 1 and 2 become preds of 3, but there is no edge between 1 and 2. Thus we get a broken cluster chain. To fix both issues, we introduce an unordered set in the change. This could help improve clustering in some hard case. One key reason the change causes so many test check changes is: As the cluster candidates are not ordered now, the candidates might be picked in different order from before. The most affected targets are: AMDGPU, AArch64, RISCV. For RISCV, it seems to me most are just minor instruction reorder, don't see obvious regression. For AArch64, there were some combining of ldr into ldp being affected. With two cases being regressed and two being improved. This has more deeper reason that machine scheduler cannot cluster them well both before and after the change, and the load combine algorithm later is also not smart enough. For AMDGPU, some cases have more v_dual instructions used while some are regressed. It seems less critical. Seems like test `v_vselect_v32bf16` gets more buffer_load being claused.
2025-06-03[MISched] Add templates for creating custom schedulers (#141935)Pengcheng Wang1-35/+3
We rename `createGenericSchedLive` and `createGenericSchedPostRA` to `createSchedLive` and `createSchedPostRA`, and add a template parameter `Strategy` which is the generic implementation by default. This can simplify some code for targets that have custom scheduler strategy.
2025-05-28MachineScheduler: Reset next cluster candidate for each node (#139513)Ruiling, Song1-0/+7
When a node is picked, we should reset its next cluster candidate to null before releasing its successors/predecessors.
2025-05-17[llvm] Use llvm::is_sorted (NFC) (#140399)Kazu Hirata1-2/+1
2025-05-09[MISched] Add statistics for heuristics (#137981)Cullen Rhodes1-13/+228
When diagnosing scheduling issues it can be useful to know which heuristics are driving the scheduler. This adds pre-RA and post-RA statistics for all heuristics.
2025-05-07[MISched] Add statistics to quantify scheduling (#138090)Cullen Rhodes1-0/+38
When diagnosing scheduler issues it can be useful to know how scheduling changes the order of instructions, particularly for large functions when it's not trivial to figure out from the debug output by looking at the scheduling unit (SU) IDs. This adds pre-RA and post-RA statistics to track 1) the number of instructions that remain in source order after scheduling and 2) the total number of instructions scheduled, to compare 1) against.
2025-05-06[MISched] Fix off-by-one error in debug output with -misched-cutoff=<n> flag ↵Cullen Rhodes1-4/+6
(#137988) This flag instructs the scheduler to stop scheduling after N instructions, but in the debug output it appears as if it's scheduling N+1 instructions, e.g. $ llc -misched-cutoff=10 -debug-only=machine-scheduler example.ll 2>&1 | grep "^Scheduling SU" | wc -l 11 as it calls pickNode before calling checkSchedLimit.
2025-04-18[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342)Philip Reames1-1/+1
This change removes the uint64_t constructor on LocationSize preventing implicit conversion, and fixes up the using APIs to adapt to the change. Note that I'm adding a couple of explicit conversion points on routines where passing in a fixed offset as an integer seems likely to have well understood semantics. We had an unfortunate case which arose if you tried to pass a TypeSize value to a parameter of LocationSize type. We'd find the implicit conversion path through TypeSize -> uint64_t -> LocationSize which works just fine for fixed values, but looses information and fails assertions if the TypeSize was scalable. This change breaks the first link in that implicit conversion chain since that seemed to be the easier one.
2025-04-08[MachineScheduler] Add more debug prints w.r.t hazards and pending SUnits ↵Min-Yih Hsu1-10/+32
(#134328) While we already have some detailed debug messages on the candidate selection process -- which selects a SUnit from the Available queue, we didn't say much about why a SUnit was _not_ moved from Pending queue to Available queue in the first place, which is just as important as why we scheduled a node IMHO. Therefore, I added some debug prints for this very purpose. I decide to print these extra messages by default (instead of being guarded by command line like `-misched-detail-resource-booking`) because we have been printing some of the hazard remarks, so I thought we might as well print these new messages -- which are mostly about hazard -- by default.
2025-03-04[CodeGen] Avoid repeated hash lookups (NFC) (#129821)Kazu Hirata1-4/+5
2025-03-04[CodeGen] Use Register in SDep interface. NFC (#129734)Craig Topper1-2/+1
2025-03-04[MachineScheduler] Optional scheduling of single-MI regions (#129704)Lucas Ramirez1-5/+4
Following 15e295d the machine scheduler no longer filters-out single-MI regions when emitting regions to schedule. While this has no functional impact at the moment, it generally has a negative compile-time impact (see #128739). Since all targets but AMDGPU do not care for this behavior, this introduces an off-by-default flag to `ScheduleDAGInstrs` to control whether such regions are going to be scheduled, effectively reverting 15e295d for all targets but AMDGPU (currently the only target enabling this flag).
2025-03-04[NPM][NFC] Chain PreservedAnalyses methods (#129505)Akshat Oke1-5/+4
2025-03-03[NFC]Make file-local cl::opt global variables static (#126486)chrisPyr1-4/+4
#125983
2025-02-27[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions (#128739)Lucas Ramirez1-2/+5
The MI scheduler skips regions containing a single MI during scheduling. This can prevent targets that perform multi-stage scheduling and move MIs between regions during some stages to reason correctly about the entire IR, since some MIs will not be assigned to a region at the beginning. This makes the machine scheduler no longer skip single-MI regions. Only a few unit tests are affected (mainly those which check for the scheduler's debug output).
2025-02-24[MachineSched] Add a first valid reason [nfc]Philip Reames1-2/+3
For debugging, distinguish the first valid candidate encountered and a preference decision driven by node number.
2025-02-20Revert "[CodeGen] Remove static member function ↵Christopher Di Bella1-1/+2
Register::isPhysicalRegister. NFC" This reverts commit 5fadb3d680909ab30b37eb559f80046b5a17045e.
2025-02-20[CodeGen] Remove static member function Register::isPhysicalRegister. NFCCraig Topper1-2/+1
Prefer the nonstatic member by converting unsigned to Register instead.
2025-02-13[MISched][NFC] Remove unused heuristic NextDefUse from enum (#125879)Cullen Rhodes1-1/+2
Heuristic was removed in 46533e614b78 due to being ineffective.
2025-02-12Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126684)Akshat Oke1-89/+241
`RegisterClassInfo` was supposed to be kept alive between pass runs, which wasn't being done leading to recomputations increasing the compile time. Now the Impl class is a member of the legacy and new passes so that it is not reconstructed on every pass run. --------- Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
2025-02-08Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126268)Akshat Oke1-205/+82
This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I investigate what's causing the compile-time regression.
2025-02-05[MISched] Small debug improvements (#125072)Cullen Rhodes1-16/+27
Changes: 1. Fix inconsistencies in register pressure set printing. "Max Pressure" printing is inconsistent with "Bottom Pressure" and "Top Pressure". For the former, register class begins on the same line vs newline for latter. Also for the former, the first register class is on the same line, but subsequent register classes are newline separated. That's removed so all are on the same line. Before: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 After: Max Pressure: FPR8=1 GPR32=14 Top Pressure: GPR32=2 Bottom Pressure: FPR8=7 GPR32=17 2. After scheduling an instruction, don't print pressure diff if there isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g., Before: UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64 to UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12 to GPR32 -1 After: UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12 to GPR32 -1 3. Don't print excess pressure sets if there are none.
2025-02-05CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)Christudasan Devadasan1-82/+205
2025-02-05[CodeGen][MachineScheduler] Remove the unimplemented print method. (#125702)Christudasan Devadasan1-6/+0
2025-02-05[CodeGen] Move MISched target hooks into TargetMachine (#125700)Christudasan Devadasan1-4/+8
The createSIMachineScheduler & createPostMachineScheduler target hooks are currently placed in the PassConfig interface. Moving it out to TargetMachine so that both legacy and the new pass manager can effectively use them.
2025-01-22[CodeGen] Rename RegisterMaskPair to VRegMaskOrUnit. NFC (#123799)Craig Topper1-6/+5
This holds a physical register unit or virtual register and mask. While I was here I've used emplace_back and removed an unneeded use of a template.
2024-12-12[MISched] Unify the way to specify scheduling direction (#119518)Pengcheng Wang1-46/+39
For pre-ra scheduling, we use two options `-misched-topdown` and `-misched-bottomup` to force the direction. While for post-ra scheduling, we use `-misched-postra-direction` with enumerated values (`topdown`, `bottomup` and `bidirectional`). This is not unified and adds some mental burdens. Here we replace these two options `-misched-topdown` and `-misched-bottomup` with `-misched-prera-direction` with the same enumerated values. To avoid the condition of `getNumOccurrences() > 0`, we add a new enum value `Unspecified` and make it the default initial value. These options are hidden, so we needn't keep the compatibility.
2024-12-10[MISched] Compare right next cluster node (#116584)Pengcheng Wang1-3/+6
We support bottom-up and bidirectonal postra scheduling now, but we only compare successive next cluster node as if we are doing topdown scheduling. This makes load/store clustering and macro fusions wrong. This patch makes sure that we can get the right cluster node by the scheduling direction.
2024-12-05[Sched] Skip MemOp with unknown size when clustering (#118443)Pengcheng Wang1-0/+3
In #83875, we changed the type of `Width` to `LocationSize`. To get the clsuter bytes, we use `LocationSize::getValue()` to calculate the value. But when `Width` is an unknown size `LocationSize`, an assertion "Getting value from an unknown LocationSize!" will be triggered. This patch simply skips MemOp with unknown size to fix this issue and keep the logic the same as before. This issue was found when implementing software pipeliner for RISC-V in #117546. The pipeliner may clone some memory operations with `BeforeOrAfterPointer` size.
2024-11-27[MISched] Use right boundary when trying latency heuristics (#116592)Pengcheng Wang1-3/+7
We may do bottom-up or bidirectional scheduling but previously we assume we are doing top-down scheduling, which may cause some issues.
2024-11-12[MISched] Add a hook to override PostRA scheduling policy (#115455)Pengcheng Wang1-9/+22
PostRA scheduling supports different directions now, but we can only specify it via command line options. This patch adds a new hook `overridePostRASchedPolicy` for targets to override PostRA scheduling policy. Note that some options like tracking register pressure won't take effect in PostRA scheduling.
2024-11-08[CodeGen][MISched] Set DumpDirection after initPolicy (#115112)Pengcheng Wang1-16/+10
Previously we set the dump direction according to command line options, but we may override the scheduling direction in `initPolicy` and this results in mismatch between dump and actual policy. Here we simply set the dump direction after initializing the policy.
2024-09-24llvm-reduce: Don't print verifier failed machine functions (#109673)Matt Arsenault1-4/+4
This produces far too much terminal output, particularly for the instruction reduction. Since it doesn't consider the liveness of of the instructions it's deleting, it produces quite a lot of verifier errors.
2024-08-29[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)Stephen Tozer1-1/+2
This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-04[CodeGen] Construct SmallVector with ArrayRef (NFC) (#101841)Kazu Hirata1-2/+2
2024-07-10[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118)paperchalice1-4/+4
- Add `LiveIntervalsAnalysis`. - Add `LiveIntervalsPrinterPass`. - Use `LiveIntervalsWrapperPass` in legacy pass manager. - Use `std::unique_ptr` instead of raw pointer for `LICalc`, so destructor and default move constructor can handle it correctly. This would be the last analysis required by `PHIElimination`.
2024-07-09[CodeGen][NewPM] Port `SlotIndexes` to new pass manager (#97941)paperchalice1-3/+3
- Add `SlotIndexesAnalysis`. - Add `SlotIndexesPrinterPass`. - Use `SlotIndexesWrapperPass` in legacy pass.
2024-07-09[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793)paperchalice1-6/+6
- Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-01[llvm][CodeGen] Avoid 'raw_string_ostream::str' (NFC) (#97318)Youngsuk Kim1-1/+1
Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.
2024-06-11[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis ↵paperchalice1-5/+5
result (#94571) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.
2024-05-22[MISched][NFC] Add documentation comment in pickNode for ReadyQueue ↵Michael Maitland1-0/+15
maintenence (#92976) I had some trouble understanding why `removeReady` removed nodes from the Pending queue, since my intuition told me that the Pending queue did not represent a node that was ready. I took a deeper look and found that pickOnlyNode and pickNodeFromQueue only picked nodes from the Available queue too. I found that need to nodes from the Available and Pending queues that correspond to the opposite direction that we ended up choosing from (IsTopNode vs !IsTopNode). It took me a little longer than I would have liked to understand this fact, so I figured that I would add a comment in the code that makes it clear for future readers.
2024-05-21MachineScheduler: Add parameter name commentsMatt Arsenault1-2/+4
2024-04-15[mi-sched] Suppress register pressure with i64. (#88256)laichunfeng1-2/+4
Machine scheduler will suppress register pressure when the scheduling window is too small, but now it doesn't consider i64 register type, and this MR extends it into i64 register type, so architecture like RISCV64 that only supports i64 interger register will have the same behavior like RISCV32.
2024-04-02MachineScheduler: Simplify usage of TargetInstrInfoMatt Arsenault1-12/+4
2024-03-25[CodeGen][MISched] Add misched post-regalloc bidirectional scheduling (#77138)Michael Maitland1-12/+101
This PR is stacked on #76186. This PR keeps the default strategy as top-down since that is what existing targets expect. It can be enabled using `-misched-postra-direction=bidirectional`. It is up to targets to decide whether they would like to enable this option for themselves.
2024-03-06[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875)David Green1-6/+7
This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).