aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineRegisterInfo.cpp
diff options
context:
space:
mode:
authorLucas Ramirez <11032120+lucas-rami@users.noreply.github.com>2025-05-08 12:51:06 +0200
committerGitHub <noreply@github.com>2025-05-08 12:51:06 +0200
commit067caaafb58a156d0d77229422607782a639f5b5 (patch)
tree952f17d04aff192b3176074e28c5f7d73049dd64 /llvm/lib/CodeGen/MachineRegisterInfo.cpp
parentc4f723a7c3bb12ce4e247bcaed755c8d927f73a4 (diff)
downloadllvm-067caaafb58a156d0d77229422607782a639f5b5.zip
llvm-067caaafb58a156d0d77229422607782a639f5b5.tar.gz
llvm-067caaafb58a156d0d77229422607782a639f5b5.tar.bz2
[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885)
AMDGPU scheduler's `PreRARematStage` attempts to increase function occupancy w.r.t. ArchVGPR usage by rematerializing trivial ArchVGPR-defining instruction next to their single use. It first collects all eligible trivially rematerializable instructions in the function, then sinks them one-by-one while recomputing occupancy in all affected regions each time to determine if and when it has managed to increase overall occupancy. If it does, changes are committed to the scheduler's state; otherwise modifications to the IR are reverted and the scheduling stage gives up. In both cases, this scheduling stage currently involves repeated queries for up-to-date occupancy estimates and some state copying to enable reversal of sinking decisions when occupancy is revealed not to increase. The current implementation also does not accurately track register pressure changes in all regions affected by sinking decisions. This commit refactors this scheduling stage, improving RP tracking and splitting the stage into two distinct steps to avoid repeated occupancy queries and IR/state rollbacks. - Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The number of ArchVGPRs to save to reduce spilling or increase function occupancy by 1 (when there is no spilling) is computed. Then, instructions eligible for rematerialization are collected, stopping as soon as enough have been identified to be able to achieve our goal (according to slightly optimistic heuristics). If there aren't enough of such instructions, the scheduling stage stops here. - Rematerialization (`rematerialize`). Instructions collected in the first step are rematerialized one-by-one. Now we are able to directly update the scheduler's state since we have already done the occupancy analysis and know we won't have to rollback any state. Register pressures for impacted regions are recomputed only once, as opposed to at every sinking decision. In the case where the stage attempted to increase occupancy, and if both rematerializations alone and rescheduling after were unable to improve occupancy, then all rematerializations are rollbacked.
Diffstat (limited to 'llvm/lib/CodeGen/MachineRegisterInfo.cpp')
-rw-r--r--llvm/lib/CodeGen/MachineRegisterInfo.cpp5
1 files changed, 5 insertions, 0 deletions
diff --git a/llvm/lib/CodeGen/MachineRegisterInfo.cpp b/llvm/lib/CodeGen/MachineRegisterInfo.cpp
index 937f63f..b713525 100644
--- a/llvm/lib/CodeGen/MachineRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/MachineRegisterInfo.cpp
@@ -432,6 +432,11 @@ bool MachineRegisterInfo::hasOneNonDBGUser(Register RegNo) const {
return hasSingleElement(use_nodbg_instructions(RegNo));
}
+MachineInstr *MachineRegisterInfo::getOneNonDBGUser(Register RegNo) const {
+ auto RegNoDbgUsers = use_nodbg_instructions(RegNo);
+ return hasSingleElement(RegNoDbgUsers) ? &*RegNoDbgUsers.begin() : nullptr;
+}
+
bool MachineRegisterInfo::hasAtMostUserInstrs(Register Reg,
unsigned MaxUsers) const {
return hasNItemsOrLess(use_instr_nodbg_begin(Reg), use_instr_nodbg_end(),