aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/AST/ByteCode/Compiler.cpp
diff options
context:
space:
mode:
authorLucas Ramirez <11032120+lucas-rami@users.noreply.github.com>2025-05-13 11:11:00 +0200
committerGitHub <noreply@github.com>2025-05-13 11:11:00 +0200
commit6456ee056ff2ddce665df2032620207281fedaf8 (patch)
tree34e14dee0e75a389c0d63f932c2bbea9dc6cb274 /clang/lib/AST/ByteCode/Compiler.cpp
parent3de2fa91e17287c0bc45f8080064fc327e803314 (diff)
downloadllvm-6456ee056ff2ddce665df2032620207281fedaf8.zip
llvm-6456ee056ff2ddce665df2032620207281fedaf8.tar.gz
llvm-6456ee056ff2ddce665df2032620207281fedaf8.tar.bz2
Reapply "[AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (#125885)" (#139548)
This reapplies 067caaa and 382a085 (reverting b35f6e2) with fixes to issues detected by the address sanitizer (MIs have to be removed from live intervals before being removed from their parent MBB). Original commit description below. AMDGPU scheduler's `PreRARematStage` attempts to increase function occupancy w.r.t. ArchVGPR usage by rematerializing trivial ArchVGPR-defining instruction next to their single use. It first collects all eligible trivially rematerializable instructions in the function, then sinks them one-by-one while recomputing occupancy in all affected regions each time to determine if and when it has managed to increase overall occupancy. If it does, changes are committed to the scheduler's state; otherwise modifications to the IR are reverted and the scheduling stage gives up. In both cases, this scheduling stage currently involves repeated queries for up-to-date occupancy estimates and some state copying to enable reversal of sinking decisions when occupancy is revealed not to increase. The current implementation also does not accurately track register pressure changes in all regions affected by sinking decisions. This commit refactors this scheduling stage, improving RP tracking and splitting the stage into two distinct steps to avoid repeated occupancy queries and IR/state rollbacks. - Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The number of ArchVGPRs to save to reduce spilling or increase function occupancy by 1 (when there is no spilling) is computed. Then, instructions eligible for rematerialization are collected, stopping as soon as enough have been identified to be able to achieve our goal (according to slightly optimistic heuristics). If there aren't enough of such instructions, the scheduling stage stops here. - Rematerialization (`rematerialize`). Instructions collected in the first step are rematerialized one-by-one. Now we are able to directly update the scheduler's state since we have already done the occupancy analysis and know we won't have to rollback any state. Register pressures for impacted regions are recomputed only once, as opposed to at every sinking decision. In the case where the stage attempted to increase occupancy, and if both rematerializations alone and rescheduling after were unable to improve occupancy, then all rematerializations are rollbacked.
Diffstat (limited to 'clang/lib/AST/ByteCode/Compiler.cpp')
0 files changed, 0 insertions, 0 deletions