[GreedyRA] Improve RA for nested loop induction variables (#72093)

Imagine a loop of the form: ``` preheader: %r = def header: bcc latch, inner inner1: .. inner2: b latch latch: %r = subs %r bcc header ``` It can be possible for code to spend a decent amount of time in the header<->latch loop, not going into the inner part of the loop as much. The greedy register allocator can prefer to spill _around_ %r though, adding spills around the subs in the loop, which can be very detrimental for performance. (The case I am looking at is actually a very deeply nested set of loops that repeat the header<->latch pattern at multiple different levels). The greedy RA will apply a preference to spill to the IV, as it is live through the header block. This patch attempts to add a heuristic to prevent that in this case for variables that look like IVs, in a similar regard to the extra spill weight that gets added to variables that look like IVs, that are expensive to spill. That will mean spills are more likely to be pushed into the inner blocks, where they are less likely to be executed and not as expensive as spills around the IV. This gives a 8% speedup in the exchange benchmark from spec2017 when compiled with flang-new, whilst importantly stabilising the scores to be less chaotic to other changes. Running ctmark showed no difference in the compile time. I've tried to run a range of benchmarking for performance, most of which were relatively flat not showing many large differences. One matrix multiply case improved 21.3% due to removing a cascading chains of spills, and some other knock-on effects happen which usually cause small differences in the scores.
author: David Green <david.green@arm.com> 2023-11-18 09:55:19 +0000
committer: GitHub <noreply@github.com> 2023-11-18 09:55:19 +0000
commit: 303a7835ff833278a0de20cf5a70085b2ae8fee1 (patch)
tree: c6a58b16ea7e57df2870f7de5cc9a15eb2b64c72 /llvm/lib/CodeGen/SplitKit.cpp
parent: 56d0e8ccf424ddcd74a505837b8966204aaba415 (diff)
download: llvm-303a7835ff833278a0de20cf5a70085b2ae8fee1.zip
llvm-303a7835ff833278a0de20cf5a70085b2ae8fee1.tar.gz
llvm-303a7835ff833278a0de20cf5a70085b2ae8fee1.tar.bz2
1 files changed, 12 insertions, 0 deletions
diff --git a/llvm/lib/CodeGen/SplitKit.cpp b/llvm/lib/CodeGen/SplitKit.cpp
index b1c8622..d6c0a78 100644
--- a/llvm/lib/CodeGen/SplitKit.cpp
+++ b/llvm/lib/CodeGen/SplitKit.cpp
@@ -45,6 +45,11 @@ using namespace llvm;
 
 #define DEBUG_TYPE "regalloc"
 
+static cl::opt<bool>
+    EnableLoopIVHeuristic("enable-split-loopiv-heuristic",
+                          cl::desc("Enable loop iv regalloc heuristic"),
+                          cl::init(true));
+
 STATISTIC(NumFinished, "Number of splits finished");
 STATISTIC(NumSimple,   "Number of splits that were simple");
 STATISTIC(NumCopies,   "Number of copies inserted for splitting");
@@ -293,6 +298,13 @@ void SplitAnalysis::calcLiveBlockInfo() {
       MFI = LIS.getMBBFromIndex(LVI->start)->getIterator();
   }
 
+  LooksLikeLoopIV = EnableLoopIVHeuristic && UseBlocks.size() == 2 &&
+                    any_of(UseBlocks, [this](BlockInfo &BI) {
+                      MachineLoop *L = Loops.getLoopFor(BI.MBB);
+                      return BI.LiveIn && BI.LiveOut && BI.FirstDef && L &&
+                             L->isLoopLatch(BI.MBB);
+                    });
+
   assert(getNumLiveBlocks() == countLiveBlocks(CurLI) && "Bad block count");
 }
author	David Green <david.green@arm.com>	2023-11-18 09:55:19 +0000
committer	GitHub <noreply@github.com>	2023-11-18 09:55:19 +0000
commit	303a7835ff833278a0de20cf5a70085b2ae8fee1 (patch)
tree	c6a58b16ea7e57df2870f7de5cc9a15eb2b64c72 /llvm/lib/CodeGen/SplitKit.cpp
parent	56d0e8ccf424ddcd74a505837b8966204aaba415 (diff)
download	llvm-303a7835ff833278a0de20cf5a70085b2ae8fee1.zip llvm-303a7835ff833278a0de20cf5a70085b2ae8fee1.tar.gz llvm-303a7835ff833278a0de20cf5a70085b2ae8fee1.tar.bz2