sched1: parameterize pressure scheduling spilling aggressiveness [PR/114729]

sched1 computes ECC (Excess Change Cost) for each insn, which represents the register pressure attributed to the insn. Currently the pressure sensitive scheduling algorithm deliberately ignores negative ECC values (pressure reduction), making them 0 (neutral), leading to more spills. This happens due to the assumption that the compiler has a reasonably accurate processor pipeline scheduling model and thus tries to aggresively fill pipeline bubbles with spill slots. This however might not be true, as the model might not be available for certains uarches or even applicable especially for modern out-of-order cores. The existing heuristic induces spill frenzy on RISC-V, noticably so on SPEC2017 507.Cactu. If insn scheduling is disabled completely, the total dynamic icounts for this workload are reduced in half from ~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns). This patch adds --param=cycle-accurate-model={0,1} to gate the spill behavior. - The default (1) preserves existing spill behavior. - targets/uarches sensitive to spilling can override the param to (0) to get the reverse effect. RISC-V backend does so too. The actual perf numbers are very promising. (1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs: Before: ------ Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par': 4,917,712.97 msec task-clock:u # 1.000 CPUs utilized 5,314 context-switches:u # 1.081 /sec 3 cpu-migrations:u # 0.001 /sec 204,784 page-faults:u # 41.642 /sec 7,868,291,222,513 cycles:u # 1.600 GHz 2,615,069,866,153 instructions:u # 0.33 insn per cycle 10,799,381,890 branches:u # 2.196 M/sec 15,714,572 branch-misses:u # 0.15% of all branches After: ----- Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par': 4,552,979.58 msec task-clock:u # 0.998 CPUs utilized 205,020 context-switches:u # 45.030 /sec 2 cpu-migrations:u # 0.000 /sec 204,221 page-faults:u # 44.854 /sec 7,285,176,204,764 cycles:u (7.4% faster) # 1.600 GHz 2,145,284,345,397 instructions:u (17.96% fewer) # 0.29 insn per cycle 10,799,382,011 branches:u # 2.372 M/sec 16,235,628 branch-misses:u # 0.15% of all branches (2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs. gcc/ChangeLog: PR target/11472 * params.opt (--param=cycle-accurate-model=): New opt. * doc/invoke.texi (cycle-accurate-model): Document. * haifa-sched.cc (model_excess_group_cost): Return negative delta if param_cycle_accurate_model is 0. (model_excess_cost): Ceil negative baseECC to 0 only if param_cycle_accurate_model is 1. Dump the actual ECC value. * config/riscv/riscv.cc (riscv_option_override): Set param to 0. gcc/testsuite/ChangeLog: PR target/114729 * gcc.target/riscv/riscv.exp: Enable new tests to build. * gcc.target/riscv/sched1-spills/spill1.cpp: Add new test. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
author: Vineet Gupta <vineetg@rivosinc.com> 2024-12-04 10:42:37 -0800
committer: Vineet Gupta <vineetg@rivosinc.com> 2024-12-04 10:59:46 -0800
commit: 7bef3482f27ce13ba7e6c4f43943f28a49e63a40 (patch)
tree: e6eb7f505e524eb12e13be74d138c1e7596ceadb /gcc/testsuite
parent: 2b75fe3708f062a8bbb432d4b0002a7a94149ab3 (diff)
download: gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.zip
gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.tar.gz
gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.tar.bz2
2 files changed, 34 insertions, 0 deletions
diff --git a/gcc/testsuite/gcc.target/riscv/riscv.exp b/gcc/testsuite/gcc.target/riscv/riscv.exp
index 3620ece..ce84081 100644
--- a/gcc/testsuite/gcc.target/riscv/riscv.exp
+++ b/gcc/testsuite/gcc.target/riscv/riscv.exp
@@ -38,6 +38,8 @@ dg-init
 # Main loop.
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
 	"" $DEFAULT_CFLAGS
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/sched1-spills/*.{\[cS\],cpp}]] \
+	"" $DEFAULT_CFLAGS
 
 # Saturation alu
 foreach opt {
diff --git a/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp b/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp
new file mode 100644
index 0000000..8060ec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp
@@ -0,0 +1,32 @@
+/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -save-temps -fverbose-asm" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "O1" "-Og" "-Os" "-Oz" } } */
+
+/* Reduced from SPEC2017 Cactu ML_BSSN_Advect.cpp
+   by comparing -fschedule-insn and -fno-schedule-insns builds.
+   Shows up one extra spill (pair of spill markers "sfp") in verbose asm
+   output which the patch fixes.  */
+
+void s();
+double b, c, d, e, f, g, h, k, l, m, n, o, p, q, t, u, v;
+int *j;
+double *r, *w;
+long x;
+void y() {
+  double *a((double *)s);
+  for (;;)
+    for (; j[1];)
+      for (int i = 1; i < j[0]; i++) {
+        k = l;
+        m = n;
+        o = p = q;
+        r[0] = t;
+        a[0] = u;
+        x = g;
+        e = f;
+        v = w[x];
+        b = c;
+        d = h;
+        }
+}
+
+/* { dg-final { scan-assembler-not "%sfp" } } */
author	Vineet Gupta <vineetg@rivosinc.com>	2024-12-04 10:42:37 -0800
committer	Vineet Gupta <vineetg@rivosinc.com>	2024-12-04 10:59:46 -0800
commit	7bef3482f27ce13ba7e6c4f43943f28a49e63a40 (patch)
tree	e6eb7f505e524eb12e13be74d138c1e7596ceadb /gcc/testsuite
parent	2b75fe3708f062a8bbb432d4b0002a7a94149ab3 (diff)
download	gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.zip gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.tar.gz gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.tar.bz2