diff options
author | Vineet Gupta <vineetg@rivosinc.com> | 2024-12-04 10:42:37 -0800 |
---|---|---|
committer | Vineet Gupta <vineetg@rivosinc.com> | 2024-12-04 10:59:46 -0800 |
commit | 7bef3482f27ce13ba7e6c4f43943f28a49e63a40 (patch) | |
tree | e6eb7f505e524eb12e13be74d138c1e7596ceadb /gcc/testsuite | |
parent | 2b75fe3708f062a8bbb432d4b0002a7a94149ab3 (diff) | |
download | gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.zip gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.tar.gz gcc-7bef3482f27ce13ba7e6c4f43943f28a49e63a40.tar.bz2 |
sched1: parameterize pressure scheduling spilling aggressiveness [PR/114729]
sched1 computes ECC (Excess Change Cost) for each insn, which represents
the register pressure attributed to the insn.
Currently the pressure sensitive scheduling algorithm deliberately ignores
negative ECC values (pressure reduction), making them 0 (neutral), leading
to more spills. This happens due to the assumption that the compiler has
a reasonably accurate processor pipeline scheduling model and thus tries
to aggresively fill pipeline bubbles with spill slots.
This however might not be true, as the model might not be available for
certains uarches or even applicable especially for modern out-of-order cores.
The existing heuristic induces spill frenzy on RISC-V, noticably so on
SPEC2017 507.Cactu. If insn scheduling is disabled completely, the
total dynamic icounts for this workload are reduced in half from
~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns).
This patch adds --param=cycle-accurate-model={0,1} to gate the spill
behavior.
- The default (1) preserves existing spill behavior.
- targets/uarches sensitive to spilling can override the param to (0)
to get the reverse effect. RISC-V backend does so too.
The actual perf numbers are very promising.
(1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs:
Before:
------
Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':
4,917,712.97 msec task-clock:u # 1.000 CPUs utilized
5,314 context-switches:u # 1.081 /sec
3 cpu-migrations:u # 0.001 /sec
204,784 page-faults:u # 41.642 /sec
7,868,291,222,513 cycles:u # 1.600 GHz
2,615,069,866,153 instructions:u # 0.33 insn per cycle
10,799,381,890 branches:u # 2.196 M/sec
15,714,572 branch-misses:u # 0.15% of all branches
After:
-----
Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':
4,552,979.58 msec task-clock:u # 0.998 CPUs utilized
205,020 context-switches:u # 45.030 /sec
2 cpu-migrations:u # 0.000 /sec
204,221 page-faults:u # 44.854 /sec
7,285,176,204,764 cycles:u (7.4% faster) # 1.600 GHz
2,145,284,345,397 instructions:u (17.96% fewer) # 0.29 insn per cycle
10,799,382,011 branches:u # 2.372 M/sec
16,235,628 branch-misses:u # 0.15% of all branches
(2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs.
gcc/ChangeLog:
PR target/11472
* params.opt (--param=cycle-accurate-model=): New opt.
* doc/invoke.texi (cycle-accurate-model): Document.
* haifa-sched.cc (model_excess_group_cost): Return negative
delta if param_cycle_accurate_model is 0.
(model_excess_cost): Ceil negative baseECC to 0 only if
param_cycle_accurate_model is 1.
Dump the actual ECC value.
* config/riscv/riscv.cc (riscv_option_override): Set param
to 0.
gcc/testsuite/ChangeLog:
PR target/114729
* gcc.target/riscv/riscv.exp: Enable new tests to build.
* gcc.target/riscv/sched1-spills/spill1.cpp: Add new test.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Diffstat (limited to 'gcc/testsuite')
-rw-r--r-- | gcc/testsuite/gcc.target/riscv/riscv.exp | 2 | ||||
-rw-r--r-- | gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp | 32 |
2 files changed, 34 insertions, 0 deletions
diff --git a/gcc/testsuite/gcc.target/riscv/riscv.exp b/gcc/testsuite/gcc.target/riscv/riscv.exp index 3620ece..ce84081 100644 --- a/gcc/testsuite/gcc.target/riscv/riscv.exp +++ b/gcc/testsuite/gcc.target/riscv/riscv.exp @@ -38,6 +38,8 @@ dg-init # Main loop. gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \ "" $DEFAULT_CFLAGS +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/sched1-spills/*.{\[cS\],cpp}]] \ + "" $DEFAULT_CFLAGS # Saturation alu foreach opt { diff --git a/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp b/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp new file mode 100644 index 0000000..8060ec2 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sched1-spills/spill1.cpp @@ -0,0 +1,32 @@ +/* { dg-options "-O2 -march=rv64gc -mabi=lp64d -save-temps -fverbose-asm" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "O1" "-Og" "-Os" "-Oz" } } */ + +/* Reduced from SPEC2017 Cactu ML_BSSN_Advect.cpp + by comparing -fschedule-insn and -fno-schedule-insns builds. + Shows up one extra spill (pair of spill markers "sfp") in verbose asm + output which the patch fixes. */ + +void s(); +double b, c, d, e, f, g, h, k, l, m, n, o, p, q, t, u, v; +int *j; +double *r, *w; +long x; +void y() { + double *a((double *)s); + for (;;) + for (; j[1];) + for (int i = 1; i < j[0]; i++) { + k = l; + m = n; + o = p = q; + r[0] = t; + a[0] = u; + x = g; + e = f; + v = w[x]; + b = c; + d = h; + } +} + +/* { dg-final { scan-assembler-not "%sfp" } } */ |