aboutsummaryrefslogtreecommitdiff
path: root/gcc/digraph.h
diff options
context:
space:
mode:
authorKyrylo Tkachov <ktkachov@nvidia.com>2024-08-02 06:48:47 -0700
committerKyrylo Tkachov <ktkachov@nvidia.com>2024-08-19 10:58:03 +0200
commitcc572242688f0c6f8733c173038163efb09560fa (patch)
treea5afa627f312152f30dd4b371928321e1a8064ed /gcc/digraph.h
parent6d8b9b772e0b3969e6b3fcf0363d6afcce2e65c9 (diff)
downloadgcc-cc572242688f0c6f8733c173038163efb09560fa.zip
gcc-cc572242688f0c6f8733c173038163efb09560fa.tar.gz
gcc-cc572242688f0c6f8733c173038163efb09560fa.tar.bz2
aarch64: Reduce FP reassociation width for Neoverse V2 and set AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA
The fp reassociation width for Neoverse V2 was set to 6 since its introduction and I guess it was empirically tuned. But since AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA was added the tree reassociation pass seems to be more deliberate in forming FMAs and when that flag is used it seems to more properly evaluate the FMA vs non-FMA reassociation widths. According to the Neoverse V2 SWOG the core has a throughput of 4 for most FP operations, so the value 6 is not accurate anyway. Also, the SWOG does state that FMADD operations are pipelined and the results can be forwarded from FP multiplies to the accumulation operands of FMADD instructions, which seems to be what AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA expresses. This patch sets the fp_reassoc_width field to 4 and enables AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA for -mcpu=neoverse-v2. On SPEC2017 fprate I see the following changes on a Grace system: 503.bwaves_r 0.16% 507.cactuBSSN_r -0.32% 508.namd_r 3.04% 510.parest_r 0.00% 511.povray_r 0.78% 519.lbm_r 0.35% 521.wrf_r 0.69% 526.blender_r -0.53% 527.cam4_r 0.84% 538.imagick_r 0.00% 544.nab_r -0.97% 549.fotonik3d_r -0.45% 554.roms_r 0.97% Geomean 0.35% with -Ofast -mcpu=grace -flto. So slight overall improvement with a meaningful improvement in 508.namd_r. I think other tunings in aarch64 should look into AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA as well, but I'll leave the benchmarking to someone else. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ChangeLog: * config/aarch64/tuning_models/neoversev2.h (fp_reassoc_width): Set to 4. (tune_flags): Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
Diffstat (limited to 'gcc/digraph.h')
0 files changed, 0 insertions, 0 deletions