riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Di Zhao <dizhao@os.amperecomputing.com>	2023-12-15 03:22:32 +0800
committer	Di Zhao <dizhao@os.amperecomputing.com>	2023-12-15 03:39:37 +0800
commit	8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652 (patch)
tree	d2f725922c44cf4c107c3904e0f4f3095cc67ec6 /gcc/tree-vect-loop.cc
parent	95b70545331764c85079a1d0e1e19b605bda1456 (diff)
download	gcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.zip gcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.tar.gz gcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.tar.bz2

Consider fully pipelined FMA in get_reassociation_width

Add a new parameter param_fully_pipelined_fma. If it is non-zero, reassociation considers the benefit of parallelizing FMA's multiplication part and addition part, assuming FMUL and FMA use the same units that can also do FADD. With the patch and new option, there's ~2% improvement in spec2017 508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1 -flto".) PR tree-optimization/110279 gcc/ChangeLog: * doc/invoke.texi: New parameter fully-pipelined-fma. * params.opt: New parameter fully-pipelined-fma. * tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return the latency of MULT_EXPRs that can't be hidden by the FMAs. (get_reassociation_width): Search for a smaller width considering the benefit of fully pipelined FMA. (rank_ops_for_fma): Return the number of MULT_EXPRs. (reassociate_bb): Pass the number of MULT_EXPRs to get_reassociation_width; avoid calling get_reassociation_width twice. gcc/testsuite/ChangeLog: * gcc.dg/pr110279-2.c: New test.

Diffstat (limited to 'gcc/tree-vect-loop.cc')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: