aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop.cc
diff options
context:
space:
mode:
authorDi Zhao <dizhao@os.amperecomputing.com>2023-12-15 03:22:32 +0800
committerDi Zhao <dizhao@os.amperecomputing.com>2023-12-15 03:39:37 +0800
commit8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652 (patch)
treed2f725922c44cf4c107c3904e0f4f3095cc67ec6 /gcc/tree-vect-loop.cc
parent95b70545331764c85079a1d0e1e19b605bda1456 (diff)
downloadgcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.zip
gcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.tar.gz
gcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.tar.bz2
Consider fully pipelined FMA in get_reassociation_width
Add a new parameter param_fully_pipelined_fma. If it is non-zero, reassociation considers the benefit of parallelizing FMA's multiplication part and addition part, assuming FMUL and FMA use the same units that can also do FADD. With the patch and new option, there's ~2% improvement in spec2017 508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1 -flto".) PR tree-optimization/110279 gcc/ChangeLog: * doc/invoke.texi: New parameter fully-pipelined-fma. * params.opt: New parameter fully-pipelined-fma. * tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return the latency of MULT_EXPRs that can't be hidden by the FMAs. (get_reassociation_width): Search for a smaller width considering the benefit of fully pipelined FMA. (rank_ops_for_fma): Return the number of MULT_EXPRs. (reassociate_bb): Pass the number of MULT_EXPRs to get_reassociation_width; avoid calling get_reassociation_width twice. gcc/testsuite/ChangeLog: * gcc.dg/pr110279-2.c: New test.
Diffstat (limited to 'gcc/tree-vect-loop.cc')
0 files changed, 0 insertions, 0 deletions