diff options
author | Di Zhao <dizhao@os.amperecomputing.com> | 2023-12-15 03:22:32 +0800 |
---|---|---|
committer | Di Zhao <dizhao@os.amperecomputing.com> | 2023-12-15 03:39:37 +0800 |
commit | 8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652 (patch) | |
tree | d2f725922c44cf4c107c3904e0f4f3095cc67ec6 /gcc/tree-vect-loop.cc | |
parent | 95b70545331764c85079a1d0e1e19b605bda1456 (diff) | |
download | gcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.zip gcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.tar.gz gcc-8afdbcdd7abe1e3c7a81e07f34c256e7f2dbc652.tar.bz2 |
Consider fully pipelined FMA in get_reassociation_width
Add a new parameter param_fully_pipelined_fma. If it is non-zero,
reassociation considers the benefit of parallelizing FMA's
multiplication part and addition part, assuming FMUL and FMA use the
same units that can also do FADD.
With the patch and new option, there's ~2% improvement in spec2017
508.namd on AmpereOne. (The other options are "-Ofast -mcpu=ampere1
-flto".)
PR tree-optimization/110279
gcc/ChangeLog:
* doc/invoke.texi: New parameter fully-pipelined-fma.
* params.opt: New parameter fully-pipelined-fma.
* tree-ssa-reassoc.cc (get_mult_latency_consider_fma): Return
the latency of MULT_EXPRs that can't be hidden by the FMAs.
(get_reassociation_width): Search for a smaller width
considering the benefit of fully pipelined FMA.
(rank_ops_for_fma): Return the number of MULT_EXPRs.
(reassociate_bb): Pass the number of MULT_EXPRs to
get_reassociation_width; avoid calling
get_reassociation_width twice.
gcc/testsuite/ChangeLog:
* gcc.dg/pr110279-2.c: New test.
Diffstat (limited to 'gcc/tree-vect-loop.cc')
0 files changed, 0 insertions, 0 deletions