diff options
author | Richard Sandiford <richard.sandiford@arm.com> | 2025-04-07 08:03:49 +0100 |
---|---|---|
committer | Richard Sandiford <richard.sandiford@arm.com> | 2025-04-07 08:03:49 +0100 |
commit | a1a0026c659196928113bad1c7889f5ca0999d06 (patch) | |
tree | 0c6dc1c5caa762d60c37ccbf64573f72ed3549bd /gcc/ada/gcc-interface/utils.cc | |
parent | 107a1b2126ceb42a79edbc388863c868bd4fbc2e (diff) | |
download | gcc-a1a0026c659196928113bad1c7889f5ca0999d06.zip gcc-a1a0026c659196928113bad1c7889f5ca0999d06.tar.gz gcc-a1a0026c659196928113bad1c7889f5ca0999d06.tar.bz2 |
combine: Limit insn searchs for 2->2 combinations [PR116398]
As noted in the previous patch, combine still takes >30% of
compile time in the original testcase for PR101523. The problem
is that try_combine uses linear insn searches for some dataflow
queries, so in the worst case, an unlimited number of 2->2
combinations for the same i2 can lead to quadratic behaviour.
This patch limits distribute_links to a certain number
of instructions when i2 is unchanged. As Segher said in the PR trail,
it would make more conceptual sense to apply the limit unconditionally,
but I thought it would be better to change as little as possible at
this development stage. Logically, in stage 1, the --param should
be applied directly by distribute_links with no input from callers.
As I mentioned in:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
I think it's safe to drop log links even if a use exists. All
processing of log links seems to handle the absence of a link
for a particular register in a conservative way.
The initial set-up errs on the side of dropping links, since for example
create_log_links has:
/* flow.c claimed:
We don't build a LOG_LINK for hard registers contained
in ASM_OPERANDs. If these registers get replaced,
we might wind up changing the semantics of the insn,
even if reload can make what appear to be valid
assignments later. */
if (regno < FIRST_PSEUDO_REGISTER
&& asm_noperands (PATTERN (use_insn)) >= 0)
continue;
which excludes combinations by dropping log links, rather than during
try_combine. And:
/* If this register is being initialized using itself, and the
register is uninitialized in this basic block, and there are
no LOG_LINKS which set the register, then part of the
register is uninitialized. In that case we can't assume
anything about the number of nonzero bits.
??? We could do better if we checked this in
reg_{nonzero_bits,num_sign_bit_copies}_for_combine. Then we
could avoid making assumptions about the insn which initially
sets the register, while still using the information in other
insns. We would have to be careful to check every insn
involved in the combination. */
if (insn
&& reg_referenced_p (x, PATTERN (insn))
&& !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
REGNO (x)))
{
struct insn_link *link;
FOR_EACH_LOG_LINK (link, insn)
if (dead_or_set_p (link->insn, x))
break;
if (!link)
{
rsp->nonzero_bits = GET_MODE_MASK (mode);
rsp->sign_bit_copies = 1;
return;
}
}
treats the lack of a log link as a possible sign of uninitialised data,
but that would be a missed optimisation rather than a correctness issue.
One question is what the default --param value should be. I went with
Jakub's suggestion of 3000 from:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
Also, to answer Jakub's question in that comment, I tried bisecting:
int limit = atoi (getenv ("BISECT"));
(so applying the limit for all calls from try_combine) with an
abort in distribute_links if the limit caused a link to be skipped.
The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
was 142, so much lower than the parameter value. I realised too late
that --enable-checking=release would probably have been a more
interesting test.
The previous patch meant that distribute_links itself is now linear
for a given i2 definition, since each search starts at the previous
last use, rather than at i2 itself. This means that the limit has
to be applied cumulatively across all searches for the same link.
The patch does that by storing a counter in the insn_link structure.
There was a 32-bit hole there on LP64 hosts.
gcc/
PR testsuite/116398
* params.opt (-param=max-combine-search-insns=): New param.
* doc/invoke.texi: Document it.
* combine.cc (insn_link::insn_count): New field.
(alloc_insn_link): Initialize it.
(distribute_links): Add a limit parameter.
(try_combine): Use the new param to limit distribute_links
when only i3 has changed.
Diffstat (limited to 'gcc/ada/gcc-interface/utils.cc')
0 files changed, 0 insertions, 0 deletions