diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2023-07-28 09:39:46 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2023-07-28 09:39:46 +0100 |
commit | 095eb138f736d94dabf9a07a6671bd351be0e66a (patch) | |
tree | 218dfdc0a8304e51355be6ce850b27a05f7b9fe0 /gcc/expr.cc | |
parent | b24acae8f4d315a5b071ffc2574ce91c7a0800ca (diff) | |
download | gcc-095eb138f736d94dabf9a07a6671bd351be0e66a.zip gcc-095eb138f736d94dabf9a07a6671bd351be0e66a.tar.gz gcc-095eb138f736d94dabf9a07a6671bd351be0e66a.tar.bz2 |
PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
This patch is one of a series of fixes for PR rtl-optimization/110587,
a compile-time regression with -O0, that attempts to address the underlying
cause. As noted previously, the pathological test case pr28071.c contains
a large number of useless register-to-register moves that can produce
quadratic behaviour (in LRA). These moves are generated during RTL
expansion in emit_group_load_1, where the middle-end attempts to simplify
the source before calling extract_bit_field. This is reasonable if the
source is a complex expression (from before the tree-ssa optimizers), or
a SUBREG, or a hard register, but it's not particularly useful to copy
a pseudo register into a new pseudo register. This patch eliminates that
redundancy.
The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
777K lines, with this patch it contains 717K lines, i.e. saving about 60K
lines (admittedly of debugging text output, but it makes the point).
2023-07-28 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR middle-end/28071
PR rtl-optimization/110587
* expr.cc (emit_group_load_1): Simplify logic for calling
force_reg on ORIG_SRC, to avoid making a copy if the source
is already in a pseudo register.
Diffstat (limited to 'gcc/expr.cc')
-rw-r--r-- | gcc/expr.cc | 13 |
1 files changed, 4 insertions, 9 deletions
diff --git a/gcc/expr.cc b/gcc/expr.cc index fff09dc..174f8ac 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -2622,16 +2622,11 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type, be loaded directly into the destination. */ src = orig_src; if (!MEM_P (orig_src) - && (!CONSTANT_P (orig_src) - || (GET_MODE (orig_src) != mode - && GET_MODE (orig_src) != VOIDmode))) + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src)) + && !CONSTANT_P (orig_src)) { - if (GET_MODE (orig_src) == VOIDmode) - src = gen_reg_rtx (mode); - else - src = gen_reg_rtx (GET_MODE (orig_src)); - - emit_move_insn (src, orig_src); + gcc_assert (GET_MODE (orig_src) != VOIDmode); + src = force_reg (GET_MODE (orig_src), orig_src); } /* Optimize the access just a bit. */ |