aboutsummaryrefslogtreecommitdiff
path: root/gcc/rtl-ssa/functions.cc
diff options
context:
space:
mode:
authorRichard Sandiford <richard.sandiford@arm.com>2021-02-15 15:05:22 +0000
committerRichard Sandiford <richard.sandiford@arm.com>2021-02-15 15:05:22 +0000
commitabe07a74bb7a2692eff2af151ca54e749ed5eba6 (patch)
tree742ecbe9bb8810e1e4cc1fa5ae8cdd06eb0b3f09 /gcc/rtl-ssa/functions.cc
parent40f235b5f00009ea35fcd8fae08566e65a864a46 (diff)
downloadgcc-abe07a74bb7a2692eff2af151ca54e749ed5eba6.zip
gcc-abe07a74bb7a2692eff2af151ca54e749ed5eba6.tar.gz
gcc-abe07a74bb7a2692eff2af151ca54e749ed5eba6.tar.bz2
rtl-ssa: Reduce the amount of temporary memory needed [PR98863]
The rtl-ssa code uses an on-the-side IL and needs to build that IL for each block and RTL insn. I'd originally not used the classical dominance frontier method for placing phis on the basis that it seemed like more work in this context: we're having to visit everything in an RPO walk anyway, so for non-backedge cases we can tell immediately whether a phi node is needed. We then speculatively created phis for registers that are live across backedges and simplified them later. This avoided having to walk most of the IL twice (once to build the initial IL, and once to link uses to phis). However, as shown in PR98863, this leads to excessive temporary memory in extreme cases, since we had to record the value of every live register on exit from every block. In that PR, there were many registers that were live (but unused) across a large region of code. This patch does use the classical approach to placing phis, but tries to use the existing DF defs information to avoid two walks of the IL. We still use the previous approach for memory, since there is no up-front information to indicate whether a block defines memory or not. However, since memory is just treated as a single unified thing (like for gimple vops), memory doesn't suffer from the same scalability problems as registers. With this change, fwprop no longer seems to be a memory-hog outlier in the PR: the maximum RSS is similar with and without fwprop. The PR also shows the problems inherent in using bitmap operations involving the live-in and live-out sets, which in the testcase are very large. I've therefore tried to reduce those operations to the bare minimum. The patch also includes other compile-time optimisations motivated by the PR; see the changelog for details. I tried adding: for (int i = 0; i < 200; ++i) { crtl->ssa = new rtl_ssa::function_info (cfun); delete crtl->ssa; } to fwprop.c to stress the code. fwprop then took 35% of the compile time for the problematic partition in the PR (measured on a release build). fwprop takes less than .5% of the compile time when running normally. The command: git diff 0b76990a9d75d97b84014e37519086b81824c307~ gcc/fwprop.c | \ patch -p1 -R still gives a working compiler that uses the old fwprop.c. The compile time with that version is very similar. For a more reasonable testcase like optabs.ii at -O, I saw a 6.7% compile time regression with the loop above added (i.e. creating the info 201 times per pass instead of once per pass). That goes down to 4.8% with -O -g. I can't measure a significant difference with a normal compiler (no 200-iteration loop). So I think that (as expected) the patch does make things a bit slower in the normal case. But like Richi says, peak memory usage is harder for users to work around than slighter slower compile times. gcc/ PR rtl-optimization/98863 * rtl-ssa/functions.h (function_info::bb_live_out_info): Delete. (function_info::build_info): Turn into a declaration, moving the definition to internals.h. (function_info::bb_walker): Declare. (function_info::create_reg_use): Likewise. (function_info::calculate_potential_phi_regs): Take a build_info parameter. (function_info::place_phis, function_info::create_ebbs): Declare. (function_info::calculate_ebb_live_in_for_debug): Likewise. (function_info::populate_backedge_phis): Delete. (function_info::start_block, function_info::end_block): Declare. (function_info::populate_phi_inputs): Delete. (function_info::m_potential_phi_regs): Move information to build_info. * rtl-ssa/internals.h: New file. (function_info::bb_phi_info): New class. (function_info::build_info): Moved from functions.h. Add a constructor and destructor. (function_info::build_info::ebb_use): Delete. (function_info::build_info::ebb_def): Likewise. (function_info::build_info::bb_live_out): Likewise. (function_info::build_info::tmp_ebb_live_in_for_debug): New variable. (function_info::build_info::potential_phi_regs): Likewise. (function_info::build_info::potential_phi_regs_for_debug): Likewise. (function_info::build_info::ebb_def_regs): Likewise. (function_info::build_info::bb_phis): Likewise. (function_info::build_info::bb_mem_live_out): Likewise. (function_info::build_info::bb_to_rpo): Likewise. (function_info::build_info::def_stack): Likewise. (function_info::build_info::old_def_stack_limit): Likewise. * rtl-ssa/internals.inl (function_info::build_info::record_reg_def): Remove the regno argument. Push the previous definition onto the definition stack where necessary. * rtl-ssa/accesses.cc: Include internals.h. * rtl-ssa/changes.cc: Likewise. * rtl-ssa/blocks.cc: Likewise. (function_info::build_info::build_info): Define. (function_info::build_info::~build_info): Likewise. (function_info::bb_walker): New class. (function_info::bb_walker::bb_walker): Define. (function_info::add_live_out_use): Convert a logarithmic-complexity test into a linear one. Allow the same definition to be passed multiple times. (function_info::calculate_potential_phi_regs): Moved from functions.cc. Take a build_info parameter and store the information there instead. (function_info::place_phis): New function. (function_info::add_entry_block_defs): Update call to record_reg_def. (function_info::calculate_ebb_live_in_for_debug): New function. (function_info::add_phi_nodes): Use bb_phis to decide which registers need phi nodes and initialize ebb_def_regs accordingly. Do not add degenerate phis here. (function_info::add_artificial_accesses): Use create_reg_use. Assert that all definitions are listed in the DF LR sets. Update call to record_reg_def. (function_info::record_block_live_out): Record live-out register values in the phis of successor blocks. Use the live-out set when processing the last block in an EBB, instead of always using the live-in sets of successor blocks. AND the live sets with the set of registers that have been defined in the EBB, rather than with all potential phi registers. Cope correctly with branches back to the start of the current EBB. (function_info::start_block): New function. (function_info::end_block): Likewise. (function_info::populate_phi_inputs): Likewise. (function_info::create_ebbs): Likewise. (function_info::process_all_blocks): Rewrite into a multi-phase process. * rtl-ssa/functions.cc: Include internals.h. (function_info::calculate_potential_phi_regs): Move to blocks.cc. (function_info::init_function_data): Remove caller. * rtl-ssa/insns.cc: Include internals.h (function_info::create_reg_use): New function. Lazily any degenerate phis needed by the linear RPO view. (function_info::record_use): Use create_reg_use. When processing debug uses, use potential_phi_regs and test it before checking whether the register is live on entry to the current EBB. Lazily calculate ebb_live_in_for_debug. (function_info::record_call_clobbers): Update call to record_reg_def. (function_info::record_def): Likewise.
Diffstat (limited to 'gcc/rtl-ssa/functions.cc')
-rw-r--r--gcc/rtl-ssa/functions.cc20
1 files changed, 1 insertions, 19 deletions
diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc
index 80821cd..257bc2b 100644
--- a/gcc/rtl-ssa/functions.cc
+++ b/gcc/rtl-ssa/functions.cc
@@ -26,6 +26,7 @@
#include "rtl.h"
#include "df.h"
#include "rtl-ssa.h"
+#include "rtl-ssa/internals.h"
#include "rtl-ssa/internals.inl"
using namespace rtl_ssa;
@@ -74,23 +75,6 @@ function_info::print (pretty_printer *pp) const
}
}
-// Calculate m_potential_phi_regs.
-void
-function_info::calculate_potential_phi_regs ()
-{
- auto *lr_info = DF_LR_BB_INFO (ENTRY_BLOCK_PTR_FOR_FN (m_fn));
- for (unsigned int regno = 0; regno < m_num_regs; ++regno)
- if (regno >= DF_REG_SIZE (DF)
- // Exclude registers that have a single definition that dominates
- // all uses. If the definition does not dominate all uses,
- // the register will be exposed upwards to the entry block but
- // will not be defined by the entry block.
- || DF_REG_DEF_COUNT (regno) > 1
- || (!bitmap_bit_p (&lr_info->def, regno)
- && bitmap_bit_p (&lr_info->out, regno)))
- bitmap_set_bit (m_potential_phi_regs, regno);
-}
-
// Initialize all member variables in preparation for (re)building
// SSA form from scratch.
void
@@ -107,8 +91,6 @@ function_info::init_function_data ()
m_last_insn = nullptr;
m_last_nondebug_insn = nullptr;
m_free_phis = nullptr;
-
- calculate_potential_phi_regs ();
}
// The initial phase of the phi simplification process. The cumulative