aboutsummaryrefslogtreecommitdiff
path: root/gcc/lto-streamer-out.c
diff options
context:
space:
mode:
authorJan Hubicka <jh@suse.cz>2020-05-20 15:58:22 +0200
committerJan Hubicka <jh@suse.cz>2020-05-20 15:58:22 +0200
commit03d90a20a1afcbb9c30da8d4adf4922b0685061f (patch)
tree400ebde593b20c6f604b5c44870275c92e159eef /gcc/lto-streamer-out.c
parenteb069ae8819c3a84d7f78becc5501e21ee3a9554 (diff)
downloadgcc-03d90a20a1afcbb9c30da8d4adf4922b0685061f.zip
gcc-03d90a20a1afcbb9c30da8d4adf4922b0685061f.tar.gz
gcc-03d90a20a1afcbb9c30da8d4adf4922b0685061f.tar.bz2
Avoid SCC hashing on unmergeable trees
This is new incarantion of patch to identify unmergeable tree at streaming out time rather than streaming in and to avoid pickling them to sccs with with hash codes. Building cc1 plus this patch reduces: [WPA] read 4452927 SCCs of average size 1.986030 [WPA] 8843646 tree bodies read in total [WPA] tree SCC table: size 524287, 205158 elements, collision ratio: 0.505204 [WPA] tree SCC max chain length 43 (size 1) [WPA] Compared 947551 SCCs, 780270 collisions (0.823460) [WPA] Merged 944038 SCCs [WPA] Merged 5253521 tree bodies [WPA] Merged 590027 types ... [WPA] Size of mmap'd section decls: 99229066 bytes [WPA] Size of mmap'd section function_body: 18398837 bytes [WPA] Size of mmap'd section refs: 733678 bytes [WPA] Size of mmap'd section jmpfuncs: 2965981 bytes [WPA] Size of mmap'd section pureconst: 170248 bytes [WPA] Size of mmap'd section profile: 17985 bytes [WPA] Size of mmap'd section symbol_nodes: 3392736 bytes [WPA] Size of mmap'd section inline: 2693920 bytes [WPA] Size of mmap'd section icf: 435557 bytes [WPA] Size of mmap'd section offload_table: 0 bytes [WPA] Size of mmap'd section lto: 4320 bytes [WPA] Size of mmap'd section ipa_sra: 651660 bytes ... to ... [WPA] read 3312246 unshared trees [WPA] read 1144381 mergeable SCCs of average size 4.833785 [WPA] 8843938 tree bodies read in total [WPA] tree SCC table: size 524287, 197767 elements, collision ratio: 0.506446 [WPA] tree SCC max chain length 43 (size 1) [WPA] Compared 946614 SCCs, 775077 collisions (0.818789) [WPA] Merged 943798 SCCs [WPA] Merged 5253336 tree bodies [WPA] Merged 590105 types .... [WPA] Size of mmap'd section decls: 81262144 bytes [WPA] Size of mmap'd section function_body: 14702611 bytes [WPA] Size of mmap'd section ext_symtab: 0 bytes [WPA] Size of mmap'd section refs: 733695 bytes [WPA] Size of mmap'd section jmpfuncs: 2332150 bytes [WPA] Size of mmap'd section pureconst: 170292 bytes [WPA] Size of mmap'd section profile: 17986 bytes [WPA] Size of mmap'd section symbol_nodes: 3393358 bytes [WPA] Size of mmap'd section inline: 2567939 bytes [WPA] Size of mmap'd section icf: 435633 bytes [WPA] Size of mmap'd section lto: 4320 bytes [WPA] Size of mmap'd section ipa_sra: 651824 bytes so results in about 22% reduction in global decl stream and 24% reduction on function bodies stream (which is read mostly by ICF) Martin, the zstd compression breaks the compression statistics (it works when GCC is configured for zlib) At first ltrans I get: [LTRANS] Size of mmap'd section decls: 3734248 bytes [LTRANS] Size of mmap'd section function_body: 4895962 bytes ... to ... [LTRANS] Size of mmap'd section decls: 3479850 bytes [LTRANS] Size of mmap'd section function_body: 3722935 bytes So 7% reduction of global stream and 31% reduction of function bodies. Stream in seems to get about 3% faster and stream out about 5% but it is close to noise factor of my experiment. I expect bigger speedups on Firefox but I did not test it today since my Firefox setup broke again. GCC is not very good example on the problem with anonymous namespace types since we do not have so many of them. Sice of object files in gcc directory is reduced by 11% (because hash numbers do not compress well I guess). The patch makes DFS walk to recognize trees that are not merged (anonymous namespace, local function/variable decls, anonymous types etc). As discussed on IRC this is now done during the SCC walk rather than during the hash computation. When local tree is discovered we know that SCC components of everything that is on the stack reffers to it and thus is also local. Moreover we mark trees into hash set in output block so if we get a cross edge referring to local tree it gets marked too. Patch also takes care of avoiding SCC wrappers around some trees. In particular 1) singleton unmergeable SCCs are now streamed inline in global decl stream This includes INTEGER_CSTs and IDENTIFIER_NODEs that are shared by different code than rest of tree merging. 2) We use LTO_trees instead of LTO_tree_scc to wrap unmergeable SCC components. It is still necessary to mark them because of forward references. LTO_trees has simple header with number of trees and then things are streamed same way as for LTO_tree_scc. That is tree headers first followed by pickled references so things may point to future. Of course it is not necessary for LTO_tree_scc to be single component and streamer out may group more components together, but I decided to not snowball the patch even more 3) In local streams when lto_output_tree is called and the topmost SCC components turns out to be singleton we stream the tree directly instead of LTO_tree_scc, hash code, pickled tree, reference to just stremaed tree. LTO_trees is used to wrap all trees needed to represent tree being streamed. It would make sense again to use only one LTO_trees rather than one per SCC but I think this can be done incrementally. In general local trees are now recognized by new predicate local_tree_p Bit subtle is handing of TRANLSATION_UNIT_DECL, INTEGER_CST and IDENTIFIER_NODE. TRANSLATION_UNIT_DECL a local tree but references to it does not make other trees local (because we also understand local decls now). So I check for it later after localness propagation is done. INTEGER_CST and IDENTIFIER_NODEs are merged but not via the tree merging machinery. So it makes sense to stream them as unmergeable trees but we still need to compute their hashes so SCCs referring them do not get too large collision chains. For this reason they are checked just prior stream out. lto-bootstrapped/regteted x86_64-linux, OK? gcc/ChangeLog: 2020-05-19 Jan Hubicka <hubicka@ucw.cz> * lto-streamer-in.c (lto_input_scc): Add SHARED_SCC parameter. (lto_input_tree_1): Strenghten sanity check. (lto_input_tree): Update call of lto_input_scc. * lto-streamer-out.c: Include ipa-utils.h (create_output_block): Initialize local_trees if merigng is going to happen. (destroy_output_block): Destroy local_trees. (DFS): Add max_local_entry. (local_tree_p): New function. (DFS::DFS): Initialize and maintain it. (DFS::DFS_write_tree): Decide on streaming format. (lto_output_tree): Stream inline singleton SCCs * lto-streamer.h (enum LTO_tags): Add LTO_trees. (struct output_block): Add local_trees. (lto_input_scc): Update prototype. gcc/lto/ChangeLog: 2020-05-19 Jan Hubicka <hubicka@ucw.cz> * lto-common.c (compare_tree_sccs_1): Sanity check that we never read TRANSLATION_UNIT_DECL. (process_dref): Break out from ... (unify_scc): ... here. (process_new_tree): Break out from ... (lto_read_decls): ... here; handle streaming of singleton trees. (print_lto_report_1): Update statistics.
Diffstat (limited to 'gcc/lto-streamer-out.c')
-rw-r--r--gcc/lto-streamer-out.c146
1 files changed, 127 insertions, 19 deletions
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 3d94324..0e17946 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-dfa.h"
#include "file-prefix-map.h" /* remap_debug_filename() */
#include "output.h"
+#include "ipa-utils.h"
static void lto_write_tree (struct output_block*, tree, bool);
@@ -75,6 +76,10 @@ create_output_block (enum lto_section_type section_type)
ob->section_type = section_type;
ob->decl_state = lto_get_out_decl_state ();
+ /* Only global decl stream in non-wpa will ever be considered by tree
+ merging. */
+ if (!flag_wpa && section_type == LTO_section_decls)
+ ob->local_trees = new (hash_set <tree>);
ob->main_stream = XCNEW (struct lto_output_stream);
ob->string_stream = XCNEW (struct lto_output_stream);
ob->writer_cache = streamer_tree_cache_create (!flag_wpa, true, false);
@@ -100,6 +105,7 @@ destroy_output_block (struct output_block *ob)
delete ob->string_hash_table;
ob->string_hash_table = NULL;
+ delete ob->local_trees;
free (ob->main_stream);
free (ob->string_stream);
@@ -532,6 +538,8 @@ private:
bool ref_p;
bool this_ref_p;
};
+ /* Maximum index of scc stack containing a local tree. */
+ int max_local_entry;
static int scc_entry_compare (const void *, const void *);
@@ -550,6 +558,41 @@ private:
struct obstack sccstate_obstack;
};
+/* Return true if type can not be merged with structurally same tree in
+ other translation unit. During stream out this information is propagated
+ to all trees referring to T and they are not streamed with additional
+ information needed by the tree merging in lto-common.c (in particular,
+ scc hash codes are not streamed).
+
+ TRANSLATION_UNIT_DECL is handled specially since references to it does
+ not make other trees local as well. */
+
+static bool
+local_tree_p (tree t)
+{
+ switch (TREE_CODE (t))
+ {
+ case LABEL_DECL:
+ return true;
+ case NAMESPACE_DECL:
+ return !DECL_NAME (t);
+ case VAR_DECL:
+ case FUNCTION_DECL:
+ return !TREE_PUBLIC (t) && !DECL_EXTERNAL (t);
+ case RECORD_TYPE:
+ case UNION_TYPE:
+ case ENUMERAL_TYPE:
+ /* Anonymous namespace types are local.
+ Only work hard for main variants;
+ variant types will inherit locality. */
+ return TYPE_MAIN_VARIANT (t) == t
+ && odr_type_p (t) && type_with_linkage_p (t)
+ && type_in_anonymous_namespace_p (t);
+ default:
+ return false;
+ }
+}
+
/* Emit the physical representation of tree node EXPR to output block OB,
using depth-first search on the subgraph. If THIS_REF_P is true, the
leaves of EXPR are emitted as references via lto_output_tree_ref.
@@ -560,6 +603,8 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
bool single_p)
{
unsigned int next_dfs_num = 1;
+
+ max_local_entry = -1;
gcc_obstack_init (&sccstate_obstack);
DFS_write_tree (ob, NULL, expr, ref_p, this_ref_p);
while (!worklist_vec.is_empty ())
@@ -586,6 +631,8 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
scc_entry e = { expr, 0 };
/* Not yet visited. DFS recurse and push it onto the stack. */
*slot = cstate = XOBNEW (&sccstate_obstack, struct sccs);
+ if (ob->local_trees && local_tree_p (expr))
+ max_local_entry = sccstack.length ();
sccstack.safe_push (e);
cstate->dfsnum = next_dfs_num++;
cstate->low = cstate->dfsnum;
@@ -640,7 +687,26 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
any merging there. */
hashval_t scc_hash = 0;
unsigned scc_entry_len = 0;
- if (!flag_wpa)
+ bool local_to_unit = !ob->local_trees
+ || max_local_entry >= (int)first;
+
+ /* Remember that trees are local so info gets propagated to other
+ SCCs. */
+ if (local_to_unit && ob->local_trees)
+ {
+ for (unsigned i = 0; i < size; ++i)
+ ob->local_trees->add (sccstack[first + i].t);
+ }
+
+ /* As a special case do not stream TRANSLATION_UNIT_DECL as shared
+ tree. We can not mark it local because references to it does not
+ make other trees local (all global decls reffer to it via
+ CONTEXT). */
+ if (size == 1
+ && TREE_CODE (sccstack[first].t) == TRANSLATION_UNIT_DECL)
+ local_to_unit = true;
+
+ if (!local_to_unit)
{
scc_hash = hash_scc (ob, first, size, ref_p, this_ref_p);
@@ -672,18 +738,47 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
gcc_checking_assert (scc_entry_len == 1);
}
- /* Write LTO_tree_scc. */
- streamer_write_record_start (ob, LTO_tree_scc);
- streamer_write_uhwi (ob, size);
- streamer_write_uhwi (ob, scc_hash);
+ worklist_vec.pop ();
+
+ /* Only global decl sections are considered by tree merging. */
+ if (ob->section_type != LTO_section_decls)
+ {
+ /* If this is the original tree we stream and it forms SCC
+ by itself then we do not need to stream SCC at all. */
+ if (worklist_vec.is_empty () && first == 0 && size == 1)
+ return;
+ streamer_write_record_start (ob, LTO_trees);
+ streamer_write_uhwi (ob, size);
+ }
+ /* Write LTO_tree_scc if tree merging is going to be performed. */
+ else if (!local_to_unit
+ /* These are special since sharing is not done by tree
+ merging machinery. We can not special case them earlier
+ because we still need to compute hash for further sharing
+ of trees referring to them. */
+ && (size != 1
+ || (TREE_CODE (sccstack[first].t) != IDENTIFIER_NODE
+ && (TREE_CODE (sccstack[first].t) != INTEGER_CST
+ || TREE_OVERFLOW (sccstack[first].t)))))
+
+ {
+ gcc_checking_assert (ob->section_type == LTO_section_decls);
+ streamer_write_record_start (ob, LTO_tree_scc);
+ streamer_write_uhwi (ob, size);
+ streamer_write_uhwi (ob, scc_hash);
+ }
+ /* Non-trivial SCCs must be packed to trees blocks so forward
+ references work correctly. */
+ else if (size != 1)
+ {
+ streamer_write_record_start (ob, LTO_trees);
+ streamer_write_uhwi (ob, size);
+ }
/* Write size-1 SCCs without wrapping them inside SCC bundles.
All INTEGER_CSTs need to be handled this way as we need
their type to materialize them. Also builtins are handled
- this way.
- ??? We still wrap these in LTO_tree_scc so at the
- input side we can properly identify the tree we want
- to ultimatively return. */
+ this way. */
if (size == 1)
lto_output_tree_1 (ob, expr, scc_hash, ref_p, this_ref_p);
else
@@ -722,10 +817,11 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
/* Finally truncate the vector. */
sccstack.truncate (first);
+ if ((int)first <= max_local_entry)
+ max_local_entry = first - 1;
if (from_state)
from_state->low = MIN (from_state->low, cstate->low);
- worklist_vec.pop ();
continue;
}
@@ -1569,7 +1665,14 @@ DFS::DFS_write_tree (struct output_block *ob, sccs *from_state,
/* Check if we already streamed EXPR. */
if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
- return;
+ {
+ /* Refernece to a local tree makes entry also local. We always process
+ top of stack entry, so set max to number of entries in stack - 1. */
+ if (ob->local_trees
+ && ob->local_trees->contains (expr))
+ max_local_entry = sccstack.length () - 1;
+ return;
+ }
worklist w;
w.expr = expr;
@@ -1641,15 +1744,20 @@ lto_output_tree (struct output_block *ob, tree expr,
DFS (ob, expr, ref_p, this_ref_p, false);
in_dfs_walk = false;
- /* Finally append a reference to the tree we were writing.
- ??? If expr ended up as a singleton we could have
- inlined it here and avoid outputting a reference. */
+ /* Finally append a reference to the tree we were writing. */
existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
- gcc_assert (existed_p);
- streamer_write_record_start (ob, LTO_tree_pickle_reference);
- streamer_write_uhwi (ob, ix);
- streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
- lto_tree_code_to_tag (TREE_CODE (expr)));
+
+ /* DFS walk above possibly skipped streaming EXPR itself to let us inline
+ it. */
+ if (!existed_p)
+ lto_output_tree_1 (ob, expr, 0, ref_p, this_ref_p);
+ else
+ {
+ streamer_write_record_start (ob, LTO_tree_pickle_reference);
+ streamer_write_uhwi (ob, ix);
+ streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
+ lto_tree_code_to_tag (TREE_CODE (expr)));
+ }
if (streamer_dump_file)
{
print_node_brief (streamer_dump_file, " Finished SCC of ",