diff options
author | Giuliano Belinassi <giuliano.belinassi@usp.br> | 2020-08-14 17:00:47 -0300 |
---|---|---|
committer | Giuliano Belinassi <giuliano.belinassi@usp.br> | 2020-08-20 16:51:07 -0300 |
commit | 82d1c7e47a05fcfb7bd64e534b7e98d055047e72 (patch) | |
tree | dc5ff364e7d6ea40dc9d0d57cd7031975ec7f757 /gcc/tree-ssa-loop-ch.c | |
parent | f7d47acdf29b99d2c8ec597d74e644124af67154 (diff) | |
download | gcc-82d1c7e47a05fcfb7bd64e534b7e98d055047e72.zip gcc-82d1c7e47a05fcfb7bd64e534b7e98d055047e72.tar.gz gcc-82d1c7e47a05fcfb7bd64e534b7e98d055047e72.tar.bz2 |
Implement fork-based parallelism engine
This patch belongs to the "Parallelize GCC with Processes" series.
Here, we implement the parallelism by forking the compiler into
multiple processes after what would be the LTO LTRANS stage,
partitioning the callgraph into several partitions, as implemented in
"maybe_compile_in_parallel". From a high level, what happens is:
1. If the partitioner manages to generate multiple partitions, the
compiler will then call lto_promote_cross_file_statics to compute
the partition boundary, and symbols are promoted to global only if
promote_statics is set to true. This option is controlled by the user
through --param=promote-statics, which is disabled by default.
2. The compiler will initialize the file passed by the driver trough
the hidden "-fsplit-outputs=<file>", creating such file.
3. The compiler will fork into multiple processes and apply the
allocated partition to the symbol table, removing every node which
is unnecessary for the partition.
4. The parent process wait for all child processes to finish, and then
call exit (0).
For implementing 3., however, we had to do some more detailed analysis
and figure a way to correctly remove reachable nodes from the callgraph
without corrupting any other node. LTO does this by simple trowing
everything into files and reloading it, but we had to avoid this
because that would result in a huge overhead. We implemented this in
"lto_apply_partition_mask" by classifying each node according to
a dependency analysis:
* Start by trusting what lto_promote_cross_file_statics
gave to us.
* Look for nodes in which may need additional nodes to be
carried with it. For example, inline clones requires that their body
keep present, so we have to expand the boundary a little by adding
all nodes that it calls.
* If the node is in the boundary, we release all unnecessary
informations about it. For varpool nodes, we have to declare it
external, otherwise we end up with multiple instances of the same
global variable in the program, which results in incorrect linking.
* Avoid duplicated release of function summaries (ipa-fnsummary).
* Finally, we had to delay the assembler file initialization,
delay any early assembler output to file, and remove any initialized
RTL code if a certain varaible requires to be renamed.
We also implemented a GNU Make Jobserver integration to this mechanism,
as implemented in jobserver.cc. This works as follows:
* If -fparallel-jobs=jobserver, then we will query the existence of a
jobserver by calling jobserver_initialize. This method will look if
the file descriptors provided by make are valid, and check the flags
of the read file descriptor are set to O_NONBLOCK.
* Then, the parent process will return the token which Make
originally gave to it, since the child is blocked awaiting for a
new token. To correctly block the child, there are two cases: (1)
when select is available in the host, and (2) when it is not. In
(1), we have to use it, since the read fd will have O_NONBLOCK. In
(2), we can simply read the fd, as the read is set to blocking mode.
* Once the child read a token, it will then compile its part, and return
the token before finalizing. If the compilation crash, however, the parent
process will correctly detect that a signal was sent to it, so there is
no need for any fancy crash control by the jobserver engine part.
gcc/ChangeLog:
2020-08-20 Giuliano Belinassi <giuliano.belinassi@usp.br>
* jobserver.cc: New file.
* jobserver.h: New file.
* cgraph.c (cgraph_node::maybe_release_dominators): New function.
* cgraph.h (symtab_node::find_by_order): Declare.
(symtab_node::find_by_name): Declare.
(symtab_node::find_by_asm_name): Declare.
(maybe_release_dominators): Declare.
* cgraphunit.c (cgraph_node::expand): Quickly return if body removed.
(ipa_passes): Run all_regular_ipa_passes if split_outputs.
(is_number): New function.
(childno): New variable.
(maybe_compile_in_parallel): New function.
* ipa-fnsummary (pass_ipa_free_fn_summary::gate): Avoid running twice
when compiling in parallel.
* ipa-icf.c (sem_item_optimizer::filter_removed_items): Behaviour when
compiling in parallel should be the same as if in LTO.
* ipa-visibility (localize_node): Same as above.
lto-cgraph.c (handle_node_in_boundary): New function.
(compute_boundary): New function.
(lto_apply_partition_mask): New function.
symtab.c: (symbol_table::change_decl_assembler_name): Discard RTL decl
if name changed.
(symtab_node::dump_base): Dump aux2.
(symtab_node::find_by_order): New function.
(symtab_node::find_by_name): New function.
(symtab_node::find_by_asm_name): New function.
toplev.c: (additional_asm_files): New variable.
(init_additional_asm_names_file): New function.
(handle_additional_asm): New function.
(toplev::main): Finalize the jobserver if initialized.
* toplev.h: (init_additional_asm_names_file): Declare.
(handle_additional_asm): Declare.
* varasm.c: (output_addressed_contants): Avoid writting to asm too
early.
(output_constants): Same as above.
(add_constant_to_table): Same as above.
(output_constant_def_contents): Same as above.
(output_addressed_constants): Same as above.
Diffstat (limited to 'gcc/tree-ssa-loop-ch.c')
0 files changed, 0 insertions, 0 deletions