aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-ssa-loop-ch.c
diff options
context:
space:
mode:
authorGiuliano Belinassi <giuliano.belinassi@usp.br>2020-08-14 17:00:47 -0300
committerGiuliano Belinassi <giuliano.belinassi@usp.br>2020-08-20 16:51:07 -0300
commit82d1c7e47a05fcfb7bd64e534b7e98d055047e72 (patch)
treedc5ff364e7d6ea40dc9d0d57cd7031975ec7f757 /gcc/tree-ssa-loop-ch.c
parentf7d47acdf29b99d2c8ec597d74e644124af67154 (diff)
downloadgcc-82d1c7e47a05fcfb7bd64e534b7e98d055047e72.zip
gcc-82d1c7e47a05fcfb7bd64e534b7e98d055047e72.tar.gz
gcc-82d1c7e47a05fcfb7bd64e534b7e98d055047e72.tar.bz2
Implement fork-based parallelism engine
This patch belongs to the "Parallelize GCC with Processes" series. Here, we implement the parallelism by forking the compiler into multiple processes after what would be the LTO LTRANS stage, partitioning the callgraph into several partitions, as implemented in "maybe_compile_in_parallel". From a high level, what happens is: 1. If the partitioner manages to generate multiple partitions, the compiler will then call lto_promote_cross_file_statics to compute the partition boundary, and symbols are promoted to global only if promote_statics is set to true. This option is controlled by the user through --param=promote-statics, which is disabled by default. 2. The compiler will initialize the file passed by the driver trough the hidden "-fsplit-outputs=<file>", creating such file. 3. The compiler will fork into multiple processes and apply the allocated partition to the symbol table, removing every node which is unnecessary for the partition. 4. The parent process wait for all child processes to finish, and then call exit (0). For implementing 3., however, we had to do some more detailed analysis and figure a way to correctly remove reachable nodes from the callgraph without corrupting any other node. LTO does this by simple trowing everything into files and reloading it, but we had to avoid this because that would result in a huge overhead. We implemented this in "lto_apply_partition_mask" by classifying each node according to a dependency analysis: * Start by trusting what lto_promote_cross_file_statics gave to us. * Look for nodes in which may need additional nodes to be carried with it. For example, inline clones requires that their body keep present, so we have to expand the boundary a little by adding all nodes that it calls. * If the node is in the boundary, we release all unnecessary informations about it. For varpool nodes, we have to declare it external, otherwise we end up with multiple instances of the same global variable in the program, which results in incorrect linking. * Avoid duplicated release of function summaries (ipa-fnsummary). * Finally, we had to delay the assembler file initialization, delay any early assembler output to file, and remove any initialized RTL code if a certain varaible requires to be renamed. We also implemented a GNU Make Jobserver integration to this mechanism, as implemented in jobserver.cc. This works as follows: * If -fparallel-jobs=jobserver, then we will query the existence of a jobserver by calling jobserver_initialize. This method will look if the file descriptors provided by make are valid, and check the flags of the read file descriptor are set to O_NONBLOCK. * Then, the parent process will return the token which Make originally gave to it, since the child is blocked awaiting for a new token. To correctly block the child, there are two cases: (1) when select is available in the host, and (2) when it is not. In (1), we have to use it, since the read fd will have O_NONBLOCK. In (2), we can simply read the fd, as the read is set to blocking mode. * Once the child read a token, it will then compile its part, and return the token before finalizing. If the compilation crash, however, the parent process will correctly detect that a signal was sent to it, so there is no need for any fancy crash control by the jobserver engine part. gcc/ChangeLog: 2020-08-20 Giuliano Belinassi <giuliano.belinassi@usp.br> * jobserver.cc: New file. * jobserver.h: New file. * cgraph.c (cgraph_node::maybe_release_dominators): New function. * cgraph.h (symtab_node::find_by_order): Declare. (symtab_node::find_by_name): Declare. (symtab_node::find_by_asm_name): Declare. (maybe_release_dominators): Declare. * cgraphunit.c (cgraph_node::expand): Quickly return if body removed. (ipa_passes): Run all_regular_ipa_passes if split_outputs. (is_number): New function. (childno): New variable. (maybe_compile_in_parallel): New function. * ipa-fnsummary (pass_ipa_free_fn_summary::gate): Avoid running twice when compiling in parallel. * ipa-icf.c (sem_item_optimizer::filter_removed_items): Behaviour when compiling in parallel should be the same as if in LTO. * ipa-visibility (localize_node): Same as above. lto-cgraph.c (handle_node_in_boundary): New function. (compute_boundary): New function. (lto_apply_partition_mask): New function. symtab.c: (symbol_table::change_decl_assembler_name): Discard RTL decl if name changed. (symtab_node::dump_base): Dump aux2. (symtab_node::find_by_order): New function. (symtab_node::find_by_name): New function. (symtab_node::find_by_asm_name): New function. toplev.c: (additional_asm_files): New variable. (init_additional_asm_names_file): New function. (handle_additional_asm): New function. (toplev::main): Finalize the jobserver if initialized. * toplev.h: (init_additional_asm_names_file): Declare. (handle_additional_asm): Declare. * varasm.c: (output_addressed_contants): Avoid writting to asm too early. (output_constants): Same as above. (add_constant_to_table): Same as above. (output_constant_def_contents): Same as above. (output_addressed_constants): Same as above.
Diffstat (limited to 'gcc/tree-ssa-loop-ch.c')
0 files changed, 0 insertions, 0 deletions