riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Fangrui Song <i@maskray.me>	2022-08-24 09:40:03 -0700
committer	Fangrui Song <i@maskray.me>	2022-08-24 09:40:03 -0700
commit	3b4d800911b52ae23da1a1e3f9105f53d8053397 (patch)
tree	536aa5d3abff5c05585bcb632b9db88f568e2956 /llvm/lib/CodeGen/MachineBlockPlacement.cpp
parent	e854c17b02f8cd82a303d223ba5f3b0d87579cd7 (diff)
download	llvm-3b4d800911b52ae23da1a1e3f9105f53d8053397.zip llvm-3b4d800911b52ae23da1a1e3f9105f53d8053397.tar.gz llvm-3b4d800911b52ae23da1a1e3f9105f53d8053397.tar.bz2

[ELF] Parallelize writes of different OutputSections

We currently process one OutputSection at a time and for each OutputSection write contained input sections in parallel. This strategy does not leverage multi-threading well. Instead, parallelize writes of different OutputSections. The default TaskSize for parallelFor often leads to inferior sharding. We prepare the task in the caller instead. * Move llvm::parallel::detail::TaskGroup to llvm::parallel::TaskGroup * Add llvm::parallel::TaskGroup::execute. * Change writeSections to declare TaskGroup and pass it to writeTo. Speed-up with --threads=8: * clang -DCMAKE_BUILD_TYPE=Release: 1.11x as fast * clang -DCMAKE_BUILD_TYPE=Debug: 1.10x as fast * chrome -DCMAKE_BUILD_TYPE=Release: 1.04x as fast * scylladb build/release: 1.09x as fast On M1, many benchmarks are a small fraction of a percentage faster. Mozilla showed the largest difference with the patch being about 1.03x as fast. Differential Revision: https://reviews.llvm.org/D131247

Diffstat (limited to 'llvm/lib/CodeGen/MachineBlockPlacement.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: