aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
diff options
context:
space:
mode:
authorJez Ng <jezng@fb.com>2021-11-12 16:01:25 -0500
committerJez Ng <jezng@fb.com>2021-11-12 16:02:49 -0500
commitd9b6f7e312c11ff69779939bb2be0bd53936a333 (patch)
tree1cd2d792da0daa2338d0826359d4ea8a8b6ee3a6 /llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
parent6c32dd4dfafe01d01e4b189a4eeda5c4497c71ec (diff)
downloadllvm-d9b6f7e312c11ff69779939bb2be0bd53936a333.zip
llvm-d9b6f7e312c11ff69779939bb2be0bd53936a333.tar.gz
llvm-d9b6f7e312c11ff69779939bb2be0bd53936a333.tar.bz2
[lld-macho] Teach ICF to dedup functions with identical unwind info
Dedup'ing unwind info is tricky because each CUE contains a different function address, if ICF operated naively and compared the entire contents of each CUE, entries with identical unwind info but belonging to different functions would never be considered identical. To work around this problem, we slice away the function address before performing ICF. We rely on `relocateCompactUnwind()` to correctly handle these truncated input sections. Here are the numbers before and after D109944, D109945, and this diff were applied, as tested on my 3.2 GHz 16-Core Intel Xeon W: Without any optimizations: base diff difference (95% CI) sys_time 0.849 ± 0.015 0.896 ± 0.012 [ +4.8% .. +6.2%] user_time 3.357 ± 0.030 3.512 ± 0.023 [ +4.3% .. +5.0%] wall_time 3.944 ± 0.039 4.032 ± 0.031 [ +1.8% .. +2.6%] samples 40 38 With `-dead_strip`: base diff difference (95% CI) sys_time 0.847 ± 0.010 0.896 ± 0.012 [ +5.2% .. +6.5%] user_time 3.377 ± 0.014 3.532 ± 0.015 [ +4.4% .. +4.8%] wall_time 3.962 ± 0.024 4.060 ± 0.030 [ +2.1% .. +2.8%] samples 47 30 With `-dead_strip` and `--icf=all`: base diff difference (95% CI) sys_time 0.935 ± 0.013 0.957 ± 0.018 [ +1.5% .. +3.2%] user_time 3.472 ± 0.022 6.531 ± 0.046 [ +87.6% .. +88.7%] wall_time 4.080 ± 0.040 5.329 ± 0.060 [ +30.0% .. +31.2%] samples 37 30 Unsurprisingly, ICF is now a lot slower, likely due to the much larger number of input sections it needs to process. But the rest of the linker only suffers a mild slowdown. Note that the compact-unwind-bad-reloc.s test was expanded because we now handle the relocation for CUE's function address in a separate code path from the rest of the CUE relocations. The extended test covers both code paths. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D109946
Diffstat (limited to 'llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp')
0 files changed, 0 insertions, 0 deletions