aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Utils/InlineFunction.cpp
diff options
context:
space:
mode:
authorspupyrev <spupyrev@fb.com>2021-11-23 08:47:23 -0800
committerHongtao Yu <hoy@fb.com>2021-11-23 09:08:30 -0800
commitb00fc198224efa038a7469e068dd920b3f1aba75 (patch)
treeef08128782d1675055c75d74719d4f3868761a71 /llvm/lib/Transforms/Utils/InlineFunction.cpp
parent38211bbab1d949f682271abba0171424a5a335ab (diff)
downloadllvm-b00fc198224efa038a7469e068dd920b3f1aba75.zip
llvm-b00fc198224efa038a7469e068dd920b3f1aba75.tar.gz
llvm-b00fc198224efa038a7469e068dd920b3f1aba75.tar.bz2
profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(|V|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860
Diffstat (limited to 'llvm/lib/Transforms/Utils/InlineFunction.cpp')
0 files changed, 0 insertions, 0 deletions