diff options
author | Amir Ayupov <aaupov@fb.com> | 2025-10-01 15:25:34 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-10-01 15:25:34 -0700 |
commit | 780f69cd922d8925648e11e771e77f0b46190e5b (patch) | |
tree | b9e9b20576e3aeade74d615e102ea0d316f92578 /lldb/source/Plugins/ScriptInterpreter/Python/ScriptInterpreterPython.cpp | |
parent | 1e4d4bb584a1c35c5f7801c68b9dfccd6130caab (diff) | |
download | llvm-780f69cd922d8925648e11e771e77f0b46190e5b.zip llvm-780f69cd922d8925648e11e771e77f0b46190e5b.tar.gz llvm-780f69cd922d8925648e11e771e77f0b46190e5b.tar.bz2 |
[Clang][CMake] Add CSSPGO support to LLVM_BUILD_INSTRUMENTED (#79942)
Build on Clang-BOLT infrastructure to collect sample profile for CSSPGO.
Add CSSPGO.cmake and BOLT-CSSPGO.cmake to automate CSSPGO/+BOLT
Clang builds.
Note that `CLANG_PGO_TRAINING_DATA_SOURCE_DIR` is required as built-in
training set is inadequate for collecting sampled profile.
Hardware compatibility: CSSPGO requires synchronized (0-skid) call
and branch stacks, which is only available with Intel PEBS (Sandy
Bridge+),
AMD Zen3 with BRS, Zen4 with LBRv2+LBR_PMC_FREEZE, and Zen5 with LBRv2.
This patch adds support for Intel `br_inst_retired.near_taken:uppp`
event.
Test Plan:
Added BOLT-CSSPGO.cmake with same use as BOLT-PGO.cmake,
e.g. for bootstrapped ThinLTO+CSSPGO+BOLT, with CSSPGO profile collected
from LLVM build, and BOLT profile collected from Hello World
(instrumentation):
```
cmake -B clang-csspgo-bolt -S /path/to/llvm-project/llvm \
-DLLVM_ENABLE_LLD=ON -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
-DPGO_INSTRUMENT_LTO=Thin \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/path/to/llvm-project/llvm \
-GNinja -C /path/to/llvm-project/clang/cmake/caches/BOLT-CSSPGO.cmake
ninja stage2-clang-bolt
...
warning: Sample PGO is estimated to optimize better with 19.5x more samples. Please consider increasing sampling rate or profiling for longer duration to get more samples.
...
[2800/2801] Optimizing Clang with BOLT
BOLT-INFO: 8189 out of 106942 functions in the binary (7.7%) have non-empty execution profile
13776393 : taken branches (-42.1%)
```
Performance testing with Clang:
- Setup: Clang-BOLT testing harness
https://github.com/aaupov/llvm-devmtg-2022/commit/9f2b46f67a1930a51c58a0e4894637a8c64c570e
- CSSPGO training: building LLVM,
- InstrPGO training: building Hello World,
- BOLT training: building Hello World, instrumentation,
- benchmark: building small LLVM tool (not),
- 2S Intel SKX Xeon 6138 with 40C/80T and 256GB RAM, using 20C/40T for
build,
- Results, wall time, lower is better
- Baseline (bootstrapped build): 10.36s,
- InstrPGO + ThinLTO: 9.34s,
- CSSPGO + ThinLTO: 8.85s.
- BOLT results, for reference:
- Baseline: 9.09s,
- InstrPGO + ThinLTO: 9.09s,
- CSSPGO + ThinLTO: 8.58s.
---------
Co-authored-by: Matthias Braun <matze@braunis.de>
Diffstat (limited to 'lldb/source/Plugins/ScriptInterpreter/Python/ScriptInterpreterPython.cpp')
0 files changed, 0 insertions, 0 deletions