aboutsummaryrefslogtreecommitdiff
path: root/bolt/lib/Utils/CommandLineOpts.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-05-30[BOLT][heatmap] Produce zoomed-out heatmaps (#140153)Amir Ayupov1-4/+51
Add a capability to produce multiple heatmaps with given bucket sizes. The default heatmap block size (64B) could be too fine-grained for large binaries. Extend the option `block-size` to accept a list of bucket sizes for additional heatmaps with coarser granularity. The heatmap is simply rescaled so provided sizes should be multiples of each other. Human-readable suffixes can be used, e.g. 4K, 16kb, 1MiB. New defaults: 64B (base bucket size), 4KB (default page size), 256KB (for large binaries). Test Plan: updated heatmap-preagg.test
2025-05-13[BOLT] Print heatmap from perf2bolt (#139194)Amir Ayupov1-1/+5
Add perf2bolt `--heatmap` option to produce heatmaps during profile aggregation. Distinguish exclusive mode (`llvm-bolt-heatmap`) and optional mode (`perf2bolt --heatmap`), which impacts perf.data handling: exclusive mode covers all addresses, whereas optional mode consumes attached profile only covering function addresses. Test Plan: updated per2bolt tests: - pre-aggregated-perf.test: pre-aggregated data, - bolt-address-translation-yaml.test: pre-aggregated + BOLTed input, - perf_test.test: no-LBR perf data.
2025-05-08[BOLT][AArch64] Patch functions targeted by optional relocs (#138750)Maksim Panchenko1-0/+5
On AArch64, we create optional/weak relocations that may not be processed due to the relocated value overflow. When the overflow happens, we used to enforce patching for all functions in the binary via --force-patch option. This PR relaxes the requirement, and enforces patching only for functions that are target of optional relocations. Moreover, if the compact code model is used, the relocation overflow is guaranteed not to happen and the patching will be skipped.
2025-03-21[NFC][BOLT] Refactor ForcePatch option (#127812)Paschalis Mpeis1-0/+6
Move force-patch flag to CommandLineOpts and add details on PatchEntries.
2024-12-12[BOLT] Introduce binary analysis tool based on BOLT (#115330)Kristof Beyls1-0/+2
This initial commit does not add any specific binary analyses yet, it merely contains the boilerplate to introduce a new BOLT-based tool. This basically combines the 4 first patches from the prototype pac-ret and stack-clash binary analyzer discussed in RFC https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and published at https://github.com/llvm/llvm-project/compare/main...kbeyls:llvm-project:bolt-gadget-scanner-prototype The introduction of such a BOLT-based binary analysis tool was proposed and discussed in at least the following places: - The RFC pointed to above - EuroLLVM 2024 round table https://discourse.llvm.org/t/summary-of-bolt-as-a-binary-analysis-tool-round-table-at-eurollvm/78441 The round table showed quite a few people interested in being able to build a custom binary analysis quickly with a tool like this. - Also at the US LLVM dev meeting a few weeks ago, I heard interest from a few people, asking when the tool would be available upstream. - The presentation "Adding Pointer Authentication ABI support for your ELF platform" (https://llvm.swoogo.com/2024devmtg/session/2512720/adding-pointer-authentication-abi-support-for-your-elf-platform) explicitly mentioned interest to extend the prototype tool to verify correct implementation of pauthabi.
2024-10-24[BOLT] Add profile density computationAmir Ayupov1-0/+4
Reuse the definition of profile density from llvm-profgen (#92144): - the density is computed in perf2bolt using raw samples (perf.data or pre-aggregated data), - function density is the ratio of dynamically executed function bytes to the static function size in bytes, - profile density: - functions are sorted by density in decreasing order, accumulating their respective sample counts, - profile density is the smallest density covering 99% of total sample count. In other words, BOLT binary profile density is the minimum amount of profile information per function (excluding functions in tail 1% sample count) which is sufficient to optimize the binary well. The density threshold of 60 was determined through experiments with large binaries by reducing the sample count and checking resulting profile density and performance. The threshold is conservative. perf2bolt would print the warning if the density is below the threshold and suggest to increase the sampling duration and/or frequency to reach a given density, e.g.: ``` BOLT-WARNING: BOLT is estimated to optimize better with 2.8x more samples. ``` Test Plan: updated pre-aggregated-perf.test Reviewers: maksfb, wlei-llvm, rafaelauler, ayermolo, dcci, WenleiHe Reviewed By: WenleiHe, wlei-llvm Pull Request: https://github.com/llvm/llvm-project/pull/101094
2024-07-25[BOLT] Enable standalone build (#97130)Tristan Ross1-3/+3
Continue from #87196 as author did not have much time, I have taken over working on this PR. We would like to have this so it'll be easier to package for Nix. Can be tested by copying cmake, bolt, third-party, and llvm directories out into their own directory with this PR applied and then build bolt. --------- Co-authored-by: pca006132 <john.lck40@gmail.com>
2024-07-15[BOLT] Add -print-mappings option to heatmaps (#97567)Paschalis Mpeis1-0/+6
Emit a mapping in the legend between the characters/buckets and the text sections, using: ```sh llvm-heatmap-bolt -print-mappings .. ``` Example: ``` Legend: .. Sections: a/A : .init 0x00000100-0x00000200 b/B : .plt 0x00000200-0x00000500 c/C : .text 0x00010000-0x000a0000 d/D : .fini 0x000a0000-0x000f0000 .. ```
2024-06-29[BOLT] Match functions with exact hash (#96572)Shaw Young1-0/+3
Added flag '--match-profile-with-function-hash' to match functions based on exact hash. After identical and LTO name matching, more functions can be recovered for inference with exact hash, in the case of function renaming with no functional changes. Collisions are possible in the unlikely case where multiple functions share the same exact hash. The flag is off by default as it requires the processing of all binary functions and subsequently is expensive. Test Plan: added hashing-based-function-matching.test.
2024-06-25Revert "[๐˜€๐—ฝ๐—ฟ] initial version"shawbyoung1-8/+0
This reverts commit bb5ab1ffe719f5e801ef08ac08be975546aa3266.
2024-06-25[๐˜€๐—ฝ๐—ฟ] initial versionshawbyoung1-0/+8
Created using spr 1.3.4
2024-06-24Revert "[BOLT] Hash-based function matching" (#96568)shaw young1-5/+0
Reverts llvm/llvm-project#95821
2024-06-24[BOLT] Hash-based function matching (#95821)shaw young1-0/+5
Using the hashes of binary and profiled functions to recover functions with changed names. Test Plan: added hashing-based-function-matching.test.
2024-05-22[BOLT] Add NamedRegionTimer to inferStaleProfile (#93078)shaw young1-0/+4
2024-03-21[BOLT] Output basic YAML profile in BAT modeAmir Ayupov1-0/+4
Relax assumptions that YAML output is not supported in BAT mode. Set up basic infrastructure for emitting YAML for functions not covered by BAT, such as from `.bolt.org.text` section (code identical to input binary sans external refs), or non-rewritten functions in non-relocation mode (where the function stays in the same section but BAT mapping is not emitted). This diff only produces YAML profile for non-BAT functions (skipped, non-simple). YAML profile for BAT functions is added in follow-up diffs: - https://github.com/llvm/llvm-project/pull/76911 emits YAML profile with internal control flow information only (branch profile), - https://github.com/llvm/llvm-project/pull/76896 adds cross-function profile (calls profile). Test Plan: Added bolt/test/X86/bolt-address-translation-yaml.test Reviewers: ayermolo, dcci, maksfb, rafaelauler Reviewed By: rafaelauler Pull Request: https://github.com/llvm/llvm-project/pull/76910
2024-01-30[BOLT] Detect Linux kernel based on ELF program headers (#80086)Maksim Panchenko1-1/+0
Check if program header addresses fall into the kernel space to detect a Linux kernel binary on x86-64. Delete opts::LinuxKernelMode and use BinaryContext::IsLinuxKernel instead.
2023-11-14[BOLT] Refactor --keep-nops option. NFC. (#72228)Maksim Panchenko1-5/+0
Run RemoveNops pass only if --keep-nops is set to false (default).
2023-11-13[BOLT] Fix NOP instruction emission on x86 (#72186)Maksim Panchenko1-0/+5
Use MCAsmBackend::writeNopData() interface to emit NOP instructions on x86. There are multiple forms of NOP instruction on x86 with different sizes. Currently, LLVM's assembly/disassembly does not support all forms correctly which can lead to a breakage of input code semantics, e.g. if the program relies on NOP instructions for reserving a patch space. Add "--keep-nops" option to preserve NOP instructions.
2023-11-09[BOLT] Fix typos (#68121)spaette1-1/+1
Closes https://github.com/llvm/llvm-project/issues/63097 Before merging please make sure the change to bolt/include/bolt/Passes/StokeInfo.h is correct. bolt/include/bolt/Passes/StokeInfo.h ```diff // This Pass solves the two major problems to use the Stoke program without - // proting its code: + // probing its code: ``` I'm still not happy about the awkward wording in this comment. bolt/include/bolt/Passes/FixRelaxationPass.h ``` $ ed -s bolt/include/bolt/Passes/FixRelaxationPass.h <<<'9,12p' // This file declares the FixRelaxations class, which locates instructions with // wrong targets and fixes them. Such problems usually occures when linker // relaxes (changes) instructions, but doesn't fix relocations types properly // for them. $ ``` bolt/docs/doxygen.cfg.in bolt/include/bolt/Core/BinaryContext.h bolt/include/bolt/Core/BinaryFunction.h bolt/include/bolt/Core/BinarySection.h bolt/include/bolt/Core/DebugData.h bolt/include/bolt/Core/DynoStats.h bolt/include/bolt/Core/Exceptions.h bolt/include/bolt/Core/MCPlusBuilder.h bolt/include/bolt/Core/Relocation.h bolt/include/bolt/Passes/FixRelaxationPass.h bolt/include/bolt/Passes/InstrumentationSummary.h bolt/include/bolt/Passes/ReorderAlgorithm.h bolt/include/bolt/Passes/StackReachingUses.h bolt/include/bolt/Passes/StokeInfo.h bolt/include/bolt/Passes/TailDuplication.h bolt/include/bolt/Profile/DataAggregator.h bolt/include/bolt/Profile/DataReader.h bolt/lib/Core/BinaryContext.cpp bolt/lib/Core/BinarySection.cpp bolt/lib/Core/DebugData.cpp bolt/lib/Core/DynoStats.cpp bolt/lib/Core/Relocation.cpp bolt/lib/Passes/Instrumentation.cpp bolt/lib/Passes/JTFootprintReduction.cpp bolt/lib/Passes/ReorderData.cpp bolt/lib/Passes/RetpolineInsertion.cpp bolt/lib/Passes/ShrinkWrapping.cpp bolt/lib/Passes/TailDuplication.cpp bolt/lib/Rewrite/BoltDiff.cpp bolt/lib/Rewrite/DWARFRewriter.cpp bolt/lib/Rewrite/RewriteInstance.cpp bolt/lib/Utils/CommandLineOpts.cpp bolt/runtime/instr.cpp bolt/test/AArch64/got-ld64-relaxation.test bolt/test/AArch64/unmarked-data.test bolt/test/X86/Inputs/dwarf5-cu-no-debug-addr-helper.s bolt/test/X86/Inputs/linenumber.cpp bolt/test/X86/double-jump.test bolt/test/X86/dwarf5-call-pc-function-null-check.test bolt/test/X86/dwarf5-split-dwarf4-monolithic.test bolt/test/X86/dynrelocs.s bolt/test/X86/fallthrough-to-noop.test bolt/test/X86/tail-duplication-cache.s bolt/test/runtime/X86/instrumentation-ind-calls.s
2023-06-28[BOLT] Add -dump-cg option to dump call graphAmir Ayupov1-3/+3
Reviewed By: #bolt, rafauler Differential Revision: https://reviews.llvm.org/D153994
2022-09-19[BOLT] Control aggregation mode output profile file formatAmir Ayupov1-0/+9
In perf2bolt and `-aggregate-only` BOLT mode, the output profile file is written in fdata format by default. Provide a knob `-profile-format=[fdata,yaml]` to control the format. Note that `-w` option still dumps in YAML format. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D133995
2022-08-02CommandLine: add and use cl::SubCommand::get{All,TopLevel}Nicolai Hรคhnle1-12/+6
Prefer using these accessors to access the special sub-commands corresponding to the top-level (no subcommand) and all sub-commands. This is a preparatory step towards removing the use of ManagedStatic: with a subsequent change, these global instances will be moved to be regular function-scope statics. It is split up to give downstream projects a (albeit short) window in which they can switch to using the accessors in a forward-compatible way. Differential Revision: https://reviews.llvm.org/D129118
2022-07-11[BOLT] Increase coverage of shrink wrapping [3/5]Rafael Auler1-0/+6
Add the option to run -equalize-bb-counts before shrink wrapping to avoid unnecessarily optimizing some CFGs where profile is inaccurate but we can prove two blocks have the same frequency. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D126113
2022-06-05[bolt] Remove unneeded cl::ZeroOrMore for cl::opt optionsFangrui Song1-58/+36
2022-06-04Remove unneeded cl::ZeroOrMore for cl::opt optionsFangrui Song1-1/+1
Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e. This commit handles options where cl::ZeroOrMore is more than one line below cl::opt.
2022-06-03[Hexagon][bolt] Remove unneeded cl::ZeroOrMore for cl::opt options. NFCFangrui Song1-1/+0
Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e
2022-04-15[BOLT] Check if LLVM_REVISION is definedAmir Ayupov1-1/+6
Handle the case where LLVM_REVISION is undefined (due to LLVM_APPEND_VC_REV=OFF or otherwise) by setting "<unknown>" value as before D123549. Reviewed By: yota9 Differential Revision: https://reviews.llvm.org/D123852
2022-04-14[BOLT][NFC] Use LLVM_REVISION instead of BOLT_VERSION_STRINGAmir Ayupov1-2/+2
Remove duplicate version string identification Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D123549
2022-03-15[BOLT] Set cold sections alignment explicitlyVladislav Khmelevsky1-0/+5
The cold text section alignment is set using the maximum alignment value passed to the emitCodeAlignment. In order to calculate tentetive layout right we will set the minimum alignment of such sections to the maximum possible function alignment explicitly. Differential Revision: https://reviews.llvm.org/D121392
2022-02-07[BOLT] Refactor heatmap to be standalone toolVladislav Khmelevsky1-32/+17
Separate heatmap from bolt and build it as standalone tool. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D118946
2022-01-27[BOLT] Prepare BOLT for unit-testingVladislav Khmelevsky1-8/+0
This patch adds unit testing support for BOLT. In order to do this we will need at least do this changes on the code level: * Make createMCPlusBuilder accessible externally * Remove positional InputFilename argument to bolt utlity sources And prepare the cmake and lit for the new tests. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Reviewed By: maksfb, Amir Differential Revision: https://reviews.llvm.org/D118271
2021-12-21[BOLT][NFC] Fix file-description commentsMaksim Panchenko1-1/+1
Summary: Fix comments at the start of source files. (cherry picked from FBD33274597)
2021-11-11[BOLT] Fix Windows buildRafael Auler1-3/+2
Summary: Make BOLT build in VisualStudio compiler and run without crashing on a simple test. Other tests are not running. (cherry picked from FBD32378736)
2021-10-08Rebase: [NFC] Refactor sources to be buildable in shared modeRafael Auler1-0/+232
Summary: Moves source files into separate components, and make explicit component dependency on each other, so LLVM build system knows how to build BOLT in BUILD_SHARED_LIBS=ON. Please use the -c merge.renamelimit=230 git option when rebasing your work on top of this change. To achieve this, we create a new library to hold core IR files (most classes beginning with Binary in their names), a new library to hold Utils, some command line options shared across both RewriteInstance and core IR files, a new library called Rewrite to hold most classes concerned with running top-level functions coordinating the binary rewriting process, and a new library called Profile to hold classes dealing with profile reading and writing. To remove the dependency from BinaryContext into X86-specific classes, we do some refactoring on the BinaryContext constructor to receive a reference to the specific backend directly from RewriteInstance. Then, the dependency on X86 or AArch64-specific classes is transfered to the Rewrite library. We can't have the Core library depend on targets because targets depend on Core (which would create a cycle). Files implementing the entry point of a tool are transferred to the tools/ folder. All header files are transferred to the include/ folder. The src/ folder was renamed to lib/. (cherry picked from FBD32746834)