aboutsummaryrefslogtreecommitdiff
path: root/bolt/lib/Rewrite/BinaryPassManager.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-09-09[BOLT][AArch64] Inlining of Memcpy (#154929)YafetBeyene1-1/+3
The pass for inlining memcpy in BOLT was currently X86-specific and was using the instruction `rep movsb`. This patch implements a static size analysis system for AArch64 memcpy inlining that extracts copy sizes from preceding instructions to then use it to generate the optimal width-specific load/store sequences.
2025-08-22[BOLT] Add dump-dot-func option for selective function CFG dumping (#153007)YafetBeyene1-1/+2
## Change: * Added `--dump-dot-func` command-line option that allows users to dump CFGs only for specific functions instead of dumping all functions (the current only available option being `--dump-dot-all`) ## Usage: * Users can now specify function names or regex patterns (e.g., `--dump-dot-func=main,helper` or `--dump-dot-func="init.*`") to generate .dot files only for functions of interest * Aims to save time when analysing specific functions in large binaries (e.g., only dumping graphs for performance-critical functions identified through profiling) and we can now avoid reduce output clutter from generating thousands of unnecessary .dot files when analysing large binaries ## Testing The introduced test `dump-dot-func.test` confirms the new option does the following: - [x] 1. `dump-dot-func` can correctly filter a specified functions - [x] 2. Can achieve the above with regexes - [x] 3. Can do 1. with a list of functions - [x] No option specified creates no dot files - [x] Passing in a non-existent function generates no dumping messages - [x] `dump-dot-all` continues to work as expected
2025-05-01[BOLT] Run PatchEntries pass before LongJmp (#137236)Maksim Panchenko1-4/+4
With --force-patch option, every original function entry point is overwritten with a trampoline to a new version of the function to prevent the execution of the original code. If the function size is too small for the trampoline code, we are forced to bail out on rewriting the function. That presented a problem on AArch64 due to LongJmp pass that assumed the presence of the new copy of the function. If the new copy was not emitted it could have lead to a relocation overflow. Run PatchEntries pass before LongJmp and make the latter aware of the functions that are not going to be emitted. Make --force-patch option behavior on AArch64 consistent with other architectures.
2025-02-28[BOLT] Report flow conservation scores (#127954)ShatianWang1-2/+2
Add two additional profile quality stats for CG (call graph) and CFG (control flow graph) flow conservations besides the CFG discontinuity stats introduced in #109683. The two new stats quantify how different "in-flow" is from "out-flow" in the following cases where they should be equal. The smaller the reported stats, the better the flow conservations are. CG flow conservation: for each function that is not a program entry, the number of times the function is called according to CG ("in-flow") should be equal to the number of times the transition from an entry basic block of the function to another basic block within the function is recorded ("out-flow"). CFG flow conservation: for each basic block that is not a function entry or exit, the number of times the transition into this basic block from another basic block within the function is recorded ("in-flow") should be equal to the number of times the transition from this basic block to another basic block within the function is recorded ("out-flow"). Use `-v=1` for more detailed bucketed stats, and use `-v=2` to dump functions / basic blocks with bad flow conservations.
2024-12-16[BOLT] Add support for safe-icf (#116275)Alexander Yermolovich1-5/+5
Identical Code Folding (ICF) folds functions that are identical into one function, and updates symbol addresses to the new address. This reduces the size of a binary, but can lead to problems. For example when function pointers are compared. This can be done either explicitly in the code or generated IR by optimization passes like Indirect Call Promotion (ICP). After ICF what used to be two different addresses become the same address. This can lead to a different code path being taken. This is where safe ICF comes in. Linker (LLD) does it using address significant section generated by clang. If symbol is in it, or an object doesn't have this section symbols are not folded. BOLT does not have the information regarding which objects do not have this section, so can't re-use this mechanism. This implementation scans code section and conservatively marks functions symbols as unsafe. It treats symbols as unsafe if they are used in non-control flow instruction. It also scans through the data relocation sections and does the same for relocations that reference a function symbol. The latter handles the case when function pointer is stored in a local or global variable, etc. If a relocation address points within a vtable these symbols are skipped.
2024-12-16[BOLT][AArch64] Enable function print after ADRRelaxation (#119869)Paschalis Mpeis1-1/+7
Introduce `--print-adr-relaxation` to print after ADR Relaxation pass.
2024-10-08[BOLT] Profile quality stats -- CFG discontinuity (#109683)ShatianWang1-0/+3
In a perfect profile, each positive-execution-count block in the function’s CFG should be reachable from a positive-execution-count function entry block through a positive-execution-count path. This new pass checks how well the BOLT input profile satisfies this “CFG continuity” property. More specifically, for each of the hottest 1000 functions, the pass calculates the function’s fraction of basic block execution counts that is “unreachable”. It then reports the 95th percentile of the distribution of the 1000 unreachable fractions in a single BOLT-INFO line. The smaller the reported value is, the better the BOLT profile satisfies the CFG continuity property. The default value of 1000 above can be changed via the hidden BOLT option `-num-functions-for-continuity-check=[N]`. If more detailed stats are needed, `-v=1` can be added to the BOLT invocation: the hottest N functions will be grouped into 5 equally-sized buckets, from the hottest to the coldest; for each bucket, various summary statistics of the distribution of the fractions and the raw unreachable execution counts will be reported.
2024-08-07Revert "[BOLT] Move ADRRelaxationPass (#101371)" (#102333)Vladislav Khmelevsky1-2/+2
This reverts commit 750b12f06badc4cdf767139c70090db62358bb44. The pass should run after splitting phase, but before nop removal
2024-08-07[BOLT] Move ADRRelaxationPass (#101371)Vladislav Khmelevsky1-2/+2
For non-simple functions we need nop instruction to be presented to transform ADR to ADRP+ADD sequence, so run this pass before remove nops pass.
2024-07-19[BOLT] Skip instruction shortening (#93032)Daniel Hill1-1/+6
Add the ability to disable the instruction shortening pass through --shorten-instructions=false
2024-05-23[BOLT] Set InitialDynoStats after EstimateEdgeCounts (#93218)Amir Ayupov1-6/+4
InitialDynoStats used to be assigned inside `runAllPasses`, but the assignment executed before any of the passes. As we've moved `EstimateEdgeCounts` into a pass out of ProfileReader, it needs to execute before initial dyno stats are set. Thus move `InitialDynoStats` into BinaryContext and assignment into `DynoStatsSetPass`.
2024-05-22[BOLT][NFC] Make estimateEdgeCounts a BinaryFunctionPass (#93074)Amir Ayupov1-0/+9
2024-04-25[BOLT] Print program stats in perf2bolt/aggregate-only mode (#89763)Amir Ayupov1-1/+1
2024-04-11[BOLT][NFC] Make RepRet X86-specific (#88286)Nathan Sidwell1-2/+3
Bolt's RepRet pass is x86-specific, no need to add it for non-x86 targets.
2024-03-22[BOLT] Enable --keep-nops option for Linux kernel by default (#86349)Maksim Panchenko1-1/+1
Preserve nop instructions in the Linux kernel since they could be used for runtime patching.
2024-02-12[BOLT][NFC] Log through JournalingStreams (#81524)Amir Ayupov1-3/+4
Make core BOLT functionality more friendly to being used as a library instead of in our standalone driver llvm-bolt. To accomplish this, we augment BinaryContext with journaling streams that are to be used by most BOLT code whenever something needs to be logged to the screen. Users of the library can decide if logs should be printed to a file, no file or to the screen, as before. To illustrate this, this patch adds a new option `--log-file` that allows the user to redirect BOLT logging to a file on disk or completely hide it by using `--log-file=/dev/null`. Future BOLT code should now use `BinaryContext::outs()` for printing important messages instead of `llvm::outs()`. A new test log.test enforces this by verifying that no strings are print to screen once the `--log-file` option is used. In previous patches we also added a new BOLTError class to report common and fatal errors, so code shouldn't call exit(1) now. To easily handle problems as before (by quitting with exit(1)), callers can now use `BinaryContext::logBOLTErrorsAndQuitOnFatal(Error)` whenever code needs to deal with BOLT errors. To test this, we have fatal.s that checks we are correctly quitting and printing a fatal error to the screen. Because this is a significant change by itself, not all code was yet ported. Code from Profiler libs (DataAggregator and friends) still print errors directly to screen. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC
2024-02-12[BOLT][NFC] Propagate BOLTErrors from Core, RewriteInstance, and passes ↵Amir Ayupov1-8/+15
(2/2) (#81523) As part of the effort to refactor old error handling code that would directly call exit(1), in this patch continue the migration on libCore, libRewrite and libPasses to use the new BOLTError class whenever a failure occurs. Test Plan: NFC Co-authored-by: Rafael Auler <rafaelauler@fb.com>
2024-02-12[BOLT][NFC] Return Error from BinaryFunctionPass::runOnFunctions (#81521)Amir Ayupov1-2/+2
As part of the effort to refactor old error handling code that would directly call exit(1), in this patch we change the interface to `BinaryFunctionPass` to return an Error on `runOnFunctions()`. This gives passes the ability to report a serious problem to the caller (RewriteInstance class), so the caller may decide how to best handle the exceptional situation. Co-authored-by: Rafael Auler <rafaelauler@fb.com> Test Plan: NFC
2023-11-29[BOLT] Add structure of CDSplit to SplitFunctions (#73430)ShatianWang1-0/+7
This commit establishes the general structure of the CDSplit strategy in SplitFunctions without incorporating the exact splitting logic. With -split-functions -split-strategy=cdsplit, the SplitFunctions pass will run twice: the first time is before function reordering and functions are hot-cold split; the second time is after function reordering and functions are hot-warm-cold split based on the fixed function ordering. Currently, all functions are hot-warm split after the entry block in the second splitting pass. Subsequent commits will introduce the precise splitting logic. NFC.
2023-11-14[BOLT] Refactor --keep-nops option. NFC. (#72228)Maksim Panchenko1-1/+7
Run RemoveNops pass only if --keep-nops is set to false (default).
2023-09-30[BOLT][NFC] Hide pass print options (#67718)Vladislav Khmelevsky1-6/+6
Most of the print options are hidden, make hidden them all.
2023-06-16[BOLT] Add minimal RISC-V 64-bit supportJob Noorman1-0/+11
Just enough features are implemented to process a simple "hello world" executable and produce something that still runs (including libc calls). This was mainly a matter of implementing support for various relocations. Currently, the following are handled: - R_RISCV_JAL - R_RISCV_CALL - R_RISCV_CALL_PLT - R_RISCV_BRANCH - R_RISCV_RVC_BRANCH - R_RISCV_RVC_JUMP - R_RISCV_GOT_HI20 - R_RISCV_PCREL_HI20 - R_RISCV_PCREL_LO12_I - R_RISCV_RELAX - R_RISCV_NONE Executables linked with linker relaxation will probably fail to be processed. BOLT relocates .text to a high address while leaving .plt at its original (low) address. This causes PC-relative PLT calls that were relaxed to a JAL to not fit their offset in an I-immediate anymore. This is something that will be addressed in a later patch. Changes to the BOLT core are relatively minor. Two things were tricky to implement and needed slightly larger changes. I'll explain those below. The R_RISCV_CALL(_PLT) relocation is put on the first instruction of a AUIPC/JALR pair, the second does not get any relocation (unlike other PCREL pairs). This causes issues with the combinations of the way BOLT processes binaries and the RISC-V MC-layer handles relocations: - BOLT reassembles instructions one by one and since the JALR doesn't have a relocation, it simply gets copied without modification; - Even though the MC-layer handles R_RISCV_CALL properly (adjusts both the AUIPC and the JALR), it assumes the immediates of both instructions are 0 (to be able to or-in a new value). This will most likely not be the case for the JALR that got copied over. To handle this difficulty without resorting to RISC-V-specific hacks in the BOLT core, a new binary pass was added that searches for AUIPC/JALR pairs and zeroes-out the immediate of the JALR. A second difficulty was supporting ABS symbols. As far as I can tell, ABS symbols were not handled at all, causing __global_pointer$ to break. RewriteInstance::analyzeRelocation was updated to handle these generically. Tests are provided for all supported relocations. Note that in order to test the correct handling of PLT entries, an ELF file produced by GCC had to be used. While I tried to strip the YAML representation, it's still quite large. Any suggestions on how to improve this would be appreciated. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D145687
2023-05-16[BOLT] Fix state of MCSymbols in lowering passRafael Auler1-0/+4
We have mostly harmless data races when running BinaryContext::calculateEmittedSize() in parallel, while performing split function pass. However, it is possible to end up in a state where some MCSymbols are still registered and our clean up failed. This happens rarely but it does happen, and when it happens, it is a difficult to diagnose heisenbug. To avoid this, add a new clean pass to perform a last check on MCSymbols, before they undergo our final emission pass, to verify that they are in a sane state. If we fail to do this, we might resolve some symbols to zero and crash the output binary. Reviewed By: #bolt, Amir Differential Revision: https://reviews.llvm.org/D137984
2022-12-23[BOLT][AArch64] Handle adrp+ld64 linker relaxationsVladislav Khmelevsky1-1/+10
Linker might relax adrp + ldr got address loading to adrp + add for local non-preemptible symbols (e.g. hidden/protected symbols in executable). As usually linker doesn't change relocations properly after relaxation, so we have to handle such cases by ourselves. To do that during relocations reading we change LD64 reloc to ADD if instruction mismatch found and introduce FixRelaxationPass that searches for ADRP+ADD pairs and after performing some checks we're replacing ADRP target symbol to already fixed ADDs one. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D138097
2022-12-20[BOLT][NFC] Remove unused PrintInstructions argumentMaksim Panchenko1-1/+1
PrintInstructions was unused in BinaryFunction::print() and dump(). Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D140440
2022-11-04adds huge pages support of PIE/no-PIE binariesAlexey Moksyakov1-0/+3
This patch adds the huge pages support (-hugify) for PIE/no-PIE binaries. Also returned functionality to support the kernels < 5.10 where there is a problem in a dynamic loader with the alignment of pages addresses. Differential Revision: https://reviews.llvm.org/D129107
2022-10-12[BOLT] Add pass to fix ambiguous memory referencesRafael Auler1-3/+6
This adds a round of checks to memory references, looking for incorrect references to jump table objects. Fix them by replacing the jump table reference with another object reference + offset. This solves bugs related to regular data references in code accidentally being bound to a jump table, and this reference being updated to a new (incorrect) location because we moved this jump table. Fixes #55004 Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D134098
2022-07-13[BOLT][AArch64] Handle gold linker veneersVladislav Khmelevsky1-4/+4
The gold linker veneers are written between functions without symbols, so we to handle it specially in BOLT. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D129260
2022-06-30[BOLT] Fix getDynoStats to handle BCs with no functionsAmir Ayupov1-2/+3
Address fuzzer crash Reviewed By: yota9 Differential Revision: https://reviews.llvm.org/D120696
2022-06-28Revert "[BOLT][AArch64] Handle gold linker veneers"Rafael Auler1-4/+4
This reverts commit 425dda76e9fac93117289fd68a2abdfb1e4a0ba5. This commit is currently causing BOLT to crash in one of our binaries and needs a bit more checking to make sure it is safe to land.
2022-06-28[BOLT][AArch64] Handle gold linker veneersVladislav Khmelevsky1-4/+4
The gold linker veneers are written between functions without symbols, so we to handle it specially in BOLT. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Differential Revision: https://reviews.llvm.org/D128082
2022-06-05[bolt] Remove unneeded cl::ZeroOrMore for cl::opt optionsFangrui Song1-77/+72
2022-06-04Remove unneeded cl::ZeroOrMore for cl::opt optionsFangrui Song1-42/+37
Similar to 557efc9a8b68628c2c944678c6471dac30ed9e8e. This commit handles options where cl::ZeroOrMore is more than one line below cl::opt.
2022-06-03[BOLT] Cache-Aware Tail Duplicationspupyrev1-7/+1
A new "cache-aware" strategy for tail duplication. Differential Revision: https://reviews.llvm.org/D123050
2022-03-08[BOLT] CMOVConversion passAmir Ayupov1-0/+9
Convert simple hammocks into cmov based on misprediction rate. Test Plan: - Assembly test: `cmov-conversion.s` - Testing on a binary: # Bootstrap clang with `-x86-cmov-converter-force-all` and `-Wl,--emit-relocs` (Release build) # Collect perf.data: - `clang++ <opts> bolt/lib/Core/BinaryFunction.cpp -E > bf.cpp` - `perf record -e cycles:u -j any,u -- clang-15 bf.cpp -O2 -std=c++14 -c -o bf.o` # Optimize clang-15 with and w/o -cmov-conversion: - `llvm-bolt clang-15 -p perf.data -o clang-15.bolt` - `llvm-bolt clang-15 -p perf.data -cmov-conversion -o clang-15.bolt.cmovconv` # Run perf experiment: - test: `clang-15.bolt.cmovconv`, - control: `clang-15.bolt`, - workload (clang options): `bf.cpp -O2 -std=c++14 -c -o bf.o` Results: ``` task-clock [delta: -360.21 ± 356.75, delta(%): -1.7760 ± 1.7589, p-value: 0.047951, balance: -6] instructions [delta: 44061118 ± 13246382, delta(%): 0.0690 ± 0.0207, p-value: 0.000001, balance: 50] icache-misses [delta: -5534468 ± 2779620, delta(%): -0.4331 ± 0.2175, p-value: 0.028014, balance: -28] branch-misses [delta: -1624270 ± 1113244, delta(%): -0.3456 ± 0.2368, p-value: 0.030300, balance: -22] ``` Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D120177
2022-02-04[BOLT][NFC] Fix compiler warningsAmir Ayupov1-1/+1
Summary: - variable 'TotalSize' set but not used - variable 'TotalCallsTopN' set but not used - use of bitwise '|' with boolean operands Reviewed By: maksfb FBD33911129
2022-01-07[BOLT][NFC] Refactor command line options in BinaryPassManagerMaksim Panchenko1-159/+91
Summary: Reformat code and put options in lexicographical order. Comparing to clang-format output, manual formatting looks cleaner to me. (cherry picked from FBD33481692)
2021-12-23[BOLTRewrite][NFC] Fix braces usagesMaksim Panchenko1-7/+4
Summary: Refactor bolt/*/Rewrite to follow the braces rule for if/else/loop from LLVM Coding Standards. (cherry picked from FBD33305364)
2021-12-21[BOLT][NFC] Fix file-description commentsMaksim Panchenko1-3/+1
Summary: Fix comments at the start of source files. (cherry picked from FBD33274597)
2021-12-18[BOLT] Fix profile and tests for nop-removal passMaksim Panchenko1-1/+1
Summary: Since nops are now removed in a separate pass, the profile is consumed on a CFG with nops. If previously a profile was generated without nops, the offsets in the profile could be different if branches included nops either as a source or a destination. This diff adjust offsets to make the profile reading backwards compatible. (cherry picked from FBD33231254)
2021-12-18[BOLT] Move disassemble optimizations to optimization passesVladislav Khmelevsky1-0/+4
Summary: The patch moves the shortenInstructions and nop remove to separate binary passes. As a result when llvm-bolt optimizations stage will begin the instructions of the binary functions will be absolutely the same as it was in the binary. This is needed for the golang support by llvm-bolt. Some of the tests must be changed, since bb alignment nops might create unreachable BBs in original functions. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei (cherry picked from FBD32896517)
2021-12-14[BOLT][NFC] Reformat with clang-formatMaksim Panchenko1-26/+17
Summary: Selectively apply clang-format to BOLT code base. (cherry picked from FBD33119052)
2021-12-01[BOLT] Add pass to normalize CFGMaksim Panchenko1-0/+9
Summary: Some optimizations may remove all instructions in a basic block. The pass will cleanup the CFG afterwards by removing empty basic blocks and merging duplicate CFG edges. The normalized CFG is printed under '-print-normalized' option. (cherry picked from FBD32774360)
2021-09-27Rebase: [BOLT] AsmDump: dump function assembly and profile infoRafael Auler1-0/+6
Summary: Added new functionality of dumping simple functions into assembly. This includes: - function control flow (basic blocks, instructions), - profile information as `FDATA` directives, to be consumed by link_fdata, - data labels, - CFI directives, - symbols for callee functions, - jump table symbols. Envisioned usage: 1. Find a function that triggers BOLT crash (e.g. with `bughunter.sh`). 2. Generate reproducer asm source for that function (using `-funcs`). 3. Attach it to an issue. 4. Reduce and include as a test case. Current limitations: 1. Emitted assembly won't match input file relocations. 2. No DWARF support. 3. Data is not emitted. (cherry picked from FBD32746857)
2021-10-08Rebase: [NFC] Refactor sources to be buildable in shared modeRafael Auler1-0/+549
Summary: Moves source files into separate components, and make explicit component dependency on each other, so LLVM build system knows how to build BOLT in BUILD_SHARED_LIBS=ON. Please use the -c merge.renamelimit=230 git option when rebasing your work on top of this change. To achieve this, we create a new library to hold core IR files (most classes beginning with Binary in their names), a new library to hold Utils, some command line options shared across both RewriteInstance and core IR files, a new library called Rewrite to hold most classes concerned with running top-level functions coordinating the binary rewriting process, and a new library called Profile to hold classes dealing with profile reading and writing. To remove the dependency from BinaryContext into X86-specific classes, we do some refactoring on the BinaryContext constructor to receive a reference to the specific backend directly from RewriteInstance. Then, the dependency on X86 or AArch64-specific classes is transfered to the Rewrite library. We can't have the Core library depend on targets because targets depend on Core (which would create a cycle). Files implementing the entry point of a tool are transferred to the tools/ folder. All header files are transferred to the include/ folder. The src/ folder was renamed to lib/. (cherry picked from FBD32746834)