aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineBlockPlacement.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-06-23[CodeGen][CodeLayout] Fix segfault on access to deleted block in MBP. (#142357)Afanasyev Ivan1-10/+5
Problem 1: There is a typo which reassigns `BlockWorkList` to `EHPadWorkList` on attempt to remove `RemBB` from work lists. Problem 2: `Chain->UnscheduledPredecessors == 0` is an incorrect way to check whether `RemBB` is enqueued or not. The root cause is a postponed deletion of `WorkList` from already scheduled blocks in `selectBestCandidateBlock`. Bug happens in the following scenario: * `FunctionChain` is being processed with non-zero `UnscheduledPredecessors` * Block `B'` is added to the `BlockWorkList` * Block `B'` is chosen as the best successor (`selectBestSuccessor`) for some another block and added into `Chain` * Block `B'` is removed by tail duplicator. `RemovalCallback` erroneously won't erase `B'` from `BlockWorkList`, because `UnscheduledPredecessors` value of `FunctionChain` is not zero (and it is allowed to be non-zero). Proposed solution is to always cleanup worklists on block deletion by tail duplicator.
2025-06-20[CodeGen] Limit number of analyzed predecessorsAlexis Engelke1-0/+16
MachineBlockPlacement has quadratic runtime in the number of predecessors: in some situation, for an edge, all predecessors of the successor are considered. Limit the number of considered predecessors to bound compile time for large functions. Pull Request: https://github.com/llvm/llvm-project/pull/142584
2025-04-27[llvm] Use range constructors of *Set (NFC) (#137552)Kazu Hirata1-6/+6
2025-04-17[CodeGen][NPM] Port MachineBlockPlacementStats to NPM (#129853)Akshat Oke1-12/+33
2025-04-14[CodeGen] Avoid repeated hash lookups (NFC) (#135584)Kazu Hirata1-4/+6
2025-03-14[CodeGen][NPM] Port MachineBlockPlacement to NPM (#129828)Akshat Oke1-33/+94
2025-03-04[CodeGen] Avoid repeated hash lookups (NFC) (#129652)Kazu Hirata1-2/+2
2025-01-22[CodeGen] Avoid repeated hash lookups (NFC) (#123894)Kazu Hirata1-3/+3
2025-01-08[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955)Ryan Mansfield1-1/+1
2024-11-18[CodeLayout] Do not rebuild chains with -apply-ext-tsp-for-size (#115934)Ellis Hoag1-5/+7
https://github.com/llvm/llvm-project/pull/109711 disables `buildCFGChains()` when `-apply-ext-tsp-for-size` is used to improve codesize. Tail merging can change the layout and normally requires `buildCFGChains()` to be called again, but we want to prevent this when optimizing for codesize. We saw slight size improvement on large binaries with this change. If `-apply-ext-tsp-for-size` is not used, this should be a NFC.
2024-11-12[CodeLayout] Do not flip branch condition when using optsize (#114607)Ellis Hoag1-19/+25
* Do not use profile data when flipping a branch condition when optimizing for size. This should improving outlining and ICF due to more uniform instruction sequences. * Refactor `optimizeBranches()` to use early `continue`s * Use the correct debug location for `insertBranch()`
2024-10-28Check hasOptSize() in shouldOptimizeForSize() (#112626)Ellis Hoag1-4/+1
2024-10-10[CodeLayout] Do not verify after assigning blocks (#111754)Ellis Hoag1-6/+1
Rather than invariantly running `F->verify()` when asserts are enabled, run machine IR verification in LIT tests only. Swap `CHECK-PERF` and `CHECK-SIZE` in `code_placement_ext_tsp_large.ll`. Remove `={0,1,true,false}` from flags in tests.
2024-10-02[CodeLayout] Size-aware machine block placement (#109711)spupyrev1-33/+84
This is an implementation of a new "size-aware" machine block placement. The idea is to reorder blocks so that the number of fall-through jumps is maximized. Observe that profile data is ignored for the optimization, and it is applied only for instances with hasOptSize()=true. This strategy has two benefits: (i) it eliminates jump instructions, which results in smaller text size; (ii) we avoid using profile data while reordering blocks, which yields more "uniform" functions, thus helping ICF and machine outliner/merger. For large (mobile) apps, the size benefits of (i) and (ii) are roughly the same, combined providing up to 0.5% uncompressed and up to 1% compressed savings size on top of the current solution. The optimization is turned off by default.
2024-09-26[NFC][CodeLayout] Remove unused parameter (#110145)Ellis Hoag1-6/+5
The `NodeCounts` parameter of `calcExtTspScore()` is unused, so remove it. Use `SmallVector` since arrays are expected to be small since they represent MBBs.
2024-09-24llvm-reduce: Don't print verifier failed machine functions (#109673)Matt Arsenault1-1/+1
This produces far too much terminal output, particularly for the instruction reduction. Since it doesn't consider the liveness of of the instructions it's deleting, it produces quite a lot of verifier errors.
2024-09-24[CodeLayout][NFC] Format and minor refactoring of MBP (#109729)spupyrev1-281/+274
This PR has two (NFC) commits: - clang-format MBP - move a part of tail duplication and block aligning into helper functions for better readability.
2024-08-08[CodeGen] Don't renumber invalid domtree (#102427)Alexis Engelke1-1/+5
Machine block placement might remove nodes from the function but does not update the dominator tree accordingly. Instead of renumbering (which might crash due to accessing removed blocks), set the domtree to null to make clear that it is invalid at this point. Fixup of #102107.
2024-08-06[CodeGen] Use optimized domtree for MachineFunction (#102107)Alexis Engelke1-0/+1
The dominator tree gained an optimization to use block numbers instead of a DenseMap to store blocks. Given that machine basic blocks already have numbers, expose these via appropriate GraphTraits. For debugging, block number epochs are added to MachineFunction -- this greatly helps in finding uses of block numbers after RenumberBlocks(). In a few cases where dominator trees are preserved across renumberings, the dominator tree is updated to use the new numbers.
2024-07-24[CodeGen] Add an option to skip extTSP BB placement for huge functions. (#99310)Krzysztof Pszeniczny1-1/+8
The extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain relatively rare kinds of autogenerated code. This patch adds an `-mllvm` flag to optionally restrict extTSP only to functions smaller than a specified threshold. While commit bcdc0477319a26fd8dcdde5ace3bdd6743599f44 added a knob to to limit the maximum chain size, it's still possible that for certain huge functions the number of chains is very large, leading to a quadratic behaviour in ExtTSPImpl::mergeChainPairs.
2024-07-24[CodeGen] Restore MachineBlockPlacement block ordering (#99351)John Brawn1-2/+14
PR #91843 changed the algorithm used to find the next unplaced block so that it iterates through the blocks in BlockFilter instead of iterating through the blocks in the function and checking if they are in the block filter. Unfortunately this sometimes results in a different block ordering being chosen, as the order of blocks in BlockFilter comes from the order in MachineLoopInfo, and in some cases this differs from the order they are in the function. This can also give an end result that has worse performance. Fix this by making collectLoopBlockSet place blocks in its output in the order that they are in the function.
2024-07-12[CodeGen][NewPM] Port `machine-block-freq` to new pass manager (#98317)paperchalice1-6/+6
- Add `MachineBlockFrequencyAnalysis`. - Add `MachineBlockFrequencyPrinterPass`. - Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager. - `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new pass manager migration.
2024-07-09[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793)paperchalice1-3/+3
- Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-06-28Reapply "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" ↵paperchalice1-6/+6
(#96858) (#96869) This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156. In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR when targeting PowerPC. Move test to `llc/new-pm`, which is X86 specific.
2024-06-27Revert "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858)paperchalice1-6/+6
Reverts llvm/llvm-project#96389 Some ppc bots failed.
2024-06-27[CodeGen][NewPM] Port machine-branch-prob to new pass manager (#96389)paperchalice1-6/+6
Like IR version `print<branch-prob>`, there is also a `print<machine-branch-prob>`.
2024-06-13[Codegen] (NFC) Faster algorithm for MachineBlockPlacement (#91843)William Junda Huang1-31/+85
In MachineBlockPlacement, the function getFirstUnplacedBlock is inefficient because in most cases (for usual loop CFG), this function fails to find a candidate, and its complexity becomes O(#(loops in function) * #(blocks in function)). This makes the compilation of very long functions slow. This update reduces it to O(k * #(blocks in function)) where k is the maximum loop nesting depth, by iterating through the BlockFilter instead.
2024-06-12[CodeGen][NewPM] Split `MachinePostDominators` into a concrete analysis ↵paperchalice1-4/+4
result (#95113) `MachinePostDominators` version of #94571.
2024-05-07[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of ↵Kazu Hirata1-1/+1
StringRef::equals (NFC) (#91304) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 53 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
2024-02-03[CodeGen] Use range-based for loops (NFC)Kazu Hirata1-2/+2
2023-11-21[MachineBlockPlacement][X86] Use max of MDAlign and TLIAlign to align Loops. ↵Freddy Ye1-4/+26
(#71026) This patch added backend consumption on a new loop metadata: !1 = !{!"llvm.loop.align", i32 64} which is generated from clang's new loop attribute: [[clang::code_align()]] clang patch: #70762
2023-10-24[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)Kazu Hirata1-1/+1
C++20 comes with std::erase to erase a value from std::vector. This patch renames llvm::erase_value to llvm::erase for consistency with C++20. We could make llvm::erase more similar to std::erase by having it return the number of elements removed, but I'm not doing that for now because nobody seems to care about that in our code base. Since there are only 50 occurrences of erase_value in our code base, this patch replaces all of them with llvm::erase and deprecates llvm::erase_value.
2023-10-05BlockFrequencyInfo: Add PrintBlockFreq helper (#67512)Matthias Braun1-10/+11
- Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions into a `PrintBlockFreq()` function returning a `Printable` object. This simplifies usage as it can be directly piped to a `raw_ostream` like `dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`. - Previously there was an interesting behavior where `BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number and as an `uint64_t`. Most algorithms use the `BlockFrequency` abstraction with the integers, the print function for basic blocks printed the `Scaled64` number potentially showing higher accuracy than was used by the algorithm. This changes things to only print `BlockFrequency` values. - Replace some instances of `dbgs() << Freq.getFrequency()` with the new function.
2023-10-05Use BlockFrequency type in more places (NFC) (#68266)Matthias Braun1-72/+73
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it more consistently in various APIs and disable implicit conversion to make usage more consistent and explicit. - Use `BlockFrequency Freq` parameter for `setBlockFreq`, `getProfileCountFromFreq` and `setBlockFreqAndScale` functions. - Return `BlockFrequency` in `getEntryFreq()` functions. - While on it change some `const BlockFrequency& Freq` parameters to plain `BlockFreqency Freq`. - Mark `BlockFrequency(uint64_t)` constructor as explicit. - Add missing `BlockFrequency::operator!=`. - Remove `uint64_t BlockFreqency::getMaxFrequency()`. - Add `BlockFrequency BlockFrequency::max()` function.
2023-09-21[CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFCFangrui Song1-4/+4
* Place types and functions in the llvm::codelayout namespace * Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t> to a struct and utilize structured bindings. It is not conventional to use the "T" suffix for structure types. * Remove a redundant copy in ChainT::merge. * Change {ExtTSPImpl,CDSortImpl}::run to use return value instead of an output parameter * Rename applyCDSLayout to computeCacheDirectedLayout: (a) avoid rare abbreviation "CDS" (cache-directed sort) (b) "compute" is more conventional for the specific use case * Change the parameter types from std::vector to ArrayRef so that SmallVector arguments can be used. * Similarly, rename applyExtTspLayout to computeExtTspLayout. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D159526
2023-09-14[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes ↵Arthur Eubanks1-2/+2
(#66295) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::
2023-06-21[MBP] Enable duplicating return block to remove jump to returnGuozhi Wei1-1/+1
Sometimes LLVM generates branch to return instruction, like PR63227. It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds we avoid duplicating a BB into another already placed BB to prevent destroying computed layout. But if the successor BB is a return block, duplicating it will only reduce taken branches without hurt to any other branches. Differential Revision: https://reviews.llvm.org/D153093
2023-04-20Fix uninitialized class membersAkshay Khadse1-1/+1
Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148692
2023-04-17Fix uninitialized pointer members in CodeGenAkshay Khadse1-11/+11
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148303
2023-03-14[CodeGen] Use *{Set,Map}::contains (NFC)Kazu Hirata1-1/+1
2023-02-14Move global namespace cl::opt inside llvm::Fangrui Song1-2/+1
2022-11-07[NFC][BlockPlacement]Add an option to renumber blocks based on function ↵Mingming Liu1-0/+14
layout order. Use case: - When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order. - As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order. This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only. Use https://godbolt.org/z/5WTW36bMr as an example: - As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`); `bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout. - Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered. - `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map. - [[ https://github.com/llvm/llvm-project/blob/b5626ae9751f0d82aa04791a21689b289721738e/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp#L127 | DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ https://github.com/llvm/llvm-project/blob/a8d93783f37c042ace67069ae4ca6f8fd849c2d0/llvm/include/llvm/Support/GraphWriter.h#L209 | called by graph writer ]]. So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other. [1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm -filter-print-funcs=_Z9func_loopv test.c` [2] {F25201785} Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D137467
2022-08-24extending code layout algspupyrev1-4/+4
The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895. The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D129893
2022-07-17[CodeGen] Qualify auto variables in for loops (NFC)Kazu Hirata1-4/+4
2022-06-16[MachineBlockPlacementStats] Added check for "-filter-print-funcs"Mingming Liu1-0/+4
option to the machine-block-placement-stats. Differential Revision: https://reviews.llvm.org/D128019
2022-06-16Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` ↵Mingming Liu1-4/+0
option to machine-block-placement stats." This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to add differential revision link to commit message and re-commit.
2022-06-16[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to ↵Mingming Liu1-0/+4
machine-block-placement stats.
2022-03-16Cleanup codegen includesserge-sans-paille1-1/+1
This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
2022-03-10Revert "Cleanup codegen includes"Nico Weber1-1/+1
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10Cleanup codegen includesserge-sans-paille1-1/+1
after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169