aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineOutliner.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-09-09[MachineOutliner] Add profile guided outlining (#154437)Ellis Hoag1-10/+86
2025-08-27[DebugInfo] Drop extra DIBuilder::finalizeSubprogram() calls (NFC) (#155618)Vladislav Dzhidzhoev1-3/+0
After #139914, `DIBilder::finalize()` finalizes both declaration and definition DISubprograms. Therefore, there is no need to call `DIBuilder::finalizeSubprogram()` right before `DIBilder::finalize()`.
2025-06-30[MachineOutliner] Remove LOHs from outlined candidates (#143617)Ellis Hoag1-0/+17
Remove Linker Optimization Hints (LOHs) from outlining candidates instead of simply preventing outlining if LOH labels are found in the candidate. This will improve the effectiveness of the machine outliner when LOHs are enabled (which is the default). In https://discourse.llvm.org/t/loh-conflicting-with-machineoutliner/83279/1 it was observed that the machine outliner is much more effective when LOHs are disabled. Rather than completely disabling LOH, this PR aims to keep LOH in most places and removing them from outlined functions where it could be illegal. Note that we are conservatively removing all LOHs from outlined functions for simplicity, but I believe we could retain LOHs that are in the intersection of all candidates. It should be ok to remove these LOHs since these blocks are being outlined anyway, which will harm performance much more than the gain from keeping the LOHs.
2025-05-22[LLVM][CodeGen] Add convenience accessors for MachineFunctionProperties ↵users/pcc/spr/main.elf-add-branch-to-branch-optimizationRahul Joshi1-6/+5
(#140002) Add per-property has<Prop>/set<Prop>/reset<Prop> functions to MachineFunctionProperties.
2025-02-26[MachineOutliner] Add skipModule call for opt-bisect-limit. (#128836)Craig Topper1-0/+3
2025-01-13[aarch64][win] Update Called Globals info when updating Call Site info (#122762)Daniel Paoliello1-2/+2
Fixes the "use after poison" issue introduced by #121516 (see <https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>). The root cause of this issue is that #121516 introduced "Called Global" information for call instructions modeling how "Call Site" info is stored in the machine function, HOWEVER it didn't copy the copy/move/erase operations for call site information. The fix is to rename and update the existing copy/move/erase functions so they also take care of Called Global info.
2024-12-16[NFC] Remove some unnecessary semicolonsDavid Green1-1/+1
All inside LLVM_DEBUG, some of which have been cleaned up by adding block scopes to allow them to format more nicely.
2024-11-12[CodeGen] Remove unused includes (NFC) (#115996)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2024-10-23[llvm] Remove redundant calls to std::unique_ptr<T>::get (NFC) (#113415)Kazu Hirata1-1/+1
2024-09-19[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133)Jay Foad1-2/+1
It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.
2024-09-10Attempt to fix [CGData][MachineOutliner] Global Outlining (#90074) (#108037)Kyungwoo Lee1-1/+1
2024-09-10[CGData][MachineOutliner] Global Outlining (#90074)Kyungwoo Lee1-1/+253
This commit introduces support for outlining functions across modules using codegen data generated from previous codegen. The codegen data currently manages the outlined hash tree, which records outlining instances that occurred locally in the past. The machine outliner now operates in one of three modes: 1. CGDataMode::None: This is the default outliner mode that uses the suffix tree to identify (local) outlining candidates within a module. This mode is also used by (full)LTO to maintain optimal behavior with the combined module. 2. CGDataMode::Write (`-codegen-data-generate`): This mode is identical to the default mode, but it also publishes the stable hash sequences of instructions in the outlined functions into a local outlined hash tree. It then encodes this into the `__llvm_outline` section, which will be dead-stripped at link time. 3. CGDataMode::Read (`-codegen-data-use-path={.cgdata}`): This mode reads a codegen data file (.cgdata) and initializes a global outlined hash tree. This tree is used to generate global outlining candidates. Note that the codegen data file has been post-processed with the raw `__llvm_outline` sections from all native objects using the `llvm-cgdata` tool (or a linker, `LLD`, or a new ThinLTO pipeline later). This depends on https://github.com/llvm/llvm-project/pull/105398. After this PR, LLD (https://github.com/llvm/llvm-project/pull/90166) and Clang (https://github.com/llvm/llvm-project/pull/90304) will follow for each client side support. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-09-04[MachineOutliner] Preserve instruction bundles (#106402)Simon Tatham1-4/+3
When the machine outliner copies instructions from a source function into an outlined function, it was doing it using `CloneMachineInstr`, which is documented as not preserving the interior of any instruction bundle. So outlining code that includes an instruction bundle would fail, because in the outlined version, the bundle would be empty, so instructions would go missing in the move. This occurs when any bundled instruction appears in the outlined code, so there was no need to construct an unusual test case: I've just copied a function from the existing `stp-opt-with-renaming.mir`, which happens to contain an SVE instruction bundle. Including two identical copies of that function makes the outliner merge them, and then we check that it didn't destroy the interior of the bundle in the process.
2024-08-27[MachineOutliner][NFC] Refactor (#105398)Kyungwoo Lee1-32/+38
This patch prepares the NFC groundwork for global outlining using CGData, which will follow https://github.com/llvm/llvm-project/pull/90074. - The `MinRepeats` parameter is now explicitly passed to the `getOutliningCandidateInfo` function, rather than relying on a default value of 2. For local outlining, the minimum number of repetitions is typically 2, but for the global outlining (mentioned above), we will optimistically create a single `Candidate` for each `OutlinedFunction` if stable hashes match a specific code sequence. This parameter is adjusted accordingly in global outlining scenarios. - I have also implemented `unique_ptr` for `OutlinedFunction` to ensure safe and efficient memory management within `FunctionList`, avoiding unnecessary implicit copies. This depends on https://github.com/llvm/llvm-project/pull/101461. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
2024-07-24MachineOutliner: Use PM to query MachineModuleInfo (#99688)Matt Arsenault1-22/+21
Avoid getting this from the MachineFunction
2024-07-01[llvm][CodeGen] Avoid 'raw_string_ostream::str' (NFC) (#97318)Youngsuk Kim1-2/+1
Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.
2024-06-29[IRBuilder] Don't include Module.h (NFC) (#97159)Nikita Popov1-0/+1
This used to be necessary to fetch the DataLayout, but isn't anymore.
2024-06-18[MachineOutliner] Leaf Descendants (#90275)Xuan Zhang1-1/+7
This PR depends on https://github.com/llvm/llvm-project/pull/90264 In the current implementation, only leaf children of each internal node in the suffix tree are included as candidates for outlining. But all leaf descendants are outlining candidates, which we include in the new implementation. This is enabled on a flag `outliner-leaf-descendants` which is default to be true. The reason for _enabling this on a flag_ is because machine outliner is not the only pass that uses suffix tree. The reason for _having this default to be true_ is because including all leaf descendants show consistent size win. * For Clang/LLD, it shows around 3% reduction in text segment size when compared to the baseline `-Oz` linker binary. * For selected benchmark tests in LLVM test suite | run (CTMark/) | only leaf children | all leaf descendants | reduction % | |------------------|--------------------|----------------------|-------------| | lencod | 349624 | 348564 | -0.2004% | | SPASS | 219672 | 218440 | -0.4738% | | kc | 271956 | 250068 | -0.4506% | | sqlite3 | 223920 | 222484 | -0.5471% | | 7zip-benchmark | 405364 | 401244 | -0.3428% | | bullet | 139820 | 138340 | -0.8315% | | consumer-typeset | 295684 | 286628 | -1.2295% | | pairlocalalign | 72236 | 71936 | -0.2164% | | tramp3d-v4 | 189572 | 183676 | -2.9668% | This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
2024-06-07[MachineOutliner] Sort by Benefit to Cost Ratio (#90264)Xuan Zhang1-2/+4
This PR depends on https://github.com/llvm/llvm-project/pull/90260 We changed the order in which functions are outlined in Machine Outliner. The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size. |run (CTMark/)   |baseline (1)|priority (2)|diff (1 -> 2)| |----------------|------------|------------|-------------| |lencod          |349624      |349264      |-0.1030%     | |SPASS           |219672      |219480      |-0.0874%     | |kc              |271956      |251200      |-7.6321%     | |sqlite3         |223920      |223708      |-0.0947%     | |7zip-benchmark  |405364      |402624      |-0.6759%     | |bullet          |139820      |139500      |-0.2289%     | |consumer-typeset|295684      |290196      |-1.8560%     | |pairlocalalign  |72236       |72092       |-0.1993%     | |tramp3d-v4      |189572      |189292      |-0.1477%     | This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
2024-06-03[MachineOutliner] Efficient Implementation of ↵Xuan Zhang1-10/+11
MachineOutliner::findCandidates() (#90260) This reduce the time complexity of the main loop of `findCandidates()` method from $O(n^2)$ to $O(n \log n)$. For small $n$, the modification does not regress the build time, but it helps significantly when $n$ is large. For one application, this reduces the runtime of the main loop from 120 seconds to 28 seconds. This is the first commit for an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
2024-06-02[IR] Do not set `none` for function uwtable (#93387)Joshua Cao1-2/+1
This avoids the pitfall where we set the uwtable to none: ``` func.setUWTableKind(llvm::UWTableKind::None) ``` `Attribute::getAsString()` would see an unknown attribute and fail an assertion. In this patch, we assert that we do not see a None uwtable kind. This also skips the check of `UWTableKind::Async`. It is dominated by the check of `UWTableKind::Default`, which has the same enum value (nfc).
2024-03-11[CodeGen] Do not pass MF into MachineRegisterInfo methods. NFC. (#84770)Jay Foad1-1/+1
MachineRegisterInfo already knows the MF so there is no need to pass it in as an argument.
2024-01-23[MachineOutliner] Refactor iterating over Candidate's instructions (#78972)Anatoly Trosinenko1-14/+13
Make Candidate's front() and back() functions return references to MachineInstr and introduce begin() and end() returning iterators, the same way it is usually done in other container-like classes. This makes possible to iterate over the instructions contained in Candidate the same way one can iterate over MachineBasicBlock (note that begin() and end() return bundled iterators, just like MachineBasicBlock does, but no instr_begin() and instr_end() are defined yet).
2023-05-15[MachineOutliner] NFC: Add debug output to MachineOutliner::outlineJessica Paquette1-2/+36
Add some debug output to `outline` to assist in debugging + understanding the code. This will say - How many things we found worth turning into outlined functions - Whether or not candidates were pruned via the outlining algorithm - The function created (if it was created) - Where the calls were inserted - What instruction was used to create the call Sample output below: ``` NUMBER OF POTENTIAL FUNCTIONS: 5 WALKING FUNCTION LIST PRUNED: 0/2 candidates OUTLINE: Expected benefit (12 B) > threshold (1 B) NEW FUNCTION: OUTLINED_FUNCTION_0 CREATE OUTLINED CALLS CALL: OUTLINED_FUNCTION_0 in bar:<unknown> .. BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp CALL: OUTLINED_FUNCTION_0 in bar:<unknown> .. BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) PRUNED: 0/2 candidates OUTLINE: Expected benefit (8 B) > threshold (1 B) NEW FUNCTION: OUTLINED_FUNCTION_1 CREATE OUTLINED CALLS CALL: OUTLINED_FUNCTION_1 in bar:<unknown> .. BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp CALL: OUTLINED_FUNCTION_1 in bar:<unknown> .. BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) PRUNED: 2/2 candidates SKIP: Expected benefit (0 B) < threshold (1 B) ```
2023-04-10[MachineOutliner] Add IsOutlined to MachineFunctionwangpc1-0/+1
We add a field `IsOutlined` to indicate whether a MachineFunction is outlined and set it true for outlined functions in MachineOutliner. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D146191
2023-04-08[Outliner] Add an option to only enable outlining of patterns above a ↵Nathan Lanza1-2/+7
certain threshold Outlining isn't always a win when the saved instruction count is >= 1. The overhead of representing a new function in the binary depends on exception metadata and alignment. So parameterize this for local tuning. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D136774
2023-03-20[NFC][Outliner] Delete default ctors for Candidate & OutlinedFunction.Amara Emerson1-5/+5
I think it's good practice to avoid having default ctors unless they're really valid/useful. For OutlinedFunction the default ctor was used to represent a bail-out value for getOutliningCandidateInfo(), so I changed the API to return an optional<getOutliningCandidateInfo> instead which seems a tad cleaner. Differential Revision: https://reviews.llvm.org/D146375
2023-02-03[MachineOutliner] Improve mapper statisticsJessica Paquette1-7/+10
Add a test for statistics as well. The mapper size stats were nested in a loop unnecessarily. Move them out. Give existing stats better names, and add one which also tracks the number of sentinels added.
2023-02-03[MachineOutliner] NFC: Add debug output to populateMapperJessica Paquette1-10/+24
Adding debug output to improve outliner debuggability + testability. Move `nooutline` attribute test into the new debug output test.
2023-02-03[MachineOutliner] NFC: Add debug output to overlap pruning codeJessica Paquette1-16/+40
This had no debug output. Since it was committed as NFC, it had no testcase. The me of today was nerdsniped by the me of 6 years ago and decided that this ought to have a testcase and some debug output.
2023-02-03[MachineOutliner] NFC: Pull variable out from erase_ifJessica Paquette1-8/+8
`Mapper.UnsignedVec.begin()` never changes throughout the call to `erase_if`, so no need to recalculate it. Also drop some redundant braces.
2023-02-03[NFC] Remove redundant check for MBB being empty in outlinerJessica Paquette1-1/+1
If the size is < 2, then we just break anyway.
2023-02-03[NFC] Remove unneccessary `llvm::` in MachineOutliner/SuffixTreeJessica Paquette1-11/+11
We have `using llvm`, we don't need to say `llvm::`.
2023-02-03[NFC] Use SmallVector/ArrayRef in MachineOutliner/SuffixTree for small typesJessica Paquette1-8/+8
The MachineOutliner + SuffixTree both used `std::vector` everywhere because I didn't know any better at the time. At least for small types, such as `unsigned` and iterators, I can't see any particular reason to use std::vector over `SmallVector` here.
2023-02-03[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"Jessica Paquette1-25/+64
Recommit with bug fixes + added testcases to the outliner. Also adds some debug output. We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.
2022-12-22[IR/MachineOutliner] Add a "nooutline" function attr and respect itJessica Paquette1-3/+6
Add `nooutline` + update LangRef to say it exists. This makes it possible to say "don't outline from this function ever." We want to be able to toggle whether or not a function should be in the search set regardless of default behaviour. Add testcases for the IR Outliner + Machine Outliner. Also remove an unnecessary check for an empty function in the Machine Outliner. Differential Revision: https://reviews.llvm.org/D140438
2022-12-02[CodeGen] Use std::nullopt instead of None (NFC)Kazu Hirata1-1/+2
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-06-10Fix interaction of CFI instructions with MachineOutliner.Eli Friedman1-8/+11
1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction. Fixes https://github.com/llvm/llvm-project/issues/55842 Differential Revision: https://reviews.llvm.org/D126930
2022-06-05[llvm] Convert for_each to range-based for loops (NFC)Kazu Hirata1-3/+4
2022-03-16Cleanup codegen includesserge-sans-paille1-0/+2
This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
2022-03-10Revert "Cleanup codegen includes"Nico Weber1-2/+0
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10Cleanup codegen includesserge-sans-paille1-0/+2
after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169
2022-02-23Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""Jessica Paquette1-42/+25
This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781. This commit was not NFC. (See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)
2022-02-21[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"Jessica Paquette1-25/+42
We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.
2022-02-17[MachineOutliner] Add statistics for unsigned vector sizeJessica Paquette1-0/+16
Useful for debugging + evaluating improvements to the outliner. Stats are the number of illegal, legal, and invisible instructions in the unsigned vector, and it's total length.
2022-02-14Extend the `uwtable` attribute with unwind table kindMomchil Velikov1-0/+9
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables. Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of *not* having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size. That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed. This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114543
2021-12-01[ARM] Implement BTI placement pass for PACBTI-MTies Stuij1-14/+3
This patch implements a new MachineFunction in the ARM backend for placing BTI instructions. It is similar to the existing AArch64 aarch64-branch-targets pass. BTI instructions are inserted into basic blocks that: - Have their address taken - Are the entry block of a function, if the function has external linkage or has its address taken - Are mentioned in jump tables - Are exception/cleanup landing pads Each BTI instructions is placed in the beginning of a BB after the so-called meta instructions (e.g. exception handler labels). Each outlining candidate and the outlined function need to be in agreement about whether BTI placement is enabled or not. If branch target enforcement is disabled for a function, the outliner should not covertly enable it by emitting a call to an outlined function, which begins with BTI. The cost mode of the outliner is adjusted to account for the extra BTI instructions in the outlined function. The ARM Constant Islands pass will maintain the count of the jump tables, which reference a block. A `BTI` instruction is removed from a block only if the reference count reaches zero. PAC instructions in entry blocks are replaced with PACBTI instructions (tests for this case will be added in a later patch because the compiler currently does not generate PAC instructions). The ARM Constant Island pass is adjusted to handle BTI instructions correctly. Functions with static linkage that don't have their address taken can still be called indirectly by linker-generated veneers and thus their entry points need be marked with BTI or PACBTI. The changes are tested using "LLVM IR -> assembly" tests, jump tables also have a MIR test. Unfortunately it is not possible add MIR tests for exception handling and computed gotos because of MIR parser limitations. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Mikhail Maltsev - Momchil Velikov - Ties Stuij Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D112426
2021-11-17Fix the side effect of outlined function when the register is implicit use ↵DianQK1-1/+4
and implicit-def in the same instruction. This is the diff associated with {D95267}, and we need to mark $x0 as live whether or not $x0 is dead. The compiler also needs to mark register $x0 as live in for the following case. ``` $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def $x0 ``` This change fixes an issue where the wrong registers were used when -machine-outliner-reruns>0. As an example: ``` lang=c typedef struct { double v1; double v2; } D16; typedef struct { D16 v1; D16 v2; } D32; typedef long long LL8; typedef struct { long long v1; long long v2; } LL16; typedef struct { LL16 v1; LL16 v2; } LL32; typedef struct { LL32 v1; LL32 v2; } LL64; LL8 needx0(LL8 v0, LL8 v1); void bar(LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8); LL8 foo(LL8 v0, LL64 v1, LL32 v2, LL16 v3, LL32 v4, LL8 v5, D16 v6, D16 v7, D16 v8) { LL8 result = needx0(v0, 0); bar(v1, v2, v3, v4, v5, v6, v7, v8); return result + 1; } ``` As you can see from the `foo` function, we should not modify the value of `x0` until we call `needx0`. This code is compiled to give the following instruction MIR code. ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ... $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ... ``` Since there are some other instruction sequences that duplicate `foo`, after the first execution of Machine Outliner you will get: ``` $sp = frame-setup SUBXri $sp, 256, 0 frame-setup STPDi killed $d13, killed $d12, $sp, 16 frame-setup STPDi killed $d11, killed $d10, $sp, 18 frame-setup STPDi killed $d9, killed $d8, $sp, 20 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ... ``` For the first time we outlined the following sequence: ``` frame-setup STPXi killed $x26, killed $x25, $sp, 22 frame-setup STPXi killed $x24, killed $x23, $sp, 24 frame-setup STPXi killed $x22, killed $x21, $sp, 26 frame-setup STPXi killed $x20, killed $x19, $sp, 28 ``` and ``` $x1 = MOVZXi 0, 0 BL @needx0, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $x1, implicit-def $sp, implicit-def $x0 ``` When we execute the outline again, we will get: ``` $x0 = ORRXrs $xzr, $lr, 0 <---- here BL @OUTLINED_FUNCTION_2_0, implicit-def $lr, implicit $sp, implicit-def $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $d8, implicit $d9, implicit $d10, implicit $d11, implicit $d12, implicit $d13, implicit $x0 $lr = ORRXrs $xzr, $x0, 0 $x7 = ORRXrs $xzr, $lr, 0 BL @OUTLINED_FUNCTION_0, implicit-def $lr, implicit $sp, implicit-def $lr, implicit $sp, implicit $xzr, implicit $x7, implicit $x19, implicit $x20, implicit $x21, implicit $x22, implicit $x23, implicit $x24, implicit $x25, implicit $x26 $lr = ORRXrs $xzr, $x7, 0 ... BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` When calling `OUTLINED_FUNCTION_2_0`, we used `x0` to save the `lr` register. The reason for the above error appears to be that: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp ``` should be: ``` BL @OUTLINED_FUNCTION_1, implicit-def $lr, implicit $sp, implicit-def $lr, implicit-def $sp, implicit-def $x0, implicit-def $x1, implicit $sp, implicit $x0 ``` When processing the same instruction with both `implicit-def $x0` and `implicit $x0` we should keep `implicit $x0`. A reproducible demo is available at: [https://github.com/DianQK/reproduce_outlined_function_use_live_x0](https://github.com/DianQK/reproduce_outlined_function_use_live_x0). Reviewed By: jinlin Differential Revision: https://reviews.llvm.org/D112911
2021-03-03Add the use of register r for outlined function when register r is live in ↵Jin Lin1-1/+1
and defined later. The compiler needs to mark register $x0 as live in for the following case. $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def dead $x0 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D95267
2021-02-15[CodeGen] Use range-based for loops (NFC)Kazu Hirata1-2/+1