aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineSink.cpp
AgeCommit message (Collapse)AuthorFilesLines
4 days[MachineSink] Remove subrange of live-ins from super register as well. (#159145)Pete Chou1-4/+2
Post-RA machine sinking could sink a copy of sub-register into a successor. However, the sub-register might not be removed from the live-in bitmask of its super register in successor and then a later pass, e.g, if-converter, may add an implicit use of the register from live-in resulting in an use of an undefined register. This change makes sure subrange of live-ins from super register could be removed as well.
2025-07-15[CodeGen] Use setNoVRegs. NFC. (#148831)Jay Foad1-2/+1
2025-07-10[CodeGen][NewPM] Port "PostRAMachineSink" pass to NPM (#129690)Vikram Hegde1-29/+53
2025-06-12[DLCov][NFC] Propagate annotated DebugLocs through transformations (#138047)Stephen Tozer1-2/+2
Part of the coverage-tracking feature, following #107279. In order for DebugLoc coverage testing to work, we firstly have to set annotations for intentionally-empty DebugLocs, and secondly we have to ensure that we do not drop these annotations as we propagate DebugLocs throughout compilation. As the annotations exist as part of the DebugLoc class, and not the underlying DILocation, they will not survive a DebugLoc->DILocation->DebugLoc roundtrip. Therefore this patch modifies a number of places in the compiler to propagate DebugLocs directly rather than via the underlying DILocation. This has no effect on the output of normal builds; it only ensures that during coverage builds, we do not drop incorrectly annotations and therefore create false positives. The bulk of these changes are in replacing DILocation::getMergedLocation(s) with a DebugLoc equivalent, and in changing the IRBuilder to store a DebugLoc directly rather than storing DILocations in its general Metadata array. We also use a new function, `DebugLoc::orElse`, which selects the "best" DebugLoc out of a pair (valid location > annotated > empty), preferring the current DebugLoc on a tie - this encapsulates the existing behaviour at a few sites where we _may_ assign a DebugLoc to an existing instruction, while extending the logic to handle annotation DebugLocs at the same time.
2025-05-22[LLVM][CodeGen] Add convenience accessors for MachineFunctionProperties ↵users/pcc/spr/main.elf-add-branch-to-branch-optimizationRahul Joshi1-2/+1
(#140002) Add per-property has<Prop>/set<Prop>/reset<Prop> functions to MachineFunctionProperties.
2025-03-29[CodeGen] Use llvm::append_range (NFC) (#133603)Kazu Hirata1-2/+1
2025-03-03[MachineSink] Fix typo in loop sinking (#127133)Jeffrey Byrnes1-1/+1
Failure to sink a candidate should not block us from attempting to sink other candidates. There are mechanisms in place to handle the case where the failed to be sunk instruction uses an instruction that gets sunk (we do not delete the original instruction corresponding with the sunk instruction if it still has uses).
2025-03-03[CodeGen][NewPM] Port MachineSink to NPM (#115434)Akshat Oke1-45/+129
Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)
2025-03-01[MachineSink] Use Register and MCRegUnit. NFCCraig Topper1-17/+17
2025-01-23[CodeGen] Fix a warningKazu Hirata1-2/+1
This patch fixes: llvm/lib/CodeGen/MachineSink.cpp:1667:22: error: unused variable 'Preheader' [-Werror,-Wunused-variable]
2025-01-23[MachineSink] Extend loop sinking capability (#117247)Jeffrey Byrnes1-91/+176
The current MIR cycle sinking capabilities are rather limited. It only support sinking copies into a single successor block while obeying limits. This opt-in feature adds a more aggressive option, that is not limited to the above concerns. The feature will try to "sink" by duplicating any top-level preheader instruction (that we are sure is safe to sink) into any user block, then does some dead code cleanup. In particular, this is useful for high RP situations when loop bodies have control flow.
2025-01-18[CodeGen] Avoid repeated hash lookups (NFC) (#123447)Kazu Hirata1-2/+3
2024-12-18[MachineSink] Use `RegisterClassInfo::getRegPressureSetLimit` (#119830)Pengcheng Wang1-1/+1
`RegisterClassInfo::getRegPressureSetLimit` is a wrapper of `TargetRegisterInfo::getRegPressureSetLimit` with some logics to adjust the limit by removing reserved registers. It seems that we shouldn't use `TargetRegisterInfo::getRegPressureSetLimit` directly, just like the comment "This limit must be adjusted dynamically for reserved registers" said. Separate from https://github.com/llvm/llvm-project/pull/118787
2024-12-13Reapply "[DomTreeUpdater] Move critical edge splitting code to updater" ↵paperchalice1-1/+5
(#119547) This relands commit #115111. Use traditional way to update post dominator tree, i.e. break critical edge splitting into insert, insert, delete sequence. When splitting critical edges, the post dominator tree may change its root node, and `setNewRoot` only works in normal dominator tree... See https://github.com/llvm/llvm-project/blob/6c7e5827eda26990e872eb7c3f0d7866ee3c3171/llvm/include/llvm/Support/GenericDomTree.h#L684-L687
2024-12-11Revert "[DomTreeUpdater] Move critical edge splitting code to updater" (#119512)paperchalice1-5/+1
Reverts llvm/llvm-project#115111 Causes #119511
2024-12-11[DomTreeUpdater] Move critical edge splitting code to updater (#115111)paperchalice1-1/+5
Support critical edge splitting in dominator tree updater. Continue the work in #100856. Compile time check: https://llvm-compile-time-tracker.com/compare.php?from=87c35d782795b54911b3e3a91a5b738d4d870e55&to=42b3e5623a9ab4c3648564dc0926b36f3b438a3a&stat=instructions%3Au
2024-11-25[TTI][RISCV] Unconditionally break critical edges to sink ADDI (#108889)Philip Reames1-1/+3
This looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving. ``` define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb: %i = icmp eq i32 %arg1, 1 br i1 %i, label %bb2, label %bb5 bb2: ; preds = %bb %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4 %i4 = load i32, ptr %i3, align 4 br label %bb5 bb5: ; preds = %bb2, %bb %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ] ret i32 %i6 } ``` Right now, we codegen this as: ``` li a3, 1 li a2, 13 bne a1, a3, .LBB0_2 lw a2, 4(a0) .LBB0_2: mv a0, a2 ret ``` In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block. Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we *would* split this critical edge. Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above. This change essentially just bypasses this emergent behavior aspect of the aarch64 behavior, and implements the same "always sink immediates" behavior for RISCV as well.
2024-11-19[MachineSink] Fix stable sort comparator (#116705)Ellis Hoag1-1/+2
Fix the comparator in `stable_sort()` to satisfy the strict weak ordering requirement. In https://github.com/llvm/llvm-project/pull/115367 this comparator was changed to use `getCycleDepth()` when `shouldOptimizeForSize()` is true. However, I mistakenly changed to logic so that we use `LHSFreq < RHSFreq` if **either** of them are zero. This causes us to fail the last requirment (https://en.cppreference.com/w/cpp/named_req/Compare). > if comp(a, b) == true and comp(b, c) == true then comp(a, c) == true
2024-11-14[NFC][CodeGen] Clang format MachineSink.cpp (#114027)Akshat Oke1-201/+208
Preparing to port this pass to new pass manager.
2024-11-12[CodeGen] Remove unused includes (NFC) (#115996)Kazu Hirata1-3/+0
Identified with misc-include-cleaner.
2024-11-12[MachineSink] Sink into consistent blocks for optsize funcs (#115367)Ellis Hoag1-4/+10
Do not consider profile data when choosing a successor block to sink into for optsize functions. This should result in more consistent instruction sequences which will improve outlining and ICF. We've observed a slight codesize improvement in a large binary. This is similar reasoning to https://github.com/llvm/llvm-project/pull/114607. Using profile data to select a block to sink into was original added in https://github.com/llvm/llvm-project/commit/d04f7596e79d7c5cf7e4249ad62690afaecd01ec.
2024-09-25[MachineSink] Update register dependency correctly (#109763)Ruiling, Song1-2/+3
The accumulateUsedDefed() was missing if block prologue interference check does not pass. This would cause incorrect register dependency, which cause incorrect sinking.
2024-08-29[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)Stephen Tozer1-1/+1
This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-22[CodeGen] Construct SmallVector with iterator ranges (NFC) (#105622)Kazu Hirata1-2/+1
2024-07-26[CodeGen] Remove AA parameter of isSafeToMove (#100691)Pengcheng Wang1-4/+4
This `AA` parameter is not used and for most uses they just pass a nullptr. The use of `AA` was removed since 8d0383e.
2024-07-17[MachineSink][RISCV] Only call isConstantPhysReg or isIgnorableUse for uses. ↵Craig Topper1-1/+1
(#99363) The included test case contains X0 as a def register. X0 is considered a constant register when it is a use. When its a def, it means to throw away the result value. If we treat it as a constant register here, we will execute the continue and not assign `DefReg` to any register. This will cause a crash when trying to get the register class for `DefReg` after the loop. By only checking isConstantPhysReg for uses, we will reach the `return false` a little further down and stop processing this instruction.
2024-07-13[MachineSink] Check predecessor/successor relationship between two basic ↵yozhu1-1/+1
blocks involved in critical edge splitting (#98540) Fix an issue in #97618 - if the two basic blocks involved are not predecessor / successor to each other, treat the candidate as illegal for critical edge splitting. Closes #98477 (checked in test copied from its comment).
2024-07-12[CodeGen][NewPM] Port `machine-block-freq` to new pass manager (#98317)paperchalice1-2/+4
- Add `MachineBlockFrequencyAnalysis`. - Add `MachineBlockFrequencyPrinterPass`. - Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager. - `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new pass manager migration.
2024-07-11Revert "[MachineSink] Only add sink candidate if ToBB is a successor of fromBB"YongKang Zhu1-1/+1
This reverts commit 546c09018a615388a36bdf898649fffbd2df529f.
2024-07-11[MachineSink] Only add sink candidate if ToBB is a successor of fromBBYongKang Zhu1-1/+1
2024-07-09[MachineSink] Fix missing sinks along critical edges (#97618)Min-Yih Hsu1-15/+66
4e0bd3f improved early MachineLICM's capabilities to hoist COPY from physical registers out of a loop. However, it accidentally broke one of MachineSink's preconditions on sinking cheap instructions (in this case, COPY) which considered those instructions being profitable to sink only when there are at least two of them in the same def-use chain in the same basic block. So if early MachineLICM hoisted one of them out, MachineSink no longer sink rest of the cheap instructions. This results in redundant load immediate instructions from the motivating example we've seen on RISC-V. This patch fixes this by teaching MachineSink that if there is more than one demand to sink a register into the same block from different critical edges, it should be considered profitable as it increases the CSE opportunities. This change also improves two of the AArch64's cases.
2024-07-09[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793)paperchalice1-1/+1
- Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-06-28Reapply "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" ↵paperchalice1-3/+3
(#96858) (#96869) This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156. In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR when targeting PowerPC. Move test to `llc/new-pm`, which is X86 specific.
2024-06-27Revert "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858)paperchalice1-3/+3
Reverts llvm/llvm-project#96389 Some ppc bots failed.
2024-06-27[CodeGen][NewPM] Port machine-branch-prob to new pass manager (#96389)paperchalice1-3/+3
Like IR version `print<branch-prob>`, there is also a `print<machine-branch-prob>`.
2024-06-15[MachineSink] Use SmallDenseMap (NFC) (#95676)Kazu Hirata1-1/+1
The use of SmallDenseMap saves 0.39% of heap allocations during the compilation of a large preprocessed file, namely X86ISelLowering.cpp, for the X86 target.
2024-06-12[CodeGen][NewPM] Split `MachinePostDominators` into a concrete analysis ↵paperchalice1-2/+2
result (#95113) `MachinePostDominators` version of #94571.
2024-06-11[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis ↵paperchalice1-3/+3
result (#94571) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.
2024-04-24[CodeGen] Make the parameter TRI required in some functions. (#85968)Xu Zhang1-1/+1
Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-02-15[CodeGen] Simplify updateLiveIn in MachineSink (#79831)Jay Foad1-7/+2
When a whole register is added a basic block's liveins, use LaneBitmask::getAll for the live lanes instead of trying to calculate an accurate mask of the lanes that comprise the register. This simplifies the code and matches other places where a whole register is marked as livein. This also avoids problems when regunits that are synthesized by TableGen to represent ad hoc aliasing have a lane mask of 0. Fixes #78942
2023-12-13[MachineSink] Clear kill flags of sunk addressing mode registers (#75072)Momchil Velikov1-5/+16
When doing sink-and-fold, the MachineSink clears the "killed" flags of the operands of the sunk (and deleted) instruction. However, this is not always sufficient. In some cases we can create the new load/store instruction with operands other than the ones present in the deleted instruction. One such example is folding a zero word extend into a memory load on AArch64. The zero-extend is represented by a pair of instructions - `MOV` (i.e. `ORRwrs`) followed by a `SUBREG_TO_REG`. The `SUBREG_TO_REG` is deleted (it is the sunk instruction), but the new load instruction mentions operands "killed" in the `MOV`, which is no longer correct. To fix this, clear the "killed" flags of the registers participating in the addressing mode.
2023-11-24 [MachineSink] Some more preserving of debug location when rematerialising ↵Momchil Velikov1-1/+3
an instruction to replace a COPY (#73155) Somewhat similar to ef9bcace834e63f25bbbc5e8e2b615f89d85fb2f ([MachineSink][AArch64] Preserve debug location when rematerialising an instruction to replace a COPY (#72685)) reuse the debug location of the COPY, iff the rematerialised instruction did not have a location. Fixes a regression in `DebugInfo/AArch64/constant-dbgloc.ll` after enabling sink-and-fold.
2023-11-21[MachineSink][AArch64] Preserve debug location when rematerialising an ↵Momchil Velikov1-3/+1
instruction to replace a COPY (#72685) Fixes a regression in `tools/lldb-dap/optimized/TestDAP_optimized.py` caused by enabling "sink-and-fold" in MachineSink.
2023-11-11[MachineSink] Drop debug info for instructions deleted by sink-and-fold (#71443)Momchil Velikov1-19/+12
After performing sink-and-fold over a COPY, the original instruction is replaced with one that produces its output in the destination of the copy. Its value is still available (in a hard register), so if there are debug instructions which refer to the (now deleted) virtual register they could be updated to refer to the hard register, in principle. However, it's not clear how to do that, moreover in some cases the debug instructions may need to be replicated proportionally to the number of the COPY instructions replaced and in some extreme cases we can end up with quadratic increase in the number of debug instructions, e.g: int f(int); void g(int x) { int y = x + 1; int t0 = y; f(t0); int t1 = y; f(t1); }
2023-10-12[MachineSink] Reduce the number of unnecessary invalidations of ↵Momchil Velikov1-2/+3
StoreInstrCache (NFC) (#68676) Don't invalidate the cache when erasing instructions which cannot ever appear in the cache.
2023-10-12[MachineSink] Use LLVM ADTs (NFC) (#68677)Momchil Velikov1-10/+10
Replace a few uses of `std::map` with `llvm::DenseMap`.
2023-10-06[MachineSink] Fix crash due to use-after-free in a MachineInstr* cache.Amara Emerson1-0/+2
After the SinkAndFold optimization was enabled, we saw some crashes with GISel due to SinkAndFold erasing an MI while a reference was being held in a cache.
2023-10-06AMDGPU: Fix temporal divergence introduced by machine-sink (#67456)Petar Avramovic1-0/+4
Temporal divergence that was present in input or introduced in IR transforms, like code-sinking or LICM, is handled in SIFixSGPRCopies by changing sgpr source instr to vgpr instr. After 5b657f5, that moved LICM after AMDGPUCodeGenPrepare, machine-sinking can introduce temporal divergence by sinking instructions outside of the cycle. Add isSafeToSink callback in TargetInstrInfo.
2023-10-06Revert "MachineSink: Fix sinking VGPR def out of a divergent loop"Petar Avramovic1-11/+4
This reverts commit 3f8ef57bede94445b1a1042c987cc914a886e7ff.
2023-10-04[AArch64] Fix an incorrect handling of debug values in MachineSink (#68107)Momchil Velikov1-1/+4