aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/PeepholeOptimizer.cpp
AgeCommit message (Collapse)AuthorFilesLines
7 daysPeepholeOpt: Fix losing subregister indexes on full copies (#161310)Matt Arsenault1-1/+21
Previously if we had a subregister extract reading from a full copy, the no-subregister incoming copy would overwrite the DefSubReg index of the folding context. There's one ugly rvv regression, but it's a downstream issue of this; an unnecessary same class reg-to-reg full copy was avoided.
9 daysPeepholeOpt: Try to constrain uses to support subregister (#161338)Matt Arsenault1-0/+24
This allows removing a special case hack in ARM. ARM's implementation of getExtractSubregLikeInputs has the strange property that it reports a register with a class that does not support the reported subregister index. We can however reconstrain the register to support this usage. This is an alternative to #159600. I've included the test, but the output is different. In this case version the VMOVSR is replaced with an ordinary subregister extract copy.
13 daysPeepholeOpt: Use initializer list (#160898)Matt Arsenault1-2/+1
2025-05-24[CodeGen] Remove unused includes (NFC) (#141320)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-22[LLVM][CodeGen] Add convenience accessors for MachineFunctionProperties ↵users/pcc/spr/main.elf-add-branch-to-branch-optimizationRahul Joshi1-2/+1
(#140002) Add per-property has<Prop>/set<Prop>/reset<Prop> functions to MachineFunctionProperties.
2025-05-02[llvm] Remove redundant control flow (NFC) (#138304)Kazu Hirata1-1/+1
2025-03-13PeepholeOpt: Do not skip reg_sequence sources with subregs (#125667)Matt Arsenault1-3/+1
Contrary to the comment, this particular code is not responsible for handling any composes that may be required, and unhandled cases are already rejected later. Lift this restriction to permit composes and reg_sequence subregisters later.
2025-03-07PeepholeOpt: Remove subreg def check for bitcast (#130086)Matt Arsenault1-5/+4
Subregister defs are illegal in SSA. Surprisingly this enables folding into subregister insert patterns in one test.
2025-03-07PeepholeOpt: Remove subreg def check for insert_subreg (#130085)Matt Arsenault1-6/+1
2025-03-07PeepholeOpt: Remove dead checks for subregister def mismatch (#130084)Matt Arsenault1-4/+1
2025-02-26PeepholeOpt: Remove pointless check for subregister def (#128850)Matt Arsenault1-5/+0
Subregister defs are illegal in SSA
2025-02-26PeepholeOpt: Immediately check if a reg_sequence compose supports a ↵Matt Arsenault1-4/+11
subregister (#128279) This is a quick fix for EXPENSIVE_CHECKS bot failures. I still think we could defer looking for a compatible subregister further up the use-def chain, and should be able to check compatibilty with the ultimate found source.
2025-02-22PeepholeOpt: Allow introducing subregister uses on reg_sequence (#127052)Matt Arsenault1-6/+0
This reverts d246cc618adc52fdbd69d44a2a375c8af97b6106. We now handle composing subregister extracts through reg_sequence.
2025-02-18PeepholeOpt: Handle subregister compose when looking through reg_sequence ↵Matt Arsenault1-1/+32
(#127051) Previously this would give up on folding subregister copies through a reg_sequence if the input operand already had a subregister index. d246cc618adc52fdbd69d44a2a375c8af97b6106 stopped introducing these subregister uses, and this is the first step to lifting that restriction. I was expecting to be able to implement this only purely with compose / reverse compose, but I wasn't able to make it work so relies on testing the lanemasks for whether the copy reads a subset of the input.
2025-02-05PeepholeOpt: Fix looking for def of current copy to coalesce (#125533)Matt Arsenault1-14/+32
This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.
2025-02-03PeepholeOpt: Make copy ID methods staticMatt Arsenault1-2/+2
2025-01-31PeepholeOpt: Fix copy current source index accounting bugMatt Arsenault1-2/+2
We were essentially using the current source index as a binary value, and didn't actually use it for indexing so it did not matter. Use the operand to ensure the value is actually correct.
2025-01-30PeepholeOpt: Avoid double map lookup (#124531)Matt Arsenault1-3/+5
2025-01-30PeepholeOpt: Remove check for reg_sequence def of subregister (#124512)Matt Arsenault1-16/+1
The verifier does not allow reg_sequence to have subregister defs, even if undef.
2025-01-30PeepholeOpt: Simplify tracking of current op for copy and reg_sequence (#124224)Matt Arsenault1-23/+8
Set the starting index in the constructor instead of treating 0 as a special case. There should also be no need for bounds checking in the rewrite.
2025-01-30PeepholeOpt: Do not add subregister indexes to reg_sequence operands (#124111)Matt Arsenault1-0/+6
Given the rest of the pass just gives up when it needs to compose subregisters, folding a subregister extract directly into a reg_sequence is counterproductive. Later fold attempts in the function will give up on the subregister operand, preventing looking up through the reg_sequence. It may still be profitable to do these folds if we start handling the composes. There are some test regressions, but this mostly looks better.
2025-01-23PeepholeOpt: Remove check for subreg index on a def operand (#123943)Matt Arsenault1-2/+2
This is looking at operand 0 of a REG_SEQUENCE, which can never have a subregister index.
2025-01-23PeepholeOpt: Stop allocating tiny helper classes (NFC) (#123936)Matt Arsenault1-347/+338
This was allocating tiny helper classes for every instruction visited. We can just dispatch over the cases in the visitor function instead.
2025-01-23PeepholeOpt: Remove null TargetRegisterInfo check (#123933)Matt Arsenault1-3/+3
This cannot happen. Also simplify the LaneBitmask check from !none to any.
2025-01-23PeepholeOpt: Remove unnecessary check for null TargetInstrInfo (#123929)Matt Arsenault1-15/+0
This can never happen.
2025-01-13[aarch64][win] Update Called Globals info when updating Call Site info (#122762)Daniel Paoliello1-3/+3
Fixes the "use after poison" issue introduced by #121516 (see <https://github.com/llvm/llvm-project/pull/121516#issuecomment-2585912395>). The root cause of this issue is that #121516 introduced "Called Global" information for call instructions modeling how "Call Site" info is stored in the machine function, HOWEVER it didn't copy the copy/move/erase operations for call site information. The fix is to rename and update the existing copy/move/erase functions so they also take care of Called Global info.
2024-11-18[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326)Akshat Oke1-35/+68
With this, all machine SSA optimization passes are available in the new codegen pipeline.
2024-11-18[NFC] Clang format PeepholeOptimizer (#116325)Akshat Oke1-333/+320
2024-07-09[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793)paperchalice1-4/+4
- Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-06-26[CodeGen] Use range-based for loops (NFC) (#96777)Kazu Hirata1-2/+1
2024-06-11[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis ↵paperchalice1-4/+5
result (#94571) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.
2024-04-24[CodeGen] Make the parameter TRI required in some functions. (#85968)Xu Zhang1-1/+1
Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
2024-01-26[NFC] Rename TargetInstrInfo::FoldImmediate to ↵Shengchen Kan1-2/+2
TargetInstrInfo::foldImmediate and simplify implementation for X86
2023-10-27[X86, Peephole] Enable FoldImmediate for X86Guozhi Wei1-31/+92
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate. Also enhanced peephole by deleting identical instructions after FoldImmediate. Differential Revision: https://reviews.llvm.org/D151848
2023-10-24Revert 24633ea and 760e7d0 "Enable FoldImmediate for X86"Mogball1-46/+16
This reverts commits 24633eac38d46cd4b253ba53258165ee08d886cd and 760e7d00d142ba85fcf48c00e0acc14a355da7c3. I have confirmed that these commits are introducing a new crash in the peephole optimizer. I have minimized a test case, which you can find below. ```llvmir ; ModuleID = 'bugpoint-reduced-simplified.bc' source_filename = "/mnt/big/modular/Kernels/mojo/Mogg/MOGG.mojo" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" declare dso_local void @foo({ { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } }, { ptr }, { ptr, i64, i8 }) define dso_local void @bad_fn(ptr %0, ptr %1, ptr %2) { %4 = load i64, ptr null, align 8 %5 = insertvalue [4 x i64] poison, i64 12, 1 %6 = insertvalue [4 x i64] %5, i64 poison, 2 %7 = insertvalue [4 x i64] %6, i64 poison, 3 %8 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } poison, [4 x i64] %7, 1 %9 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %8, [4 x i64] poison, 2 %10 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %9, i1 poison, 3 %11 = icmp ne i64 %4, 1 %12 = or i1 false, %11 %13 = select i1 %12, i64 %4, i64 0 %14 = zext i1 %12 to i64 %15 = insertvalue [4 x i64] poison, i64 12, 1 %16 = insertvalue [4 x i64] %15, i64 poison, 2 %17 = insertvalue [4 x i64] %16, i64 %13, 3 %18 = insertvalue [4 x i64] poison, i64 %14, 3 %19 = icmp eq i64 0, 0 %20 = icmp eq i64 0, 0 %21 = icmp eq i64 %13, 0 %22 = and i1 %20, %19 %23 = select i1 %22, i1 %21, i1 false %24 = select i1 %23, i1 %12, i1 false %25 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } poison, [4 x i64] %17, 1 %26 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %25, [4 x i64] %18, 2 %27 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %26, i1 %24, 3 %28 = insertvalue { { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } undef, { ptr, [4 x i64], [4 x i64], i1 } %10, 0 %29 = insertvalue { { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } %28, { ptr, [4 x i64], [4 x i64], i1 } %27, 1 br label %31 30: ; preds = %3 br label %softmax_pass 31: ; preds = %31 %exitcond.not.i = icmp eq i64 poison, 3 br i1 %exitcond.not.i, label %37, label %31 32: ; preds = %31 br i1 poison, label %34, label %33 33: ; preds = %32 br label %34 34: ; preds = %33, %32 br i1 poison, label %35, label %36 35: ; preds = %34 br label %softmax_pass 36: ; preds = %34 br i1 poison, label %37, label %.critedge.i 37: ; preds = %36 br i1 poison, label %38, label %.critedge.i 38: ; preds = %37 br i1 poison, label %40, label %39 39: ; preds = %38 br label %40 40: ; preds = %39, %38 br i1 poison, label %.lr.ph28.i, label %._crit_edge.i .lr.ph28.i: ; preds = %40 br label %41 41: ; preds = %51, %.lr.ph28.i br i1 poison, label %.thread, label %42 42: ; preds = %41 br i1 poison, label %43, label %44 43: ; preds = %42 br label %45 44: ; preds = %42 br label %45 45: ; preds = %44, %43 br i1 poison, label %46, label %.thread 46: ; preds = %45 br label %47 .thread: ; preds = %45, %41 br label %47 47: ; preds = %.thread, %46 br i1 poison, label %51, label %48 48: ; preds = %47 br i1 poison, label %49, label %50 49: ; preds = %48 br label %51 50: ; preds = %48 br label %51 51: ; preds = %50, %49, %47 call void @foo({ { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } %29, { ptr } poison, { ptr, i64, i8 } poison) br i1 poison, label %._crit_edge.i, label %41 ._crit_edge.i: ; preds = %51, %40 br label %softmax_pass .critedge.i: ; preds = %37, %36 br i1 poison, label %.lr.ph.i, label %softmax_pass .lr.ph.i: ; preds = %.lr.ph.i, %.critedge.i store { ptr, [4 x i64], [4 x i64], i1 } %10, ptr poison, align 8 br i1 poison, label %.lr.ph.i, label %softmax_pass softmax_pass: ; preds = %.lr.ph.i, %.critedge.i, %._crit_edge.i, %35, %30 ret void } ```
2023-10-20[Peephole] Check instructions from CopyMIs are still COPY (#69511)weiguozhi1-1/+3
Function foldRedundantCopy records COPY instructions in CopyMIs and uses it later. But other optimizations may delete or modify it. So before using it we should check if the extracted instruction is existing and still a COPY instruction.
2023-10-17[X86, Peephole] Enable FoldImmediate for X86Guozhi Wei1-16/+44
Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate. Also enhanced peephole by deleting identical instructions after FoldImmediate. Differential Revision: https://reviews.llvm.org/D151848
2023-04-17Fix uninitialized pointer members in CodeGenAkshay Khadse1-5/+5
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148303
2023-01-13[CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFCCraig Topper1-8/+7
Use isPhysical/isVirtual methods. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D141715
2022-12-13[CodeGen] llvm::Optional => std::optionalFangrui Song1-3/+2
2022-05-16Teach PeepholeOpt to eliminate redundant copy from constant physreg (e.g ↵Philip Reames1-5/+6
VLENB on RISCV) The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers. On RISCV, this helps eliminate redundant CSR reads from VLENB. Differential Revision: https://reviews.llvm.org/D125564
2022-03-16[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated ↵Shengchen Kan1-1/+1
comments
2022-03-16Cleanup codegen includesserge-sans-paille1-1/+0
This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
2022-03-10Revert "Cleanup codegen includes"Nico Weber1-0/+1
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10Cleanup codegen includesserge-sans-paille1-1/+0
after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169
2022-02-06[CodeGen] Use = default (NFC)Kazu Hirata1-1/+1
Identified with modernize-use-equals-default
2021-08-30[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI)Nikita Popov1-1/+1
The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875
2021-06-28Teach peephole optimizer to not emit sub-register defsAhsan Saghir1-7/+22
Peephole optimizer should not be introducing sub-reg definitions as they are illegal in machine SSA phase. This patch modifies the optimizer to not emit sub-register definitions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103408
2020-12-17Make LLVM build in C++20 modeBarry Revzin1-1/+1
Part of the <=> changes in C++20 make certain patterns of writing equality operators ambiguous with themselves (sorry!). This patch goes through and adjusts all the comparison operators such that they should work in both C++17 and C++20 modes. It also makes two other small C++20-specific changes (adding a constructor to a type that cases to be an aggregate, and adding casts from u8 literals which no longer have type const char*). There were four categories of errors that this review fixes. Here are canonical examples of them, ordered from most to least common: // 1) Missing const namespace missing_const { struct A { #ifndef FIXED bool operator==(A const&); #else bool operator==(A const&) const; #endif }; bool a = A{} == A{}; // error } // 2) Type mismatch on CRTP namespace crtp_mismatch { template <typename Derived> struct Base { #ifndef FIXED bool operator==(Derived const&) const; #else // in one case changed to taking Base const& friend bool operator==(Derived const&, Derived const&); #endif }; struct D : Base<D> { }; bool b = D{} == D{}; // error } // 3) iterator/const_iterator with only mixed comparison namespace iter_const_iter { template <bool Const> struct iterator { using const_iterator = iterator<true>; iterator(); template <bool B, std::enable_if_t<(Const && !B), int> = 0> iterator(iterator<B> const&); #ifndef FIXED bool operator==(const_iterator const&) const; #else friend bool operator==(iterator const&, iterator const&); #endif }; bool c = iterator<false>{} == iterator<false>{} // error || iterator<false>{} == iterator<true>{} || iterator<true>{} == iterator<false>{} || iterator<true>{} == iterator<true>{}; } // 4) Same-type comparison but only have mixed-type operator namespace ambiguous_choice { enum Color { Red }; struct C { C(); C(Color); operator Color() const; bool operator==(Color) const; friend bool operator==(C, C); }; bool c = C{} == C{}; // error bool d = C{} == Red; } Differential revision: https://reviews.llvm.org/D78938
2020-09-24Improve 723fea23079f9c85800e5cdc90a75414af182bfd - Silence 'warning: unused ↵Alexandre Ganea1-4/+2
variable' when compiling with Clang 10.0