aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/PeepholeOptimizer.cpp
AgeCommit message (Collapse)AuthorFilesLines
2014-10-15Avoid caching the MachineFunction, we don't use it outside ofEric Christopher1-9/+7
runOnMachineFunction. llvm-svn: 219847
2014-10-14[AAarch64] Optimize CSINC-branch sequenceGerolf Hoflehner1-0/+12
Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742
2014-10-14Instead of the TargetMachine cache the MachineFunctionEric Christopher1-14/+13
and TargetRegisterInfo in the peephole optimizer. This makes it easier to grab subtarget dependent variables off of the MachineFunction rather than the TargetMachine. llvm-svn: 219669
2014-08-21[PeepholeOptimizer] Enable the advanced copy optimization by default.Quentin Colombet1-1/+1
The advanced copy optimization does not yield any difference on the whole llvm test-suite + SPECs, either in compile time or runtime (binaries are identical), but has a big potential when data go back and forth between register files as demonstrated with test/CodeGen/ARM/adv-copy-opt.ll. Note: This was measured for both Os and O3 for armv7s, arm64, and x86_64. <rdar://problem/12702965> llvm-svn: 216236
2014-08-21[PeepholeOptimizer] Update the kill flags when extending the live-range of theQuentin Colombet1-1/+5
source of a copy. <rdar://problem/12702965> llvm-svn: 216229
2014-08-21[PeepholeOptimizer] Take advantage of the isInsertSubreg property in theQuentin Colombet1-32/+15
advanced copy optimization. This is the final step patch toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov.32 d16[0], r0 vmov.32 d16[1], r1 and is able to rewrite the following sequence: vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 into simple generic GPR copies that the coalescer managed to remove. <rdar://problem/12702965> llvm-svn: 216144
2014-08-20[PeepholeOptimizer] Take advantage of the isExtractSubreg property in theQuentin Colombet1-24/+12
advanced copy optimization. This patch is a step toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov r0, r1, d16 but it does not understand yet vmov.32 d16[0], r0 vmov.32 d16[1], r1 Comming patches will fix that and update the related test case. <rdar://problem/12702965> llvm-svn: 216136
2014-08-20[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage ofQuentin Colombet1-169/+607
the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088
2014-08-11PeepholeOptimizer: make parameter ref to SmallPtrSetImplHans Wennborg1-2/+2
This makes the function type independent of the in-line size of LocalMIs. llvm-svn: 215356
2014-08-11Re-commit "Increase the size of this SmallVector in PeepholeOptimizer." ↵Hans Wennborg1-3/+3
(r215340) This time, also update the function that receives a reference to the SmallPtrSet as a parameter. llvm-svn: 215342
2014-08-11Revert "Increase the size of this SmallVector in PeepholeOptimizer." (r215340)Hans Wennborg1-1/+1
That broke the build: /data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:729:46: error: non-const lvalue reference to type 'SmallPtrSet<[...], 8>' cannot bind to a value of unrelated type 'SmallPtrSet<[...], 16>' Changed |= optimizeExtInstr(MI, MBB, LocalMIs); ^~~~~~~~ /data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:265:49: note: passing argument to parameter 'LocalMIs' here SmallPtrSet<MachineInstr*, 8> &LocalMIs) { ^ llvm-svn: 215341
2014-08-11Increase the size of this SmallVector in PeepholeOptimizer.Hans Wennborg1-1/+1
During a Clang build, the median size of this was 9 llvm-svn: 215340
2014-08-04Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher1-5/+8
information and update all callers. No functional change. llvm-svn: 214781
2014-07-01[PeepholeOptimzer] Fix a typo in a comment.Quentin Colombet1-1/+1
Spotted by Amara Emerson. llvm-svn: 212106
2014-07-01[PeepholeOptimizer] Advanced rewriting of copies to avoid cross register banksQuentin Colombet1-13/+368
copies. This patch extends the peephole optimization introduced in r190713 to produce register-coalescer friendly copies when possible. This extension taught the existing cross-bank copy optimization how to deal with the instructions that generate cross-bank copies, i.e., insert_subreg, extract_subreg, reg_sequence, and subreg_to_reg. E.g. b = insert_subreg e, A, sub0 <-- cross-bank copy ... C = copy b.sub0 <-- cross-bank copy Would produce the following code: b = insert_subreg e, A, sub0 <-- cross-bank copy ... C = copy A <-- same-bank copy This patch also introduces a new helper class for that: ValueTracker. This class implements the logic to look through the copy related instructions and get the related source. For now, the advanced rewriting is disabled by default as we are lacking the semantic on target specific instructions to catch the motivating examples. Related to <rdar://problem/12702965>. llvm-svn: 212100
2014-04-22[Modules] Remove potential ODR violations by sinking the DEBUG_TYPEChandler Carruth1-1/+2
define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
2014-04-14[C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper1-6/+6
instead of comparing to nullptr. llvm-svn: 206142
2014-04-03[CodeGen] Fix peephole optimizer bug introduced in r205481. Fixes PR19318.Lang Hames1-9/+11
I should have read that comment a little more carefully. ;) Regression test in the works, committing in the mean time to un-break people. llvm-svn: 205511
2014-04-02[CodeGen] Teach the peephole optimizer to remember (and exploit) all foldingLang Hames1-35/+44
opportunities in the current basic block, rather than just the last one seen. <rdar://problem/16478629> llvm-svn: 205481
2014-03-31Disable each MachineFunctionPass for 'optnone' functions, unless thatPaul Robinson1-0/+3
pass normally runs at optimization level None, or is part of the register allocation pipeline. llvm-svn: 205228
2014-03-17Switch a number of loops in lib/CodeGen over to range-based for-loops, now thatOwen Anderson1-13/+6
the MachineRegisterInfo iterators are compatible with it. llvm-svn: 204075
2014-03-13Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changingOwen Anderson1-7/+7
operator* on the by-operand iterators to return a MachineOperand& rather than a MachineInstr&. At this point they almost behave like normal iterators! Again, this requires making some existing loops more verbose, but should pave the way for the big range-based for-loop cleanups in the future. llvm-svn: 203865
2014-03-13Fix for http://llvm.org/bugs/show_bug.cgi?id=18590Ekaterina Romanova1-3/+11
This patch fixes the bug in peephole optimization that folds a load which defines one vreg into the one and only use of that vreg. With debug info, a DBG_VALUE that referenced the vreg considered to be a use, preventing the optimization. The fix is to ignore DBG_VALUE's during the optimization, and undef a DBG_VALUE that references a vreg that gets removed. Patch by Trevor Smigiel! llvm-svn: 203829
2014-03-07[C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper1-2/+2
class. llvm-svn: 203220
2014-03-07Replace PROLOG_LABEL with a new CFI_INSTRUCTION.Rafael Espindola1-1/+1
The old system was fairly convoluted: * A temporary label was created. * A single PROLOG_LABEL was created with it. * A few MCCFIInstructions were created with the same label. The semantics were that the cfi instructions were mapped to the PROLOG_LABEL via the temporary label. The output position was that of the PROLOG_LABEL. The temporary label itself was used only for doing the mapping. The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to one by holding an index into the CFI instructions of this function. I did consider removing MMI.getFrameInstructions completelly and having CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non trivial constructors and destructors and are somewhat big, so the this setup is probably better. The net result is that we don't create temporary labels that are never used. llvm-svn: 203204
2013-09-13[Peephole] Rewrite copies to avoid cross register banks copies.Quentin Colombet1-84/+166
By definition copies across register banks are not coalescable. Still, it may be possible to get rid of such a copy when the value is available in another register of the same register file. Consider the following example, where capital and lower letters denote different register file: b = copy A <-- cross-bank copy ... C = copy b <-- cross-bank copy This could have been optimized this way: b = copy A <-- cross-bank copy ... C = copy A <-- same-bank copy Note: b and C's definitions may be in different basic blocks. This patch adds a peephole optimization that looks through a chain of copies leading to a cross-bank copy and reuses a source that is on the same register file if available. This solution could also be used to get rid of some copies (e.g., A could have been used instead of C). However, we do not do so because: - It may over constrain the coloring of the source register for coalescing. - The register allocator may not be able to find a nice split point for the longer live-range, leading to more spill. <rdar://problem/14742333> llvm-svn: 190713
2012-12-17Add debug prints for when optimizeLoadInstr folds a load.Craig Topper1-0/+6
llvm-svn: 170298
2012-12-11Add comment for load foldingJoel Jones1-0/+5
llvm-svn: 169880
2012-12-03Use the new script to sort the includes of every file under lib.Chandler Carruth1-5/+5
Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131
2012-10-15Make sure we iterate over newly created instructions. Fixes pr13625. Testcase toRafael Espindola1-0/+5
follow in one sec. llvm-svn: 165951
2012-08-17Use standard pattern for iterate+erase.Jakob Stoklund Olesen1-9/+2
Increment the MBB iterator at the top of the loop to properly handle the current (and previous) instructions getting erased. This fixes PR13625. llvm-svn: 162099
2012-08-16Add an MCID::Select flag and TII hooks for optimizing selects.Jakob Stoklund Olesen1-16/+27
Select instructions pick one of two virtual registers based on a condition, like x86 cmov. On targets like ARM that support predication, selects can sometimes be eliminated by predicating the instruction defining one of the operands. Teach PeepholeOptimizer to recognize select instructions, and ask the target to optimize them. llvm-svn: 162059
2012-08-02X86 Peephole: fold loads to the source register operand if possible.Manman Ren1-14/+15
Add more comments and use early returns to reduce nesting in isLoadFoldable. Also disable folding for V_SET0 to avoid introducing a const pool entry and a const pool load. rdar://10554090 and rdar://11873276 llvm-svn: 161207
2012-08-02X86 Peephole: fold loads to the source register operand if possible.Manman Ren1-0/+57
Machine CSE and other optimizations can remove instructions so folding is possible at peephole while not possible at ISel. This patch is a rework of r160919 and was tested on clang self-host on my local machine. rdar://10554090 and rdar://11873276 llvm-svn: 161152
2012-07-29Revert r160920 and r160919 due to dragonegg and clang selfhost failureManman Ren1-22/+0
llvm-svn: 160927
2012-07-28X86 Peephole: fold loads to the source register operand if possible.Manman Ren1-0/+22
Machine CSE and other optimizations can remove instructions so folding is possible at peephole while not possible at ISel. rdar://10554090 and rdar://11873276 llvm-svn: 160919
2012-06-29Add SrcReg2 to analyzeCompare and optimizeCompareInstr to handle CompareManman Ren1-4/+5
instructions with two register operands. llvm-svn: 159465
2012-06-19Implement PPCInstrInfo::isCoalescableExtInstr().Jakob Stoklund Olesen1-3/+19
The PPC::EXTSW instruction preserves the low 32 bits of its input, just like some of the x86 instructions. Use it to reduce register pressure when the low 32 bits have multiple uses. This requires a small change to PeepholeOptimizer since EXTSW takes a 64-bit input register. This is related to PR5997. llvm-svn: 158743
2012-06-19Style: Don't reuse variables for multiple purposes.Jakob Stoklund Olesen1-8/+7
No functional change. llvm-svn: 158742
2012-06-06Revert r157755.Manman Ren1-1/+0
The commit is intended to fix rdar://11540023. It is implemented as part of peephole optimization. We can actually implement this in the SelectionDAG lowering phase. llvm-svn: 158122
2012-05-31X86: replace SUB with CMP if possibleManman Ren1-0/+1
This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 157755
2012-05-20Constrain regclasses in PeepholeOptimizer.Jakob Stoklund Olesen1-1/+10
It can be necessary to restrict to a sub-class before accessing sub-registers. llvm-svn: 157164
2012-05-11ARM: peephole optimization to remove cmp instructionManman Ren1-0/+9
This patch will optimize the following cases: sub r1, r3 | sub r1, imm cmp r3, r1 or cmp r1, r3 | cmp r1, imm bge L1 TO subs r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can replace "sub" with "subs" and eliminate the "cmp" instruction. rdar: 10734411 llvm-svn: 156599
2012-05-10Revert: 156550 "ARM: peephole optimization to remove cmp instruction"Manman Ren1-9/+0
This commit broke an external linux bot and gave a compile-time warning. llvm-svn: 156556
2012-05-10ARM: peephole optimization to remove cmp instructionManman Ren1-0/+9
This patch will optimize the following cases: sub r1, r3 | sub r1, imm cmp r3, r1 or cmp r1, r3 | cmp r1, imm bge L1 TO subs r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can replace "sub" with "subs" and eliminate the "cmp" instruction. rdar: 10734411 llvm-svn: 156550
2012-05-01Tidy up. Naming conventions.Jim Grosbach1-16/+16
llvm-svn: 155960
2012-02-25Make the peephole optimizer clear kill flags on a vreg if it's about to add newLang Hames1-0/+4
uses of the vreg, since the old kills may no longer be valid. This was causing -verify-machineinstrs to complain about uses after kills, and could potentially have been causing subtle register allocation issues, but I haven't come across a test case yet. llvm-svn: 151425
2012-02-25Fixed typo.Lang Hames1-1/+1
llvm-svn: 151417
2012-02-08Codegen pass definition cleanup. No functionality.Andrew Trick1-4/+1
Moving toward a uniform style of pass definition to allow easier target configuration. Globally declare Pass ID. Globally declare pass initializer. Use INITIALIZE_PASS consistently. Add a call to the initializer from CodeGen.cpp. Remove redundant "createPass" functions and "getPassName" methods. While cleaning up declarations, cleaned up comments (sorry for large diff). llvm-svn: 150100
2012-02-08whitespaceAndrew Trick1-7/+7
llvm-svn: 150094