aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/CodeGenPrepare.cpp
AgeCommit message (Collapse)AuthorFilesLines
2021-08-17[CodeGenPrepare] The instruction to be sunk should be inserted before its ↵Tiehu Zhang1-2/+12
user in a block In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107262
2021-07-24[CGP] despeculateCountZeros - Don't create is-zero branch if cttz/ctlz ↵Simon Pilgrim1-0/+4
source is known non-zero If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from). Differential Revision: https://reviews.llvm.org/D106685
2021-07-22[WebAssembly] Implementation of global.get/set for reftypes in LLVM IRPaulo Matos1-0/+4
Reland of 31859f896. This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104797
2021-07-05[DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location opsStephen Tozer1-1/+3
This patch fixes an issue which occurred in CodeGenPrepare and HWAddressSanitizer, which both at some point create a map of Old->New instructions and update dbg.value uses of these. They did this by iterating over the dbg.value's location operands, and if an instance of the old instruction was found, replaceVariableLocationOp would be called on that dbg.value. This would cause an error if the same operand appeared multiple times as a location operand, as the first call to replaceVariableLocationOp would update all uses of the old instruction, invalidating the old iterator and eventually hitting an assertion. This has been fixed by no longer iterating over the dbg.value's location operands directly, but by first collecting them into a set and then iterating over that, ensuring that we never attempt to replace a duplicated operand multiple times. Differential Revision: https://reviews.llvm.org/D105129
2021-07-02Revert "[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR"Roman Lebedev1-4/+0
This reverts commit 4facbf213c51e4add2e8c19b08d5e58ad71c72de. ``` ******************** FAIL: LLVM :: CodeGen/WebAssembly/funcref-call.ll (44466 of 44468) ******************** TEST 'LLVM :: CodeGen/WebAssembly/funcref-call.ll' FAILED ******************** Script: -- : 'RUN: at line 1'; /builddirs/llvm-project/build-Clang12/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll --mtriple=wasm32-unknown-unknown -asm-verbose=false -mattr=+reference-types | /builddirs/llvm-project/build-Clang12/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll -- Exit Code: 2 Command Output (stderr): -- llc: /repositories/llvm-project/llvm/include/llvm/Support/LowLevelTypeImpl.h:44: static llvm::LLT llvm::LLT::scalar(unsigned int): Assertion `SizeInBits > 0 && "invalid scalar size"' failed. ```
2021-07-02[WebAssembly] Implementation of global.get/set for reftypes in LLVM IRPaulo Matos1-0/+4
Reland of 31859f896. This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Differential Revision: https://reviews.llvm.org/D104797
2021-06-23[CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor ↵Craig Topper1-6/+14
isSExtCheaperThanZExt. This optimization pre-promotes the input and constants for a switch instruction to a legal type so that all the generated compares share the same extend. Since RISCV prefers sext for i32 to i64 extends, we should honor that to use sext.w instead of a pair of shifts. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D104612
2021-06-11[NFC][OpaquePtr] Explicitly pass GEP source type in optimizeGatherScatterInst()Arthur Eubanks1-3/+17
Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103480
2021-06-10Revert "Implementation of global.get/set for reftypes in LLVM IR"David Spickett1-4/+0
This reverts commit 31859f896cf90d64904134ce7b31230f374c3fcc. Causing SVE and RISCV-V test failures on bots.
2021-06-10Implementation of global.get/set for reftypes in LLVM IRPaulo Matos1-0/+4
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D95425
2021-05-16[CPG][ARM] Optimize towards branch on zero in codegenprepareDavid Green1-0/+63
This adds a simple fold into codegenprepare that converts comparison of branches towards comparison with zero if possible. For example: %c = icmp ult %x, 8 br %c, bla, blb %tc = lshr %x, 3 becomes %tc = lshr %x, 3 %c = icmp eq %tc, 0 br %c, bla, blb As a first order approximation, this can reduce the number of instructions needed to perform the branch as the shift is (often) needed anyway. At the moment this does not effect very much, as llvm tends to prefer the opposite form. But it can protect against regressions from commits like rG9423f78240a2. Simple cases of Add and Sub are added along with Shift, equally as the comparison to zero can often be folded with cpsr flags. Differential Revision: https://reviews.llvm.org/D101778
2021-04-23[TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCostSander de Smalen1-3/+2
This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D100565
2021-04-19[DebugInfo] Move the findDbg* functions into DebugInfo.cppOCHyams1-0/+1
Move the findDbg* functions into lib/IR/DebugInfo.cpp from lib/Transforms/Utils/Local.cpp. D99169 adds a call to a function (findDbgUsers) that lives in lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp (LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp which is part of LLVMCore. Reviewed By: dblaikie, rnk Differential Revision: https://reviews.llvm.org/D100632
2021-04-07[CSSPGO] Move pseudo probes to the beginning of a block to unblock ↵Hongtao Yu1-0/+24
SelectionDAG combine. Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality. Test Plan: Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D100002
2021-04-06Use AssumeInst in a few more places [nfc]Philip Reames1-5/+3
Follow up to a6d2a8d6f5. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
2021-03-22[TargetTransformInfo] move branch probability query from TargetLoweringInfoSanjay Patel1-1/+1
This is no-functional-change intended (NFC), but needed to allow optimizer passes to use the API. See D98898 for a proposed usage by SimplifyCFG. I'm simplifying the code by removing the cl::opt. That was added back with the original commit in D19488, but I don't see any evidence in regression tests that it was used. Target-specific overrides can use the usual patterns to adjust as necessary. We could also restore that cl::opt, but it was not clear to me exactly how to do it in the convoluted TTI class structure.
2021-03-17Reapply "[DebugInfo] Handle multiple variable location operands in IR"Stephen Tozer1-34/+63
Fixed section of code that iterated through a SmallDenseMap and added instructions in each iteration, causing non-deterministic code; replaced SmallDenseMap with MapVector to prevent non-determinism. This reverts commit 01ac6d1587e8613ba4278786e8341f8b492ac941.
2021-03-17Revert "[DebugInfo] Handle multiple variable location operands in IR"Hans Wennborg1-63/+34
This caused non-deterministic compiler output; see comment on the code review. > This patch updates the various IR passes to correctly handle dbg.values with a > DIArgList location. This patch does not actually allow DIArgLists to be produced > by salvageDebugInfo, and it does not affect any pass after codegen-prepare. > Other than that, it should cover every IR pass. > > Most of the changes simply extend code that operated on a single debug value to > operate on the list of debug values in the style of any_of, all_of, for_each, > etc. Instances of setOperand(0, ...) have been replaced with with > replaceVariableLocationOp, which takes the value that is being replaced as an > additional argument. In places where this value isn't readily available, we have > to track the old value through to the point where it gets replaced. > > Differential Revision: https://reviews.llvm.org/D88232 This reverts commit df69c69427dea7f5b3b3a4d4564bc77b0926ec88.
2021-03-13Restore fixed version of "[CodeGenPrepare] Fix isIVIncrement (PR49466)"Philip Reames1-1/+2
Change was reverted in commit 8d20f2c2c66eb486ff23cc3d55a53bd840b36971 because it was causing an infinite loop. 9228f2f32 fixed the root issue in the code structure, this change just reapplies the original change w/adaptation to the new code structure.
2021-03-13[CGP] Consolidate logic for getIVIncrement and isIVIncrementPhilip Reames1-16/+38
This fixes the bug demonstrated by the test case in the commit message of 8d20f2c2 (which was a revert of cf82700). The root issue was that we have two transforms which are inverses of each other. We use one for simple induction variables (where we can use the post-inc form), and the other for everything else. The problem was that the two transforms could disagree about whether something was an induction variable. The reverted commit made a change to one of the matcher routines which was used for one of the two transforms without updating the other matcher. However, it's worth noting the existing code w/o the reverted change also has cases where the decision could differ between the two paths. The fix is simply to consolidate the code such that two paths must agree by construction, and to add an assert to catch any potential future re-divergence. Triggering the infinite loop requires side stepping the SunkAddrs cache. The SunkAddrs cache has the effect of suppressing the iteration in the common case, but there are codepaths through CGP which restart iteration and clear this cache. Unfortunately, I have not been able to construct a standalone IR test case for this. The original test case is a c++ program which when compiled by clang demonstrates the infinite loop, but all of my attempts at extracting an IR test case runnable through opt/llc have failed to reproduce. (Including capturing the IR at point of the transform itself!) I have no idea what weird state clang is creating here. I also tried creating a test case by hand, but gave up after about an hour of trying to find the right combination to dance through multiple transforms to create the end result needed to trip the bug.
2021-03-12Revert "[CodeGenPrepare] Fix isIVIncrement (PR49466)"Jordan Rupprecht1-2/+1
This reverts commit cf82700af8c658ae09b14c3d01bb1e73e48d3bd3 due to a compile timeout when building the following with `clang -O2`: ``` template <class, class = int> class a; struct b { using d = int *; }; struct e { using f = b::d; }; class g { public: e::f h; e::f i; }; template <class, class> class a : g { public: long j() const { return i - h; } long operator[](long) const noexcept; }; template <class c, class k> long a<c, k>::operator[](long l) const noexcept { return h[l]; } template <typename m, typename n> int fn1(m, n, const char *); int o, p; class D { void q(const a<long> &); long r; }; void D::q(const a<long> &l) { int s; if (l[0]) for (; l.j(); ++s) { if (l[s]) while (fn1(o, 0, "")) ; r = l[s] / p; } } ```
2021-03-12[OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)Nikita Popov1-2/+5
This removes some (but not all) uses of type-less CreateGEP() and CreateInBoundsGEP() APIs, which are incompatible with opaque pointers. There are a still a number of tricky uses left, as well as many more variation APIs for CreateGEP.
2021-03-09[cgp] improve robustness of uadd/usub transformsPhilip Reames1-10/+11
LSR prefers to schedule iv increments just before the latch. The recent 80511565 broadened this to moving increments in the original IR. This pointed out a robustness problem with the CGP transform. When we have a use of an induction increment outside of the loop (we canonicalize away from this form, but it happens e.g. unanalyzeable loops) we'd avoid performing the uadd/usub transform. Interestingly, all of these involve moving the increment closer to it's operands, so there's no concern about dominating all uses. We can handle that case cheaply, resulting in a more robust transform.
2021-03-09[cgp] group related code together [nfc]Philip Reames1-3/+4
2021-03-09[DebugInfo] Handle multiple variable location operands in IRgbtozers1-34/+63
This patch updates the various IR passes to correctly handle dbg.values with a DIArgList location. This patch does not actually allow DIArgLists to be produced by salvageDebugInfo, and it does not affect any pass after codegen-prepare. Other than that, it should cover every IR pass. Most of the changes simply extend code that operated on a single debug value to operate on the list of debug values in the style of any_of, all_of, for_each, etc. Instances of setOperand(0, ...) have been replaced with with replaceVariableLocationOp, which takes the value that is being replaced as an additional argument. In places where this value isn't readily available, we have to track the old value through to the point where it gets replaced. Differential Revision: https://reviews.llvm.org/D88232
2021-03-09[CodeGenPrepare] Fix isIVIncrement (PR49466)Ta-Wei Tu1-1/+2
In the NFC commit 8d835f42a57f15c0b9053bd7c41ea95821a40e5f, the check for `!L` is moved to a separate function `getIVIncrement` which, instead of using `BO->getParent()`, uses `PN->getParent()`. However, these two basic blocks are not necessarily the same. https://bugs.llvm.org/show_bug.cgi?id=49466 demonstrates a case where `PN` is contained in a loop while `BO` is not, causing the null-pointer dereference in `L->getLoopLatch()`. This patch checks whether both `BO` and `PN` belong to the same loop before entering `getIVIncrement`. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D98144
2021-03-08[DebugInfo] Support DIArgList in DbgVariableIntrinsicgbtozers1-8/+4
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the location operand, resulting in a significant change to its interface. This patch does not update all IR passes to support multiple location operands in a dbg.value; the only change is to update the DbgVariableIntrinsic interface and its uses. All code outside of the intrinsic classes assumes that an intrinsic will always have exactly one location operand; they will still support DIArgLists, but only if they contain exactly one Value. Among other changes, the setOperand and setArgOperand functions in DbgVariableIntrinsic have been made private. This is to prevent code from setting the operands of these intrinsics directly, which could easily result in incorrect/invalid operands being set. This does not prevent these functions from being called on a debug intrinsic at all, as they can still be called on any CallInst pointer; it is assumed that any code directly setting the operands on a generic call instruction is doing so safely. The intention for making these functions private is to prevent DIArgLists from being overwritten by code that's naively trying to replace one of the Values it points to, and also to fail fast if a DbgVariableIntrinsic is updated to use a DIArgList without a valid corresponding DIExpression.
2021-03-04[cgp] Defer lazy domtree usage to last possible pointPhilip Reames1-2/+5
This is a compile time optimization for d9e93e8e5. Not sure this matters or not, but why not do it just in case. This does involve querying TLI with a potentially invalid addressing mode for the using instruction, but since we don't actually pass the using instruction to the TLI callback, that should be fine.
2021-03-04[CGP] Lazily compute domtree only when needed during address matchingPhilip Reames1-12/+20
This is a compile time optimization for d9e93e8e5. As pointed out in post dommit review on the original review (D96399), there was a moderately large compile time regression with this patch and the eager computation of domtree on matcher construction is the first obvious candidate for why.
2021-03-04[CodeGenPrepare] Eliminate llvm.expect before removing empty blocksJann Horn1-12/+30
CodeGenPrepare currently first removes empty blocks, then in a loop performs other optimizations. One of those optimizations is the removal of call instructions that invoke @llvm.assume, which can create new empty blocks. This means that when a branch only contains a call to __builtin_assume(), the empty branch will survive into MIR, and will then only be half-removed by MIR-level optimizations (e.g. removing the branch but leaving the condition intact). Fix it by eliminating @llvm.expect builtin calls before removing empty blocks. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D97848
2021-03-04[X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding ↵Max Kazantsev1-2/+4
the offset, part 2 This patch enables the case where we do not completely eliminate offset. Supposedly in this case we reduce live range overlap that never harms, but since there are doubts this is true, this goes as a separate change. Differential Revision: https://reviews.llvm.org/D96399 Reviewed By: reames
2021-03-04[X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding ↵Max Kazantsev1-20/+80
the offset, part 1 While optimizing the memory instruction, we sometimes need to add offset to the value of `IV`. We could avoid doing so if the `IV.next` is already defined at the point of interest. In this case, we may get two possible advantages from this: - If the `IV` step happens to match with the offset, we don't need to add the offset at all; - We reduce overlap of live ranges of `IV` and `IV.next`. They may stop overlapping and it will lead to better register allocation. Even if the overlap will preserve, we are not introducing a new overlap, so it should be a neutral transform (Disabled this patch, will come with follow-up). Currently I've only added support for IVs that get decremented using `usub` intrinsic. We could also support `AddInstr`, however there is some weird interaction with some other transform that may lead to infinite compilation in this case (seems like same transform is done and undone over and over). I need to investigate why it happens, but generally we could do that too. The first part only handles case where this reuse fully elimiates the offset. Differential Revision: https://reviews.llvm.org/D96399 Reviewed By: reames
2021-03-01[NFC] Detect IV increment expressed as uadd_with_overflow and usub_with_overflowMax Kazantsev1-0/+6
Current callers do not call it with such argument, so this is NFC. But for further changes, it can be very useful to detect such cases.
2021-03-01[NFC] Introduce function getIVStep for further reuseMax Kazantsev1-10/+23
2021-03-01[NFC] Whitespace fixMax Kazantsev1-1/+1
2021-03-01[NFC] Factor out IV detector function for further reuseMax Kazantsev1-13/+20
2021-02-26[cgp] Minor code improvement - reuse an existing named helper [NFC]Philip Reames1-3/+3
2021-02-24[InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep.Sander de Smalen1-2/+2
This fixes the types of a few more cost variables to be of type InstructionCost.
2021-02-11[CodeGen] Use range-based for loops (NFC)Kazu Hirata1-27/+19
2021-02-11Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment"Max Kazantsev1-1/+36
The patch did not account for one corner case where cmp does not dominate the loop latch. This patch adds this check, hopefully it's cheap because the CFG does not change during the transform, so DT queries should be executed quickly. If you see compile time slowness from this, please revert. Differential Revision: https://reviews.llvm.org/D96119
2021-02-11Revert "[Codegenprepare][X86] Use usub with overflow opt for IV increment"Max Kazantsev1-31/+2
This reverts commit 3d15b7e7dfc3e2cefc47791d1e8d95909e937842. We've found an internal failure, need to analyze.
2021-02-11[Codegenprepare][X86] Use usub with overflow opt for IV incrementMax Kazantsev1-2/+31
Function `replaceMathCmpWithIntrinsic` artificially limits the scope of the optimization, setting a requirement of two instructions be in the same block, due to two reasons: - usage of DT for more general check is costly in terms of compile time; - risk of creating a new value that lives through multiple blocks. Because of this, two semantically equivalent tests may be or not be the subject of this opt depending on where the binary operation is located. See `test/CodeGen/X86/usub_inc_iv.ll` for motivation There is one important particular case where this limitation is too strict: it is when the binary operation is the increment of the induction variable. As result, the application of this opt becomes fragile and highly reliant on where other passes decide to place IV increment. In most cases, they place it in the end of the latch block, killing the opt opportunity (when in fact it does not matter where to insert the actual instruction). This patch handles this particular case separately. - The detector does not use dom tree and has constant cost; - The value of IV or IV.next lives through all loop in any case, so this should not create a new unexpected long-living value. As result, the transform becomes more robust. It also seems to lead to better code generation in some cases (see `test/CodeGen/X86/lsr-loop-exit-cond.ll`). Differential Revision: https://reviews.llvm.org/D96119 Reviewed By: spatel, reames
2021-02-05[NFC] inline variableGuillaume Chatelet1-2/+1
2021-02-04[TargetLowering] Use Align in allowsMisalignedMemoryAccesses.Craig Topper1-2/+2
Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097
2021-02-01[CodeGenPrepare] Also skip lifetime.end intrinsic when check return block in ↵Jun Ma1-15/+19
dupRetToEnableTailCallOpts. Differential Revision: https://reviews.llvm.org/D95424
2021-01-21[CodeGen] Use llvm::append_range (NFC)Kazu Hirata1-4/+2
2021-01-18[llvm] Use the default value of drop_begin (NFC)Kazu Hirata1-2/+2
2021-01-14[llvm] Use llvm::drop_begin (NFC)Kazu Hirata1-2/+2
2021-01-10[CodeGen, DebugInfo] Use llvm::find_if (NFC)Kazu Hirata1-2/+2
2021-01-10[CodeGen] Update transformations to use poison for ↵Juneyoung Lee1-3/+2
shufflevector/insertelem's initial vector elem This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element. The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero. It doesn't touch the second element (which is poison). The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than the size of the first vector operand. This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad. It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form. isDeInterleaveMask has the check that a+km is smaller than the vector size. To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D94056