aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/CodeGenPrepare.cpp
AgeCommit message (Collapse)AuthorFilesLines
2021-03-17Reapply "[DebugInfo] Handle multiple variable location operands in IR"Stephen Tozer1-34/+63
Fixed section of code that iterated through a SmallDenseMap and added instructions in each iteration, causing non-deterministic code; replaced SmallDenseMap with MapVector to prevent non-determinism. This reverts commit 01ac6d1587e8613ba4278786e8341f8b492ac941.
2021-03-17Revert "[DebugInfo] Handle multiple variable location operands in IR"Hans Wennborg1-63/+34
This caused non-deterministic compiler output; see comment on the code review. > This patch updates the various IR passes to correctly handle dbg.values with a > DIArgList location. This patch does not actually allow DIArgLists to be produced > by salvageDebugInfo, and it does not affect any pass after codegen-prepare. > Other than that, it should cover every IR pass. > > Most of the changes simply extend code that operated on a single debug value to > operate on the list of debug values in the style of any_of, all_of, for_each, > etc. Instances of setOperand(0, ...) have been replaced with with > replaceVariableLocationOp, which takes the value that is being replaced as an > additional argument. In places where this value isn't readily available, we have > to track the old value through to the point where it gets replaced. > > Differential Revision: https://reviews.llvm.org/D88232 This reverts commit df69c69427dea7f5b3b3a4d4564bc77b0926ec88.
2021-03-13Restore fixed version of "[CodeGenPrepare] Fix isIVIncrement (PR49466)"Philip Reames1-1/+2
Change was reverted in commit 8d20f2c2c66eb486ff23cc3d55a53bd840b36971 because it was causing an infinite loop. 9228f2f32 fixed the root issue in the code structure, this change just reapplies the original change w/adaptation to the new code structure.
2021-03-13[CGP] Consolidate logic for getIVIncrement and isIVIncrementPhilip Reames1-16/+38
This fixes the bug demonstrated by the test case in the commit message of 8d20f2c2 (which was a revert of cf82700). The root issue was that we have two transforms which are inverses of each other. We use one for simple induction variables (where we can use the post-inc form), and the other for everything else. The problem was that the two transforms could disagree about whether something was an induction variable. The reverted commit made a change to one of the matcher routines which was used for one of the two transforms without updating the other matcher. However, it's worth noting the existing code w/o the reverted change also has cases where the decision could differ between the two paths. The fix is simply to consolidate the code such that two paths must agree by construction, and to add an assert to catch any potential future re-divergence. Triggering the infinite loop requires side stepping the SunkAddrs cache. The SunkAddrs cache has the effect of suppressing the iteration in the common case, but there are codepaths through CGP which restart iteration and clear this cache. Unfortunately, I have not been able to construct a standalone IR test case for this. The original test case is a c++ program which when compiled by clang demonstrates the infinite loop, but all of my attempts at extracting an IR test case runnable through opt/llc have failed to reproduce. (Including capturing the IR at point of the transform itself!) I have no idea what weird state clang is creating here. I also tried creating a test case by hand, but gave up after about an hour of trying to find the right combination to dance through multiple transforms to create the end result needed to trip the bug.
2021-03-12Revert "[CodeGenPrepare] Fix isIVIncrement (PR49466)"Jordan Rupprecht1-2/+1
This reverts commit cf82700af8c658ae09b14c3d01bb1e73e48d3bd3 due to a compile timeout when building the following with `clang -O2`: ``` template <class, class = int> class a; struct b { using d = int *; }; struct e { using f = b::d; }; class g { public: e::f h; e::f i; }; template <class, class> class a : g { public: long j() const { return i - h; } long operator[](long) const noexcept; }; template <class c, class k> long a<c, k>::operator[](long l) const noexcept { return h[l]; } template <typename m, typename n> int fn1(m, n, const char *); int o, p; class D { void q(const a<long> &); long r; }; void D::q(const a<long> &l) { int s; if (l[0]) for (; l.j(); ++s) { if (l[s]) while (fn1(o, 0, "")) ; r = l[s] / p; } } ```
2021-03-12[OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)Nikita Popov1-2/+5
This removes some (but not all) uses of type-less CreateGEP() and CreateInBoundsGEP() APIs, which are incompatible with opaque pointers. There are a still a number of tricky uses left, as well as many more variation APIs for CreateGEP.
2021-03-09[cgp] improve robustness of uadd/usub transformsPhilip Reames1-10/+11
LSR prefers to schedule iv increments just before the latch. The recent 80511565 broadened this to moving increments in the original IR. This pointed out a robustness problem with the CGP transform. When we have a use of an induction increment outside of the loop (we canonicalize away from this form, but it happens e.g. unanalyzeable loops) we'd avoid performing the uadd/usub transform. Interestingly, all of these involve moving the increment closer to it's operands, so there's no concern about dominating all uses. We can handle that case cheaply, resulting in a more robust transform.
2021-03-09[cgp] group related code together [nfc]Philip Reames1-3/+4
2021-03-09[DebugInfo] Handle multiple variable location operands in IRgbtozers1-34/+63
This patch updates the various IR passes to correctly handle dbg.values with a DIArgList location. This patch does not actually allow DIArgLists to be produced by salvageDebugInfo, and it does not affect any pass after codegen-prepare. Other than that, it should cover every IR pass. Most of the changes simply extend code that operated on a single debug value to operate on the list of debug values in the style of any_of, all_of, for_each, etc. Instances of setOperand(0, ...) have been replaced with with replaceVariableLocationOp, which takes the value that is being replaced as an additional argument. In places where this value isn't readily available, we have to track the old value through to the point where it gets replaced. Differential Revision: https://reviews.llvm.org/D88232
2021-03-09[CodeGenPrepare] Fix isIVIncrement (PR49466)Ta-Wei Tu1-1/+2
In the NFC commit 8d835f42a57f15c0b9053bd7c41ea95821a40e5f, the check for `!L` is moved to a separate function `getIVIncrement` which, instead of using `BO->getParent()`, uses `PN->getParent()`. However, these two basic blocks are not necessarily the same. https://bugs.llvm.org/show_bug.cgi?id=49466 demonstrates a case where `PN` is contained in a loop while `BO` is not, causing the null-pointer dereference in `L->getLoopLatch()`. This patch checks whether both `BO` and `PN` belong to the same loop before entering `getIVIncrement`. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D98144
2021-03-08[DebugInfo] Support DIArgList in DbgVariableIntrinsicgbtozers1-8/+4
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the location operand, resulting in a significant change to its interface. This patch does not update all IR passes to support multiple location operands in a dbg.value; the only change is to update the DbgVariableIntrinsic interface and its uses. All code outside of the intrinsic classes assumes that an intrinsic will always have exactly one location operand; they will still support DIArgLists, but only if they contain exactly one Value. Among other changes, the setOperand and setArgOperand functions in DbgVariableIntrinsic have been made private. This is to prevent code from setting the operands of these intrinsics directly, which could easily result in incorrect/invalid operands being set. This does not prevent these functions from being called on a debug intrinsic at all, as they can still be called on any CallInst pointer; it is assumed that any code directly setting the operands on a generic call instruction is doing so safely. The intention for making these functions private is to prevent DIArgLists from being overwritten by code that's naively trying to replace one of the Values it points to, and also to fail fast if a DbgVariableIntrinsic is updated to use a DIArgList without a valid corresponding DIExpression.
2021-03-04[cgp] Defer lazy domtree usage to last possible pointPhilip Reames1-2/+5
This is a compile time optimization for d9e93e8e5. Not sure this matters or not, but why not do it just in case. This does involve querying TLI with a potentially invalid addressing mode for the using instruction, but since we don't actually pass the using instruction to the TLI callback, that should be fine.
2021-03-04[CGP] Lazily compute domtree only when needed during address matchingPhilip Reames1-12/+20
This is a compile time optimization for d9e93e8e5. As pointed out in post dommit review on the original review (D96399), there was a moderately large compile time regression with this patch and the eager computation of domtree on matcher construction is the first obvious candidate for why.
2021-03-04[CodeGenPrepare] Eliminate llvm.expect before removing empty blocksJann Horn1-12/+30
CodeGenPrepare currently first removes empty blocks, then in a loop performs other optimizations. One of those optimizations is the removal of call instructions that invoke @llvm.assume, which can create new empty blocks. This means that when a branch only contains a call to __builtin_assume(), the empty branch will survive into MIR, and will then only be half-removed by MIR-level optimizations (e.g. removing the branch but leaving the condition intact). Fix it by eliminating @llvm.expect builtin calls before removing empty blocks. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D97848
2021-03-04[X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding ↵Max Kazantsev1-2/+4
the offset, part 2 This patch enables the case where we do not completely eliminate offset. Supposedly in this case we reduce live range overlap that never harms, but since there are doubts this is true, this goes as a separate change. Differential Revision: https://reviews.llvm.org/D96399 Reviewed By: reames
2021-03-04[X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding ↵Max Kazantsev1-20/+80
the offset, part 1 While optimizing the memory instruction, we sometimes need to add offset to the value of `IV`. We could avoid doing so if the `IV.next` is already defined at the point of interest. In this case, we may get two possible advantages from this: - If the `IV` step happens to match with the offset, we don't need to add the offset at all; - We reduce overlap of live ranges of `IV` and `IV.next`. They may stop overlapping and it will lead to better register allocation. Even if the overlap will preserve, we are not introducing a new overlap, so it should be a neutral transform (Disabled this patch, will come with follow-up). Currently I've only added support for IVs that get decremented using `usub` intrinsic. We could also support `AddInstr`, however there is some weird interaction with some other transform that may lead to infinite compilation in this case (seems like same transform is done and undone over and over). I need to investigate why it happens, but generally we could do that too. The first part only handles case where this reuse fully elimiates the offset. Differential Revision: https://reviews.llvm.org/D96399 Reviewed By: reames
2021-03-01[NFC] Detect IV increment expressed as uadd_with_overflow and usub_with_overflowMax Kazantsev1-0/+6
Current callers do not call it with such argument, so this is NFC. But for further changes, it can be very useful to detect such cases.
2021-03-01[NFC] Introduce function getIVStep for further reuseMax Kazantsev1-10/+23
2021-03-01[NFC] Whitespace fixMax Kazantsev1-1/+1
2021-03-01[NFC] Factor out IV detector function for further reuseMax Kazantsev1-13/+20
2021-02-26[cgp] Minor code improvement - reuse an existing named helper [NFC]Philip Reames1-3/+3
2021-02-24[InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep.Sander de Smalen1-2/+2
This fixes the types of a few more cost variables to be of type InstructionCost.
2021-02-11[CodeGen] Use range-based for loops (NFC)Kazu Hirata1-27/+19
2021-02-11Return "[Codegenprepare][X86] Use usub with overflow opt for IV increment"Max Kazantsev1-1/+36
The patch did not account for one corner case where cmp does not dominate the loop latch. This patch adds this check, hopefully it's cheap because the CFG does not change during the transform, so DT queries should be executed quickly. If you see compile time slowness from this, please revert. Differential Revision: https://reviews.llvm.org/D96119
2021-02-11Revert "[Codegenprepare][X86] Use usub with overflow opt for IV increment"Max Kazantsev1-31/+2
This reverts commit 3d15b7e7dfc3e2cefc47791d1e8d95909e937842. We've found an internal failure, need to analyze.
2021-02-11[Codegenprepare][X86] Use usub with overflow opt for IV incrementMax Kazantsev1-2/+31
Function `replaceMathCmpWithIntrinsic` artificially limits the scope of the optimization, setting a requirement of two instructions be in the same block, due to two reasons: - usage of DT for more general check is costly in terms of compile time; - risk of creating a new value that lives through multiple blocks. Because of this, two semantically equivalent tests may be or not be the subject of this opt depending on where the binary operation is located. See `test/CodeGen/X86/usub_inc_iv.ll` for motivation There is one important particular case where this limitation is too strict: it is when the binary operation is the increment of the induction variable. As result, the application of this opt becomes fragile and highly reliant on where other passes decide to place IV increment. In most cases, they place it in the end of the latch block, killing the opt opportunity (when in fact it does not matter where to insert the actual instruction). This patch handles this particular case separately. - The detector does not use dom tree and has constant cost; - The value of IV or IV.next lives through all loop in any case, so this should not create a new unexpected long-living value. As result, the transform becomes more robust. It also seems to lead to better code generation in some cases (see `test/CodeGen/X86/lsr-loop-exit-cond.ll`). Differential Revision: https://reviews.llvm.org/D96119 Reviewed By: spatel, reames
2021-02-05[NFC] inline variableGuillaume Chatelet1-2/+1
2021-02-04[TargetLowering] Use Align in allowsMisalignedMemoryAccesses.Craig Topper1-2/+2
Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097
2021-02-01[CodeGenPrepare] Also skip lifetime.end intrinsic when check return block in ↵Jun Ma1-15/+19
dupRetToEnableTailCallOpts. Differential Revision: https://reviews.llvm.org/D95424
2021-01-21[CodeGen] Use llvm::append_range (NFC)Kazu Hirata1-4/+2
2021-01-18[llvm] Use the default value of drop_begin (NFC)Kazu Hirata1-2/+2
2021-01-14[llvm] Use llvm::drop_begin (NFC)Kazu Hirata1-2/+2
2021-01-10[CodeGen, DebugInfo] Use llvm::find_if (NFC)Kazu Hirata1-2/+2
2021-01-10[CodeGen] Update transformations to use poison for ↵Juneyoung Lee1-3/+2
shufflevector/insertelem's initial vector elem This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element. The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero. It doesn't touch the second element (which is poison). The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than the size of the first vector operand. This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad. It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form. isDeInterleaveMask has the check that a+km is smaller than the vector size. To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D94056
2021-01-06[llvm] Use llvm::all_of (NFC)Kazu Hirata1-2/+1
2021-01-03[llvm] Call *(Set|Map)::erase directly (NFC)Kazu Hirata1-3/+1
We can erase an item in a set or map without checking its membership first.
2021-01-01[CodeGen] recognize select form of and/ors when splitting branch conditionsJuneyoung Lee1-8/+14
Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a || b" in C/C++. "a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a. This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353). In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison. The discussion at D93065 has more context about this. This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize select form of and/or as well using m_LogicalAnd/Or. Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93853
2020-12-31[CodeGen] Construct SmallVector with iterator ranges (NFC)Kazu Hirata1-3/+3
2020-12-30Use unary CreateShuffleVector if possibleJuneyoung Lee1-2/+1
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923
2020-12-28[CodeGen] Use llvm::append_range (NFC)Kazu Hirata1-1/+1
2020-12-17[IR][PGO] Add hot func attribute and use hot/cold attribute in func sectionRong Xu1-2/+10
Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493
2020-12-15[CodeGenPrepare] Update optimizeGatherScatterInst for scalable vectors.Paul Walker1-9/+5
optimizeGatherScatterInst does nothing specific to fixed length vectors but uses FixedVectorType to extract the number of elements. This patch simply updates the code to use VectorType and getElementCount instead. For testing I just copied Transforms/CodeGenPrepare/X86/gather-scatter-opt.ll replacing `<4 x ` with `<vscale x 4`. Differential Revision: https://reviews.llvm.org/D92572
2020-12-08[CodeGen] Add text section prefix for COFF object filePan, Tao1-3/+3
Text section prefix is created in CodeGenPrepare, it's file format independent implementation, text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF. Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix. Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix. Reviewed By: pengfei, rnk Differential Revision: https://reviews.llvm.org/D92073
2020-11-24Clear NewGEPBases after finish using them in CodeGenPrep passYichao Yu1-0/+1
AFAICT all other set/map are correctly cleared in `runOnFunction`. With assertion enabled this causes a crash when the module is freed and potentially if a later pass delete the instruction (not observed in real world though). Without assertion this can potentially cause confusing result when running on a new Function/Module. Reviewed By: loladiro Differential Revision: https://reviews.llvm.org/D84031
2020-11-22[CodeGen] Use pred_empty (NFC)Kazu Hirata1-3/+3
2020-11-20[CSSPGO] IR intrinsic for pseudo-probe block instrumentationHongtao Yu1-16/+9
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490
2020-10-27[Utils] Skip RemoveRedundantDbgInstrs in MergeBlockIntoPredecessor (PR47746)Vedant Kumar1-0/+9
This patch changes MergeBlockIntoPredecessor to skip the call to RemoveRedundantDbgInstrs, in effect partially reverting D71480 due to some compile-time issues spotted in LoopUnroll and SimplifyCFG. The call to RemoveRedundantDbgInstrs appears to have changed the worst-case behavior of the merging utility. Loosely speaking, it seems to have gone from O(#phis) to O(#insts). It might not be possible to mitigate this by scanning a block to determine whether there are any debug intrinsics to remove, since such a scan costs O(#insts). So: skip the call to RemoveRedundantDbgInstrs. There's surprisingly little fallout from this, and most of it can be addressed by doing RemoveRedundantDbgInstrs later. The exception is (the block-local version of) SimplifyCFG, where it might just be too expensive to call RemoveRedundantDbgInstrs. Differential Revision: https://reviews.llvm.org/D88928
2020-09-14[CGP] Limit converting phi types to simple loads and storesDavid Green1-1/+3
Instcombine limits converting phi types to simple loads and stores. This does the same in codegenprepare, not processing phis that are not simple. Note that volatile loads/store ISel will happily convert between float and int. Atomics are more likely to always be integer. This just keeps things simple and doesn't process either. Differential Revision: https://reviews.llvm.org/D83770
2020-09-14[CodeGenPrepare] Fix zapping dead operands of assumeYevgeny Rouban1-3/+5
This patch fixes a problem of the commit 52cc97a0. A test case is created to demonstrate the crash caused by the instruction iterator invalidated by the recursive removal of dead operands of assume. The solution restarts from the blocks's first instruction in case CurInstIterator is invalidated by RecursivelyDeleteTriviallyDeadInstructions(). Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D87434
2020-09-13[CGP] Prevent optimizePhiType from iterating foreverDavid Green1-6/+22
The recently added optimizePhiType algorithm had no checks to make sure it didn't continually iterate backward and forth between float and int types. This means that given an input like store(phi(bitcast(load))), we could convert that back and forth to store(bitcast(phi(load))). This particular case would usually have been simplified to a different load type (folding the bitcast into the load) before CGP, but other cases can occur. The one that came up was phi(bitcast(phi)), where the two phi's of different types were bitcast between. That was not helped by a dead bitcast being kept around which could make conversion look profitable. This adds an extra check of the bitcast Uses or Defs, to make sure that at least one is grounded and will not end up being converted back. It also makes sure that dead bitcasts are removed, and there is a minor change to include newly created Phi nodes in the Visited set so that they do not need to be revisited. Differential Revision: https://reviews.llvm.org/D82676