aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/CodeGenPrepare.cpp
AgeCommit message (Collapse)AuthorFilesLines
2024-01-04[IR] Fix GEP offset computations for vector GEPs (#75448)Jannik Silvanus1-1/+1
Vectors are always bit-packed and don't respect the elements' alignment requirements. This is different from arrays. This means offsets of vector GEPs need to be computed differently than offsets of array GEPs. This PR fixes many places that rely on an incorrect pattern that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`. We replace these by usages of `GTI.getSequentialElementStride(DL)`, which is a new helper function added in this PR. This changes behavior for GEPs into vectors with element types for which the (bit) size and alloc size is different. This includes two cases: * Types with a bit size that is not a multiple of a byte, e.g. i1. GEPs into such vectors are questionable to begin with, as some elements are not even addressable. * Overaligned types, e.g. i16 with 32-bit alignment. Existing tests are unaffected, but a miscompilation of a new test is fixed. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-15[llvm] Remove no-op ptr-to-ptr casts (NFC)Youngsuk Kim1-9/+1
Remove calls to CreatePointerCast which are just doing no-op ptr-to-ptr bitcasts. Opaque ptr cleanup effort (NFC).
2023-12-15[CodeGenPrepare] Remove unused TypePromotionTransaction::moveBefore to fix ↵Simon Pilgrim1-11/+0
gcc Wunused-function warning. NFC.
2023-12-13[DebugInfo][RemoveDIs] Switch some insertion routines to use iterators (#75330)Jeremy Morse1-10/+14
As part of RemoveDIs, we need instruction insertion to be done with iterators rather than instruction pointers, so that we can communicate some debug-info facts about the position. This patch is an entirely mechanical replacement of Instruction * with BasicBlock::iterator, plus using insertBefore to insert some instructions because we don't have iterator-taking constructors yet. Sadly it's not NFC because it causes dbg.value intrinsics / their DPValue equivalents to shift location.
2023-12-06[DebugInfo][RemoveDIs] Maintain DPValues on skipped instrs in CGP (#74602)Jeremy Morse1-2/+14
It turns out that CodeGenPrepare will skip over consecutive select instructions as it knows it can optimise them all at the same time. This is unfortunate for the RemoveDIs project to remove intrinsic-based debug-info, because that means debug-info attached to those skipped instructions doesn't get seen by optimizeInst and so updated. Add code to handle debug-info on those skipped instructions manually. This code will also have been slower when it had dbg.values stuffed in between instructions, but with RemoveDIs it'll go faster because the dbg.values won't break up the select sequence.
2023-12-05[CGP][AArch64] Rebase the common base offset for better ISelzhongyunde 004434071-29/+50
When all the large const offsets masked with the same value from bit-12 to bit-23. Fold add x8, x0, #2031, lsl #12 add x8, x8, #960 ldr x9, [x8, x8] ldr x8, [x8, #2056] into add x8, x0, #2031, lsl #12 ldr x9, [x8, #960] ldr x8, [x8, #3016]
2023-11-30[DebugInfo][RemoveDIs] Support maintaining DPValues in CodeGenPrepare (#73660)Jeremy Morse1-70/+151
CodeGenPrepare needs to support the maintenence of DPValues, the non-instruction replacement for dbg.value intrinsics. This means there are a few functions we need to duplicate or replicate the functionality of: * fixupDbgValue for setting users of sunk addr GEPs, * The remains of placeDbgValues needs a DPValue implementation for sinking * Rollback of RAUWs needs to update DPValues * Rollback of instruction removal needs supporting (see github #73350) * A few places where we have to use iterators rather than instructions. There are three places where we have to use the setHeadBit call on iterators to indicate which portion of debug-info records we're about to splice around. This is because CodeGenPrepare, unlike other optimisation passes, is very much concerned with which block an operation occurs in and where in the block instructions are because it's preparing things to be in a format that's good for SelectionDAG. There isn't a large amount of test coverage for debuginfo behaviours in this pass, hence I've added some more.
2023-11-14[CGP] Drop nneg flag when moving zext past instruction (#72103)Nikita Popov1-26/+12
Fix the issue by not reusing the zext at all. The code already handles creation of new zexts if more than one is needed. Always use that code-path instead of trying to reuse the old zext in some case. (Alternatively we could also drop poison-generating flags on the old zext, but it seems cleaner to not reuse it at all, especially if it's not always possible anyway.) Fixes https://github.com/llvm/llvm-project/issues/72046.
2023-11-07[AArch64] Sink vscale calls into loops for better isel (#70304)Graham Hunter1-1/+0
For more recent sve capable CPUs it is beneficial to use the inc* instruction to increment a value by vscale (potentially shifted or multiplied) even in short loops. This patch tells codegenprepare to sink appropriate vscale calls into blocks where they are used so that isel can match them.
2023-10-13[CodeGenPrepare] Check types when unmerging GEPs across indirect branches ↵Maurice Heumann1-0/+2
(#68587) The optimization in CodeGenPrepare, where GEPs are unmerged across indirect branches must respect the types of both GEPs and their sizes when adjusting the indices. The sample here shows the bug: https://godbolt.org/z/8e9o5sYPP The value `%elementValuePtr` addresses the second field of the `%struct.Blub`. It is therefore a GEP with index 1 and type i8. The value `%nextArrayElement` addresses the next array element. It is therefore a GEP with index 1 and type `%struct.Blub`. Both values point to completely different addresses, even if the indices are the same, due to the types being different. However, after CodeGenPrepare has run, `%nextArrayElement` is a bitcast from `%elementValuePtr`, meaning both were treated as equal. The cause for this is that the unmerging optimization does not take types into consideration. It sees both GEPs have `%currentArrayElement` as source operand and therefore tries to rewrite `%nextArrayElement` in terms of `%elementValuePtr`. It changes the index to the difference of the two GEPs. As both indices are `1`, the difference is `0`. As the indices are `0` the GEP is later replaced with a simple bitcast in CodeGenPrepare. Before adjusting the indices, the types of the GEPs would have to be aligned and the indices scaled accordingly for the optimization to be correct. Due to the size of the struct being `16` and the `%elementValuePtr` pointing to offset `1`, the correct index for the unmerged `%nextArrayElement` would be 15. I assume this bug emerged from the opaque pointer change as GEPs like `%elementValuePtr` that access the struct field based of type i8 did not naturally occur before. In light of future migration to ptradd, simply not performing the optimization if the types mismatch should be sufficient.
2023-10-05Use BlockFrequency type in more places (NFC) (#68266)Matthias Braun1-4/+3
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it more consistently in various APIs and disable implicit conversion to make usage more consistent and explicit. - Use `BlockFrequency Freq` parameter for `setBlockFreq`, `getProfileCountFromFreq` and `setBlockFreqAndScale` functions. - Return `BlockFrequency` in `getEntryFreq()` functions. - While on it change some `const BlockFrequency& Freq` parameters to plain `BlockFreqency Freq`. - Mark `BlockFrequency(uint64_t)` constructor as explicit. - Add missing `BlockFrequency::operator!=`. - Remove `uint64_t BlockFreqency::getMaxFrequency()`. - Add `BlockFrequency BlockFrequency::max()` function.
2023-09-29[llvm] Use more explicit cast methods (NFC)Nikita Popov1-1/+1
Instead of ConstantExpr::getCast() with a fixed opcode, use the corresponding getXYZ methods instead. For the one place creating a pointer bitcast drop it entirely, as this is redundant with opaque pointers.
2023-09-14Avoid BlockFrequency overflow problems (#66280)Matthias Braun1-3/+3
Multiplying raw block frequency with an integer carries a high risk of overflow. - Add `BlockFrequency::mul` return an std::optional with the product or `nullopt` to indicate an overflow. - Fix two instances where overflow was likely.
2023-09-11[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilderJeremy Morse1-1/+1
This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468
2023-09-11[NFC][RemoveDIs] Prefer iterator-insertion over instructionsJeremy Morse1-1/+2
Continuing the patch series to get rid of debug intrinsics [0], instruction insertion needs to be done with iterators rather than instruction pointers, so that we can communicate information in the iterator class. This patch adds an iterator-taking insertBefore method and converts various call sites to take iterators. These are all sites where such debug-info needs to be preserved so that a stage2 clang can be built identically; it's likely that many more will need to be changed in the future. At this stage, this is just changing the spelling of a few operations, which will eventually become signifiant once the debug-info bearing iterator is used. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152537
2023-08-29[CGP] Remove dead PHI nodes before elimination of mostly empty blocksSerguei Katkov1-1/+10
Before elimination of mostly empty block it makes sense to remove dead PHI nodes. It open more opportunity for elimination plus eliminates dead code itself. It appeared that change results in failing many unit tests and some of them I've updated and for another one I disable this optimization. The pattern I observed in the tests is that there is a infinite loop without side effects. As a result after elimination of dead phi node all other related instruction are also removed and tests stops to check what it is expected. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D158503
2023-08-26[CodeGenPrepare] Fix modification status bugAiden Grossman1-0/+1
This was exposed in https://reviews.llvm.org/D158250 in CodeGen/X86/statepoint-stack-usage.ll. There was no update to the modification status in this section. Co-Authored-By: nikic Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D158898
2023-08-13[llvm] Drop some more typed pointer bitcasts etc.Bjorn Pettersson1-3/+1
2023-08-03[llvm] Drop some typed pointer handling/bitcastsBjorn Pettersson1-2/+2
Differential Revision: https://reviews.llvm.org/D157016
2023-08-01Revert "[CodeGenPrepare][NFC] Update the dominator tree instead of ↵Jordan Rupprecht1-117/+87
rebuilding it" This reverts commit 0b1d1cdb89322c277baf5221218a830195fef9d4. It causes a clang crash. Details will be posted to D153638.
2023-08-01[CodeGenPrepare][NFC] Update the dominator tree instead of rebuilding itMomchil Velikov1-87/+117
Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153638
2023-07-19[CodeGenPrepare] Refactor optimizeSelectInst (NFC)Momchil Velikov1-73/+65
Refactor to use BasicBlockUtils functions and make life easier for a subsequent patch for updating the dominator tree. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D154053
2023-07-09[CGP] Enable CodeGenPrepares phi type convertion.David Green1-1/+1
This is a recommit of 67121d7, enabling the CodeGenPrepare OptimizePhiTypes option that can help with the type of phi instructions into ISel.
2023-06-28[CodeGenPrepare] Implement releaseMemorySven van Haastregt1-4/+9
Release BlockFrequencyInfo and BranchProbabilityInfo results and other per function information immediately afterwards, instead of holding onto the memory until the next `CodeGenPrepare::runOnFunction` call. Differential Revision: https://reviews.llvm.org/D152552 Co-authored-by: Erik Hogeman <erik.hogeman@arm.com>
2023-06-19[CodeGenPrepare] Fix for using outdated/corrupt LoopInfoMomchil Velikov1-12/+45
Some transformation in CodeGenPrepare pass may create and/or delete basic block, but they don't update the LoopInfo, so the LoopInfo may end up containing dangling pointers and sometimes reused basic blocks, which leads to "interesting" non-deterministic behaviour. These transformations do not seem to alter the loop structure of the function, and updating the loop info is quite straighforward. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D150384 Change-Id: If8ab3905749ea6be94fbbacd54c5cfab5bc1fba1
2023-06-18[CodeGenPrepare][RISCV] Remove asserting VH references before erasing the ↵Yingwei Zheng1-1/+3
dead GEP Fixes issue https://github.com/llvm/llvm-project/issues/63365 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153194
2023-06-16[CGP] Fix infinite loop in icmp operand swappingNikita Popov1-1/+1
Don't swap the operands if they're the same. Fixes the issue reported at https://reviews.llvm.org/D152541#4427017.
2023-06-15[InstCombine][CGP] Move swapMayExposeCSEOpportunities() foldNikita Popov1-0/+34
InstCombine tries to swap compare operands to match sub instructions in order to expose "CSE opportunities". However, it doesn't really make sense to perform this transform in the middle-end, as we cannot actually CSE the instructions there. The backend already performs this fold in https://github.com/llvm/llvm-project/blob/18f5446a45da5a61dbfb1b7667d27fb441ac62db/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L4236 on the SDAG level, however this only works within a single basic block. To handle cross-BB cases, we do need to handle this in the IR layer. This patch moves the fold from InstCombine to CGP in the backend, while keeping the same (somewhat dubious) heuristic. Differential Revision: https://reviews.llvm.org/D152541
2023-06-02[AArch64] Don't use tbl lowering if ZExt can be folded into user.Florian Hahn1-3/+3
If the ZExt can be lowered to a single ZExt to the next power-of-2 and the remaining ZExt folded into the user, don't use tbl lowering. Fixes #62620. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D150482
2023-05-27[CGP] Disable default copy ctor and copy assignment operator for ↵Bing1 Yu1-0/+3
InstructionRemover class InstructionRemover manages resources such as dynamically allocated memory, it's generally a good practice to either implement a custom copy constructor or disable the default one. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D151543
2023-05-23[CodeGen] Fix crash in CodeGenPrepare::optimizeGatherScatterInst.Joshua Cranmer1-1/+2
Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151141
2023-05-19[NFC] Fix typo in CodeGenPrepare.cppThomas Symalla1-1/+1
2023-05-12[GlobalISel] Handle ptr size != index size in IRTranslator, CodeGenPrepareKrzysztof Drewniak1-3/+3
While the original motivation for this patch (address space 7 on AMDGPU) has been reworked and is not presently planned to reach IR translation, the incorrect (by the spec) handling of index offset width in IR translation and CodeGenPrepare is likely to trip someone - possibly future AMD, since we have a p7:160:256:256:32 now, so we convert to the other API now. Reviewed By: aemerson, arsenm Differential Revision: https://reviews.llvm.org/D143526
2023-05-03Restore CodeGen/MachineValueType.h from `Support`NAKAMURA Takumi1-1/+1
This is rework of; - rG13e77db2df94 (r328395; MVT) Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored as well. Depends on D148767 Differential Revision: https://reviews.llvm.org/D149024
2023-04-28[NFC]Fix 2 logic dead codeWang, Xin101-3/+1
First, in CodeGenPrepare.cpp, line 6891, the VectorCond will always be false because if not function will return at 6888. Second, in SelectionDAGBuilder.cpp, line 5443, getSExtValue() will return value as int type, but now we use unsigned Val to maintain it, which make the if condition at 5452 meaningless. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D149033
2023-04-27Revert "[CodeGenPrepare] Estimate liveness of loop invariants when checking ↵Jordan Rupprecht1-25/+1
for address folding profitability" This reverts commit 5344d8e10bb7d8672d4bfae8adb010465470d51b. It causes non-determinism when building clang. See the review thread on D143897.
2023-04-24[CodeGenPrepare] Estimate liveness of loop invariants when checking for ↵Momchil Velikov1-1/+25
address folding profitability When checking the profitability of folding an address computation into a memory instruction, the compiler tries to determine the liveness of the values, comprising the address, at the point of the memory instruction. This patch improves on the live variable estimates by including the loop invariants which are references in the loop body. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D143897
2023-04-21[NFC][CodeGenPrepare] Match against the correct instruction when checking ↵Momchil Velikov1-10/+11
profitability of folding an address The "nested" `AddressingModeMatcher`s in `AddressingModeMatcher::isProfitableToFoldIntoAddressingMode` are constructed using the original memory instruction, even though they check whether the address operand of a differrent memory instructon is foldable. The memory instruction is used only for a dominance check (when not checking for profitability), and using the wrong memory instruction does not change the outcome of the test - if an address is foldable, the dominance test afects which of the two possible ways to fold is chosen, but this result is discarded. As an example, in target triple = "x86_64-linux" declare i1 @check(i64, i64) define i32 @f(i1 %cc, ptr %p, ptr %q, i64 %n) { entry: br label %loop loop: %iv = phi i64 [ %i, %C ], [ 0, %entry ] %offs = mul i64 %iv, 4 %c.0 = icmp ult i64 %iv, %n br i1 %c.0, label %A, label %fail A: br i1 %cc, label %B, label %C C: %u = phi i32 [0, %A], [%w, %B] %i = add i64 %iv, 1 %a.0 = getelementptr i8, ptr %p, i64 %offs %a.1 = getelementptr i8, ptr %a.0, i64 4 %v = load i32, ptr %a.1 %c.1 = icmp eq i32 %v, %u br i1 %c.1, label %exit, label %loop B: %a.2 = getelementptr i8, ptr %p, i64 %offs %a.3 = getelementptr i8, ptr %a.2, i64 4 %w = load i32, ptr %a.3 br label %C exit: ret i32 -1 fail: ret i32 0 } the dominance test is perfomed between `%i = ...` and `%v = ...` at the moment we're checking whether `%a3 = ...` is foldable Using the memory instruction, which uses the interesting address is "more correct" and this change is needed by a future patch. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D143896
2023-04-21Recommit "[AArch64] Fix incorrect `isLegalAddressingMode`"Momchil Velikov1-31/+39
This patch recommits 0827e2fa3fd15b49fd2d0fc676753f11abb60cab after reverting it in ed7ada259f665a742561b88e9e6c078e9ea85224. Added workround for `Targetlowering::AddrMode` no longer being an aggregate in C++20. `AArch64TargetLowering::isLegalAddressingMode` has a number of defects, including accepting an addressing mode, which consists of only an immediate operand, or not checking the offset range for an addressing mode in the form `1*ScaledReg + Offs`. This patch fixes the above issues. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D143895 Change-Id: I41a520c13ce21da503ca45019979bfceb8b648fa
2023-04-20Revert "[AArch64] Fix incorrect `isLegalAddressingMode`"Momchil Velikov1-39/+31
This reverts commit 0827e2fa3fd15b49fd2d0fc676753f11abb60cab. Failing buildbot, perhaps due to `-std=c++20`.
2023-04-20[AArch64] Fix incorrect `isLegalAddressingMode`Momchil Velikov1-31/+39
`AArch64TargetLowering::isLegalAddressingMode` has a number of defects, including accepting an addressing mode which consists of only an immediate operand, or not checking the offset range for an addressing mode in the form `1*ScaledReg + Offs`. This patch fixes the above issues. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D143895 Change-Id: I756fa21941844ded44f082ac7eea4391219f9851
2023-04-20Fix uninitialized class membersAkshay Khadse1-1/+1
Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148692
2023-04-17Fix uninitialized pointer members in CodeGenAkshay Khadse1-5/+5
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148303
2023-03-30[CodeGenPrepare] Increase the limit on the number of instructions to scanMomchil Velikov1-7/+7
... when finding all memory uses for an address and make it a parameter. Now that we have avoided potentially exponential run time of `FindAllMemoryUses` in D143893. it'd be beneficial to increase the limit up from 20. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D143894 Change-Id: I3abdf40332ef65e9b2f819ac32ac60e4200ec51d
2023-03-30[CodeGenPrepare] Fix counting uses when folding addresses into memory ↵Momchil Velikov1-4/+13
instructions The counter of the number of instructions seen in `FindAllMemoryUses` is reset after returning from a recursive invocation of `FindAllMemoryUses` to the value it had before the call. In effect, depending on the shape of the uses graph, the function may scan up to `2^N-1` instructions where `N` is the scan limit (`MaxMemoryUsesToScan`). This does not look intuitive or intended. This patch changes the counting to just count the scanned instructions, independent of the shape of the references. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D143893 Change-Id: I99f5de55e84843cf2fbea287d6ae4312fa196240
2023-03-30[CodeGen] Remove redundent instructions generated by combineAddrModes.Peter Rong1-1/+14
CodeGenPare may optimize memory access modes. During such optimization, it might create a new instruction representing combined value. Later, If the optimization failed, the generated value is not removed and remains a dead instruction. Normally this won't be a problem as dead code will be eliminated later. However, in this case (Issue 58538), the generated instruction may trigger an infinite loop. The infinite loop involves `sinkCmpExpression`, where it tries to optimize the placeholder generated by us. (See the test case detailed in the issue) To fix this, we remove the unnecessary placeholder immediately when we abort the optimization. `AddressingModeCombiner` will keep track of the placeholder, and remove it if it is an inserted placeholder and has no uses. This patch fixes https://github.com/llvm/llvm-project/issues/58538, a test is also included. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D147041
2023-03-27[CodeGenPrepare][RISCV] Correct the MathUsed flag for shouldFormOverflowOpCraig Topper1-2/+4
For add, if we match the constant edge case the add isn't used by the compare so we shouldn't check for 2 users. For sub, the compare is not a user of the sub so the math is used if the sub has any users. This regresses RISC-V which I will work on other patches for. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D146786
2023-03-23[CodeGenPrepare] Don't give up if unable to sink first arg to a cold callMomchil Velikov1-1/+2
Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D143892
2023-03-15[llvm] Use *{Map,Set}::contains (NFC)Kazu Hirata1-4/+4
2023-03-14[CodeGen] Use *{Set,Map}::contains (NFC)Kazu Hirata1-1/+1