aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
AgeCommit message (Collapse)AuthorFilesLines
5 days[LoopInterchange] Bail out when finding a dependency with all `*` elements ↵Ryotaro Kasuga1-0/+11
(#149049) If a direction vector with all `*` elements, like `[* * *]`, is present, it indicates that none of the loop pairs are legal to interchange. In such cases, continuing the analysis is meaningless. This patch introduces a check to detect such direction vectors and exits early when one is found. This slightly reduces compile time.
2025-07-25[LoopInterchange] Consider forward/backward dependency in vectorize ↵Ryotaro Kasuga1-14/+100
heuristic (#133672) The vectorization heuristic of LoopInterchange attempts to move a vectorizable loop to the innermost position. Before this patch, a loop was deemed vectorizable if there are no loop-carried dependencies induced by the loop. This patch extends the vectorization heuristic by introducing the concept of forward and backward dependencies, inspired by LoopAccessAnalysis. Specifically, an additional element is appended to each direction vector to indicate whether it represents a forward dependency (`<`) or not (`*`). Among these, only the forward dependencies (i.e., those whose last element is `<`) affect the vectorization heuristic. Accordingly, the check is conservative, and dependencies are considered forward only when this can be proven. Currently, we only support perfectly nested loops whose body consists of a single basic block. For other cases, dependencies are pessimistically treated as non-forward.
2025-07-18[LoopInterchange] Ignore the cost-model, force interchange if legal (#148858)Sjoerd Meijer1-4/+20
This is and has been proven useful for testing purposes, to get more test coverage.
2025-07-15[Scalar] Fix a warningKazu Hirata1-0/+1
This patch fixes: llvm/lib/Transforms/Scalar/LoopInterchange.cpp:863:20: error: unused variable 'OpCode' [-Werror,-Wunused-variable]
2025-07-15[LoopInterchange] Drop nuw/nsw flags from reduction ops when interchanging ↵Ryotaro Kasuga1-6/+84
(#148612) Before this patch, when a reduction exists in the loop, the legality check of LoopInterchange only verified if there exists a non-reassociative floating-point instruction in the reduction calculation. However, it is insufficient, because reordering integer reductions can also lead to incorrect transformations. Consider the following example: ```c int A[2][2] = { { INT_MAX, INT_MAX }, { INT_MIN, INT_MIN }, }; int sum = 0; for (int i = 0; i < 2; i++) for (int j = 0; j < 2; j++) sum += A[j][i]; ``` To make this exchange legal, we must drop nuw/nsw flags from the instructions involved in the reduction operations. This patch extends the legality check to correctly handle such cases. In particular, for integer addition and multiplication, it verifies that the nsw and nuw flags are set on involved instructions, and drop them when the transformation actually performed. This patch also introduces explicit checks for the kind of reduction and permits only those that are known to be safe for interchange. Consequently, some "unknown" reductions (at the moment, `FindFirst*` and `FindLast*`) are rejected. Fix #148228
2025-07-08[LoopInterchange] Defer CacheCost calculation until needed (#146874)Ryotaro Kasuga1-34/+71
LoopInterchange currently stop and exit its process when the number of loads/stores in the loop is greater than `MaxMemInstrCount`. This prevents excessive querying to DependenceAnalysis. However, computing `CacheCost` also involves DependenceAnalysis queries, and their number can grow to `O(N^2)` in the worst case, where `N` is the number of loads/stores in the loop. Therefore, we should also avoid calculating it if the loads/stores count exceeds `MaxMemInstrCount`. This patch defers the calculation of `CacheCost` until it is actually needed to reduce compile time. This avoids computing `CacheCost` when the number of loads/stores is large. Additionally, since this patch delays its calculation as much as possible, it is also effective in other scenarios, e.g., when there are no legal loop pairs to exchange.
2025-06-28[LoopInterchange] Modernize loops (NFC) (#146105)Ramkumar Ramachandra1-12/+9
2025-06-27[LoopInterchange] Use ArrayRef in more places (NFC) (#146077)Ramkumar Ramachandra1-10/+8
2025-06-25[Transforms] Use range-based for loops (NFC) (#145252)Kazu Hirata1-2/+2
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-06-05[LoopInterchange] Handle confused dependence correctly (#140709)Ryotaro Kasuga1-0/+8
This patch fixes the handling of a confused `Dependence` object. Such an object doesn’t contain any information about dependencies, so we must process it conservatively. However, it was converted into a direction vector like `[I I ... I]`. As a result, it was treated as if there are no loop-carried dependencies, which can lead to illegal loop exchanges. Fixes #140238
2025-05-26[Scalar] Use llvm::count (NFC) (#141445)Kazu Hirata1-5/+2
2025-05-13[LoopInterchange] Relax the legality check to accept more patterns (#139690)Ryotaro Kasuga1-6/+2
When proving the legality of exchanging two loops, it doesn't need to check the elements of the direction vectors associated with the loops outside of the two target loops. Before this patch, the legality check looked at all elements of a direction vector to calculate the lexicographically order of the vector, which may reject some legal exchanges. For example, if a direction vector is `[* < =]`, it is safe to swap the last two loops because the corresponding subsequence of the vector (`[< =]`) is lexicographically positive for both before and after the exchange. However, the its order is unknown if we don't drop the prefix since the first element is `*`. This patch improves the logic of legality check to ignore such unrelated prefixes of direction vectors.
2025-05-13[LoopInterchange] Skip legality check if surrounding loops already guarantee ↵Ryotaro Kasuga1-6/+23
dependency (NFC) (#139254) The legality check in LoopInterchange allows two loops to be exchanged if all direction vectors are lexicographically positive (or zero) for both before and after the swap. The current implementation performs this routine naively. However, if a direction vector is lexicographically positive due to an element corresponding to a loop that is outside the given two loops (i.e., if there is an element `<` before the loops we are trying to interchange), then obviously it is also positive after exchanging them. For example, for a direction vector `[< < >]`, swapping the last two elements doesn't make it lexicographically negative because the first element is `<`. This patch adds a code to skip legality check if surrounding loops already guarantee that the direction vector is lexicographically positive. Note that this is only a small improvement on its own, but it's necessary to relax the legality check I'm working on. Split off from #118267 --------- Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
2025-05-04[Transforms] Remove unused local variables (NFC) (#138442)Kazu Hirata1-1/+0
2025-04-18[Transforms] Construct SmallVector with iterator ranges (NFC) (#136259)Kazu Hirata1-6/+4
2025-04-03[LoopInterchange] Fix the vectorizable check for a loop (#133667)Ryotaro Kasuga1-17/+27
In the profitability check for vectorization, the dependency matrix was not handled correctly. This can result to make a wrong decision: It may say "this loop can be vectorized" when in fact it cannot. The root cause of this is that the check process early returns when it finds '=' or 'I' in the dependency matrix. To make sure that we can actually vectorize the loop, we need to check all the rows of the matrix. This patch fixes the process of checking whether we can vectorize the loop or not. Now it won't make a wrong decision for a loop that cannot be vectorized. Related: #131130
2025-04-02[LoopInterchange] Add an option to control the cost heuristics applied (#133664)Ryotaro Kasuga1-23/+33
LoopInterchange has several heuristic functions to determine if exchanging two loops is profitable or not. Whether or not to use each heuristic and the order in which to use them were fixed, but #125830 allows them to be changed internally at will. This patch adds a new option to control them via the compiler option. The previous patch also added an option to prioritize the vectorization heuristic. This patch also removes it to avoid conflicts between it and the newly introduced one, e.g., both `-loop-interchange-prioritize-vectorization=1` and `-loop-interchange-profitabilities='cache,vectorization'` are specified.
2025-03-21[LoopInterchange] Prevent from undoing its own transformation (#127473)Ryotaro Kasuga1-5/+3
LoopInterchange uses the bubble-sort fashion algorithm to sort the loops in a loop nest. For two loops `A` and `B`, it calls `isProfitable(A, B)` to determine whether these loops should be exchanged. The result of `isProfitable(A, B)` should be conservative, that is, it should return true only when we are sure exchanging `A` and `B` will improve performance. If it's not conservative enough, it's possible that a subsequent `isProfitable(B, A)` will also return true, in which case LoopInterchange will undo its previous transformation. To avoid such cases, `isProfitable(B, A)` must not return true if `isProfitable(A, B)` returned true in the past. However, the current implementation can be in such a situation. This patch resolves it by correcting the handling of two loops that have the same cache cost. This resolve the problem mentioned in https://github.com/llvm/llvm-project/pull/118267#issuecomment-2510759354.
2025-03-21[LoopInterchange] Add an option to prioritize vectorization (#131988)Ryotaro Kasuga1-13/+49
The LoopInterchange cost-model consists of several decision rules. They are called one by one, and if some rule can determine the profitability, then the subsequent rules aren't called. In the current implementation, the rule for `CacheCostAnalysis` is called first, and if it fails to determine the profitability, then the rule for vectorization is called. However, there are cases where interchanging loops for vectorization makes the code faster even if such exchanges are detrimental to the cache. For example, exchanging the inner two loops in the following example looks about x3 faster in my local (compiled with `-O3 -mcpu=neoverse-v2 -mllvm -cache-line-size=64`), even though it's rejected by the rule based on cache cost. (NOTE: LoopInterchange cannot exchange these loops due to legality checks. This should also be improved.) ```c __attribute__((aligned(64))) float aa[256][256],bb[256][256],cc[256][256], dd[256][256],ee[256][256],ff[256][256]; // Alternative of TSVC s231 with more array accesses than the original. void s231_alternative() { for (int nl = 0; nl < 100*(100000/256); nl++) { for (int i = 0; i < 256; ++i) { for (int j = 1; j < 256; j++) { aa[j][i] = aa[j-1][i] + bb[j][i] + cc[i][j] + dd[i][j] + ee[i][j] + ff[i][j]; } } } } ``` This patch introduces a new option to prioritize the vectorization rule over the cache cost rule. Related issue: #131130 --------- Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-03-12[Transforms] Avoid repeated hash lookups (NFC) (#130890)Kazu Hirata1-15/+18
2025-03-04[LoopInterchange] Move some processes to another function (NFC) (#129514)Ryotaro Kasuga1-13/+14
Some post-processing involved in exchanging a pair of loops has been done separately from `processLoop`, which is a main function that does the transformation. It's better to consolidate these processes into the same function. This patch is a preparation of #127474.
2025-02-11[DependenceAnalysis][NFC] Removing PossiblyLoopIndependent parameter (#124615)Alireza Torabian1-1/+1
Parameter PossiblyLoopIndependent has lost its intended purpose. This flag is always set to true in all cases when depends() is called, hence we want to reconsider the utility of this variable and remove it from the function signature entirely. This is an NFC patch.
2025-02-05[LoopInterchange] Hoist isComputableLoopNest() in the control flow (#124247)Madhur Amilkanthwar1-24/+35
The profiling of the LLVM Test-suite reveals that a significant portion, specifically 14,090 out of 139,323, loop nests were identified as non-viable candidates for transformation, leading to the transform exiting from isComputableLoopNest() without any action. More importantly, dependence information was computed for these loop nests before reaching the function isComputableLoopNest(), which does not require DI and relies solely on scalar evolution (SE). To enhance compile-time efficiency, this patch moves the call to isComputableLoopNest() earlier in the control-flow, thereby avoiding unnecessary dependence calculations. The impact of this change is evident on the compile-time-tracker, with the overall geometric mean improvement recorded at 0.11%, while the lencode benchmark gets a more substantial benefit of 0.44%. This improvement can be tracked in the isc-ln-exp-2 branch under my repo.
2025-01-29[LoopInterchange] Handle LE and GE correctly (#124901)Ryotaro Kasuga1-3/+8
LoopInterchange have converted `DVEntry::LE` and `DVEntry::GE` in direction vectors to '<' and '>' respectively. This handling is incorrect because the information about the '=' it lost. This leads to miscompilation in some cases. To resolve this issue, convert them to '*' instead. Resolve #123920
2025-01-24[NFC][DebugInfo] Use iterator-flavour getFirstNonPHI at many call-sites ↵Jeremy Morse1-8/+9
(#123737) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to getFirstNonPHI use the iterator-returning version. This patch changes a bunch of call-sites calling getFirstNonPHI to use getFirstNonPHIIt, which returns an iterator. All these call sites are where it's obviously safe to fetch the iterator then dereference it. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer getFirstNonPHI, but not before adding concise documentation of what considerations are needed (very few). --------- Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
2025-01-24[NFC][DebugInfo] Use iterator moveBefore at many call-sites (#123583)Jeremy Morse1-2/+2
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).
2025-01-23[LoopInterchange] Constrain LI within supported loop nest depth (#118656)Madhur Amilkanthwar1-13/+29
This patch is an extension to #115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost. Please see 'bound-max-depth' branch on compile-time-tracker.
2025-01-21[LoopInterchange] Constrain number of load/stores in a loop (#118973)Madhur Amilkanthwar1-13/+29
In the current state of the code, the transform computes entries for the dependency matrix until `MaxMemInstrCount` which is 100. After 99th entry, it terminates and thus overall wastes compile-time. It would be nice if we can compute total number of entries upfront and early exit if the number of entries > 100. However, computing the number of entries is not always possible as it depends on two factors: 1. Number of load-store pairs in a loop. 2. Number of common loop levels for each of the pair. This patch constrains the whole computation on the number of loads and stores instructions in the loop. In another approach, I experimented with computing 1 and constraining the number of pairs, but that did not lead to any additional benefit in terms of compile time. However, when other issues are fixed, I can revisit this approach.
2025-01-20[LoopInterchange] Remove 'S' Scalar Dependencies (#119345)Sjoerd Meijer1-17/+11
We are not handling 'S' scalar dependencies correctly and have at least the following miscompiles related to that: [LoopInterchange] incorrect handling of scalar dependencies and dependence vectors starting with ">" #54176 [LoopInterchange] Interchange breaks program correctness #46867 [LoopInterchange] Loops should not interchanged due to dependencies #47259 [LoopInterchange] Loops should not interchanged due to control flow #47401 This patch does no longer insert the "S" dependency/direction into the dependency matrix, so a dependency is never "S". We seem to have forgotten what the exact meaning is of this dependency type, and don't see why it should be treated differently. We prefer correctness over incorrect and more aggressive results. I.e., this prevents the miscompiles at the expense of handling less cases, i.e. making interchange more pessimistic. However, some of the cases that are now rejected for dependence analysis reasons, were rejected before too but for other reasons (e.g. profitability). So at least for the llvm regression tests, the number of regression are very reasonable. This should be a stopgap. We would like to get interchange enabled by default and thus prefer correctness over unsafe transforms, and later see if we can get solve the regressions.
2024-11-19[LoopInterchange] Make the entries of the Dependency Matrix unique (#116195)Sjoerd Meijer1-10/+14
The entries in the dependency matrix can contain a lot of duplicates, which is unnecessary and results in more checks that we can avoid, and this patch adds that.
2024-11-19[LoopInterchange] Bail out early if minimum loop nest is not met (#115128)Madhur Amilkanthwar1-4/+16
This patch bails out early if minimum depth is not met. As it stands today, the pass computes CacheCost before it attempts to do the transform. This is not needed if minimum depth is not met. This handles basic cases where depth is typically 1. As the patch avoids unnecessary computation, it is aimed to improve compile-time.
2024-11-02[Scalar] Remove unused includes (NFC) (#114645)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2024-08-03[Transforms] Construct SmallVector with ArrayRef (NFC) (#101851)Kazu Hirata1-1/+1
2024-04-12Fix typos (#88565)Victor Toni1-1/+1
2023-09-07[NFC][RemoveDIs] Create a new spelling of the moveBefore methodJeremy Morse1-1/+1
As outlined in my proposal of how to get rid of debug intrinsics, this patch adds a moveBefore method that signals the caller /intends/ the order of moved instructions is to stay the same. This semantic difference has an effect on debug-info, as it signals whether debug-info needs to move with instructions or not. The patch just replaces a few calls to moveBefore with calls to moveBeforePreserving -- and the latter just calls the former, so it's all NFC right now. A future patch will add an implementation of moveBeforePreserving that takes action to correctly preserve debug-info, but that's tightly coupled with our non-instruction debug-info representation that's still being reviewed. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D156369
2023-06-05Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)"Nikita Popov1-2/+2
This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7. In preparation for reverting a dependent revision.
2023-05-25[SCEVExpander] Remember phi nodes inserted by LCSSA constructionNikita Popov1-3/+1
SCEVExpander keeps track of all instructions it inserted. However, it currently misses some phi nodes created during LCSSA construction. Fix this by collecting these into another argument. This also removes the IRBuilder argument, which was added for essentially the same purpose, but only handles the root LCSSA nodes, not those inserted by SSAUpdater. This was reported as a regression on D149344, but the reduced test case also reproduces without it. Differential Revision: https://reviews.llvm.org/D150681
2023-05-02[LCSSA] Remove unused ScalarEvolution argument (NFC)Nikita Popov1-2/+2
After D149435, LCSSA formation no longer needs access to ScalarEvolution, so remove the argument from the utilities.
2023-04-17Remove several no longer needed includes. NFCIBjorn Pettersson1-3/+0
Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.
2023-04-16[Scalar] Use range-based for loops (NFC)Kazu Hirata1-2/+1
2023-03-14[Transforms] Use *{Set,Map}::contains (NFC)Kazu Hirata1-2/+1
2023-03-03[LoopInterchange] Remove unused RecurrenceDescriptor object. NFCCraig Topper1-1/+0
2023-02-15[LoopInterchange] Remove legacy pass (unused in the pipeline)Fangrui Song1-46/+0
Following recent changes to remove non-core legacy passes.
2023-01-16[LoopInterchange] Correcting the profitability checkRam-NK1-49/+96
Before D135808, There would be endless loop interchange posibility (no proper priority was there in profitability check. Any profitable check may leads to loop-interchange). With this patch, there is no endless interchange (priority in profitable check is defined. Order of decision is 'Cache cost' check, 'InstrOrderCost', 'Vectorization'). Corrected the dependency checking inside isProfitableForVectorization(), corrected the checking of bad order loops in isProfitablePerInstrOrderCost(). Reviewed By: Meinersbur, bmahjour, #loopoptwg Differential Revision: https://reviews.llvm.org/D135808
2023-01-12[NFC][LoopFlatten][LoopInterchange] Do not explicitly forget subloopsJoshua Cao1-1/+0
We don't need to explicitly forget subloops because forgetting parent loops will automatically forget their subloops Differential Revision: https://reviews.llvm.org/D141029
2022-12-01[NFC] Cleanup: Replaces BB->getInstList().splice() with BB->splice().Vasileios Porpodas1-4/+3
This is part of a series of cleanup patches towards making BasicBlock::getInstList() private. Differential Revision: https://reviews.llvm.org/D138979
2022-11-17[LoopInterchange] Refactor and rewrite validDepInterchange()Mengxuan Cai1-44/+17
The current code of validDepInterchange() enumerates cases that are legal for interchange. This could be simplified by checking lexicographically order of the swapped direction matrix. Reviewed By: congzhe, Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D137461
2022-11-04[LoopInterchange] Check phis in all subloopsCongzhe Cao1-12/+20
This is the bugfix to the miscompile mentioned in https://reviews.llvm.org/D132055#3814831. The IR that reproduced the bug is added as the test case in this patch. What this patch does is that, during legality phase instead of checking the phi nodes only in `InnerLoop` and `OuterLoop`, we check phi nodes in all subloops of the `OuterLoop`. Suppose if the loop nest is triply nested, and `InnerLoop` and `OuterLoop` is the middle loop and the outermost loop respectively, we'll check phi nodes in the innermost loop as well, in addition to the ones in the middle and outermost loops. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D134930
2022-10-04[NFC][LoopInterchange] Clean up of irrelevent dependency checking withRam-NK1-17/+0
isOuterMostDepPositive() The function isOuterMostDepPositive() is checked after negative dependence vectors are normalized to be non-negative, so there will not be any negative dependency ('>' as the outermost non-equal sign) after normalization. And therefore the check in isOuterMostDepPositive() is irrelevent and redundant. Reviewed By: congzhe Differential Revision: https://reviews.llvm.org/D132982
2022-09-22[LoopInterchange][PR57148] Ensure the correct form of IR after transformationCongzhe Cao1-7/+5
This is a bugfix patch that resolves the following two bugs in loop interchange: 1. PR57148 which is an assertion error due to of loss of LCSSA form after interchange, as referred to test1() in pr57148.ll. 2. Use before def for the outermost loop induction variables after interchange, as referred to test2() in pr57148.ll. The fix in this patch is that: 1. In cases where the LCSSA form is not maintained after interchange, we update the IR to the LCSSA form again. 2. We split the phi nodes in the inner loop header into a separate basic block to avoid the situation where use of the outer indvar appears before its def after interchange. Previously we already did this for innermost loops, now we do it for non-innermost loops (e.g., middle loops) as well. Reviewed By: bmahjour, Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D132055