aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Analysis/LoopAccessAnalysis.cpp
AgeCommit message (Collapse)AuthorFilesLines
2024-07-06[LAA] Only invalidate loops that require runtime checks (NFCI).Florian Hahn1-0/+16
LAA doesn't keep references to IR outside the loop or references to SCEVs that may be invalidated, unless runtime checks are needed (either memory or SCEV predicates). For the current LAA users, it should be sufficient to invalidate entries for loops that require runtime checks, thus avoiding analyzing loops again unnecessarily. This helps reduce compile-time, in particular when removing the restrictions added in 234cc40adc6. https://llvm-compile-time-tracker.com/compare.php?from=73894dba2cdbcc00678d0c13a6b61765675f60b4&to=05c6bdc41b5f63696ebeb7116325725fa94f66d6&stat=instructions:u
2024-07-04[LAA] Cache pointer bounds expansions (NFCI).Florian Hahn1-9/+16
This avoids expanding the same bounds multiple times, which helps reduce the compile-time impact of removing the restrictions added in 234cc40adc6, notably -0.06% on stage1-O3 and -0.05% on both stage1-ReleaseThinLTO and stage1-ReleaseLTO-g. https://llvm-compile-time-tracker.com/compare.php?from=8b9ebc4bb86cf0979e05908cbb04336f2d01dda5&to=fabd36f96c31e47ea72653f5a404feaadfc7b5b5&stat=instructions:u
2024-06-27[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)Nikita Popov1-7/+7
This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.
2024-06-25[Analysis] Use range-based for loops (NFC) (#96587)Kazu Hirata1-10/+7
2024-06-25LoopInfo: introduce Loop::getLocStr; unify debug output (#93051)Ramkumar Ramachandra1-3/+5
Introduce a Loop::getLocStr stolen from LoopVectorize's static function getDebugLocString in order to have uniform debug output headers across LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation for this change is to have UpdateTestChecks recognize the headers and automatically generate CHECK lines for debug output, with minimal special-casing.
2024-06-24LAA: strip unnecessary getUniqueCastUse (#92119)Ramkumar Ramachandra1-28/+6
733b8b2 ([LAA] Simplify identification of speculatable strides [nfc]) refactored getStrideFromPointer() to compute directly on SCEVs, and return an SCEV expression instead of a Value. However, it left behind a call to getUniqueCastUse(), which is completely unnecessary. Remove this, showing a positive test update, and simplify the surrounding program logic.
2024-06-11LAA: refactor analyzeLoop to return bool (NFC) (#93824)Ramkumar Ramachandra1-33/+25
Avoid wastefully setting CanVecMem in several places in analyzeLoop, complicating the logic, to get the function to return a bool, and set CanVecMem in the caller.
2024-06-04[LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (#93499)Florian Hahn1-30/+33
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns the minimum of the countable exits. When analyzing dependences and computing runtime checks, we need the smallest upper bound on the number of iterations. In terms of memory safety, it shouldn't matter if any uncomputable exits leave the loop, as long as we prove that there are no dependences given the minimum of the countable exits. The same should apply also for generating runtime checks. Note that this shifts the responsiblity of checking whether all exit counts are computable or handling early-exits to the users of LAA. Depends on https://github.com/llvm/llvm-project/pull/93498 PR: https://github.com/llvm/llvm-project/pull/93499
2024-05-29[LAA] Move getDependenceDistanceStrideAndSize to MemoryDepChecker (NFC).Florian Hahn1-33/+10
This avoids unnecessarily passing a number of parameters, and avoids needing to add extra parameters in the future.
2024-05-29[LAA] Store reference to SymbolicStrides in MemoryDepChecker (NFC).Florian Hahn1-12/+10
This reduces the need for explicitly passing it through multiple layers of function calls.
2024-05-28[LAA] Limit no-overlap check to at least one loop-invariant accesses.Florian Hahn1-14/+19
Limit the logic added in https://github.com/llvm/llvm-project/pull/9230 to cases where either sink or source are loop-invariant, to avoid compile-time increases. This is not needed for correctness. I am working on follow-up changes to reduce the compile-time impact in general to allow us to enable this again for any source/sink. This should fix the compile-time regression introduced by this change: * compile-time improvement with this change: https://llvm-compile-time-tracker.com/compare.php?from=4351787fb650da6d1bfb8d6e58753c90dcd4c418&to=b89010a2eb5f98494787c1c3b77f25208c59090c&stat=instructions:u * compile-time improvement with original patch reverted on top of this change: https://llvm-compile-time-tracker.com/compare.php?from=b89010a2eb5f98494787c1c3b77f25208c59090c&to=19a1103fe68115cfd7d6472c6961f4fabe81a593&stat=instructions:u
2024-05-23[LAA] refactor program logic (NFC) (#92101)Ramkumar Ramachandra1-54/+45
Implement NFC improvements spotted during a cursory reading of LoopAccessAnalysis.
2024-05-21[LAA] Check accesses don't overlap early to determine NoDep (#92307)Florian Hahn1-4/+24
Use getStartAndEndForAccess to compute the start and end of both src and sink (factored out to helper in bce3680f45b57f). If they do not overlap (i.e. SrcEnd <= SinkStart || SinkEnd <= SrcStart), there is no dependence, regardless of stride. PR: https://github.com/llvm/llvm-project/pull/92307
2024-05-20[LAA] Move logic to compute start and end of a pointer to helper (NFC).Florian Hahn1-6/+16
This allows use at other places, in particular an updated version of https://github.com/llvm/llvm-project/pull/92307.
2024-05-17[LAA] refactor sortPtrAccesses (NFC) (#92256)Ramkumar Ramachandra1-12/+7
Use the destructuring syntax in C++ and llvm::enumerate to make sortPtrAccesses a little more readable.
2024-05-14[LAA] Delay applying loop guards until after isSafeDependenceDistance.Florian Hahn1-5/+3
Applying the loop guards to the distance may prevent isSafeDependenceDistance from determining NoDep, unless loop guards are also applied to the backedge-taken-count. Instead of applying the guards to both Dist and the backedge-taken-count, just apply them after handling isSafeDependenceDistance and constant distances; there is no benefit to applying the guards before then. This fixes a regression flagged by @bjope due to ecae3ed958481cba7d60868cf3504292f7f4fdf5.
2024-05-14[LAA] refactor tryToCreateDiffCheck (NFC) (#92110)Ramkumar Ramachandra1-35/+20
tryToCreateDiffCheck has one caller, and exits early if CanUseDiffCheck is false. Hence, we can get/set CanUseDiffCheck in the caller to avoid wastefully calling tryToCreateDiffCheck. This patch is an NFC simplification of program logic.
2024-05-10[LAA] Support backward dependences with non-constant distance. (#91525)Florian Hahn1-26/+67
Following up to 933f49248, also update the code reasoning about backwards dependences to support non-constant distances. Update the code to use the signed minimum distance instead of a constant distance This means e checked the lower bound of the dependence distance and the distance may be larger at runtime (and safe for vectorization). Whether to classify it as Unknown or Backwards depends on the vector width and LAA was updated to take TTI to get the maximum vector register width. If the minimum dependence distance is larger than the max vector width, we consider it as backwards-vectorizable. Otherwise we classify them as Unknown, so we re-try with runtime checks. PR: https://github.com/llvm/llvm-project/pull/91525
2024-05-09[LAA] Apply loop guards to dependence distance.Florian Hahn1-0/+3
After supporting non-constant dependence distances in 933f49248bf, applying information from loop guards can help further disambiguate dependencies.
2024-05-05[LAA] Directly pass DepChecker to getSource/getDestination (NFC).Florian Hahn1-2/+2
Instead of passing LoopAccessInfo only to fetch the MemoryDepChecker, directly pass MemoryDepChecker. This simplifies the code and also allows new uses in places where no LAI is available.
2024-05-04[LV,LAA] Don't vectorize loops with load and store to invar address.Florian Hahn1-5/+9
Code checking stores to invariant addresses and reductions made an incorrect assumption that the case of both a load & store to the same invariant address does not need to be handled. In some cases when vectorizing with runtime checks, there may be dependences with a load and store to the same address, storing a reduction value. Update LAA to separately track if there was a store-store and a load-store dependence with an invariant addresses. Bail out early if there as a load-store dependence with invariant address. If there was a store-store one, still apply the logic checking if they all store a reduction.
2024-04-30[LAA] Pass maximum stride to isSafeDependenceDistance. (#90036)Florian Hahn1-15/+16
As discussed in https://github.com/llvm/llvm-project/pull/88039, support different strides with isSafeDependenceDistance by passing the maximum of both strides. isSafeDependenceDistance tries to prove that |Dist| > BackedgeTakenCount * Step holds. Chosing the maximum stride computes the maximum range accesed by the loop for all strides. PR: https://github.com/llvm/llvm-project/pull/90036
2024-04-25[LAA] Support different strides & non constant dep distances using SCEV. ↵Florian Hahn1-46/+91
(#88039) Extend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence. In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride. We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior. Fixes https://github.com/llvm/llvm-project/issues/87336 PR: https://github.com/llvm/llvm-project/pull/88039
2024-04-22[LAA] Document reasoning in multiple places in isDependent (NFC). (#89381)Florian Hahn1-4/+13
As suggested in https://github.com/llvm/llvm-project/pull/88039, add extra documentation for reasoning in isDependent. PR: https://github.com/llvm/llvm-project/pull/89381
2024-04-10[LAA] Replace std::tuple with struct (NFCI).Florian Hahn1-5/+22
As suggested in https://github.com/llvm/llvm-project/pull/88039, replace the tuple with a struct, to make it easier to extend.
2024-03-12[LAA] Fix typo IndidrectUnsafe -> IndirectUnsafe.Florian Hahn1-1/+1
Fix type in textual analysis output.
2024-01-24[LAA] Drop alias scope metadata that is not valid across iterations (#79161)Nikita Popov1-7/+46
LAA currently adds memory locations with their original AATags to AST. However, scoped alias AATags may be valid only within one loop iteration, while LAA reasons across iterations. Fix this by determining which alias scopes are defined inside the loop, and drop AATags that reference these scopes. Fixes https://github.com/llvm/llvm-project/issues/79137.
2024-01-17[AST] Don't merge memory locations in AliasSetTracker (#65731)Bruno De Fraine1-10/+13
This changes the AliasSetTracker to track memory locations instead of pointers in its alias sets. The motivation for this is outlined in an RFC posted on LLVM discourse: https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336 In the data structures of the AST implementation, I made the choice to replace the linked list of `PointerRec` entries (that had to go anyway) with a simple flat vector of `MemoryLocation` objects, but for the `AliasSet` objects referenced from a lookup table, I retained the mechanism of a linked list, reference counting, forwarding, etc. The data structures could be revised in a follow-up change.
2024-01-04[IR] Fix GEP offset computations for vector GEPs (#75448)Jannik Silvanus1-1/+4
Vectors are always bit-packed and don't respect the elements' alignment requirements. This is different from arrays. This means offsets of vector GEPs need to be computed differently than offsets of array GEPs. This PR fixes many places that rely on an incorrect pattern that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`. We replace these by usages of `GTI.getSequentialElementStride(DL)`, which is a new helper function added in this PR. This changes behavior for GEPs into vectors with element types for which the (bit) size and alloc size is different. This includes two cases: * Types with a bit size that is not a multiple of a byte, e.g. i1. GEPs into such vectors are questionable to begin with, as some elements are not even addressable. * Overaligned types, e.g. i16 with 32-bit alignment. Existing tests are unaffected, but a miscompilation of a new test is fixed. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2023-12-18[LoopVectorize] Enable hoisting of runtime checks by default (#71538)David Sherwood1-1/+1
With commit https://reviews.llvm.org/D152366 I introduced functionality that permitted the hoisting of runtime memory checks from a vectorised inner loop to the preheader of the next outer-most loop. This is useful for benchmarks like SPEC2017's x264 where the inner loop is vectorised and only has a small trip count. In such cases the runtime memory checks become expensive and since the checks never fail in the case of x264 it makes sense to do this. However, this behaviour was controlled by the flag -hoist-runtime-checks which was off by default. This patch enables this flag by default for all targets, since I believe this is a generally beneficial thing to do. I have tested this with SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1 and neoverse-n1, respectively. Similarly, I saw slight improvements in the overall geomean on both machines. The only other notable changes were a 1% drop in the roms benchmark, which was compensated for by a 1% improvement in fotonik3d.
2023-12-13[AST] Switch to MemoryLocation in add method (NFC)Bruno De Fraine1-2/+2
Pass MemoryLocation as one argument, instead of passing all its parts separately.
2023-12-12[LoopVectorize] Improve algorithm for hoisting runtime checks (#73515)David Sherwood1-1/+6
When attempting to hoist runtime checks out of a loop we currently avoid creating pointer diff checks and prefer to do expanded range checks instead. This gives us the opportunity to hoist runtime checks out of a loop, since these checks are loop invariant. However, in some cases the pointer diff checks would also be loop invariant and so will naturally get hoisted. Therefore, since diff checks are cheaper so we should prefer to use those instead.
2023-12-05[LAA] Fix incorrect dependency classification. (#70819)Alexandros Lamprineas1-6/+3
As shown in #70473, the following loop was not considered safe to vectorize. When determining the memory access dependencies in a loop which has negative iteration step, we invert the source and sink of the dependence. Perhaps we should just invert the operands to getMinusSCEV(). This way the dependency is not regarded to be true, since the users of the `IsWrite` variables, which correspond to each of the memory accesses, rely on program order and therefore should not be swapped. void vectorizable_Read_Write(int *A) { for (unsigned i = 1022; i >= 0; i--) A[i+1] = A[i] + 1; }
2023-11-27[LAA] Check HasSameSize before couldPreventStoreLoadForward.Florian Hahn1-2/+2
After 9645267, TypeByteSize is 0 if both access do not have the same size (i.e. HasSameSize will be false). This can cause an infinite loop in couldPreventStoreLoadForward, if HasSameSize is not checked first. So check HasSameSize first instead of after couldPreventStoreLoadForward. Checking HasSameSize first is also cheaper.
2023-11-23[LAA] Factor out logic to compute dependence distance. (NFCI)Florian Hahn1-25/+59
This patch refactors the logic to compute the dependence distance, stride, size and write info to a separate function. This limits the scope of various A* and B* variables, which in turn makes it easier to reason about their uses. In particular this makes it more explicit why dropping the various std::swaps as done in https://github.com/llvm/llvm-project/pull/70819/ is valid.
2023-11-23Revert "mp"Florian Hahn1-81/+61
Commit was pushed by accident. This reverts commit 0e15f245557000038f27f2dc8926a65bf3d4ccaf.
2023-11-23mpFlorian Hahn1-61/+81
2023-11-15[LAA] Check if dependencies access loop-varying underlying objects.Florian Hahn1-14/+57
This patch adds a new dependence kind UnsafeIndirect, for cases where at least one of the memory access instructions may access a loop varying object, e.g. the address of underlying object is loaded inside the loop, like A[B[i]]. We cannot determine direction or distance in those cases, and also are unable to generate any runtime checks. This fixes a miscompile, if we attempt to generate runtime checks for unknown dependencies. Note that in most cases we do not attempt to generate runtime checks for unknown dependences, except if FoundNonConstantDistanceDependence is true. Fixes https://github.com/llvm/llvm-project/issues/69744.
2023-10-28[LoopDist] Update the pragma info of loop distribute, NFC (#69825)Allen1-1/+1
Base on D19403, the exact pragma of distribute is `#pragma clang loop distribute`
2023-10-22[llvm] Stop including llvm/ADT/iterator_range.h (NFC)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2023-10-22[llvm] Stop including llvm/ADT/DepthFirstIterator.h (NFC)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2023-09-19[LAA] Analyze pointers forked by a phi (#65834)Allen1-0/+16
Given a function like the following: https://godbolt.org/z/T9c99fr88 ```c 1161_noReadWrite(int *Preds) { for (int i = 0; i < LEN_1D-1; ++i) { if (Preds[i] != 0) b[i] = c[i] + 1; else a[i] = i * i; } } ``` LLVM will optimize the IR to a single store by a phi instruction: ```llvm %1 = load ptr, ptr @a, align 64 %2 = load ptr, ptr @b, align 64 ... for.inc: %.sink = phi ptr [ %1, %if.then ], [ %2, %if.else ] %add.sink = phi double [ %add, %if.then ], [ %conv8, %if.else ] %arrayidx7 = getelementptr inbounds double, ptr %.sink, i64 %indvars.iv store double %add.sink, ptr %arrayidx7, align 8 ``` LAA is currently unable to analyze such IR, since ScalarEvolution will return a SCEVUnknown for the forked pointer operand of the store. This patch adds initial optional support for analyzing both possibilities for the pointer and allowing LAA to generate runtime checks for the bounds if required, refers to D108699, but here address the phi node. Fixes https://github.com/llvm/llvm-project/issues/64888 Reviewed By: huntergr-arm, fhahn Differential Revision: https://reviews.llvm.org/D158965
2023-09-16[LAA] Improve the output remark for LoopVectorize (#65832)Allen1-5/+17
Don't report 'Use #pragma loop distribute(enable) to allow loop distribution' when we already add #pragma clang loop distribute(enable) Fixes https://github.com/llvm/llvm-project/issues/64637
2023-08-24[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer ↵David Sherwood1-0/+30
loop Suppose we have a nested loop like this: void foo(int32_t *dst, int32_t *src, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { dst[(i * n) + j] += src[(i * n) + j]; } } } We currently generate runtime memory checks as a precondition for entering the vectorised version of the inner loop. However, if the runtime-determined trip count for the inner loop is quite small then the cost of these checks becomes quite expensive. This patch attempts to mitigate these costs by adding a new option to expand the memory ranges being checked to include the outer loop as well. This leads to runtime checks that can then be hoisted above the outer loop. For example, rather than looking for a conflict between the memory ranges: 1. &dst[(i * n)] -> &dst[(i * n) + n] 2. &src[(i * n)] -> &src[(i * n) + n] we can instead look at the expanded ranges: 1. &dst[0] -> &dst[((m - 1) * n) + n] 2. &src[0] -> &src[((m - 1) * n) + n] which are outer-loop-invariant. As with many optimisations there is a trade-off here, because there is a danger that using the expanded ranges we may never enter the vectorised inner loop, whereas with the smaller ranges we might enter at least once. I have added a HoistRuntimeChecks option that is turned off by default, but can be enabled for workloads where we know this is guaranteed to be of real benefit. In future, we can also use PGO to determine if this is worthwhile by using the inner loop trip count information. When enabling this option for SPEC2017 on neoverse-v1 with the flags "-Ofast -mcpu=native -flto" I see an overall geomean improvement of ~0.5%: SPEC2017 results (+ is an improvement, - is a regression): 520.omnetpp: +2% 525.x264: +2% 557.xz: +1.2% ... GEOMEAN: +0.5% I didn't investigate all the differences to see if they are genuine or noise, but I know the x264 improvement is real because it has some hot nested loops with low trip counts where I can see this hoisting is beneficial. Tests have been added here: Transforms/LoopVectorize/runtime-checks-hoist.ll Differential Revision: https://reviews.llvm.org/D152366
2023-08-16[Analysis] Fix an unused variable warningKazu Hirata1-0/+1
This patch fixes: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2001:12: error: unused variable 'MinDepDistBytesOld' [-Werror,-Wunused-variable]
2023-08-16[LAA] Rename and fix semantics of MaxSafeDepDistBytes to MinDepDistBytesMichael Maitland1-14/+27
`MaxSafeDepDistBytes` was not correct based on its name an semantics in instances when there was a non-unit stride loop. For example, ``` for (int k = 0; k < len; k+=3) { a[k] = a[k+4]; a[k+2] = a[k+6]; } ``` Here, the smallest dependence distance is 24 bytes, but only vectorizing 8 bytes is safe. `MaxSafeVectorWidthInBits` reported the correct number of bits that could be vectorized as 64 bits. The semantics of of `MaxSafeDepDistBytes` should be: The smallest dependence distance in bytes in the loop. This may not be the same as the maximum number of bytes that are safe to operate on simultaneously. The name of this variable should reflect those semantics and its docstring should be updated accordingly, `MinDepDistBytes`. A debug message that used `MaxSafeDepDistBytes` to signify to the user how many bytes could be accessed in parallel is updated to use `MaxSafeVectorWidthInBits` instead. That way, the same message if communicated to the user, just in different units. This patch makes sure that when `MinDepDistBytes` is modified in a way that should impact `MaxSafeVectorWidthInBits`, that we update the latter accordingly. This patch also clarifies why `MaxSafeVectorWidthInBits` does not to be updated when `MinDepDistBytes` is (i.e. in the case of a forward dependency). Differential Revision: https://reviews.llvm.org/D156158
2023-08-13[LegacyPM] Drop unused includes in passes no longer supporting legacy PMBjorn Pettersson1-2/+0
2023-07-31[IR] Mark `llvm.assume` as `memory(inaccessiblemem: write)`Johannes Doerfert1-7/+7
It was `inaccessiblemem: readwrite` before, no need for the read. No real benefit is expected but it can help debugging and other efforts. Differential Revision: https://reviews.llvm.org/D156478
2023-07-24[LAA] Add assertion to check both Start and End are invariant (NFC).Florian Hahn1-0/+3
Add extra assert to check invariant of RuntimePointerChecking::insert to guard against subtle changes when extending the scope of LAA.
2023-07-18[llvm] Remove some uses of isOpaqueOrPointeeTypeEquals() (NFC)Nikita Popov1-4/+0