aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Transforms/Scalar
AgeCommit message (Collapse)AuthorFilesLines
24 hours[DropUnnecessaryAssumes] Make the ephemeral value check more precise (#160700)Nikita Popov1-7/+43
The initial implementation used a very crude check where a value was considered ephemeral if it has only one use. This is insufficient if there are multiple assumes acting on the same value, or in more complex cases like cyclic phis. Generalize this to a more typical ephemeral value check, i.e. make sure that all transitive users are in assumes, while stopping at side-effecting instructions.
4 days[LoopFusion] Detecting legal dependencies for fusion using DA info (#146383)Alireza Torabian1-0/+42
Loop fusion pass will use the information provided by the recent DA patch to fuse additional legal loops, including those with forward loop-carried dependencies.
5 days[LoopInterchange] Bail out when finding a dependency with all `*` elements ↵Ryotaro Kasuga1-0/+11
(#149049) If a direction vector with all `*` elements, like `[* * *]`, is present, it indicates that none of the loop pairs are legal to interchange. In such cases, continuing the analysis is meaningless. This patch introduces a check to detect such direction vectors and exits early when one is found. This slightly reduces compile time.
5 days[DropUnnecessaryAssumes] Add support for operand bundles (#160311)Nikita Popov1-12/+60
This extends the DropUnnecessaryAssumes pass to also handle operand bundle assumes. For this purpose, export the affected value analysis for operand bundles from AssumptionCache. If the bundle only affects ephemeral values, drop it. If all bundles on an assume are dropped, drop the whole assume.
6 days[LLVMContext] Add OB_align assume bundle op ID. (#158078)Florian Hahn1-1/+1
Assume operand bundles are emitted in a few more places now, including used in various places in libc++. Add a dedicated ID for them. PR: https://github.com/llvm/llvm-project/pull/158078
6 days[AssumptionCache] Don't use ResultElem for assumption list (NFC) (#160462)Nikita Popov1-2/+2
ResultElem stores a weak handle of an assume, plus an index for referring to a specific operand bundle. This makes sense for the results of assumptionsFor(), which refers to specific operands of assumes. However, assumptions() is a plain list of assumes. It does *not* contain separate entries for each operand bundles. The operand bundle index is always ExprResultIdx. As such, we should be directly using WeakVH for this case, without the additional wrapper.
8 days[InferAlignment] Fix updating alignment when larger than i32 (#160109)Joseph Huber1-1/+2
Summary: The changes made in https://github.com/llvm/llvm-project/pull/156057 allows the alignment value to be increased. We assert effectively infinite alignment when the pointer argument is invalid / null. The problem is that for whatever reason the masked load / store functions use i32 for their alignment value which means this gets truncated to zero. Add a special check for this, long term we probably want to just remove this argument entirely.
10 days[SROA] Use tree-structure merge to remove alloca (#152793)Chengjun1-7/+306
This patch introduces a new optimization in SROA that handles the pattern where multiple non-overlapping vector `store`s completely fill an `alloca`. The current approach to handle this pattern introduces many `.vecexpand` and `.vecblend` instructions, which can dramatically slow down compilation when dealing with large `alloca`s built from many small vector `store`s. For example, consider an `alloca` of type `<128 x float>` filled by 64 `store`s of `<2 x float>` each. The current implementation requires: - 64 `shufflevector`s( `.vecexpand`) - 64 `select`s ( `.vecblend` ) - All operations use masks of size 128 - These operations form a long dependency chain This kind of IR is both difficult to optimize and slow to compile, particularly impacting the `InstCombine` pass. This patch introduces a tree-structured merge approach that significantly reduces the number of operations and improves compilation performance. Key features: - Detects when vector `store`s completely fill an `alloca` without gaps - Ensures no loads occur in the middle of the store sequence - Uses a tree-based approach with `shufflevector`s to merge stored values - Reduces the number of intermediate operations compared to linear merging - Eliminates the long dependency chains that hurt optimization Example transformation: ``` // Before: (stores do not have to be in order) %alloca = alloca <8 x float> store <2 x float> %val0, ptr %alloca ; offset 0-1 store <2 x float> %val2, ptr %alloca+16 ; offset 4-5 store <2 x float> %val1, ptr %alloca+8 ; offset 2-3 store <2 x float> %val3, ptr %alloca+24 ; offset 6-7 %result = load <8 x float>, ptr %alloca // After (tree-structured merge): %shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3> %result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7> ``` Benefits: - Logarithmic depth (O(log n)) instead of linear dependency chains - Fewer total operations for large vectors - Better optimization opportunities for subsequent passes - Significant compilation time improvements for large vector patterns For some large cases, the compile time can be reduced from about 60s to less than 3s. --------- Co-authored-by: chengjunp <chengjunp@nividia.com>
11 days[SeparateConstOffsetFromGEP] Check if non-extracted indices may be negative ↵Fabian Ritter1-6/+7
when preserving inbounds (#159515) If we know that the initial GEP was inbounds, and we change it to a sequence of GEPs from the same base pointer where every offset is non-negative, then the new GEPs are inbounds. So far, the implementation only checked if the extracted offsets are non-negative. In cases where non-extracted offsets can be negative, this would cause the inbounds flag to be wrongly preserved. Fixes an issue in #130617 found by nikic.
11 days[InferAddressSpaces] Mark ConstantAggregateZero as safe to cast to a ↵Wenju He1-1/+1
ConstantExpr addrspacecast (#159695) This PR extends isSafeToCastConstAddrSpace to treat ConstantAggregateZero like ConstantPointerNull. Tests shows an extra addrspacecast instruction is removed and icmp pointer vector operand's address space is now inferred. This change is motivated by inspecting the test in commit f7629f5945f6.
11 days[InferAddressSpaces] Extend undef pointer operand support to phi inst (#159548)Wenju He1-61/+40
Previously undef pointer operand is only supported for select inst, where undef in generic AS behaves like `take the other side`. This PR extends the support to other instructions, e.g. phi inst. Defer joining and inferring constant pointer operand until all other operand AS states considered. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
12 days[DropUnnecessaryAssumes] Add pass for dropping assumes (#159403)Nikita Popov2-0/+63
This adds a new pass for dropping assumes that are unlikely to be useful for further optimization. It works by discarding any assumes whose affected values are one-use (which implies that they are only used by the assume, i.e. ephemeral). This pass currently runs at the start of the module optimization pipeline, that is post-inline and post-link. Before that point, it is more likely for previously "useless" assumes to become useful again, e.g. because an additional user of the value is introduced after inlining + CSE.
12 days[NewPM] Remove BranchProbabilityInfo from FunctionToLoopPassAdaptor. NFCI ↵Luke Lau1-6/+0
(#159516) No loop pass seems to use now it after LoopPredication stopped using it in https://reviews.llvm.org/D111668
12 days[NewPM] Don't preserve BlockFrequencyInfo in FunctionToLoopPassAdaptor (#157888)Luke Lau1-4/+0
Function analyses in LoopStandardAnalysisResults are marked as preserved by the loop pass adaptor, because LoopAnalysisManagerFunctionProxy manually invalidates most of them. However the proxy doesn't invalidate BFI, since it is only preserved on a "lossy" basis: see https://reviews.llvm.org/D86156 and https://reviews.llvm.org/D110438. So any changes to the CFG will result in BFI giving incorrect results, which is fine for loop passes which deal with the lossiness. But the loop pass adapator still marks it as preserved, which causes the lossy result to leak out into function passes. This causes incorrect results when viewed from e.g. LoopVectorizer, where an innermost loop header may be reported to have a smaller frequency than its successors. This fixes this by dropping the call to preserve, and adds a test with the -O1 pipeline which shows the effects whenever the CFG is changed and UseBlockFrequencyInfo is set. I've also dropped it for BranchProbabilityAnalysis too, but I couldn't test for it since UseBranchProbabilityInfo always seems to be false? This may be dead code.
12 days[PatternMatch] Introduce match functor (NFC) (#159386)Ramkumar Ramachandra1-4/+3
A common idiom is the usage of the PatternMatch match function within a functional algorithm like all_of. Introduce a match functor to shorten this idiom. Co-authored-by: Luke Lau <luke@igalia.com>
14 days[LSR] Add an addressing mode that considers all addressing modes (#158110)John Brawn1-16/+14
The way that loops strength reduction works is that the target has to upfront decide whether it wants its addressing to be preindex, postindex, or neither. This choice affects: * Which potential solutions we generate * Whether we consider a pre/post index load/store as costing an AddRec or not. None of these choices are a good fit for either AArch64 or ARM, where both preindex and postindex addressing are typically free: * If we pick None then we count pre/post index addressing as costing one addrec more than is correct so we don't pick them when we should. * If we pick PreIndexed or PostIndexed then we get the correct cost for that addressing type, but still get it wrong for the other and also exclude potential solutions using offset addressing that could have less cost. This patch adds an "all" addressing mode that causes all potential solutions to be generated and counts both pre and postindex as having AddRecCost of zero. Unfortuntely this reveals problems elsewhere in how we calculate the cost of things that need to be fixed before we can make use of it.
2025-09-15[StructurizeCFG] bug fix in zero cost hoist (#157969)Vigneshwar Jayakumar1-15/+19
This fixes a bug where zero cost instruction was hoisted to nearest common dominator but the hoisted instruction's operands didn't dominate the common dominator causing poison values.
2025-09-12[IR] Add `MD_prof` to the `Keep` list of `dropUBImplyingAttrsAndMetadata` ↵Mircea Trofin1-4/+1
(#154635) `MD_prof` is safe to keep when e.g. hoisting instructions. Issue #147390
2025-09-12Revert "[LoopInfo] Pointer to stack object may not be loop invariant in a ↵Weibo He1-4/+4
coroutine function (#149936)" (#157986) Since #156788 has resolved #149604, we can revert this workaround now.
2025-09-10[profcheck] Require `unknown` metadata have an origin parameter (#157594)Mircea Trofin1-1/+1
Rather than passes using `!prof = !{!”unknown”}`​for cases where don’t have enough information to emit profile values, this patch captures the pass (or some other information) that can help diagnostics - i.e. `!{!”unknown”, !”some-pass-name”}`​. For example, suppose we emitted a `select`​ with the unknown metadata, and, later, end up needing to lower that to a conditional branch. If we observe (via sample profiling, for example) that the branch is biased and would have benefitted from a valid profile, the extra information can help speed up debugging. We can also (in a subsequent pass) generate optimization remarks about such lowered selects, with a similar aim - identify patterns lowering to `select`​ that may be worth some extra investment in extracting a more precise profile.
2025-09-10[JumpThreading][DebugInfo] Propagate DebugLocs when simplifying loads (#157683)Stephen Tozer1-1/+10
In simplifyPartiallyRedundantLoad we may replace a load with a PHI of available values in predecessor blocks. As part of this process, we may need to cast those values, which we do by inserting a new cast at the end of the predecessor. These cast instructions should take their debug location from the load instruction, just as the PHI does; we make an exception if the predecessor does not unconditionally branch to the load's block, as in that case we are not guaranteed to reach the load and must therefore drop its debug location. Found using https://github.com/llvm/llvm-project/pull/107279.
2025-09-10[DFAJumpThreading] Prevent pass from using too much memory. (#153193)Bushev Dmitry1-15/+25
The limit 'dfa-max-num-paths' that is used to control number of enumerated paths was not checked against inside getPathsFromStateDefMap. It may lead to large memory consumption for complex enough switch statements. Reland llvm/llvm-project#145482
2025-09-09[AArch64] Enable RT and partial unrolling with reductions for Apple CPUs. ↵Florian Hahn1-0/+2
(#149699) Update unrolling preferences for Apple Silicon CPUs to enable partial unrolling and runtime unrolling for small loops with reductions. This builds on top of unroller changes to introduce parallel reduction phis, if possible: https://github.com/llvm/llvm-project/pull/149470. PR: https://github.com/llvm/llvm-project/pull/149699
2025-09-09[DFAJumpThreading] Fix possible null dereference (#157461)Daniel Kuts1-1/+1
Fixes #157450 --------- Co-authored-by: Nikita Popov <github@npopov.com>
2025-09-05[LoopIdiom] Use HashRecognize to optimize CRC (#143208)Ramkumar Ramachandra1-1/+175
Optimize CRC loops using a Sarwate table-lookup by using the results of HashRecognize in LoopIdiomRecognize. The optimization is checked for correctness using the SingleSource/UnitTests/HashRecognize tests in llvm-test-suite.
2025-09-05[SimpleLoopUnswitch] Adjust cost multiplier accounting for parent loop sizeAntonio Frighetto1-5/+22
When estimating the cost to avoid exponential unswitches of non-trivial invariant conditions, also consider the parent loop basic blocks size, ensuring this does not grow unexpectedly. Fixes: https://github.com/llvm/llvm-project/issues/138509.
2025-09-04[SeparateConstOffsetFromGEP] Highlight that trunc is handled. NFC (#154563)Björn Pettersson1-37/+39
Update code comments and variable/function names to make it more clear that we handle trunc instructions (and not only sext/zext) when extracting constant offsets from a GEP index expressions. This for example renames the vector ExtInsts to CastInsts.
2025-09-03[ConstraintElim] Bail out on non-canonical GEPs (#156688)Yingwei Zheng1-2/+6
In most cases, GEPs should be canonicalized by InstCombine. Bail out on non-canonical forms for simplicity. Fixes https://github.com/llvm/llvm-project/pull/155253#issuecomment-3248457478.
2025-09-02[GVN] Turn off ScalarPRE for TokenLike Types (#156513)Farzon Lotfi1-1/+2
fixes #154407 In HLSL the GVNPass was adding a phi node on a target extention type. https://hlsl.godbolt.org/z/sc14YenEe This is something we cleaned up in a past PR (https://github.com/llvm/llvm-project/pull/154620) by introducing `isTokenLikeTy`. In the case of the GVN pass the target extention type was still making its way through. This change makes it so if we see this type we don't do PRE.
2025-09-02[ConstraintElim] Use constraints from bounded memory accesses (#155253)Yingwei Zheng1-7/+105
This patch removes bound checks that are dominated by bounded memory accesses. For example, if we have an array `int A[5]` and `A[idx]` is performed successfully, we know that `idx u< 5` after the load. compile-time impact (+0.1%): https://llvm-compile-time-tracker.com/compare.php?from=f0e9bba024d44b55d54b02025623ce4a3ba5a37c&to=5227b08a4a514159ec524d1b1ca18ed8ab5407df&stat=instructions%3Au llvm-opt-benchmark: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2709 Proof: https://alive2.llvm.org/ce/z/JEyjA2
2025-09-02[InferAlignment] Increase alignment in masked load / store instrinsics if ↵Joseph Huber1-2/+33
known (#156057) Summary: The masked load / store LLVM intrinsics take an argument for the alignment. If the user is pessimistic about alignment they can provide a value of `1` for an unaligned load. This patch updates infer-alignment to increase the alignment value of the alignment argument if it is known greater than the provided one. Ignoring the gather / scatter versions for now since they contain many pointers.
2025-08-28[StructurizeCFG] nested-if zerocost hoist bugfix (#155408)Vigneshwar Jayakumar1-1/+2
When zero cost instructions are hoisted, the simplifyHoistedPhi function was setting incoming phi values which were not dominating the use causing runtime failure. This was set to poison by rebuildSSA function. This commit fixes the issue.
2025-08-22[LoopDist] Add metadata for checking post process state of distribute… ↵Michael Berg1-0/+11
(#153902) …d loops Add a count of the number of partitions LoopDist made when distributing a loop in meta data, then check for loops which are already distributed to prevent reprocessing. We see this happen on some spec apps, LD is on by default at SiFive.
2025-08-22[llvm] Remove unused includes of SmallSet.h (NFC) (#154893)Kazu Hirata6-6/+0
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing the redirection found in SmallSet.h. With that, we no longer need to include SmallSet.h in many files.
2025-08-22[SeparateConstOffsetFromGEP] Avoid miscompiles related to trunc nuw/nsw ↵Bjorn Pettersson1-0/+7
(#154582) Drop poison generating flags on trunc when distributing trunc over add/sub/or. We need to do this since for example (add (trunc nuw A), (trunc nuw B)) is more poisonous than (trunc nuw (add A, B))). In some situations it is pessimistic to drop the flags. Such as if the add in the example above also has the nuw flag. For now we keep it simple and always drop the flags. Worth mentioning is that we drop the flags when cloning instructions and rebuilding the chain. This is done after the "allowsPreservingNUW" checks in ConstantOffsetExtractor::Extract. So we still take the "trunc nuw" into consideration when determining if nuw can be preserved in the gep (which should be ok since that check also require that all the involved binary operations has nuw). Fixes #154116
2025-08-21[Scalar] Use SmallSetVector instead of SmallVector (NFC) (#154678)Kazu Hirata1-13/+6
insertParsePoints collects live variables and then deduplicate them while retaining the original insertion order, which is exactly what SetVector is designed for. This patch replaces SmallVector with SetSmallVector while deleting unique_unsorted. While we are at it, this patch reduces the number of inline elements to a reasonable level for linear search.
2025-08-21[SeparateConstOffsetFromGEP] propagate const offset through GEP chains (#143470)Alex MacLean1-11/+37
When separating the constant offset from a GEP, if the pointer operand is a constant ptradd (likely generated when we performed this transform on that GEP), we accumulate the offset into the current offset. This ensures that when there is a chain of GEPs the constant offset reaches the final memory instruction where it can likely be folded into the addressing.
2025-08-20[Scalar] Use SmallPtrSet directly instead of SmallSet (NFC) (#154473)Kazu Hirata1-1/+1
I'm trying to remove the redirection in SmallSet.h: template <typename PointeeType, unsigned N> class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N> {}; to make it clear that we are using SmallPtrSet. There are only handful places that rely on this redirection. This patch replaces SmallSet to SmallPtrSet where the element type is a pointer.
2025-08-20[IR] Add utilities for manipulating length of MemIntrinsic [nfc] (#153856)Philip Reames2-6/+3
Goal is simply to reduce direct usage of getLength and setLength so that if we end up moving memset.pattern (whose length is in elements) there are fewer places to audit.
2025-08-19[SCCP] Enable PredicateInfo for non-interprocedural SCCP (#153003)Nikita Popov1-4/+12
SCCP can use PredicateInfo to constrain ranges based on assume and branch conditions. Currently, this is only enabled during IPSCCP. This enables it for SCCP as well, which runs after functions have already been simplified, while IPSCCP runs pre-inline. To a large degree, CVP already handles range-based optimizations, but SCCP is more reliable for the cases it can handle. In particular, SCCP works reliably inside loops, which is something that CVP struggles with due to LVI cycles. I have made various optimizations to make PredicateInfo more efficient, but unfortunately this still has significant compile-time cost (around 0.1-0.2%).
2025-08-19[LICM] Avoid assertion failure on stale MemoryDefNikita Popov1-2/+7
It can happen that the call is originally created as a MemoryDef, and then later transforms show it is actually read-only and could be a MemoryUse -- however, this is not guaranteed to be reflected in MSSA.
2025-08-18Revert "[SimpleLoopUnswitch] Record loops from unswitching non-trivial ↵Antonio Frighetto1-26/+23
conditions" This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to multiple performance regressions observed across downstream Numba benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772). While avoiding non-trivial unswitches on newly-cloned loops helps mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509, it may as well make the IR less friendly to vectorization / loop- canonicalization (in the test reported, previously no select with loop-carried dependence existed in the new specialized loops), leading the abovementioned approach to be reconsidered.
2025-08-18[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)Kazu Hirata8-19/+19
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.
2025-08-16[licm] don't drop `MD_prof` when dropping other metadata (#152420)Mircea Trofin1-2/+6
Part of Issue #147390
2025-08-15[JTS] Use common branch weight downscaling (#153738)Mircea Trofin1-8/+6
This also fixes a bug introduced accidentally in #153651, whereby the `JumpTableToSwitch`​ would convert all the branch weights to 0 except for one. It didn't trip the test because `update_test_checks`​ wasn't run with `-check-globals`​. It is now. This also made noticeable that the direct calls promoted from the indirect call inherited the `VP`​metadata, which should be dropped as it makes no more sense now.
2025-08-14[LoopDist] Consider reads and writes together for runtime checks (#145623)Michael Berg1-2/+4
Emit safety guards for ptr accesses when cross partition loads exist which have a corresponding store to the same address in a different partition. This will emit the necessary ptr checks for these accesses. The test case was obtained from SuperTest, which SiFive runs regularly. We enabled LoopDistribution by default in our downstream compiler, this change was part of that enablement.
2025-08-14Add dependency on `ProfileData` from ScalarOpts (#153651)Mircea Trofin2-3/+9
Fixing buildbot failures after PR #153305, e.g. https://lab.llvm.org/buildbot/#/builders/203/builds/19861 Analysis already depends on `ProfileData`, so the transitive closure of the dependencies of `ScalarOpts` doesn't change. Also avoided an extra dependency (and very unnecessary) on `Instrumentation`. The API previously used doesn't need to live in Instrumentation to begin with, but that's something to address in a follow-up.
2025-08-14[NFC] Use `[[maybe_unused]]` for variable used in assertion (#153639)Mircea Trofin1-2/+1
2025-08-14[JTS] Propagate profile info (#153305)Mircea Trofin1-5/+69
If the indirect call target being recognized as a jump table has profile info, we can accurately synthesize the branch weights of the switch that replaces the indirect call. Otherwise we insert the "unknown" `MD_prof` to indicate this is the best we can do here. Part of Issue #147390
2025-08-14[llvm][LICM] Limit multi-use BOAssociation to FP and Vector (#149829)Theodoros Theodoridis1-1/+1
Limit the re-association of BOps with multiple users to FP and Vector arithmetic.