riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-13	[IRSim] Avoid repeated hash lookups (NFC) (#108483)	Kazu Hirata	1	-10/+2

2024-09-13	[ValueTracking] Infer is-power-of-2 from dominating conditions (#107994)	Yingwei Zheng	1	-9/+28
	Addresses downstream rustc issue: https://github.com/rust-lang/rust/issues/129795
2024-09-10	[ValueTracking] Infer is-power-of-2 from assumptions. (#107745)	Yingwei Zheng	1	-3/+36
	This patch tries to infer is-power-of-2 from assumptions. I don't see that this kind of assumption exists in my dataset. Related issue: https://github.com/rust-lang/rust/issues/129795 Close https://github.com/llvm/llvm-project/issues/58996.
2024-09-07	[ThinLTO] Shrink FunctionSummary by 8 bytes (#107706)	Kazu Hirata	1	-2/+2
	During the ThinLTO indexing step for one of our large applications, we create 4 million instances of FunctionSummary. Changing: std::vector<EdgeTy> CallGraphEdgeList; to: SmallVector<EdgeTy, 0> CallGraphEdgeList; in FunctionSummary reduces the size of each instance by 8 bytes. The rest of the patch makes the same change to other places so that the types stay compatible across function boundaries.
2024-09-06	[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding ↵	Mingming Liu	1	-4/+4
	synthetic count passes. (#107471) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of https://github.com/llvm/llvm-project/commit/64498c54831bed9cf069e0923b9b73678c6451d8). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version
2024-09-06	[ctx_prof] Handle case when no root is in this Module. (#107463)	Mircea Trofin	1	-4/+17
	If none of the functions in this `Module` are roots in the contextual profile, we can't use it and should just return the `{}` case.
2024-09-06	[ThinLTO] Shrink GlobalValueSummary by 8 bytes (#107342)	Kazu Hirata	1	-17/+19
	During the ThinLTO indexing step for one of our large applications, we create 7.5 million instances of GlobalValueSummary. Changing: std::vector<ValueInfo> RefEdgeList; to: SmallVector<ValueInfo, 0> RefEdgeList; in GlobalValueSummary reduces the size of each instance by 8 bytes. The rest of the patch makes the same change to other places so that the types stay compatible across function boundaries.
2024-09-06	[IRSim] Avoid repeated hash lookups (NFC) (#107510)	Kazu Hirata	1	-10/+6

2024-09-06	[TBAA] Fix the case where a subobject gets accessed at a non-zero offset. ↵	Ivan Kosarev	1	-4/+5
	(#101485)
2024-09-05	[NFC] Rename the `Nr` abbreviation to `Num` (#107151)	Mircea Trofin	2	-5/+5
	It's more clear. (This isn't exhaustive).
2024-09-05	[SCEV] BECount to zero if `((-C + (C smax %x)) /u %x), C > 0` holds	Antonio Frighetto	1	-0/+16
	The SCEV expression `((-C + (C smax %x)) /u %x)` can be folded to zero for any positive constant C. Proof: https://alive2.llvm.org/ce/z/_dLm8C.
2024-09-05	[ConstantRange] Perform increment on APInt (NFC)	Nikita Popov	1	-1/+1
	This handles the edge case where BitWidth is 1 and doing the increment gets a value that's not valid in that width, while we just want wrap-around. Split out of https://github.com/llvm/llvm-project/pull/80309.
2024-09-04	Consolidate all IR logic for getting the identity value of a reduction [nfc]	Philip Reames	1	-46/+0
	This change merges the three different places (at the IR layer) for finding the identity value of a reduction into a single copy. This depends on several prior commits which fix ommissions and bugs in the distinct copies, but this patch itself should be fully non-functional. As the new comments and naming try to make clear, the identity value is a property of the @llvm.vector.reduce.* intrinsic, not of e.g. the recurrence descriptor. (We still provide an interface for clients using recurrence descriptors, but the implementation simply translates to the intrinsic which each corresponds to.) As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum or fminimum/fmaximum which is why we still need manual logic (but at least only one copy of manual logic) for those cases.
2024-09-04	IVDescriptors: improve readability of a function (NFC) (#106219)	Ramkumar Ramachandra	1	-7/+6
	Avoid dereferencing operand to llvm::isa.
2024-09-04	[Lint] Skip null args when checking noalias	Nikita Popov	1	-1/+2
	Do not emit a warning if there are two null noalias arguments, as they cannot be dereferenced anyway. This is a common pattern for `@.omp_outlined`, which has some optional noalias arguments.
2024-09-04	[Lint] Fix another scalable vector crash	Nikita Popov	1	-1/+2
	We also need to check that the memory access LocationSize is not scalable.
2024-09-04	[Lint] Fix crash for insert/extract on scalable vector	Nikita Popov	1	-8/+9
	Don't assume the vector is fixed size. For scalable vectors, do not report an error, as indices outside the minimum range may be valid.
2024-09-04	[Lint] Fix crash with scalable alloca	Nikita Popov	1	-2/+2

2024-09-03	[ctx_prof] Add Inlining support (#106154)	Mircea Trofin	1	-5/+24
	Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant. Post-inlining, the update mainly consists of: - making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions) - in the contextual profile: - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual. - the contexts of the callee (at the inlined callsite) are moved to the caller. - the callee context at the inlined callsite is deleted.
2024-09-03	[LV] Prefer FLT_MIN/MAX for fmin/fmax reductions with ninf (#107141)	Philip Reames	1	-8/+9
	Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case where the vectorizer is emitting a non-canonical identity value given the available flags. We use largest/smallest value during ISEL, and VP expansion, but not during vectorization. Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start value, this difference is only visible when masking of inactive lanes is required. Primary motivation of this change is simply to remove a difference between version of code which reason about the identity value of a reduction so I can kill all but one off. In review, it was pointed out that this is actually a functional fix as well. The old code used inf on a noinf reduction instruction - whose result is poison! That wasn't the intent of the code.
2024-09-03	[LV] Separate AnyOf recurrence from getRecurrenceIdentity [NFC]	Philip Reames	1	-3/+2
	These recurrence types don't have a meaningful identity, and the routine was abused to return the start value instead. Out of the three callers to this routine, only one actually wants this behavior. This is a prep change for removing the routine entirely and commoning it with other copies of the same logic.
2024-09-03	[Analysis] getIntrinsicForCallSite - add vectorization support for ↵	Simon Pilgrim	1	-0/+24
	acos/asin/atan and cosh/sinh/tanh libcalls (#106844) Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents
2024-09-02	[Analysis] Add getPredicatedExitCount to ScalarEvolution (#105649)	David Sherwood	1	-26/+60
	Due to a reviewer request on PR #88385 I have created this patch to add a getPredicatedExitCount function, which is similar to getExitCount except that it uses the predicated backedge taken information. With PR #88385 we will start to care about more loops with multiple exits, and want the ability to query exit counts for a particular exiting block. Such loops may require predicates in order to be vectorised. New tests added here: Analysis/ScalarEvolution/predicated-exit-count.ll
2024-09-02	[BasicAA] Track nuw through decomposed expressions (#106512)	Nikita Popov	1	-10/+21
	When we decompose the GEP offset expression, and the arithmetic is not performed using nuw operations, we cannot retain the nuw flag on the decomposed GEP. For example, if we have `gep nuw p, (a-1)`, this is not at all the same as `gep nuw (gep nuw p, a), -1`. Fix this by tracking NUW through linear expression decomposition, similarly to what we already do for the NSW flag. This fixes the miscompilation reported in https://github.com/llvm/llvm-project/pull/105496#issuecomment-2315322220.
2024-09-02	[SLP] Add vectorization support for [u\|s]cmp (#106747)	Yingwei Zheng	1	-0/+4
	This patch adds vectorization support for [u\|s]cmp intrinsic calls.
2024-08-31	[DXIL][Analysis] Collect Function properties in Metadata Analysis (#105728)	S. Bharadwaj Yadavalli	1	-1/+43
	Basic infrastructure to collect Function properties in Metadata Analysis - Add a `SmallVector` of entry properties to the metadata information. - Add a structure to represent function properties. Currently `numthreads` and shader kind properties of shader entry functions are represented.
2024-08-30	[PtrUseVisitor] Allow using Argument as a starting point (#106308)	Artem Belevich	1	-1/+1
	Argument is another possible starting point for the pointer traversal, and PtrUseVisitor should be able to handle it.
2024-08-30	[IVDesc] Reuse getBinOpIdentity in getRecurrenceIdentity [nfc]	Philip Reames	1	-18/+3
	Avoid duplication so that we can easily tell these lists are in sync.
2024-08-30	[Analysis] isTriviallyVectorizable - add vectorization support for ↵	Simon Pilgrim	1	-0/+6
	acos/asin/atan and cosh/sinh/tanh intrinsics (#106584) Show fallback cases in amdlibm tests where it doesn't have that specific op
2024-08-30	[ValueTracking] use KnownBits to compute fpclass from bitcast (#97762)	Alex MacLean	1	-0/+55
	When we encounter a bitcast from an integer type we can use the information from `KnownBits` to glean some information about the fpclass: - If the sign bit is known, we can transfer this information over. - If the float is IEEE format and enough of the bits are known, we may be able to prove or rule out some fpclasses such as NaN, Zero, or Inf.
2024-08-29	Reapply "[nfc][mlgo] Incrementally update DominatorTreeAnalysis in ↵	Mircea Trofin	2	-3/+54
	FunctionPropertiesAnalysis (#104867) (#106309) Reverts c992690179eb5de6efe47d5c8f3a23f2302723f2. The problem is that if there is a sequence "{delete A->B} {delete A->B} {insert A->B}" the net result is "{delete A->B}", which is not what we want. Duplicate successors may happen in cases like switch statements (as shown in the unit test). The second problem was that in `invoke` cases, some edges we speculate may get deleted don't, but are also not reachable from the inlined call site's basic block. We just need to check which edges are actually not present anymore. The fix is to sanitize the list of deletes, just like we do for inserts.
2024-08-29	Revert "[Analysis] Guard logf128 cst folding"	Thomas Preud'homme	1	-2/+4
	This reverts commit 42d3cccffd203ff6dc967d4243588ca466c0faf7 which caused a test failure.
2024-08-29	[Analysis] Guard logf128 cst folding (#106543)	Thomas Preud'homme	1	-4/+2
	LLVM has a CMake variable to control whether to consider logf128 constant folding which libAnalysis ignores. This patch changes the logf128 check to rely on the global LLVM_HAS_LOGF128 setting made in config-ix.cmake.
2024-08-28	[ValueLattice] Move intersect from LVI into ValueLattice API (NFC)	Nikita Popov	2	-69/+74
	So we can reuse the logic inside IPSCCP.
2024-08-28	[LoopUnrollAnalyzer] Don't simplify signed pointer comparison	Nikita Popov	1	-1/+6
	We're generally not able to simplify signed pointer comparisons (because we don't have no-wrap flags that would permit it), so we shouldn't pretend that we can in the cost model. The unsigned comparison case is also not modelled correctly, as explained in the added comment. As this is a cost model inaccuracy at worst, I'm leaving it alone for now.
2024-08-28	[LoopUnrollAnalyzer] Use constant folding API for loads	Nikita Popov	1	-30/+6
	Use ConstantFoldLoadFromConst() instead of a partial re-implementation. This makes the code slightly more generic by not depending on the exact structure of the constant.
2024-08-27	[ctx_prof] Add support for ICP (#105469)	Mircea Trofin	1	-29/+50
	An overload of `llvm::promoteCallWithIfThenElse` that updates the contextual profile. High-level, this is very simple: after creating the `if... then (direct call) else (indirect call)` structure, we instrument the new callsites and BBs (the instrumentation will help with tracking for other IPO transformations, and, ultimately, to match counter values before flattening to `MD_prof`). In more detail: - move the callsite instrumentation of the indirect call to the `else` BB, before the indirect call - create a new callsite instrumentation for the direct call - create instrumentation for both the `then` and `else` BBs - we could instrument just one (MST-style) but we're not running the binary with this instrumentation, and at most this would save some space (less counters tracked). For simplicity instrumenting both at this point - update each context belonging to the caller by updating the counters, and moving the indirect callee to the new, direct callsite ID Issue #89287
2024-08-27	IVDescriptors: clarify getSCEV use in a function (NFC) (#106222)	Ramkumar Ramachandra	1	-2/+2
	getSCEV will assert unless the operand is SCEVable. Replace an instance of the implementation of ScalarEvolution::isSCEVable (which checks that the operand is either integer or pointer type) with a call to the function, to make it clear that the subsequent use of getSCEV will not fail.
2024-08-27	[LoopUnrollAnalyzer] Fix icmp simplification	Nikita Popov	1	-3/+6
	Fix a bug I introduced in 721fdf1c9a73269280a504cbba847f4979512b66.
2024-08-27	Revert "[nfc][mlgo] Incrementally update DominatorTreeAnalysis in ↵	Hans Wennborg	2	-41/+3
	FunctionPropertiesAnalysis (#104867)" This seems to cause asserts in our builds: llvm/include/llvm/Support/GenericDomTreeConstruction.h:927: static void llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<BasicBlock, false>>::DeleteEdge(DomTreeT &, const BatchUpdatePtr, const NodePtr, const NodePtr) [DomTreeT = llvm::DominatorTreeBase<BasicBlock, false>]: Assertion `!IsSuccessor(To, From) && "Deleted edge still exists in the CFG!"' failed. and llvm/lib/Analysis/FunctionPropertiesAnalysis.cpp:390: DominatorTree &llvm::FunctionPropertiesUpdater::getUpdatedDominatorTree(FunctionAnalysisManager &) const: Assertion `DT.getNode(BB)' failed. See comment on the PR. > We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result. > > This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case) > > The loop info change might be more involved and would follow in a subsequent PR. This reverts commit a2a5508bdae7d115b6c3ace461beb7a987a44407 and the follow-up commit cdd11d694a406a98a16d6265168ee2fbe1b6a87c.
2024-08-27	[LoopUnrollAnalyzer] Use computeConstantDifference()	Nikita Popov	1	-3/+3
	This is faster than checking for a SCEVConstant getMinusSCEV() result. The results should be the same for non-degenerate cases.
2024-08-27	[LoopUnrollAnalyzer] Store SimplifiedAddress offset as APInt (NFC)	Nikita Popov	1	-8/+7

2024-08-27	Revert "[LAA] Remove loop-invariant check added in 234cc40adc61."	Florian Hahn	1	-58/+23
	This reverts commit a80053322b765eec93951e21db490c55521da2d8. The new asserts exposed an underlying issue where the expanded bounds could wrap, causing the parts of the code to incorrectly determine that accesses do not overlap. Reproducer below based on @mstorsjo's test case. opt -passes='print<access-info>' target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" define i32 @j(ptr %P, i32 %x, i32 %y) { entry: %gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4 %gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8 br label %loop loop: %1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ] %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ] %gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv %l = load i32, ptr %gep.iv, align 4 %c.1 = icmp eq i32 %l, 3 br i1 %c.1, label %loop.latch, label %if.then if.then: ; preds = %for.body store i64 0, ptr %gep.iv, align 4 %l.2 = load i32, ptr %gep.P.4 br label %loop.latch loop.latch: %sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ] %iv.next = add nsw i32 %iv, 1 %c.2 = icmp slt i32 %iv.next, %sel br i1 %c.2, label %loop, label %exit exit: %res = phi i32 [ %iv.next, %loop.latch ] ret i32 %res }
2024-08-27	[Analysis][NFC] Use SmallVectorImpl consistently in ScalarEvolution (#105663)	David Sherwood	1	-8/+7
	Use SmallVectorImpl instead of SmallVector for function arguments to give the caller greater flexibility in choice of initial size.
2024-08-26	[LAA] Remove loop-invariant check added in 234cc40adc61.	Florian Hahn	1	-23/+58
	234cc40adc61 introduced a loop-invariance check to limit the compile-time impact of the newly added checks. This patch removes the restriction and avoids extra compile-time impact by sinking the check to exits where we would return an unknown dependence. This notably reduces the amount the extra checks are executed while not missing out on any improvements from them. https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u
2024-08-25	Revert "Enable logf128 constant folding for hosts with 128bit long double ↵	NAKAMURA Takumi	2	-24/+12
	(#104929)" ConstantFolding behaves differently depending on host's `HAS_IEE754_FLOAT128`. LLVM should not change the behavior depending on host configurations. This reverts commit 14c7e4a1844904f3db9b2dc93b722925a8c66b27. (llvmorg-20-init-3262-g14c7e4a18449 and llvmorg-20-init-3498-g001e423ac626)
2024-08-24	[Analysis] Copy-construct SmallVector (NFC) (#105911)	Kazu Hirata	1	-2/+2

2024-08-24	[ConstantFolding] Ensure TLI is valid when simplifying fp128 intrinsics.	David Green	1	-1/+1
	TLI might not be valid for all contexts that constant folding is performed. Add a quick guard that it is not null.
2024-08-23	Fix bot failures after PR #104867	Mircea Trofin	1	-22/+7
	An assert was left over after addressing feedback. In the process of fixing, realized the way I addressed the feedback was also incomplete.
2024-08-23	[nfc][mlgo] Incrementally update DominatorTreeAnalysis in ↵	Mircea Trofin	2	-3/+56
	FunctionPropertiesAnalysis (#104867) We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result. This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case) The loop info change might be more involved and would follow in a subsequent PR.