Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
Addresses downstream rustc issue:
https://github.com/rust-lang/rust/issues/129795
|
|
This patch tries to infer is-power-of-2 from assumptions. I don't see
that this kind of assumption exists in my dataset.
Related issue: https://github.com/rust-lang/rust/issues/129795
Close https://github.com/llvm/llvm-project/issues/58996.
|
|
During the ThinLTO indexing step for one of our large applications, we
create 4 million instances of FunctionSummary.
Changing:
std::vector<EdgeTy> CallGraphEdgeList;
to:
SmallVector<EdgeTy, 0> CallGraphEdgeList;
in FunctionSummary reduces the size of each instance by 8 bytes. The
rest of the patch makes the same change to other places so that the
types stay compatible across function boundaries.
|
|
synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
https://github.com/llvm/llvm-project/commit/64498c54831bed9cf069e0923b9b73678c6451d8).
While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.
With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
|
|
If none of the functions in this `Module` are roots in the contextual profile, we can't use it and should just return the `{}` case.
|
|
During the ThinLTO indexing step for one of our large applications, we
create 7.5 million instances of GlobalValueSummary.
Changing:
std::vector<ValueInfo> RefEdgeList;
to:
SmallVector<ValueInfo, 0> RefEdgeList;
in GlobalValueSummary reduces the size of each instance by 8 bytes.
The rest of the patch makes the same change to other places so that
the types stay compatible across function boundaries.
|
|
|
|
(#101485)
|
|
It's more clear. (This isn't exhaustive).
|
|
The SCEV expression `((-C + (C smax %x)) /u %x)` can be folded
to zero for any positive constant C.
Proof: https://alive2.llvm.org/ce/z/_dLm8C.
|
|
This handles the edge case where BitWidth is 1 and doing the
increment gets a value that's not valid in that width, while we
just want wrap-around.
Split out of https://github.com/llvm/llvm-project/pull/80309.
|
|
This change merges the three different places (at the IR layer) for
finding the identity value of a reduction into a single copy. This
depends on several prior commits which fix ommissions and bugs in
the distinct copies, but this patch itself should be fully
non-functional.
As the new comments and naming try to make clear, the identity value
is a property of the @llvm.vector.reduce.* intrinsic, not of e.g.
the recurrence descriptor. (We still provide an interface for
clients using recurrence descriptors, but the implementation simply
translates to the intrinsic which each corresponds to.)
As a note, the getIntrinsicIdentity API does not support fminnum/fmaxnum
or fminimum/fmaximum which is why we still need manual logic (but at
least only one copy of manual logic) for those cases.
|
|
Avoid dereferencing operand to llvm::isa.
|
|
Do not emit a warning if there are two null noalias arguments,
as they cannot be dereferenced anyway.
This is a common pattern for `@.omp_outlined`, which has some
optional noalias arguments.
|
|
We also need to check that the memory access LocationSize is not
scalable.
|
|
Don't assume the vector is fixed size. For scalable vectors, do
not report an error, as indices outside the minimum range may be
valid.
|
|
|
|
Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant.
Post-inlining, the update mainly consists of:
- making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions)
- in the contextual profile:
- each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual.
- the contexts of the callee (at the inlined callsite) are moved to the caller.
- the callee context at the inlined callsite is deleted.
|
|
Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case
where the vectorizer is emitting a non-canonical identity value given
the available flags. We use largest/smallest value during ISEL, and VP
expansion, but not during vectorization.
Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start
value, this difference is only visible when masking of inactive lanes is
required.
Primary motivation of this change is simply to remove a difference
between version of code which reason about the identity value of a
reduction so I can kill all but one off.
In review, it was pointed out that this is actually a functional fix as well.
The old code used inf on a noinf reduction instruction - whose
result is poison! That wasn't the intent of the code.
|
|
These recurrence types don't have a meaningful identity, and the
routine was abused to return the start value instead. Out of the
three callers to this routine, only one actually wants this
behavior. This is a prep change for removing the routine entirely
and commoning it with other copies of the same logic.
|
|
acos/asin/atan and cosh/sinh/tanh libcalls (#106844)
Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents
|
|
Due to a reviewer request on PR #88385 I have created this patch
to add a getPredicatedExitCount function, which is similar to
getExitCount except that it uses the predicated backedge taken
information. With PR #88385 we will start to care about more
loops with multiple exits, and want the ability to query exit
counts for a particular exiting block. Such loops may require
predicates in order to be vectorised.
New tests added here:
Analysis/ScalarEvolution/predicated-exit-count.ll
|
|
When we decompose the GEP offset expression, and the arithmetic is not
performed using nuw operations, we cannot retain the nuw flag on the
decomposed GEP.
For example, if we have `gep nuw p, (a-1)`, this is not at all the same
as `gep nuw (gep nuw p, a), -1`.
Fix this by tracking NUW through linear expression decomposition,
similarly to what we already do for the NSW flag.
This fixes the miscompilation reported in
https://github.com/llvm/llvm-project/pull/105496#issuecomment-2315322220.
|
|
This patch adds vectorization support for [u|s]cmp intrinsic calls.
|
|
Basic infrastructure to collect Function properties in Metadata Analysis
- Add a `SmallVector` of entry properties to the metadata information.
- Add a structure to represent function properties. Currently
`numthreads` and shader kind properties of shader entry functions are
represented.
|
|
Argument is another possible starting point for the pointer traversal,
and PtrUseVisitor should be able to handle it.
|
|
Avoid duplication so that we can easily tell these lists are in sync.
|
|
acos/asin/atan and cosh/sinh/tanh intrinsics (#106584)
Show fallback cases in amdlibm tests where it doesn't have that specific op
|
|
When we encounter a bitcast from an integer type we can use the
information from `KnownBits` to glean some information about the
fpclass:
- If the sign bit is known, we can transfer this information over.
- If the float is IEEE format and enough of the bits are known, we may
be able to prove or rule out some fpclasses such as NaN, Zero, or Inf.
|
|
FunctionPropertiesAnalysis (#104867) (#106309)
Reverts c992690179eb5de6efe47d5c8f3a23f2302723f2.
The problem is that if there is a sequence "{delete A->B} {delete A->B}
{insert A->B}" the net result is "{delete A->B}", which is not what we
want.
Duplicate successors may happen in cases like switch statements (as
shown in the unit test).
The second problem was that in `invoke` cases, some edges we speculate may get deleted don't, but are also not reachable from the inlined call site's basic block. We just need to check which edges are actually not present anymore.
The fix is to sanitize the list of deletes, just like we do for inserts.
|
|
This reverts commit 42d3cccffd203ff6dc967d4243588ca466c0faf7 which
caused a test failure.
|
|
LLVM has a CMake variable to control whether to consider logf128
constant folding which libAnalysis ignores. This patch changes the
logf128 check to rely on the global LLVM_HAS_LOGF128 setting made in
config-ix.cmake.
|
|
So we can reuse the logic inside IPSCCP.
|
|
We're generally not able to simplify signed pointer comparisons
(because we don't have no-wrap flags that would permit it), so
we shouldn't pretend that we can in the cost model.
The unsigned comparison case is also not modelled correctly,
as explained in the added comment. As this is a cost model
inaccuracy at worst, I'm leaving it alone for now.
|
|
Use ConstantFoldLoadFromConst() instead of a partial re-implementation.
This makes the code slightly more generic by not depending on the
exact structure of the constant.
|
|
An overload of `llvm::promoteCallWithIfThenElse` that updates the contextual profile.
High-level, this is very simple: after creating the `if... then (direct call) else (indirect call)` structure, we instrument the new callsites and BBs (the instrumentation will help with tracking for other IPO transformations, and, ultimately, to match counter values before flattening to `MD_prof`).
In more detail:
- move the callsite instrumentation of the indirect call to the `else` BB, before the indirect call
- create a new callsite instrumentation for the direct call
- create instrumentation for both the `then` and `else` BBs - we could instrument just one (MST-style) but we're not running the binary with this instrumentation, and at most this would save some space (less counters tracked). For simplicity instrumenting both at this point
- update each context belonging to the caller by updating the counters, and moving the indirect callee to the new, direct callsite ID
Issue #89287
|
|
getSCEV will assert unless the operand is SCEVable. Replace an instance
of the implementation of ScalarEvolution::isSCEVable (which checks that
the operand is either integer or pointer type) with a call to the
function, to make it clear that the subsequent use of getSCEV will not
fail.
|
|
Fix a bug I introduced in 721fdf1c9a73269280a504cbba847f4979512b66.
|
|
FunctionPropertiesAnalysis (#104867)"
This seems to cause asserts in our builds:
llvm/include/llvm/Support/GenericDomTreeConstruction.h:927:
static void llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<BasicBlock, false>>::DeleteEdge(DomTreeT &, const BatchUpdatePtr, const NodePtr, const NodePtr) [DomTreeT = llvm::DominatorTreeBase<BasicBlock, false>]:
Assertion `!IsSuccessor(To, From) && "Deleted edge still exists in the CFG!"' failed.
and
llvm/lib/Analysis/FunctionPropertiesAnalysis.cpp:390:
DominatorTree &llvm::FunctionPropertiesUpdater::getUpdatedDominatorTree(FunctionAnalysisManager &) const:
Assertion `DT.getNode(BB)' failed.
See comment on the PR.
> We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result.
>
> This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case)
>
> The loop info change might be more involved and would follow in a subsequent PR.
This reverts commit a2a5508bdae7d115b6c3ace461beb7a987a44407 and the
follow-up commit cdd11d694a406a98a16d6265168ee2fbe1b6a87c.
|
|
This is faster than checking for a SCEVConstant getMinusSCEV()
result. The results should be the same for non-degenerate cases.
|
|
|
|
This reverts commit a80053322b765eec93951e21db490c55521da2d8.
The new asserts exposed an underlying issue where the expanded bounds
could wrap, causing the parts of the code to incorrectly determine that
accesses do not overlap.
Reproducer below based on @mstorsjo's test case.
opt -passes='print<access-info>'
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
define i32 @j(ptr %P, i32 %x, i32 %y) {
entry:
%gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4
%gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8
br label %loop
loop:
%1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ]
%iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ]
%gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv
%l = load i32, ptr %gep.iv, align 4
%c.1 = icmp eq i32 %l, 3
br i1 %c.1, label %loop.latch, label %if.then
if.then: ; preds = %for.body
store i64 0, ptr %gep.iv, align 4
%l.2 = load i32, ptr %gep.P.4
br label %loop.latch
loop.latch:
%sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ]
%iv.next = add nsw i32 %iv, 1
%c.2 = icmp slt i32 %iv.next, %sel
br i1 %c.2, label %loop, label %exit
exit:
%res = phi i32 [ %iv.next, %loop.latch ]
ret i32 %res
}
|
|
Use SmallVectorImpl instead of SmallVector for function arguments
to give the caller greater flexibility in choice of initial size.
|
|
234cc40adc61 introduced a loop-invariance check to limit the
compile-time impact of the newly added checks.
This patch removes the restriction and avoids extra compile-time impact
by sinking the check to exits where we would return an unknown
dependence. This notably reduces the amount the extra checks are
executed while not missing out on any improvements from them.
https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u
|
|
(#104929)"
ConstantFolding behaves differently depending on host's `HAS_IEE754_FLOAT128`.
LLVM should not change the behavior depending on host configurations.
This reverts commit 14c7e4a1844904f3db9b2dc93b722925a8c66b27.
(llvmorg-20-init-3262-g14c7e4a18449 and llvmorg-20-init-3498-g001e423ac626)
|
|
|
|
TLI might not be valid for all contexts that constant folding is performed. Add
a quick guard that it is not null.
|
|
An assert was left over after addressing feedback. In the process of
fixing, realized the way I addressed the feedback was also incomplete.
|
|
FunctionPropertiesAnalysis (#104867)
We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result.
This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case)
The loop info change might be more involved and would follow in a subsequent PR.
|