Age | Commit message (Collapse) | Author | Files | Lines |
|
We should handle allocator attributes not only on function
declarations, but also on the call-site. That way we can e.g.
also optimize cases where the allocator function is a virtual
function call.
This was already supported in some of the MemoryBuiltins helpers,
but not all of them. This adds support for allocsize, alloc-family
and allockind("free").
|
|
This reverts commit ccb2b011e577e861254f61df9c59494e9e122b38.
Causes buildbot failures, e.g. on ppc64le builders.
|
|
Hosts which support a float size of 128 bits can benefit from constant
fp128 folding.
|
|
This reverts commit 967185eeb85abb77bd6b6cdd2b026d5c54b7d4f3.
The problem was link dependencies, moved `UseCtxProfile` to `Analysis`.
|
|
MustExec has special logic to determine whether the first loop iteration
will always be executed, by simplifying the IV comparison with the start
value. Currently, this code assumes that the IV is on the LHS of the
comparison, but this is not guaranteed. Make sure it handles the
commuted variant as well.
The changed PhaseOrdering test previously performed peeling to make the
loads dereferenceable -- as a side effect, this also reduced the exit
count by one, avoiding the awkward <= MAX case.
Now we know up-front the the loads are dereferenceable and can be simply
hoisted. As such, we retain the original exit count and now have to
handle it by widening the exit count calculation to i128. This is a
regression, but at least it preserves the vectorization, which was the
original goal. I'm not sure what else can be done about that test.
|
|
target intrinsics. (#97070)"
This reverts commit e8ad87c7d06afe8f5dde2e4c7f13c314cb3a99e9.
This reverts commit d3c9bb0cf811424dcb8c848cf06773dbdde19965.
A few buildbots trip up on asan-rvv-intrinsics.ll. I've also reverted
the follow-up commit d3c9bb0cf8.
https://lab.llvm.org/buildbot/#/builders/46/builds/2895
|
|
intrinsics. (#97070)
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch provide TTI hooks to
make targets describe their intrinsic informations to asan.
Note,
1. this patch renames InterestingMemoryOperand to MemoryRefInfo.
2. this patch does not support RVV indexed/segment load/store.
|
|
This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.
|
|
The index doesn't matter here.
|
|
If the GEP is both nuw and inbounds/nusw, the offset is non-negative.
Pass this information to CastedValue and make use of it when determining
the value range.
Proof for nusw+nuw->nneg: https://alive2.llvm.org/ce/z/a_CKAw
Proof for the test case: https://alive2.llvm.org/ce/z/yJ3ymP
|
|
Add support to detect __size_returning_new variants defined inproposal
P0901R5 to extend to operator new, see
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0901r5.html for
details.
This PR matches the declarations exported by tcmalloc in
https://github.com/google/tcmalloc/blob/f2516691d01051defc558679f37720bba88d9862/tcmalloc/malloc_extension.h#L707-L711
|
|
For the offset scaling, this is sufficient to guarantee nsw. The
other checks for inbounds in this file do need proper inbounds.
|
|
|
|
x -nsw y < -C is false when x > y and C >= 0
Alive2 proof for sgt, sge : https://alive2.llvm.org/ce/z/tupvfi
Note: It only really makes sense in the context of signed comparison for
"X - Y must be positive if X >= Y and no overflow".
Fixes https://github.com/llvm/llvm-project/issues/54735
|
|
|
|
Use const SCEV * explicitly in more places to prepare for
https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
|
|
The hash may contain less than 14 significant digits, which caused the
test to fail.
|
|
Produces -Wrange-loop-construct on some buildbots.
|
|
Currently it only deals with the case where we're subtracting adds with
at most one non-constant operand. This patch extends it to cancel out
common operands for the subtraction of arbitrary add expressions.
The background here is that I want to replace a getMinusSCEV() call in
LAA with computeConstantDifference():
https://github.com/llvm/llvm-project/blob/93fecc2577ece0329f3bbe2719bbc5b4b9b30010/llvm/lib/Analysis/LoopAccessAnalysis.cpp#L1602-L1603
This particular call is very expensive in some cases (e.g. lencod with
LTO) and computeConstantDifference() could achieve this much more
cheaply, because it does not need to construct new SCEV expressions.
However, the current computeConstantDifference() implementation is too
weak for this and misses many basic cases. This is a step towards making
it more powerful while still keeping it pretty fast.
|
|
Add a common constantFoldAndGroupOps() helper that takes care of
constant folding and grouping transforms that are common to all nary
ops. This moves the constant folding prior to grouping, which is more
efficient, and excludes any constant from the sort.
The constant folding has hooks for folding, identity constants and
absorber constants.
This gives a compile-time improvement for SCEV-heavy workloads like
lencod.
|
|
Transitioned from inheritance to has-a relationship in 9db7948e
|
|
We have existing code which reasons about a step evenly dividing the
iteration space is a finite loop with a single exit implying
no-self-wrap. The sign of the step doesn't effect this.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
(#101380)
SCEV has logic for inferring wrap flags on AddRecs which are known to
control an exit based on whether the step is a power of two. This logic
only considered constants, and thus did not trigger for steps such as (4
x vscale) which are common in scalably vectorized loops.
The net effect is that we were very sensative to the preservation of
nsw/nuw flags on such IVs, and could not infer trip counts if they got
lost for any reason.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
Change eraseNode to require that the basic block is still contained
inside the function. This is a preparation for using numbers of basic
blocks inside the dominator tree, which are invalid for blocks that are
not inside a function.
|
|
Unfortunately storing a `MaybeAlign` in ResourceInfo deletes our move
constructor in compilers that haven't implemented [P0602R4], like GCC 7.
Since we only ever use the alignment in ways where alignment 1 and unset
are ambiguous anyway, we'll just store the integer AlignLog2 value that
we'll eventually use directly.
[P0602R4]:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0602r4.html
This reverts commit c22171f12fa9f260e2525cf61b93c136889e17f2, reapplying
a94edb6b8e321a46fe429934236aaa4e2e9fb97f.
|
|
The EqCache parameter has been removed.
|
|
The cache triggers almost never, and seems unlikely to help with
performance. However, when it does, it is likely to cause the comparator
to become inconsistent due to a bad interaction of the depth limit and
cache hits. This leads to crashes in debug builds. See the new unit test
for a reproducer.
|
|
No need to do the second one if the first one already failed.
|
|
Fix the build failure caused by
https://github.com/llvm/llvm-project/pull/94944
Fixes https://github.com/llvm/llvm-project/issues/100296
|
|
Seeing build failures, reverting to investigate.
Reverts llvm/llvm-project#100697
|
|
HLSL allows StructuredBuffer<> to be defined with scalar or
up-to-4-element vectors as well as with structs, but when doing so
`dxc` doesn't set the alignment. Emulate this.
Pull Request: https://github.com/llvm/llvm-project/pull/100697
|
|
This simplifies making sure we set all of the members of the unions
and adds asserts to help catch if we do something wrong.
Pull Request: https://github.com/llvm/llvm-project/pull/100696
|
|
Even if memory is valid from `llvm` point of view,
e.g. local alloca, sanitizers have API for user
specific memory annotations.
These annotations can be used to track size of the
local object, e.g. inline vectors may prevent
accesses beyond the current vector size.
So valid programs should not access those parts of
alloca before checking preconditions.
Fixes #100639.
|
|
And extract `suppressSpeculativeLoadForSanitizers`.
For #100639.
|
|
...and also in TTI::getMemcpyLoopResidualLoweringType.
|
|
Compile-time improvement:
http://llvm-compile-time-tracker.com/compare.php?from=13996378d81c8fa9a364aeaafd7382abbc1db83a&to=861ffa4ec5f7bde5a194a7715593a1b5359eb581&stat=instructions:u
baseline: 803eaf29267c6aae9162d1a83a4a2ae508b440d3
```
Top 5 improvements:
stockfish/movegen.ll 2541620819 2538599412 -0.12%
minetest/profiler.cpp.ll 431724935 431246500 -0.11%
abc/luckySwap.c.ll 581173720 580581935 -0.10%
abc/kitTruth.c.ll 2521936288 2519445570 -0.10%
abc/extraUtilTruth.c.ll 1216674614 1215495502 -0.10%
Top 5 regressions:
openssl/libcrypto-shlib-sm4.ll 1155054721 1155943201 +0.08%
openssl/libcrypto-lib-sm4.ll 1155054838 1155943063 +0.08%
spike/vsm4r_vv.ll 1296430080 1297039258 +0.05%
spike/vsm4r_vs.ll 1312496906 1313093460 +0.05%
nuttx/lib_rand48.c.ll 126201233 126246692 +0.04%
Overall: -0.02112308%
```
|
|
Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.
Update getPtrStride to return 0 for invariant pointers.
Then proceed by checking the strides.
If either source or sink are not strided by a constant (i.e. not a
non-wrapping AddRec) or invariant, the accesses may overlap
with earlier or later iterations and we cannot generate runtime
checks to disambiguate them.
Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.
If both are strided by constants, we proceed as previously.
This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces
additional checks if the underlying object is loop-invariant.
Fixes https://github.com/llvm/llvm-project/issues/87189.
PR: https://github.com/llvm/llvm-project/pull/99577
|
|
This makes the binding structure in a DXILResource default to empty
and need a separate call to set up, and also moves the unique ID into
it since bindings are the only place where those are actually used.
This will put us in a better position when dealing with resource
handles in libraries.
Pull Request: https://github.com/llvm/llvm-project/pull/100623
|
|
Pull Request: https://github.com/llvm/llvm-project/pull/100622
|
|
I had put this in Transforms/Utils, but that doesn't actually make
sense if we want to populate these structures via an analysis pass.
Pull Request: https://github.com/llvm/llvm-project/pull/100621
|
|
|
|
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.
This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.
This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.
Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)
Works towards issue #98272.
|
|
If a result is potentially based on a not yet proven assumption,
BasicAA will remember it inside AssumptionBasedResults and remove
the cache entry if an assumption higher up is later disproved.
However, we currently miss the case where another cache entry ends
up depending on such an AssumptionBased result.
Fix this by introducing an additional AssumptionBased state for
cache entries. If such a result is used, we'll still increment
AAQI.NumAssumptionUses, which means that the using entry will
also become AssumptionBased and be cleared if the assumption is
disproved.
At the end of the root query, convert remaining AssumptionBased
results into definitive results.
Fixes https://github.com/llvm/llvm-project/issues/98978.
|
|
Retrieve `!tbaa` metadata via `!tbaa.struct` in `adjustForAccess`
unless it already exists, as struct-path aware `MDNodes` emitted
via `new-struct-path-tbaa` may be leveraged. As `!tbaa.struct`
carries memcpy padding semantics among struct fields and `!tbaa`
is already meant to aid to alias semantics, it should be possible
to zero out `!tbaa.struct` once the memcpy has been simplified.
`SROA/tbaa-struct.ll` test has gone out of scope, as `!tbaa` has
already replaced `!tbaa.struct` in SROA.
Fixes: https://github.com/llvm/llvm-project/issues/95661.
|
|
|
|
Alive2: https://alive2.llvm.org/ce/z/g3xxnM
|
|
(#100316)
See the following case:
```
define i16 @pr100298() {
entry:
br label %for.inc
for.inc:
%indvar = phi i32 [ -15, %entry ], [ %mask, %for.inc ]
%add = add nsw i32 %indvar, 9
%mask = and i32 %add, 65535
%cmp1 = icmp ugt i32 %mask, 5
br i1 %cmp1, label %for.inc, label %for.end
for.end:
%conv = trunc i32 %add to i16
%cmp2 = icmp ugt i32 %mask, 3
%shl = shl nuw i16 %conv, 14
%res = select i1 %cmp2, i16 %conv, i16 %shl
ret i16 %res
}
```
When computing knownbits of `%shl` with `%cmp2=false`, we cannot use
this condition in the analysis of `%mask (%for.inc -> %for.inc)`.
Fixes https://github.com/llvm/llvm-project/issues/100298.
|
|
unpredictable. (#98495)
The `!unpredictable` metadata has been present for a long time, but
it's usage in optimizations is still limited. This patch teaches
`FoldTwoEntryPHINode()` to be more aggressive with an unpredictable
branch to reduce mispredictions.
A TTI interface `getBranchMispredictPenalty()` is added to distinguish
between different hardwares to ensure we don't go too far for simpler
cores. For simplicity, only a naive x86 implementation is included for
the time being.
|
|
This addresses an optimization regression in Rust we have observed after
https://github.com/llvm/llvm-project/pull/82458. We now only perform
pointer replacement if they have the same underlying object. However,
getUnderlyingObject() by default only looks through linear chains, not
selects/phis. In particular, this means that we miss cases involving
involving pointer induction variables.
This patch fixes this by introducing a new helper
getUnderlyingObjectAggressive() which basically does what
getUnderlyingObjects() does, just specialized to the case where we must
arrive at a single underlying object in the end, and with a limit on the
number of inspected values.
Doing this more expensive underlying object check has no measurable
compile-time impact on CTMark.
|
|
isDereferenceableAndAlignedInLoop (#99490)
This patch now bails out explicitly for negative offsets so that it's
more consistent with the unsigned remainder and add calculations,
and it fixes a genuine bug as shown with the new test.
|