Age | Commit message (Collapse) | Author | Files | Lines |
|
This reverts commits 56a1cbb ([LAA] Fix non-NFC parts of 1aded51),
1aded51 ([LAA] Prepare to handle diff type sizes (NFC)). The original
NFC patch caused some regressions, which the later patch tried to fix.
However, the later patch is the cause of some crashes, and it would be
best to revert both for now, and re-land after thorough testing.
|
|
This reverts commit aa08b1a9963f33ded658d3ee655429e1121b5212.
|
|
Split out from #151300 to isolate TargetTransformInfo cost modelling for
fault-only-first loads from VPlan implementation details. This change
adds costing support for vp.load.ff independently of the VPlan work.
For now, model a vp.load.ff as cost-equivalent to a vp.load.
|
|
Fix the NaN-handling semantics of various NVVM intrinsics converting
from fp types to integer types.
Previously in ConstantFolding, NaN inputs would be constant-folded to 0.
However, v9.0 of the PTX spec states that:
In float-to-integer conversions, depending upon conversion types, NaN
input results in following value:
* Zero if source is not `.f64` and destination is not `.s64`, .`u64`.
* Otherwise `1 << (BitWidth(dst) - 1)` corresponding to the value of
`(MAXINT >> 1) + 1` for unsigned type or `MININT` for signed type.
Also, support for constant-folding +/-Inf and values which
overflow/underflow the integer output type has been added (they clamp to
min/max int).
Because of this NaN-handling semantic difference, we also need to
disable transforming several intrinsics to FPToSI/FPToUI, as the LLVM
intstruction will return poison, but the intrinsics have defined
behaviour for these edge-cases like NaN/Inf/overflow.
|
|
1aded51 ([LAA] Prepare to handle diff type sizes (NFC)) was supposed to
be a non-functional patch, but introduced functional changes as
known-non-negative and known-non-positive is not equivalent to
!known-non-zero. Fix this.
|
|
This extends the DropUnnecessaryAssumes pass to also handle operand
bundle assumes. For this purpose, export the affected value analysis for
operand bundles from AssumptionCache.
If the bundle only affects ephemeral values, drop it. If all bundles on
an assume are dropped, drop the whole assume.
|
|
ResultElem stores a weak handle of an assume, plus an index for
referring to a specific operand bundle. This makes sense for the results
of assumptionsFor(), which refers to specific operands of assumes.
However, assumptions() is a plain list of assumes. It does *not* contain
separate entries for each operand bundles. The operand bundle index is
always ExprResultIdx.
As such, we should be directly using WeakVH for this case, without the
additional wrapper.
|
|
Scalable get_active_lane_mask intrinsic calls can be simplified to i1
splat (ptrue) when its constant range is larger than or equal to the
maximum possible number of elements, which can be inferred from
vscale_range(x, y)
|
|
Avoiding any new inttoptr is unnecessarily restrictive for "plain"
non-integral pointers, but it is important for unstable pointers and
pointers with external state. Fixes another test codegen regression
from https://github.com/llvm/llvm-project/pull/105735.
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/159959
|
|
An attempt to resolve the issue flagged in
[PR150531](https://github.com/llvm/llvm-project/issues/150531)
|
|
embed in MemIntrinsicInfo #157863 (#159713)
[Previously reverted due to failures on asan-rvv-intrinsics.ll, the test
case is riscv only and it is triggered by other target]
Reland [#157863](https://github.com/llvm/llvm-project/pull/157863), and
add `; REQUIRES: riscv-registered-target` in test case to skip the
configuration that doesn't register riscv target.
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch make
SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so
that TTI can make targets describe their intrinsic informations to asan.
Note,
1. This patch move InterestingMemoryOperand from Transforms to Analysis.
2. Extend MemIntrinsicInfo by adding a
SmallVector<InterestingMemoryOperand> member.
3. This patch does not support RVV indexed/segment load/store.
|
|
When building an assert-enabled target, silence the following:
```
C:\git\llvm-project\llvm\include\llvm/Analysis/DependenceAnalysis.h(290): warning C4018: '<=': signed/unsigned mismatch
```
|
|
comparisons with `ConstantFPRange` (#159315)
Follow up of #158097
Similar to `simplifyAndOrOfICmpsWithConstants`, we can do so for
floating point comparisons.
|
|
Alive2: https://alive2.llvm.org/ce/z/8rX5Rk
Closes https://github.com/llvm/llvm-project/issues/118106.
|
|
When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.
|
|
This is a common pattern to initialize Knownbits that occurs before
loops that call intersectWith.
|
|
compares. (#141798)
My usecase is simplifying the control flow generated by LoopVectorize
when vectorising loops whose tripcount is a function of the runtime
vector length. This can be problematic because:
* CSE is a pre-LoopVectorize transform and so it's common for an IR
function to include several calls to llvm.vscale(). (NOTE: Code
generation will typically remove the duplicates)
* Pre-LoopVectorize instcombines will rewrite some multiplies as shifts.
This leads to a mismatch between VL based maths of the scalar loop and
that created for the vector loop, which prevents some obvious
simplifications.
SCEV does not suffer these issues because it effectively does CSE during
construction and shifts are represented as multiplies.
|
|
This patch adds an overflow check to the `exactSIVtest` function to fix
the issue demonstrated in the test case added in #157085. This patch
only fixes one of the routines. To fully resolve the test case, the
other functions need to be addressed as well.
|
|
embed in MemIntrinsicInfo" (#159700)
Reverts llvm/llvm-project#157863
|
|
MemIntrinsicInfo (#157863)
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch make
SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so
that TTI can make targets describe their intrinsic informations to asan.
Note,
1. This patch move InterestingMemoryOperand from Transforms to Analysis.
2. Extend MemIntrinsicInfo by adding a
SmallVector<InterestingMemoryOperand> member.
3. This patch does not support RVV indexed/segment load/store.
|
|
As depend_diff_types shows, there are several places where the
HasSameSize check can be relaxed for higher analysis precision. As a
first step, return both the source size and the sink size from
getDependenceDistanceStrideAndSize, along with a HasSameSize boolean for
the moment.
|
|
A common idiom is the usage of the PatternMatch match function within a
functional algorithm like all_of. Introduce a match functor to shorten
this idiom.
Co-authored-by: Luke Lau <luke@igalia.com>
|
|
This reverts commit fd58f235f8c5bd40d98acfd8e7fb11d41de301c7.
The recommit contains an extra check to make sure that D is a multiple of
C2, if C2 > C1. This fixes the issue causing the revert fd58f235f8c. Tests
have been added in 6a726e9a4d3d0.
Original message:
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1).
Depends on https://github.com/llvm/llvm-project/pull/157555.
Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN
PR: https://github.com/llvm/llvm-project/pull/157656
|
|
We currently infer `captures(none)` for calls that are read-only,
nounwind and willreturn. As pointed out in
https://github.com/llvm/llvm-project/issues/129090, this is not correct
even with this set of pre-conditions, because the function could
conditionally cause UB depending on the address.
As such, change this logic to instead report `captures(address)`. This
also allows dropping the nounwind and willreturn checks, as these can
also only capture the address.
|
|
This is a minor fixup for scalable vectors after f9f62ef4ae555a. It
handles them in the same way as other memory locations that are larger
than errno, preventing the failure on implicit conversion from a
scalable location.
|
|
This patch introduces a new option, `da-run-siv-routines-only`, which
runs only the SIV family routines in the DA. This is useful for testing
(regression tests, not dependence tests) as it helps detect behavioral
changes in the SIV routines. Actually, regarding the test cases added in
#157085, fixing the incorrect result requires changes across multiple
functions (at a minimum, `exactSIVtest`, `gcdMIVtest` and
`symbolicRDIVtest`). It is difficult to address all of them at once.
This patch also generates the CHECK directives using the new option for
`ExactSIV.ll` as it is necessary for subsequent patches. However, I
believe it will also be useful for other `xxSIV.ll` tests. Notably, the
SIV family routines tend to be affected by other routines, as they are
typically invoked at the beginning of the overall analysis.
|
|
Ensure alias analyses mask out `errnomem` location, refining the
resulting modref info, when the given access/location does not
alias errno. This may occur either when TBAA proves there is no
alias with errno (e.g., float TBAA for the same root would be
disjoint with the int-only compatible TBAA node for errno); or
if the memory access size is larger than the integer size, or
when the underlying object is a potentially-escaping alloca.
Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.
|
|
inst's (#158097)
Resolves #157371
We can eliminate one of the `fcmp` when we have two same `olt` or `ogt`
instructions matched in `or`/`and` simplification.
|
|
Closes https://github.com/llvm/llvm-project/issues/157238.
|
|
This patch removes base pointers from subscripts when delinearization
fails. Previously, in such cases, the pointer type SCEVs were used
instead of offset SCEVs derived from them.
For example, here is a portion of the debug output when analyzing
`strong0` in `test/Analysis/DependenceAnalysis/StrongSIV.ll`:
```
testing subscript 0, SIV
src = {(8 + %A),+,4}<nuw><%for.body>
dst = {(8 + %A),+,4}<nuw><%for.body>
Strong SIV test
Coeff = 4, i64
SrcConst = (8 + %A), ptr
DstConst = (8 + %A), ptr
Delta = 0, i64
UpperBound = (-1 + %n), i64
Distance = 0
Remainder = 0
```
As shown above, the `SrcConst` and `DstConst` are pointer values rather
than integer offsets. `%A` should be removed.
This change is necessary for #157086, since
`ScalarEvolution::willNotOverflow` expects integer type SCEVs as
arguments. This change alone alone should not affect the analysis
results.
|
|
When adding a new predicate to a union, we currently do a bidirectional
implication for all the contained predicates. This means that the number
of implication checks is quadratic in the number of total predicates (if
they don't end up being eliminated).
Fix this by not checking for implication if the number of predicates
grows too large. The expectation is that if there is a large number of
predicates, we should be discarding them later anyway, as expanding them
would be too expensive.
Fixes https://github.com/llvm/llvm-project/issues/156114.
|
|
Reverts llvm/llvm-project#157656
There are multiple reports that this is causing miscompiles in the MSan
test suite after bootstrapping and that this is causing miscompiles in
rustc. Let's revert for now, and work to capture a reproducer next week.
|
|
If we have a phi where one of it's source blocks is an unreachable
block, we don't want to traverse back into the unreachable region. Doing
so allows e.g. finding a trivial self loop when walking back the
predecessor chain.
|
|
When the second argument passed to the get.active.lane.mask intrinsic is
zero we can simplify the instruction to return an all-false mask
regardless of the first operand.
|
|
Stacked on #156923
In https://godbolt.org/z/8svWaredK, we spill a lot on RISC-V because
whilst the largest element type is i8, we generate a bunch of pointer
vectors for gathers and scatters. This means the VF chosen is quite high
e.g. <vscale x 16 x i8>, but we end up using a bunch of <vscale x 16 x
i64> m8 registers for the pointers.
This was briefly fixed by #132190 where we computed register pressure in
VPlan and used it to prune VFs that were likely to spill. The legacy
cost model wasn't able to do this pruning because it didn't have
visibility into the pointer vectors that were needed for the
gathers/scatters.
However VF pruning was restricted again to just the case when max
bandwidth was enabled in #141736 to avoid an AArch64 regression, and
restricted again in #149056 to only prune VFs that had max bandwidth
enabled.
On RISC-V we take advantage of register grouping for performance and
choose a default of LMUL 2, which means there are 16 registers to work
with – half the number as SVE, so we encounter higher register pressure
more frequently.
As such, we likely want to always consider pruning VFs with high
register pressure and not just the VFs from max bandwidth.
This adds a TTI hook to opt into this behaviour for RISC-V which fixes
the motivating godbolt example above. When last checked this
significantly reduces the number of spills on SPEC CPU 2017, up to
80% on 538.imagick_r.
|
|
coroutine function (#149936)" (#157986)
Since #156788 has resolved #149604, we can revert this workaround now.
|
|
Check if operands are ConstantInt to avoid crashing on constant
expression after https://github.com/llvm/llvm-project/pull/156659.
|
|
Scalable get_active_lane_mask intrinsics with a range of 0 can be
lowered to zeroinitializer. This helps remove no-op scalable masked
stores and loads.
|
|
If C2 >u C1 and C1 >u 1, fold to A /u (C2 /u C1).
Depends on https://github.com/llvm/llvm-project/pull/157555.
Alive2 Proof: https://alive2.llvm.org/ce/z/BWvQYN
PR: https://github.com/llvm/llvm-project/pull/157656
|
|
proof: https://alive2.llvm.org/ce/z/8emkHY
|
|
|
|
The specification for get.active.lanes.mask says a limit value of zero
results in poison. This seems like an artificial restriction and means
you cannot use the intrinsic to create minimal loops of the form:
```
foo(int count, ....) {
int i = 0;
while (mask = get.active.lane.mask(i, count)) {
; do work
i += count_bits(mask);
}
}
```
I cannot see any code that generates poison in this case, in fact
ConstantFoldFixedVectorCall returns the logical result (i.e. an all
false vector).
There are also cases like `can_overflow_i64_induction_var` in
sve-tail-folding-overflow-checks.ll that look broken by the current
definition? for the case when "%N <= vscale * 4".
|
|
Treat negative constants C as -1 * abs(C1) when folding multiplies and
udivs.
Alive2 Proof: https://alive2.llvm.org/ce/z/bdj9W2
PR: https://github.com/llvm/llvm-project/pull/157555
|
|
Remove a level of indirection due to findForkedPointer, in an effort to
improve code.
|
|
Currently this fold only supports a single GEP. However, in ptradd
representation, it may be split across multiple GEPs. In particular, PR
#151333 will split off constant offset GEPs.
To support this, add a new helper decomposeLinearExpression(), which
decomposes a pointer into a linear expression of the form BasePtr +
Index * Scale + Offset.
I plan to also extend this helper to look through mul/shl on the index
and use it in more places that currently use collectOffset() to extract
a single index * scale. This will make sure such optimizations are not
affected by the ptradd migration.
|
|
(#157159)
Generalize fold added in 74ec38fad0a1289
(https://github.com/llvm/llvm-project/pull/156730) to support multiplying and
dividing by different constants, given they are both powers-of-2 and C1 is a
multiple of C2, checked via logBase2.
https://alive2.llvm.org/ce/z/eqJ2xj
PR: https://github.com/llvm/llvm-project/pull/157159
|
|
Unify explanation for GF(2^n) and GF(2), which was previously
convoluted.
|
|
Checking if trip-count exceeds 256 is no longer necessary, as we have
moved away from KnownBits computations to pattern-matching, which is
very cheap and independent of TC.
|
|
This patch addresses
https://github.com/llvm/llvm-project/pull/155216#discussion_r2297724663.
This patch adds a helper function to put the inverse cast on constants,
with cast flags preserved(optional).
Follow-up patches will add trunc/ext handling on VectorCombine and flags
preservation on InstCombine.
|
|
It is tied to the vocab having had been set. Checking that vector's
`emtpy` is sufficient. Less state to track (for a maintainer)
|