| Age | Commit message (Collapse) | Author | Files | Lines |
|
RHS (#179055)
Add a new helper function `matchEquivZeroRHS()` that recognizes
comparisons with constants that are equivalent to comparisons with zero,
and transforms the predicate accordingly.
This handles the following transformations:
- icmp sgt X, -1 --> icmp sge X, 0
- icmp sle X, -1 --> icmp slt X, 0
- icmp [us]ge X, 1 --> icmp [us]gt X, 0
- icmp [us]lt X, 1 --> icmp [us]le X, 0
This enables more optimization opportunities in `simplifyICmpWithZero`,
such as folding icmp sgt X, -1 when X is known to be non-negative.
---
- IR Impact: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/3414
|
|
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.
The syntax in the common cases looks like this:
`denormal_fpenv(preservesign,preservesign)`
`denormal_fpenv(float: preservesign,preservesign)`
`denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`
I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).
This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.
This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.
I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
|
|
|
|
Following on from #170796, this PR implements the second part of
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974
by allowing non-constant offsets in the vector splice intrinsics.
Previously @llvm.vector.splice had a restriction enforced by the
verifier that the offset had to be known to be within the range of the
vector at compile time. Because we can't enforce this with non-constant
offsets, it's been relaxed so that offsets that would slide the vector
out of bounds return a poison value, similar to
insertelement/extractelement.
@llvm.vector.splice.left also previously only allowed offsets within the
range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so
that it's consistent with @llvm.vector.splice.right.
In lieu of the verifier checks that were removed, InstSimplify has been
taught to fold splices to poison when the offset is out of bounds.
The cost model isn't implemented in this PR, and just returns invalid
for any non-constant offsets for now. I think the correct way to cost
these non-constant offets isn't through getShuffleCost because they
can't handle variable masks, but instead just through
getIntrinsicInstCost.
|
|
isn't successfull when simplifying fcmp (#176159)
Fixes #175949.
|
|
In the special case for the same value for both operands,
ensure the value isn't undef.
|
|
Changes this to getSigned() to match the signedness of the calculation.
However, we still need to allow truncation because the addition
result may overflow, and the operation is specified to truncate
in that case.
Fixes https://github.com/llvm/llvm-project/issues/175159.
|
|
In the special case for fdiv/frem with the same operands, make
sure the input isn't undef.
|
|
(#174142)
Previously, `canReplacePointersIfEqual` unconditionally returned
`true` for vectors of pointers (e.g., `<2 x ptr>`) because it only
checked for scalar pointer types.
This resulted in a failure to perform appropriate verification for
these types. This patch fixes the logic to ensure they are properly
validated.
Fixes https://github.com/llvm/llvm-project/issues/174045
|
|
`ptrtoaddr(p1) - ptrtoaddr(p2) == non-zero` implies `p1 != p2`, same as
for ptrtoint.
|
|
Addresses https://github.com/llvm/llvm-project/issues/173267.
- folds log(-x) -> NaN
- folds log(0) -> -inf
- also folds log(1) -> 0.0 without host libm
> note: log(inf) is also doable but it causes some other tests to fail
so I avoided it for now
|
|
Add support for ptrtoaddr in isKnownNonZero(). We can directly forward
to isKnownNonZero() for the pointer here, as we define nonnull as
applying to the address bits.
Also adjust the ptrtoint implementation to match, by requiring that the
result type >= address size (rather than >= pointer size). This is just
for clarity, in practice this is a non-canonical form.
|
|
Currently these intrinsics output `undef` on poison, which triggers CI
errors on PRs that want to add poison tests for funnel shifts (such as
#172723). Let's make `fshl` and `fshr` propagate poison instead.
|
|
This is basically the same change as #162653, but for InstSimplify
instead of ConstantFolding.
It folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)` and `icmp
(ptrtoaddr x, C)` to `icmp (x, inttoptr C)`.
The fold is restricted to the case where the result type is the address
type, as icmp only compares the icmp bits. As in the other PR, I think
in practice all the folds are also going to work if the ptrtoint result
type is larger than the address size, but it's unclear how to justify
this in general.
|
|
This folds `icmp (ptrtoaddr x, ptrtoaddr y)` to `icmp (x, y)`, matching
the existing ptrtoint fold. Restrict both folds to only the case where
the result type matches the address type.
I think that all folds this can do in practice end up actually being
valid for ptrtoint to a type large than the address size as well, but I
don't really see a way to justify this generically without making
assumptions about what kind of folding the recursive calls may do.
This is based on the icmp semantics specified in
https://github.com/llvm/llvm-project/pull/163936.
|
|
The mask doesn't really affect the reverse. It only poisons the masked
off elements in the results. It should be ok to ignore the mask if we
can eliminate the pair.
I don't have a specific use case for this, but it matches what I had
implemented in our downstream before the current upstream
implementation. Submitting upstream so I can remove the delta
in my downstream.
|
|
Basically identical to nearbyint and rint, which we already treat as
rounding to nearest with ties to even during constant folding.
|
|
The libcall versions will later be optimized and constant-folded.
|
|
The behaviour of constant-folding `maxnum(sNaN, x)` and `minnum(sNaN,
x)` has become controversial, and there are ongoing discussions about
which behaviour we want to specify in the LLVM IR LangRef.
See:
- https://github.com/llvm/llvm-project/issues/170082
- https://github.com/llvm/llvm-project/pull/168838
- https://github.com/llvm/llvm-project/pull/138451
- https://github.com/llvm/llvm-project/pull/170067
-
https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006
This patch removes optimizations and constant-folding support for
`maxnum(sNaN, x)` but keeps it folded/optimized for `qNaN`. This should
allow for some more flexibility so the implementation can conform to
either the old or new version of the semantics specified without any
changes.
As far as I am aware, optimizations involving constant `sNaN` should
generally be edge-cases that rarely occur, so here should hopefully be
very little real-world performance impact from disabling these
optimizations.
|
|
When comparing additions with the same base where one has `nsw`, the
following simplification can be performed:
```llvm
icmp slt/sgt/sle/sge (x + C1), (x +nsw C2)
=>
icmp slt/sgt/sle/sge C1, C2
```
Previously this was only done for `slt`. This patch extends it to the
`sgt`, `sle`, and `sge` predicates when either of the conditions hold:
- `C1 <= C2 && C1 >= 0`, or
- `C2 <= C1 && C1 <= 0`
This patch also handles the `C1 == C2` case, which was previously
excluded.
Proof: https://alive2.llvm.org/ce/z/LtmY4f
|
|
intrinsics. (#168668)
We can constant fold interleave of identical splat vectors to a larger
splat vector.
|
|
deinterleave3-8. (#168640)
|
|
This matches how IR is printed.
|
|
interleave3-8. (#168473)
|
|
[andv, eorv, orv, s/uaddv, s/umaxv, s/uminv]
sve_reduce_##(none, ?) -> op's neutral value
sve_reduce_##(any, neutral) -> op's neutral value
[andv, orv, s/umaxv, s/uminv]
sve_reduce_##(all, splat(X)) -> X
[eorv]
sve_reduce_##(all, splat(X)) -> 0
|
|
vectors (#168055)
When simplifying min/max intrinsics with fixed-size vector constants,
InstructionSimplify attempts to optimize element-wise. However,
getAggregateElement() can return null for certain constant expressions
like bitcasts, leading to a null pointer dereference.
This patch adds a check to bail out of the optimization when
getAggregateElement() returns null, preventing the crash while
maintaining correct behavior for normal constant vectors.
Fixes crash with patterns like:
call <2 x half> @llvm.minnum.v2f16(<2 x half> %x,
<2 x half> bitcast (<1 x i32> <i32 N> to <2 x half>))
|
|
vectors. (#165437)
|
|
|
|
This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) -
ptrtoaddr(p) -> p2` fold.
This fold requires that p and p2 have the same underlying object
(otherwise the provenance may not be the same).
The argument I would like to make here is that because the underlying
objects are the same (and the pointers in the same address space), the
non-address bits of the pointer must be the same. Looking at some
specific cases of underlying object relationship:
* phi/select: Trivially true.
* getelementptr: Only modifies address bits, non-address bits must
remain the same.
* addrspacecast round-trip cast: Must preserve all bits because we
optimize such round-trip casts away.
* non-interposable global alias: I'm a bit unsure about this one, but I
guess the alias and the aliasee must have the same non-address bits?
* various intrinsics like launder.invariant.group, ptrmask. I think
these all either preserve all pointer bits (like the invariant.group
ones) or at least the non-address bits (like ptrmask). There are some
interesting cases like amdgcn.make.buffer.rsrc, but those are cross
address-space.
-----
There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform
in this function, which I am not extending to handle ptrtoaddr, adding
negative tests instead. This transform is overall dubious for provenance
reasons, but especially dubious with ptrtoaddr, as then we don't have
the guarantee that provenance of `p` has been exposed.
|
|
All the existing tests test code either in ConstantFolding or
InstSimplify, so move them to use -passes=instsimplify instead of
-passes=instcombine. This makes sure we keep InstSimplify coverage
even if there are subsuming InstCombine folds.
This requires writing some of the constant folding tests in a
different way, as InstSimplify does not try to re-fold already
existing constant expressions.
|
|
Splats include two poison values, but only the poison-ness of the
splatted value actually matters.
|
|
based operands. (#159358)
Simplifcation of vector.reduce intrinsics are prevented by an early
bailout for ConstantInt base operands. This PR removes the bailout and
updates the tests to show matching output when
-use-constant-int-for-*-splat is used.
|
|
Treat it the same way as ptrtoint. ptrmask only operates on the
address bits of the pointer.
|
|
We can fold ptrdiff(ptradd(p, x), p) to x regardless of whether the
ptradd is inbounds.
Proof: https://alive2.llvm.org/ce/z/Xuvc7N
|
|
Turns out there already was a test for the non-inbounds variant,
so remove the duplicate. Rename the tests to be more meaningful.
Drop irrelevant target triple.
|
|
Also regenerate the test in current format.
|
|
isEliminableCastPair() currently tries to support elimination of
ptrtoint/inttoptr cast pairs by assuming that the maximum possible
pointer size is 64 bits. Of course, this is no longer the case nowadays.
This PR changes isEliminableCastPair() to accept an optional DataLayout
argument, which is required to eliminate pointer casts.
This means that we no longer eliminate these cast pairs during ConstExpr
construction, and instead only do it during DL-aware constant folding.
This had a lot of annoying fallout on tests, most of which I've
addressed in advance of this change.
|
|
Add support for the new maximumnum and minimumnum intrinsics in various
optimizations in InstSimplify.
Also, change the behavior of optimizing maxnum(sNaN, x) to simplify to
qNaN instead of x to better match the LLVM IR spec, and add more tests
for sNaN behavior for all 3 max/min intrinsic types.
|
|
The intermediate integer type is too small to hold the full value.
|
|
This patch simplifies an fcmp into true/false if it is implied by a
dominating fcmp.
As an initial support, it only handles two cases:
+ `fcmp pred1, X, Y -> fcmp pred2, X, Y`: use set operations.
+ `fcmp pred1, X, C1 -> fcmp pred2, X, C2`: use `ConstantFPRange` and
set operations.
Note: It doesn't fix https://github.com/llvm/llvm-project/issues/70985,
as the second fcmp in the motivating case is not dominated by the edge.
We may need to adjust JumpThreading to handle this case.
Comptime impact (~+0.1%):
https://llvm-compile-time-tracker.com/compare.php?from=a728f213c863e4dd19f8969a417148d2951323c0&to=8ca70404fb0d66a824f39d83050ac38e2f1b25b9&stat=instructions:u
IR diff: https://github.com/dtcxzyw/llvm-opt-benchmark/pull/2848
|
|
Fix the NaN-handling semantics of various NVVM intrinsics converting
from fp types to integer types.
Previously in ConstantFolding, NaN inputs would be constant-folded to 0.
However, v9.0 of the PTX spec states that:
In float-to-integer conversions, depending upon conversion types, NaN
input results in following value:
* Zero if source is not `.f64` and destination is not `.s64`, .`u64`.
* Otherwise `1 << (BitWidth(dst) - 1)` corresponding to the value of
`(MAXINT >> 1) + 1` for unsigned type or `MININT` for signed type.
Also, support for constant-folding +/-Inf and values which
overflow/underflow the integer output type has been added (they clamp to
min/max int).
Because of this NaN-handling semantic difference, we also need to
disable transforming several intrinsics to FPToSI/FPToUI, as the LLVM
intstruction will return poison, but the intrinsics have defined
behaviour for these edge-cases like NaN/Inf/overflow.
|
|
Refactor all the tests in `fminmax-folds.ll` so that they are grouped by
optimization, rather than by intrinsic.
Instead of calling 1 intrinsic per function, each function now tests all
6 variants of the intrinsic. Results are stored to named pointers to
maintain readability in this more compact form. This makes it much
easier to compare the outputs from each intrinsic, rather than having
them scattered in different functions in different parts of the file. It
is also much more compact, so despite adding >50% more tests, the file
is ~500 lines shorter.
The tests added include:
* Adding `maximumnum` and `minimumnum` everywhere (currently not
optimized, but added as a baseline for future optimizations in #139581).
* Adding separate tests for SNaN and QNaN (as a baseline for correctness
improvements in #139581 )
* Adding tests for scalable vectors
* Increasing the variety of types used in various tests by using more
f16, f64, and vector types in tests.
The only coverage removed is for tests with undef (only poison is now
tested for).
Overall, this refactor should increase coverage, improve readability
with more comments and clear section headers, and make the tests much
more compact and easier to review in #139581 by providing a clear
baseline for each intrinsic's current behaviour.
|
|
Scalable get_active_lane_mask intrinsic calls can be simplified to i1
splat (ptrue) when its constant range is larger than or equal to the
maximum possible number of elements, which can be inferred from
vscale_range(x, y)
|
|
Avoiding any new inttoptr is unnecessarily restrictive for "plain"
non-integral pointers, but it is important for unstable pointers and
pointers with external state. Fixes another test codegen regression
from https://github.com/llvm/llvm-project/pull/105735.
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/159959
|
|
This commit adds finer-grained versions of isNonIntegralAddressSpace() and
isNonIntegralPointerType() where the current semantics prohibit
introduction of both ptrtoint and inttoptr instructions. The current
semantics are too strict for some targets (e.g. AMDGPU/CHERI) where
ptrtoint has a stable value, but the pointer has additional metadata.
Currently, marking a pointer address space as non-integral also marks it
as having an unstable bitwise representation (e.g. when pointers can be
changed by a copying GC). This property inhibits a lot of
optimizations that are perfectly legal for other non-integral pointers
such as fat pointers or CHERI capabilities that have a well-defined
bitwise representation but can't be created with only an address.
This change splits the properties of non-integral pointers and allows
for address spaces to be marked as unstable or non-integral (or both)
independently using the 'p' part of the DataLayout string.
A 'u' following the p marks the address space as unstable and specifying
a index width != representation width marks it as non-integral.
Finally, we also add an 'e' flag to mark pointers with external state
(such as the CHERI capability validity) state. These pointers require
special handling of loads and stores in addition to being non-integral.
This does not change the checks in any of the passes yet - we
currently keep the existing non-integral behaviour. In the future I plan
to audit calls to DL.isNonIntegral[PointerType]() and replace them with
the DL.mustNotIntroduce{IntToPtr,PtrToInt}() checks that allow for more
optimizations.
RFC: https://discourse.llvm.org/t/rfc-finer-grained-non-integral-pointer-properties/83176
Reviewed By: nikic, krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/105735
|
|
|
|
When the second argument passed to the get.active.lane.mask intrinsic is
zero we can simplify the instruction to return an all-false mask
regardless of the first operand.
|
|
Check if operands are ConstantInt to avoid crashing on constant
expression after https://github.com/llvm/llvm-project/pull/156659.
|
|
Scalable get_active_lane_mask intrinsics with a range of 0 can be
lowered to zeroinitializer. This helps remove no-op scalable masked
stores and loads.
|
|
|