Age | Commit message (Collapse) | Author | Files | Lines |
|
Remained two uses are related to fneg and foldFPToIntToFP, some AMDGPU
tests are duplicated and regenerated.
|
|
Remove NoSignedZerosFPMath in visitFSUB part, we should always use
instruction level fast math flags.
|
|
Remove NoSignedZerosFPMath in TargetLowering part, users should always
use instruction level fast math flags.
|
|
Fixes #160981
The exponential part of a floating-point number is signed. This patch
prevents treating it as unsigned.
|
|
On targets where f32 maximumnum is legal, but maximumnum on vectors of
smaller types is not legal (e.g. v2f16), try unrolling the vector first
as part of the expansion.
Only fall back to expanding the full maximumnum computation into
compares + selects if maximumnum on the scalar element type cannot be
supported.
|
|
Remove these global flags and use node level flags instead.
|
|
ucmp (#159889)
Same deal we use for determining ucmp vs scmp.
Using selects on platforms that like selects is better than using usubo.
Rename function to be more general fitting this new description.
|
|
computeKnownBits/ComputeNumSignBits (#159692)
Resolves #158332
|
|
Fixes [157370](https://github.com/llvm/llvm-project/issues/157370)
UREM General proof: https://alive2.llvm.org/ce/z/b_GQJX
SREM General proof: https://alive2.llvm.org/ce/z/Whkaxh
I have added it as rv32i and rv64i tests because they are the only architectures where I could verify that it works.
|
|
Fixes #159912
|
|
This is a common pattern to initialize Knownbits that occurs before
loops that call intersectWith.
|
|
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strlen routine is a millicode implementation;
we use millicode for the strlen function instead of a library call to
improve performance.
|
|
If we can't fold a PTRADD's offset into its users, lowering them to
disjoint ORs is preferable: Often, a 32-bit OR instruction suffices
where we'd otherwise use a pair of 32-bit additions with carry.
This needs to be a DAGCombine (and not a selection rule) because its
main purpose is to enable subsequent DAGCombines for bitwise operations.
We don't want to just turn PTRADDs into disjoint ORs whenever that's
sound because this transform loses the information that the operation
implements pointer arithmetic, which AMDGPU for instance needs when
folding constant offsets.
For SWDEV-516125.
|
|
This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds,
that targets can use to allow transformations to introduce out-of-bounds
pointer arithmetic. It also moves two such transformations from the
AMDGPU-specific DAG combines to the generic DAGCombiner.
This is motivated by target features like AArch64's checked pointer
arithmetic, CPA, which does not tolerate the introduction of
out-of-bounds pointer arithmetic.
|
|
There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp
that check for ISD::ADD in a pointer context, but as far as I can tell
those are only relevant for 32-bit pointer arithmetic (like frame
indices/scratch addresses and LDS), for which we don't enable PTRADD
generation yet.
For SWDEV-516125.
|
|
As reported in https://github.com/llvm/llvm-project/issues/141034
SelectionDAG::getNode had some unexpected
behaviors when trying to create vectors with UNDEF elements. Since
we treat both UNDEF and POISON as undefined (when using isUndef())
we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on
isUndef(), as that could make the resulting vector more poisonous.
Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR.
Here are some examples:
This fold was done even if vec[idx] was POISON:
INSERT_VECTOR_ELT vec, UNDEF, idx -> vec
This fold was done even if any of vec[idx..idx+size] was POISON:
INSERT_SUBVECTOR vec, UNDEF, idx -> vec
This fold was done even if the elements not extracted from vec could
be POISON:
sub = EXTRACT_SUBVECTOR vec, idx
INSERT_SUBVECTOR UNDEF, sub, idx -> vec
With this patch we avoid such folds unless we can prove that the
result isn't more poisonous when eliminating the insert.
Fixes https://github.com/llvm/llvm-project/issues/141034
|
|
NFC. (#159381)
We have assertions above confirming VT == N1.getValueType() for INSERT_VECTOR_ELT nodes.
|
|
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
|
|
The PowerPC changes are caused by shifts created by different IR
operations being CSEd now. This allows consecutive loads to be turned
into vectors earlier. This has effects on the ordering of other combines
and legalizations. This leads to some improvements and some regressions.
|
|
(#155141)
Hi, I compared the following LLVM IR with GCC and Clang, and there is a small difference between the two. The LLVM IR is:
```
define i64 @test_smin_neg_one(i64 %a) {
%1 = tail call i64 @llvm.smin.i64(i64 %a, i64 -1)
%retval.0 = xor i64 %1, -1
ret i64 %retval.0
}
```
GCC generates:
```
cmp x0, 0
csinv x0, xzr, x0, ge
ret
```
Clang generates:
```
cmn x0, #1
csinv x8, x0, xzr, lt
mvn x0, x8
ret
```
Clang keeps flipping x0 through x8 unnecessarily.
So I added the following folds to DAGCombiner:
fold (xor (smax(x, C), C)) -> select (x > C), xor(x, C), 0
fold (xor (smin(x, C), C)) -> select (x < C), xor(x, C), 0
alive2: https://alive2.llvm.org/ce/z/gffoir
---------
Co-authored-by: Yui5427 <785369607@qq.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
|
|
(#155256)
Based on comment of
https://github.com/llvm/llvm-project/pull/153600#discussion_r2285729269,
Add a helper function isTailCall for getting libcall in SelectionDAG.
|
|
This is a typo from #158553. We should use AmtVT instead of VT.
I guess VT and AmtVT are always the same at this point for tested
targets.
|
|
(#158553)
|
|
Many of the shifts in LegalizeIntegerTypes.cpp were using getPointerTy.
|
|
(#158388)
This ensures we don't need to fixup the shift amount later.
Unfortunately, this enabled the
(SRA (SHL X, ShlConst), SraConst) -> (SRA (sext_in_reg X), SraConst -
ShlConst) combine in combineShiftRightArithmetic for some cases in
is_fpclass-fp80.ll. So we need to also update checkSignTestSetCCCombine
to look through sign_extend_inreg to prevent a regression.
|
|
This function contained old code for handling the case that the type
returned getScalarShiftAmountTy can't hold the shift amount.
These days this is handled by getShiftAmountTy which is used by
getShiftAmountConstant.
|
|
|
|
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
|
|
Reverts llvm/llvm-project#157658
Causes hangs, see
https://github.com/llvm/llvm-project/pull/157658#issuecomment-3276441812
|
|
Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows
more optimization opportunities. In particular, if a target wants to
mark `extract_vector_elt` as `Custom` rather than `Legal` in order to
optimize some certain cases, this combiner would otherwise miss some
improvements.
Previously, using `isOperationLegalOrCustom` was avoided due to the risk
of getting stuck in infinite loops (as noted in
https://github.com/llvm/llvm-project/commit/61ec738b60a4fb47ec9b7195de55f1ecb5cbdb45).
After testing, the issue no longer reproduces, but the coverage is
limited to the regression/unit tests and the test-suite.
|
|
Generalize `fold (not (neg x)) -> (add X, -1)` to `fold (not (sub Y, X)) -> (add X, ~Y)`
---------
Co-authored-by: Yui5427 <785369607@qq.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
|
|
Factor out from #151275.
|
|
SimplifyDemandedVectorElts (#157027)
This patch reverts changes from commit 585e65d3307f5f0
(https://reviews.llvm.org/D104250), as it doesn't seem to be needed
nowadays.
The removed code was doing a recursive call to
SimplifyDemandedVectorElts trying to simplify the vector %vec when
finding things like
(SCALAR_TO_VECTOR (EXTRACT_VECTOR_ELT %vec, 0))
I figure that (EXTRACT_VECTOR_ELT %vec, 0) would be simplified based on
only demanding element zero regardless of being used in a
SCALAR_TO_VECTOR operation or not.
It had been different if the code tried to simplify the whole expression
as %vec. That could also have motivate why to make element zero a
special case. But it only simplified %vec without folding away the
SCALAR_TO_VECTOR.
|
|
undef/poison (#157056)
AVGFLOORS: https://alive2.llvm.org/ce/z/6TdoQ_
AVGFLOORU: https://alive2.llvm.org/ce/z/4pfi4i
AVGCEILS: https://alive2.llvm.org/ce/z/nWu8WM
AVGCEILU: https://alive2.llvm.org/ce/z/CGvWiA
Fixes #147696
|
|
This patch adds a new `TargetLowering` hook `lowerEHPadEntry()` that is
called at the start of lowering EH pads in SelectionDAG. This allows the
insertion of target-specific actions on entry to exception handlers.
This is used on AArch64 to insert SME streaming-mode switches at landing
pads. This is needed as exception handlers are always entered with
PSTATE.SM off, and the function needs to resume the streaming mode of
the function body.
|
|
Support tail calls to whole wave functions (trivial) and from whole wave
functions (slightly more involved because we need a new pseudo for the
tail call return, that patches up the EXEC mask).
Move the expansion of whole wave function return pseudos (regular and
tail call returns) to prolog epilog insertion, since that's where we
patch up the EXEC mask.
|
|
Proof: https://alive2.llvm.org/ce/z/cdbzSL
Closes https://github.com/llvm/llvm-project/issues/156559.
|
|
(#117007)
It can be unsafe to load a vector from an address and write a vector to
an address if those two addresses have overlapping lanes within a
vectorised loop iteration.
This PR adds intrinsics designed to create a mask with lanes disabled if
they overlap between the two pointer arguments, so that only safe lanes
are loaded, operated on and stored. The `loop.dependence.war.mask`
intrinsic represents cases where the store occurs after the load, and
the opposite for `loop.dependence.raw.mask`. The distinction between
write-after-read and read-after-write is important, since the ordering
of the read and write operations affects if the chain of those
instructions can be done safely.
Along with the two pointer parameters, the intrinsics also take an
immediate that represents the size in bytes of the vector element types.
This will be used by #100579.
|
|
friends. NFC (#156224)
Instead of std::optional<uint64_t>. Shift amounts must be less than or
equal to our maximum supported bit widths which fit in unsigned. Most of
the callers already assumed it fit in unsigned.
|
|
expandABD (#156193)
Not all paths in expandABD are using LHS and RHS twice.
|
|
|
|
closes #154331
This PR addresses all minimum changes needed to compile LLVM and MLIR
with the c++23 standard.
It is a work in progress and to be reviewed for better methods of
handling the parts of the build broken by c++23.
|
|
|
|
Add operators to shift left or right and insert unknown bits.
|
|
instructions. (#153033)
This PR adds support for VP intrinsics to be aware of the nontemporal
metadata information.
|
|
uses (#155427)
Closes https://github.com/llvm/llvm-project/issues/155345.
In the original case, we have one frozen use and two unfrozen uses:
```
t73: i8 = select t81, Constant:i8<0>, t18
t75: i8 = select t10, t18, t73
t59: i8 = freeze t18 (combining)
t80: i8 = freeze t59 (another user of t59)
```
In `DAGCombiner::visitFREEZE`, we replace all uses of `t18` with `t59`.
After updating the uses, `t59: i8 = freeze t18` will be updated to `t59:
i8 = freeze t59` (`AddModifiedNodeToCSEMaps`) and CSEed into `t80: i8 =
freeze t59` (`ReplaceAllUsesWith`). As the previous call to
`AddModifiedNodeToCSEMaps` already removed `t59` from the CSE map,
`ReplaceAllUsesWith` cannot remove `t59` again.
For clarity, see the following call graph:
```
ReplaceAllUsesOfValueWith(t18, t59)
ReplaceAllUsesWith(t18, t59)
RemoveNodeFromCSEMaps(t73)
update t73
AddModifiedNodeToCSEMaps(t73)
RemoveNodeFromCSEMaps(t75)
update t75
AddModifiedNodeToCSEMaps(t75)
RemoveNodeFromCSEMaps(t59) <- first delection
update t59
AddModifiedNodeToCSEMaps(t59)
ReplaceAllUsesWith(t59, t80)
RemoveNodeFromCSEMaps(t59) <- second delection
Boom!
```
This patch unfreezes all the uses first to avoid triggering CSE when
introducing cycles.
|
|
(#155455)
When going through the ISD::EXTRACT_ELEMENT case, `KnownSign - rIndex *
BitWidth`
could produce a negative. When a negative is produced, the lower bound
of the `std::clamp` is returned. Change that lower bound to one to avoid
potential underflows, because the expectation is that
`ComputeNumSignBits`
should always return at least 1.
Fixes #155452.
|
|
combine. (#155043)
If the srl+shl have the same shift amount and the shl has the nuw flag,
we can remove both.
In the affected test, the InterleavedAccess pass will emit a udiv after
the `mul nuw`. We expect them to combine away. The remaining shifts on
the RV64 tests are because we didn't add the zeroext attribute to the
incoming evl operand.
|
|
(#155063)
The original version of this change inadvertently dropped
b6e19b35cd87f3167a0f04a61a12016b935ab1ea. This version retains that fix
as well as adding tests for it and an explanation for why it is needed.
|
|
|