aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/SelectionDAG
AgeCommit message (Collapse)AuthorFilesLines
7 hours[DAGCombiner] Remove most `NoSignedZerosFPMath` uses (#161180)paperchalice1-9/+4
Remained two uses are related to fneg and foldFPToIntToFP, some AMDGPU tests are duplicated and regenerated.
23 hours[DAGCombiner] Remove NoSignedZerosFPMath uses in visitFSUB (#160974)paperchalice1-6/+3
Remove NoSignedZerosFPMath in visitFSUB part, we should always use instruction level fast math flags.
28 hours[TargetLowering] Remove NoSignedZerosFPMath uses (#160975)paperchalice1-7/+5
Remove NoSignedZerosFPMath in TargetLowering part, users should always use instruction level fast math flags.
2 days[SDAG] Constant fold frexp in signed way (#161015)Hongyu Chen1-2/+2
Fixes #160981 The exponential part of a floating-point number is signed. This patch prevents treating it as unsigned.
4 days[SelectionDAG] Improve v2f16 maximumnum expansion (#160723)Lewis Crawford1-1/+3
On targets where f32 maximumnum is legal, but maximumnum on vectors of smaller types is not legal (e.g. v2f16), try unrolling the vector first as part of the expansion. Only fall back to expanding the full maximumnum computation into compares + selects if maximumnum on the scalar element type cannot be supported.
4 days[DAGCombiner] Remove `NoSignedZerosFPMath` uses in `visitFADD` (#160635)paperchalice1-7/+5
Remove these global flags and use node level flags instead.
5 days[TargetLowering][ExpandABD] Prefer selects over usubo if we do the same for ↵AZero131-6/+8
ucmp (#159889) Same deal we use for determining ucmp vs scmp. Using selects on platforms that like selects is better than using usubo. Rename function to be more general fitting this new description.
8 days[DAG] Add ISD::VECTOR_COMPRESS handling in ↵Kavin Gnanapandithan1-0/+22
computeKnownBits/ComputeNumSignBits (#159692) Resolves #158332
8 days[DAG] Fold rem(rem(A, BCst), Op1Cst) -> rem(A, Op1Cst) (#159517)kper1-0/+18
Fixes [157370](https://github.com/llvm/llvm-project/issues/157370) UREM General proof: https://alive2.llvm.org/ce/z/b_GQJX SREM General proof: https://alive2.llvm.org/ce/z/Whkaxh I have added it as rv32i and rv64i tests because they are the only architectures where I could verify that it works.
9 days[DAG] Skip `mstore` combine for `<1 x ty>` vectors (#159915)Abhishek Kaushik1-0/+6
Fixes #159912
11 days[KnownBits] Add setAllConflict to set all bits in Zero and One. NFC (#159815)Craig Topper2-16/+10
This is a common pattern to initialize Knownbits that occurs before loops that call intersectWith.
11 days[PowerPC] using milicode call for strlen instead of lib call (#153600)zhijian lin2-3/+29
AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.
11 days[AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (#146075)Fabian Ritter1-0/+13
If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which AMDGPU for instance needs when folding constant offsets. For SWDEV-516125.
11 days[SDAG][AMDGPU] Allow opting in to OOB-generating PTRADD transforms (#146074)Fabian Ritter1-51/+75
This PR adds a TargetLowering hook, canTransformPtrArithOutOfBounds, that targets can use to allow transformations to introduce out-of-bounds pointer arithmetic. It also moves two such transformations from the AMDGPU-specific DAG combines to the generic DAGCombiner. This is motivated by target features like AArch64's checked pointer arithmetic, CPA, which does not tolerate the introduction of out-of-bounds pointer arithmetic.
11 days[AMDGPU][SDAG] Handle ISD::PTRADD in various special cases (#145330)Fabian Ritter2-5/+16
There are more places in SIISelLowering.cpp and AMDGPUISelDAGToDAG.cpp that check for ISD::ADD in a pointer context, but as far as I can tell those are only relevant for 32-bit pointer arithmetic (like frame indices/scratch addresses and LDS), for which we don't enable PTRADD generation yet. For SWDEV-516125.
13 days[SelectionDAG] Deal with POISON for INSERT_VECTOR_ELT/INSERT_SUBVECTOR (#143102)Björn Pettersson3-19/+110
As reported in https://github.com/llvm/llvm-project/issues/141034 SelectionDAG::getNode had some unexpected behaviors when trying to create vectors with UNDEF elements. Since we treat both UNDEF and POISON as undefined (when using isUndef()) we can't just fold away INSERT_VECTOR_ELT/INSERT_SUBVECTOR based on isUndef(), as that could make the resulting vector more poisonous. Same kind of bug existed in DAGCombiner::visitINSERT_SUBVECTOR. Here are some examples: This fold was done even if vec[idx] was POISON: INSERT_VECTOR_ELT vec, UNDEF, idx -> vec This fold was done even if any of vec[idx..idx+size] was POISON: INSERT_SUBVECTOR vec, UNDEF, idx -> vec This fold was done even if the elements not extracted from vec could be POISON: sub = EXTRACT_SUBVECTOR vec, idx INSERT_SUBVECTOR UNDEF, sub, idx -> vec With this patch we avoid such folds unless we can prove that the result isn't more poisonous when eliminating the insert. Fixes https://github.com/llvm/llvm-project/issues/141034
13 days[DAG] getNode() - reuse result type instead of calling getValueType again. ↵Simon Pilgrim1-2/+2
NFC. (#159381) We have assertions above confirming VT == N1.getValueType() for INSERT_VECTOR_ELT nodes.
13 days[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)Sander de Smalen1-1/+1
The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.
14 days[SelectionDAGBuilder][PPC] Use getShiftAmountConstant. (#158400)Craig Topper1-16/+11
The PowerPC changes are caused by shifts created by different IR operations being CSEd now. This allows consecutive loads to be turned into vectors earlier. This has effects on the ordering of other combines and legalizations. This leads to some improvements and some regressions.
14 days[DAGCombiner] add fold (xor (smin(x, C), C)) and fold (xor (smax(x, C), C)) ↵guan jian1-0/+49
(#155141) Hi, I compared the following LLVM IR with GCC and Clang, and there is a small difference between the two. The LLVM IR is: ``` define i64 @test_smin_neg_one(i64 %a) { %1 = tail call i64 @llvm.smin.i64(i64 %a, i64 -1) %retval.0 = xor i64 %1, -1 ret i64 %retval.0 } ``` GCC generates: ``` cmp x0, 0 csinv x0, xzr, x0, ge ret ``` Clang generates: ``` cmn x0, #1 csinv x8, x0, xzr, lt mvn x0, x8 ret ``` Clang keeps flipping x0 through x8 unnecessarily. So I added the following folds to DAGCombiner: fold (xor (smax(x, C), C)) -> select (x > C), xor(x, C), 0 fold (xor (smin(x, C), C)) -> select (x < C), xor(x, C), 0 alive2: https://alive2.llvm.org/ce/z/gffoir --------- Co-authored-by: Yui5427 <785369607@qq.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
14 days[NFC ]Add a helper function isTailCall for getting libcall in SelectionDAG ↵zhijian lin1-12/+16
(#155256) Based on comment of https://github.com/llvm/llvm-project/pull/153600#discussion_r2285729269, Add a helper function isTailCall for getting libcall in SelectionDAG.
2025-09-15[LegalizeTypes] Use correct type for constant in PromoteIntRes_FunnelShift.Craig Topper1-1/+1
This is a typo from #158553. We should use AmtVT instead of VT. I guess VT and AmtVT are always the same at this point for tested targets.
2025-09-15[LegalizeTypes] Use getShiftAmountConstant in PromoteIntRes_FunnelShift. ↵Craig Topper1-4/+5
(#158553)
2025-09-12[SelectionDAG] Use getShiftAmountConstant. (#158395)Craig Topper4-72/+46
Many of the shifts in LegalizeIntegerTypes.cpp were using getPointerTy.
2025-09-12[LegalizeTypes][X86] Use getShiftAmountConstant in ExpandIntRes_SIGN_EXTEND. ↵Craig Topper1-5/+4
(#158388) This ensures we don't need to fixup the shift amount later. Unfortunately, this enabled the (SRA (SHL X, ShlConst), SraConst) -> (SRA (sext_in_reg X), SraConst - ShlConst) combine in combineShiftRightArithmetic for some cases in is_fpclass-fp80.ll. So we need to also update checkSignTestSetCCCombine to look through sign_extend_inreg to prevent a regression.
2025-09-12[LegalizeTypes] Use getShiftAmountConstant in SplitInteger. (#158392)Craig Topper1-8/+3
This function contained old code for handling the case that the type returned getScalarShiftAmountTy can't hold the shift amount. These days this is handled by getShiftAmountTy which is used by getShiftAmountConstant.
2025-09-12[LegalizeIntegerTypes] Use getShiftAmountConstant.Craig Topper1-3/+3
2025-09-12CodeGen: Remove MachineFunction argument from getRegClass (#158188)Matt Arsenault3-7/+6
This is a low level utility to parse the MCInstrInfo and should not depend on the state of the function.
2025-09-10Revert "[DAGCombiner] Relax condition for extract_vector_elt combine" (#157953)Arthur Eubanks1-1/+2
Reverts llvm/llvm-project#157658 Causes hangs, see https://github.com/llvm/llvm-project/pull/157658#issuecomment-3276441812
2025-09-10[DAGCombiner] Relax condition for extract_vector_elt combine (#157658)ZhaoQi1-2/+1
Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows more optimization opportunities. In particular, if a target wants to mark `extract_vector_elt` as `Custom` rather than `Legal` in order to optimize some certain cases, this combiner would otherwise miss some improvements. Previously, using `isOperationLegalOrCustom` was avoided due to the risk of getting stuck in infinite loops (as noted in https://github.com/llvm/llvm-project/commit/61ec738b60a4fb47ec9b7195de55f1ecb5cbdb45). After testing, the issue no longer reproduces, but the coverage is limited to the regression/unit tests and the test-suite.
2025-09-08[DAG] Generalize fold (not (neg x)) -> (add X, -1) (#154348)guan jian1-7/+10
Generalize `fold (not (neg x)) -> (add X, -1)` to `fold (not (sub Y, X)) -> (add X, ~Y)` --------- Co-authored-by: Yui5427 <785369607@qq.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-09-06[SelectionDAG][ARM] Propagate fast math flags in visitBRCOND (#156647)paperchalice1-4/+4
Factor out from #151275.
2025-09-05[SelectionDAG] Clean up SCALAR_TO_VECTOR handling in ↵Björn Pettersson1-21/+0
SimplifyDemandedVectorElts (#157027) This patch reverts changes from commit 585e65d3307f5f0 (https://reviews.llvm.org/D104250), as it doesn't seem to be needed nowadays. The removed code was doing a recursive call to SimplifyDemandedVectorElts trying to simplify the vector %vec when finding things like (SCALAR_TO_VECTOR (EXTRACT_VECTOR_ELT %vec, 0)) I figure that (EXTRACT_VECTOR_ELT %vec, 0) would be simplified based on only demanding element zero regardless of being used in a SCALAR_TO_VECTOR operation or not. It had been different if the code tried to simplify the whole expression as %vec. That could also have motivate why to make element zero a special case. But it only simplified %vec without folding away the SCALAR_TO_VECTOR.
2025-09-05[DAG] SelectionDAG::canCreateUndefOrPoison - AVGFLOOR/AVGCEIL don't create ↵Simon Pilgrim1-0/+4
undef/poison (#157056) AVGFLOORS: https://alive2.llvm.org/ce/z/6TdoQ_ AVGFLOORU: https://alive2.llvm.org/ce/z/4pfi4i AVGCEILS: https://alive2.llvm.org/ce/z/nWu8WM AVGCEILU: https://alive2.llvm.org/ce/z/CGvWiA Fixes #147696
2025-09-04[AArch64][SME] Resume streaming-mode on entry to exception handlers (#156638)Benjamin Maxwell1-1/+9
This patch adds a new `TargetLowering` hook `lowerEHPadEntry()` that is called at the start of lowering EH pads in SelectionDAG. This allows the insertion of target-specific actions on entry to exception handlers. This is used on AArch64 to insert SME streaming-mode switches at landing pads. This is needed as exception handlers are always entered with PSTATE.SM off, and the function needs to resume the streaming mode of the function body.
2025-09-04[AMDGPU] Tail call support for whole wave functions (#145860)Diana Picus2-21/+37
Support tail calls to whole wave functions (trivial) and from whole wave functions (slightly more involved because we need a new pseudo for the tail call return, that patches up the EXEC mask). Move the expansion of whole wave function return pseudos (regular and tail call returns) to prolog epilog insertion, since that's where we patch up the EXEC mask.
2025-09-04[DAGCombine] Propagate nuw when evaluating sub with narrower types (#156710)Yingwei Zheng1-1/+9
Proof: https://alive2.llvm.org/ce/z/cdbzSL Closes https://github.com/llvm/llvm-project/issues/156559.
2025-09-02[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes ↵Sam Tebbs6-0/+137
(#117007) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.
2025-08-31[SelectionDAG] Return std::optional<unsigned> from getValidShiftAmount and ↵Craig Topper2-27/+27
friends. NFC (#156224) Instead of std::optional<uint64_t>. Shift amounts must be less than or equal to our maximum supported bit widths which fit in unsigned. Most of the callers already assumed it fit in unsigned.
2025-08-31[TargetLowering] Only freeze LHS and RHS if they are used multiple times in ↵AZero131-10/+13
expandABD (#156193) Not all paths in expandABD are using LHS and RHS twice.
2025-08-30[SelectionDAG] Add computeKnownBits for ISD::ROTL/ROTR. (#156142)Craig Topper1-0/+16
2025-08-29[llvm] Support building with c++23 (#154372)Kyle Krüger1-0/+2
closes #154331 This PR addresses all minimum changes needed to compile LLVM and MLIR with the c++23 standard. It is a work in progress and to be reviewed for better methods of handling the parts of the build broken by c++23.
2025-08-28[ValueTracking][SelectionDAG] Use KnownBits::reverseBits/byteSwap. NFC (#155847)Craig Topper1-4/+2
2025-08-28[KnownBits] Add operator<<=(unsigned) and operator>>=(unsigned). NFC (#155751)Craig Topper2-18/+9
Add operators to shift left or right and insert unknown bits.
2025-08-27[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication ↵daniel-trujillo-bsc2-16/+32
instructions. (#153033) This PR adds support for VP intrinsics to be aware of the nontemporal metadata information.
2025-08-27[DAGCombiner] Avoid double deletion when replacing multiple frozen/unfrozen ↵Yingwei Zheng1-0/+2
uses (#155427) Closes https://github.com/llvm/llvm-project/issues/155345. In the original case, we have one frozen use and two unfrozen uses: ``` t73: i8 = select t81, Constant:i8<0>, t18 t75: i8 = select t10, t18, t73 t59: i8 = freeze t18 (combining) t80: i8 = freeze t59 (another user of t59) ``` In `DAGCombiner::visitFREEZE`, we replace all uses of `t18` with `t59`. After updating the uses, `t59: i8 = freeze t18` will be updated to `t59: i8 = freeze t59` (`AddModifiedNodeToCSEMaps`) and CSEed into `t80: i8 = freeze t59` (`ReplaceAllUsesWith`). As the previous call to `AddModifiedNodeToCSEMaps` already removed `t59` from the CSE map, `ReplaceAllUsesWith` cannot remove `t59` again. For clarity, see the following call graph: ``` ReplaceAllUsesOfValueWith(t18, t59) ReplaceAllUsesWith(t18, t59) RemoveNodeFromCSEMaps(t73) update t73 AddModifiedNodeToCSEMaps(t73) RemoveNodeFromCSEMaps(t75) update t75 AddModifiedNodeToCSEMaps(t75) RemoveNodeFromCSEMaps(t59) <- first delection update t59 AddModifiedNodeToCSEMaps(t59) ReplaceAllUsesWith(t59, t80) RemoveNodeFromCSEMaps(t59) <- second delection Boom! ``` This patch unfreezes all the uses first to avoid triggering CSE when introducing cycles.
2025-08-26[DAG] ComputeNumSignBits - ISD::EXTRACT_ELEMENT needs to return at least 1 ↵Miguel Saldivar1-1/+1
(#155455) When going through the ISD::EXTRACT_ELEMENT case, `KnownSign - rIndex * BitWidth` could produce a negative. When a negative is produced, the lower bound of the `std::clamp` is returned. Change that lower bound to one to avoid potential underflows, because the expectation is that `ComputeNumSignBits` should always return at least 1. Fixes #155452.
2025-08-25[DAGCombiner] Preserve nuw when converting mul to shl. Use nuw in srl+shl ↵Craig Topper1-33/+41
combine. (#155043) If the srl+shl have the same shift amount and the shl has the nuw flag, we can remove both. In the affected test, the InterleavedAccess pass will emit a udiv after the `mul nuw`. We expect them to combine away. The remaining shifts on the RV64 tests are because we didn't add the zeroext attribute to the incoming evl operand.
2025-08-25Reland "[NVPTX] Legalize aext-load to zext-load to expose more DAG combines" ↵Alex MacLean1-1/+1
(#155063) The original version of this change inadvertently dropped b6e19b35cd87f3167a0f04a61a12016b935ab1ea. This version retains that fix as well as adding tests for it and an explanation for why it is needed.
2025-08-25DAG: Avoid comparing Register to unsigned 0 (#155164)Matt Arsenault1-1/+1