aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-09-02[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes ↵Sam Tebbs1-0/+51
(#117007) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.
2025-08-15[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)Nikita Popov1-9/+3
This ensures that the required fields are set, and also makes the construction more convenient.
2025-07-22[SelectionDAG] Pass SDNodeFlags through getNode instead of setFlags. (#149852)Craig Topper1-5/+4
getNode updates flags correctly for CSE. Calling setFlags after getNode may set the flags where they don't apply. I've added a Flags argument to getSelectCC and the signature of getNode that takes an ArrayRef of EVTs.
2025-06-20[LLVM][CodeGen][SVE] Add isel for bfloat unordered reductions. (#143540)Paul Walker1-9/+36
The omissions are VECREDUCE_SEQ_* and MUL. The former goes down a different code path and the latter is unsupported across all element types.
2025-06-09[SDAG] Add partial_reduce_sumla node (#141267)Philip Reames1-0/+2
We have recently added the partial_reduce_smla and partial_reduce_umla nodes to represent Acc += ext(b) * ext(b) where the two extends have to have the same source type, and have the same extend kind. For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which correspond to the existing nodes, but we also have vqdotsu which represents the case where the two extends are sign and zero respective (i.e. not the same type of extend). This patch adds a partial_reduce_sumla node which has sign extension for A, and zero extension for B. The addition is somewhat mechanical.
2025-05-29[SDAG] Split the partial reduce legalize table by opcode [nfc] (#141970)Philip Reames1-2/+3
On it's own, this change should be non-functional. This is a preparatory change for https://github.com/llvm/llvm-project/pull/141267 which adds a new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that review, AArch64 needs a different set of legal and custom types for the PARTIAL_REDUCE_SUMLA variant than the currently existing PARTIAL_REDUCE_UMLA/SMLA.
2025-05-08[DAG/RISCV] Continue mitgrating to getInsertSubvector and getExtractSubvectorPhilip Reames1-4/+2
Follow up to 6e654caab, use the new routines in more places. Note that I've excluded from this patch any case which uses a getConstant index instead of a getVectorIdxConstant index just to minimize room for error. I'll get those in a separate follow up.
2025-04-23[AArch64][SVE] Add dot product lowering for PARTIAL_REDUCE_MLA node (#130933)Nicholas Guy1-2/+5
Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only happens when the combine has been performed on the ISD node. Also adds in check to only do the DAG combine when the node can then eventually be lowered, so changes neon tests too. --------- Co-authored-by: James Chesterman <james.chesterman@arm.com>
2025-02-28[SelectionDAG][RISCV] Promote VECREDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM} (#128800)Jim Lin1-1/+7
This patch also adds the tests for VP_REDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM}, which have been supported for a while.
2025-02-18[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207)James Chesterman1-0/+6
Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line argument (aarch64-enable-partial-reduce-nodes) that indicates whether the intrinsic experimental_vector_partial_ reduce_add will be transformed into the new ISD node. Lowering with the new ISD nodes will, for now, always be done as an expand.
2025-02-11[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705)Benjamin Maxwell1-1/+1
This makes the name more consistent with the other helpers.
2025-02-11[IR] Add llvm.sincospi intrinsic (#125873)Benjamin Maxwell1-3/+7
This adds the `llvm.sincospi` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.sincospi` intrinsic takes a floating-point value and returns both the sine and cosine of the value multiplied by pi. It computes the result more accurately than the naive approach of doing the multiplication ahead of time, especially for large input values. ``` declare { float, float } @llvm.sincospi.f32(float %Val) declare { double, double } @llvm.sincospi.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val) ``` Currently, the default lowering of this intrinsic relies on the `sincospi[f|l]` functions being available in the target's runtime (e.g. libc).
2025-02-07[IR] Add `llvm.modf` intrinsic (#121948)Benjamin Maxwell1-0/+9
This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.modf` intrinsic takes a floating-point value and returns both the integral and fractional parts (as a struct). ``` declare { float, float } @llvm.modf.f32(float %Val) declare { double, double } @llvm.modf.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val) ``` This corresponds to the libm `modf` function but returns multiple values in a struct (rather than take output pointers), which makes it easier to vectorize.
2025-01-20[SDAG] Add an ISD node to help lower vector.extract.last.active (#118810)Graham Hunter1-0/+4
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder. The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
2025-01-13[LegalizeVectorOps][RISCV] Use VP_FP_EXTEND/ROUND when promoting VP_FP* ↵Craig Topper1-3/+20
operations. (#122784) This preserves the original VL leading to more reuse of VL for vsetvli. The VLOptimizer can also clean up a lot of this, but I'm not sure if it gets all of it. There are some regressions in here from propagating the mask too, but I'm not sure if that's a concern.
2025-01-08[X86] Combine `uitofp <v x i32> to <v x half>` (#121809)abhishek-kaushik221-0/+25
Closes #121793
2025-01-06[DAG] VectorLegalizer::ExpandUINT_TO_FLOAT- pull out repeated getValueType ↵Simon Pilgrim1-23/+19
calls. NFC.
2025-01-06[X86] Support lowering of FMINIMUMNUM/FMAXIMUMNUM (#121464)Phoebe Wang1-0/+6
2025-01-03[LegalizeVectorOps] Use getBoolConstant instead of getAllOnesConstant in ↵Craig Topper1-1/+3
VectorLegalizer::UnrollVSETCC. (#121526) This code should follow the target preference for boolean contents of a vector type. We shouldn't assume that true is negative one.
2024-11-06[SDAG] Merge multiple-result libcall expansion into ↵Benjamin Maxwell1-3/+5
DAG.expandMultipleResultFPLibCall() (#114792) This merges the logic for expanding both FFREXP and FSINCOS into one method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication and also allows FFREXP to benefit from the stack slot elimination implemented for FSINCOS. This method will also be used in future to implement more multiple-result intrinsics (such as modf and sincospi).
2024-10-31[SDAG] Support expanding `FSINCOS` to vector library calls (#114039)Benjamin Maxwell1-0/+5
This shares most of its code with the scalar sincos expansion. It allows expanding vector FSINCOS nodes to a library call from the specified `-vector-library`. The upside of this is it will mean the vectorizer only needs to handle the sincos intrinsic, which has no memory effects, and this can handle lowering the intrinsic to a call that takes output pointers.
2024-10-31[SDAG] Simplify `SDNodeFlags` with bitwise logic (#114061)Yingwei Zheng1-9/+3
This patch allows using enumeration values directly and simplifies the implementation with bitwise logic. It addresses the comment in https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.
2024-10-29[IR] Add `llvm.sincos` intrinsic (#109825)Benjamin Maxwell1-0/+1
This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.
2024-10-16[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760)Tex Riddell1-0/+1
This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Based on example PR #96222 and fix PR #101268, with some differences due to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp). - Add llvm.experimental.constrained.atan2 - Intrinsics.td, ConstrainedOps.def, LangRef.rst - Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp - Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp, and LegalizeVectorTypes.cpp - Update isKnownNeverNaN in SelectionDAG.cpp - Update SelectionDAGDumper.cpp - Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp - TargetLoweringBase.cpp - Expand for vectors, promote f16 - X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC Part 4 for Implement the atan2 HLSL Function #70096.
2024-10-07[LLVM][CodeGen] Add lowering for scalable vector bfloat operations. (#109803)Paul Walker1-0/+23
Specifically: fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm, fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt & ftrunc
2024-09-30[LegalizeVectorOps] Enable ExpandFABS/COPYSIGN to use integer ops for fixed ↵Craig Topper1-6/+17
vectors in some cases. (#109232) Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and ARM.
2024-09-18[LegalizeVectorOps][RISCV] Don't scalarize FNEG in ExpandFNEG if FSUB is ↵Craig Topper1-1/+1
marked Promote. We have a special check that tries to determine if vector FP operations are supported for the type to determine whether to scalarize or not. If FP arithmetic would be promoted, don't unroll. This improves Zvfhmin codegen on RISC-V.
2024-09-17Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific."Craig Topper1-7/+3
This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a. Regression was reported in Halide for arm32.
2024-09-17[LegalizeVectorOps] Remove calls to DAG.UnrollVectorsOps from some expansion ↵Craig Topper1-45/+56
handlers. NFC (#108930) Instead, return SDValue() to tell the caller to do the unrolling. This is consistent with how some other handler work. Especially the handlers that live in TLI. ExpandBITREVERSE was rewritten to not take the Results vector an argument.
2024-09-16[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific.Craig Topper1-3/+7
Only scalarize single element vectors when vector FSUB is not supported and scalar FNEG is supported.
2024-09-03[LegalizeDAG][RISCV] Don't promote f16 vector ISD::FNEG/FABS/FCOPYSIGN to ↵Craig Topper1-1/+62
f32 when we don't have Zvfh. (#106652) The fp_extend will canonicalize NaNs which is not the semantics of FNEG/FABS/FCOPYSIGN. For fixed vectors I'm scalarizing due to test changes on other targets where the scalarization is expected. I will try to address in a follow up. For scalable vectors, we bitcast to integer and use integer logic ops.
2024-09-02[LegalizeVectorOps] Defer UnrollVectorOp in ExpandFNEG to caller. (#106783)Craig Topper1-12/+15
Make ExpandFNEG return SDValue() when it doesn't expand. The caller already knows how to Unroll when Results is empty.
2024-09-01[SDAG] Expand vector [u|s]cmp in VectorLegalizer (#106883)Yingwei Zheng1-0/+4
Address comment https://github.com/llvm/llvm-project/pull/106747#issuecomment-2322922855.
2024-08-30[LegalizeVectorOps][RISCV] Don't promote VP_FABS/FNEG/FCOPYSIGN. (#106659)Craig Topper1-0/+100
Promoting canonicalizes NaNs which changes the semantics. Bitcast to integer and use logic ops instead.
2024-08-29[LegalizeVectorOps][PowerPC] Use xor to expand fneg. (#106595)Craig Topper1-5/+11
This preserves the semantis of fneg and matches what we do in LegalizeDAG. I kept the legal FSUB check to force unrolling for some targets that don't have FSUB but have XOR. On Aarch64, using xor broke some tests that expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg (extractvectorelt X)))))) pattern.
2024-08-21Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054)Sumanth Gundapaneni1-0/+2
Verifier is updated in a different patch to let the vector types for llvm.lround and llvm.llround intrinsics.
2024-07-17[LLVM] Add `llvm.experimental.vector.compress` intrinsic (#92289)Lawrence Benson1-0/+4
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress` to "compress" data within a vector based on a selection mask, i.e., it moves all selected values (i.e., where `mask[i] == 1`) to consecutive lanes in the result vector. A `passthru` vector can be provided, from which remaining lanes are filled. The main reason for this is that the existing `@llvm.masked.compressstore` has very strong constraints in that it can only write values that were selected, resulting in guard branches for all targets except AVX-512 (and even there the AMD implementation is _very_ slow). More instruction sets support "compress" logic, but only within registers. So to store the values, an additional store is needed. But this combination is likely significantly faster on many target as it avoids branches. In follow up PRs, my plan is to add target-specific lowerings for x86, SVE, and possibly RISCV. I also want to combine this with a store instruction, as this is probably a common case and we can avoid some memory writes in that case. See [discussion in forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663) for initial discussion on the design.
2024-07-11[X86][CodeGen] Add base trig intrinsic lowerings (#96222)Farzon Lotfi1-0/+6
This change is an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds constraint intrinsics and some lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. The only x86 specific change was for f80. https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 The x86 lowering is going to be done in three pr changes with this being the first. A second PR will be put up for Loop Vectorizing and then SLPVectorizer. The constraint intrinsics is also going to be in multiple parts, but just 2. This part covers just the llvm specific changes, part2 will cover clang specifc changes and legalization for backends than have special legalization requirements like aarch64 and wasm.
2024-06-21Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"Nikita Popov1-10/+0
As far as I can tell, this pull request was not approved, and did not go through an RFC on discourse. This reverts commit 89881480030f48f83af668175b70a9798edca2fb. This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21Intrinsic: introduce minimumnum and maximumnum (#93841)YunQiang Su1-0/+10
Currently, on different platform, the behaivor of llvm.minnum is different if one operand is sNaN: When we compare sNaN vs NUM: ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN. RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86: Returns NUM but not same with IEEE754-2019's minimumNumber as +0.0 is not always greater than -0.0. MIPS/LoongArch/Generic: return NUM. LIBCALL: returns qNaN. So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow IEEE754-2019's minimumNumber/maximumNumber. Half-fix: #93033
2024-06-17[SelectionDAG] Add support for the 3-way comparison intrinsics [US]CMP (#91871)Poseydon421-0/+2
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison intrinsics in the SelectionDAG. Some of the expansions/lowerings are not optimal yet.
2024-06-12[DAG] Add legalization handling for AVGCEIL/AVGFLOOR nodes (#92096)Simon Pilgrim1-0/+13
Always match AVG patterns pre-legalization, and use TargetLowering::expandAVG to expand again during legalization. I've removed the X86 custom AVGCEILU pattern detection and replaced with combines to try and convert other AVG nodes to AVGCEILU.
2024-06-06DAG: Improve fminimum/fmaximum vector expansion logic (#93579)Matt Arsenault1-5/+2
First, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client wanted it expanded, and it can always be expanded. This logic was tied up with what the VectorLegalizer wanted. Prefer using the min/max opcodes, and unrolling if we don't have a vselect. This seems to produce better code in all the changed tests.
2024-06-05[LegalizeVectorOps] Move VP_STORE legalization from LegalizeDAG to ↵Craig Topper1-1/+2
LegalizeVectorOps. 705636a1130551ab105aec95b909a35a0305fc9f moved reductions from LegalizeVectorOps to LegalizeDAG, but the way it was done inadvertently moved stores from LegalizeVectorOps to LegalizeDAG too. This was not intended or desired. Found when this was pulled into my downstream which has other changes that make the distinction important.
2024-06-05[x86] Add tan intrinsic part 4 (#90503)Farzon Lotfi1-0/+1
This change is an implementation of #87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Much of this change was following how G_FSIN and G_FCOS were used. Changes: - `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN` opcode - `llvm/docs/LangRef.rst` - Document the tan intrinsic - `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan intrinsic as a vector function similar to the tanf libcall. - `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to `ISD::FTAN` - `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for `FTAN` and `STRICT_FTAN` - `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic - `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall mappings - `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN` Opcode - `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN` Opcode handler - `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map `G_FTAN` to `ftan` - `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`, `strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN` and `STRICT_FTAN` - `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a vector intrinsic - `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic to `G_FTAN` Opcode - `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to the list of floating point math operations also associate `G_FTAN` with the `TAN_F` runtime lib. - `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math operation common behaviors. - llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function expansion operations for `FTAN` and `STRICT_FTAN`. Also define both opcodes in `PromoteNode`. - `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN` and `STRICT_FTAN` handling in the legalizer - `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define `SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`. - `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN` as a legal vector operation. - `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define `FTAN` as a legal vector operation. - `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an intrinsic that doesn't return NaN. - `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map `LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map `Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for `Intrinsic::tan`. - `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan` and `strict_ftan` names for the equivalent ISD opcodes. - `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and ISD::FTAN as a target lowering action. - `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for tan intrinsic resolves https://github.com/llvm/llvm-project/issues/70082
2024-05-15[LegalizeVectorOps][X86] Add ISD::ABDS/ABSDU to the list of opcodes handled ↵Craig Topper1-0/+2
by LegalizeVectorOps. (#92332) The expand code is present, but we were missing the type query code so the nodes would be ignored until LegalizeDAG.
2024-04-29[SelectionDAG][RISCV] Move VP_REDUCE* legalization to LegalizeDAG.cpp. (#90522)Craig Topper1-68/+5
LegalizeVectorType is responsible for legalizing nodes that perform an operation on each element may need to scalarize. This is not true for nodes like VP_REDUCE.*, BUILD_VECTOR, SHUFFLE_VECTOR, EXTRACT_SUBVECTOR, etc. This patch drops any nodes with a scalar result from LegalizeVectorOps and handles them in LegalizeDAG instead. This required moving the reduction promotion to LegalizeDAG. I have removed the support integer promotion as it was incorrect for integer min/max reductions. Since it was untested, it was best to assert on it until it was really needed. There are a couple regressions that can be fixed with a small DAG combine which I will do as a follow up.
2024-04-29[Legalizer] Expand fmaximum and fminimum (#67301)Qiu Chaofan1-0/+7
According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.
2024-04-24AMDGPU: Fix vector handling of fptrunc_roundMatt Arsenault1-0/+1
2024-04-16[AMDGPU] In VectorLegalizer::Expand, if UnrollVectorOp returns Load, … ↵choikwa1-2/+8
(#88475) …return only Load since other output is chain. Added testcase that showed mismatched expected arity when Load and chain were returned as separate items after 003b58f65bdd5d9c7d0c1b355566c9ef430c0e7d