Age | Commit message (Collapse) | Author | Files | Lines |
|
(#117007)
It can be unsafe to load a vector from an address and write a vector to
an address if those two addresses have overlapping lanes within a
vectorised loop iteration.
This PR adds intrinsics designed to create a mask with lanes disabled if
they overlap between the two pointer arguments, so that only safe lanes
are loaded, operated on and stored. The `loop.dependence.war.mask`
intrinsic represents cases where the store occurs after the load, and
the opposite for `loop.dependence.raw.mask`. The distinction between
write-after-read and read-after-write is important, since the ordering
of the read and write operations affects if the chain of those
instructions can be done safely.
Along with the two pointer parameters, the intrinsics also take an
immediate that represents the size in bytes of the vector element types.
This will be used by #100579.
|
|
This ensures that the required fields are set, and also makes the
construction more convenient.
|
|
getNode updates flags correctly for CSE. Calling setFlags after getNode
may set the flags where they don't apply.
I've added a Flags argument to getSelectCC and the signature of getNode that takes
an ArrayRef of EVTs.
|
|
The omissions are VECREDUCE_SEQ_* and MUL. The former goes down a
different code path and the latter is unsupported across all element types.
|
|
We have recently added the partial_reduce_smla and partial_reduce_umla
nodes to represent Acc += ext(b) * ext(b) where the two extends have to
have the same source type, and have the same extend kind.
For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which
correspond to the existing nodes, but we also have vqdotsu which
represents the case where the two extends are sign and zero respective
(i.e. not the same type of extend).
This patch adds a partial_reduce_sumla node which has sign extension for
A, and zero extension for B. The addition is somewhat mechanical.
|
|
On it's own, this change should be non-functional. This is a preparatory
change for https://github.com/llvm/llvm-project/pull/141267 which adds a
new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that
review, AArch64 needs a different set of legal and custom types for the
PARTIAL_REDUCE_SUMLA variant than the currently existing
PARTIAL_REDUCE_UMLA/SMLA.
|
|
Follow up to 6e654caab, use the new routines in more places. Note that
I've excluded from this patch any case which uses a getConstant index
instead of a getVectorIdxConstant index just to minimize room for
error. I'll get those in a separate follow up.
|
|
Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only
happens when the combine has been performed on the ISD node. Also adds
in check to only do the DAG combine when the node can then eventually be
lowered, so changes neon tests too.
---------
Co-authored-by: James Chesterman <james.chesterman@arm.com>
|
|
This patch also adds the tests for VP_REDUCE_{FMAX,FMIN,FMAXIMUM,FMINIMUM}, which have been supported for a while.
|
|
Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line
argument (aarch64-enable-partial-reduce-nodes) that indicates whether the
intrinsic experimental_vector_partial_ reduce_add will be transformed
into the new ISD node. Lowering with the new ISD nodes will, for now,
always be done as an expand.
|
|
This makes the name more consistent with the other helpers.
|
|
This adds the `llvm.sincospi` intrinsic, legalization, and lowering
(mostly reusing the lowering for sincos and frexp).
The `llvm.sincospi` intrinsic takes a floating-point value and returns
both the sine and cosine of the value multiplied by pi. It computes the
result more accurately than the naive approach of doing the
multiplication ahead of time, especially for large input values.
```
declare { float, float } @llvm.sincospi.f32(float %Val)
declare { double, double } @llvm.sincospi.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val)
```
Currently, the default lowering of this intrinsic relies on the
`sincospi[f|l]` functions being available in the target's runtime (e.g.
libc).
|
|
This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly
reusing the lowering for sincos and frexp).
The `llvm.modf` intrinsic takes a floating-point value and returns both
the integral and fractional parts (as a struct).
```
declare { float, float } @llvm.modf.f32(float %Val)
declare { double, double } @llvm.modf.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val)
```
This corresponds to the libm `modf` function but returns multiple values
in a struct (rather than take output pointers), which makes it easier to
vectorize.
|
|
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder.
The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
|
|
operations. (#122784)
This preserves the original VL leading to more reuse of VL for vsetvli.
The VLOptimizer can also clean up a lot of this, but I'm not sure if it
gets all of it.
There are some regressions in here from propagating the mask too, but
I'm not sure if that's a concern.
|
|
Closes #121793
|
|
calls. NFC.
|
|
|
|
VectorLegalizer::UnrollVSETCC. (#121526)
This code should follow the target preference for boolean contents of a
vector type. We shouldn't assume that true is negative one.
|
|
DAG.expandMultipleResultFPLibCall() (#114792)
This merges the logic for expanding both FFREXP and FSINCOS into one
method `DAG.expandMultipleResultFPLibCall()`. This reduces duplication
and also allows FFREXP to benefit from the stack slot elimination
implemented for FSINCOS. This method will also be used in future to
implement more multiple-result intrinsics (such as modf and sincospi).
|
|
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
|
|
This patch allows using enumeration values directly and simplifies the
implementation with bitwise logic. It addresses the comment in
https://github.com/llvm/llvm-project/pull/113808#discussion_r1819923625.
|
|
This adds the `llvm.sincos` intrinsic, legalization, and lowering.
The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).
```
declare { float, float } @llvm.sincos.f32(float %Val)
declare { double, double } @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val)
declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val)
```
The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
|
|
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
Based on example PR #96222 and fix PR #101268, with some differences due
to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp).
- Add llvm.experimental.constrained.atan2 - Intrinsics.td,
ConstrainedOps.def, LangRef.rst
- Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in
BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp
- Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp,
and LegalizeVectorTypes.cpp
- Update isKnownNeverNaN in SelectionDAG.cpp
- Update SelectionDAGDumper.cpp
- Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp
- TargetLoweringBase.cpp - Expand for vectors, promote f16
- X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC
Part 4 for Implement the atan2 HLSL Function #70096.
|
|
Specifically:
fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm,
fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt &
ftrunc
|
|
vectors in some cases. (#109232)
Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and
ARM.
|
|
marked Promote.
We have a special check that tries to determine if vector FP
operations are supported for the type to determine whether to
scalarize or not. If FP arithmetic would be promoted, don't unroll.
This improves Zvfhmin codegen on RISC-V.
|
|
This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a.
Regression was reported in Halide for arm32.
|
|
handlers. NFC (#108930)
Instead, return SDValue() to tell the caller to do the unrolling. This
is consistent with how some other handler work. Especially the handlers
that live in TLI.
ExpandBITREVERSE was rewritten to not take the Results vector an
argument.
|
|
Only scalarize single element vectors when vector FSUB is not
supported and scalar FNEG is supported.
|
|
f32 when we don't have Zvfh. (#106652)
The fp_extend will canonicalize NaNs which is not the semantics of
FNEG/FABS/FCOPYSIGN.
For fixed vectors I'm scalarizing due to test changes on other targets
where the scalarization is expected. I will try to address in a follow
up.
For scalable vectors, we bitcast to integer and use integer logic ops.
|
|
Make ExpandFNEG return SDValue() when it doesn't expand. The caller
already knows how to Unroll when Results is empty.
|
|
Address comment
https://github.com/llvm/llvm-project/pull/106747#issuecomment-2322922855.
|
|
Promoting canonicalizes NaNs which changes the semantics. Bitcast to
integer and use logic ops instead.
|
|
This preserves the semantis of fneg and matches what we do in
LegalizeDAG.
I kept the legal FSUB check to force unrolling for some targets that
don't have FSUB but have XOR. On Aarch64, using xor broke some tests that
expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg
(extractvectorelt X)))))) pattern.
|
|
Verifier is updated in a different patch to let the vector types for
llvm.lround and llvm.llround intrinsics.
|
|
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress`
to "compress" data within a vector based on a selection mask, i.e., it
moves all selected values (i.e., where `mask[i] == 1`) to consecutive
lanes in the result vector. A `passthru` vector can be provided, from
which remaining lanes are filled.
The main reason for this is that the existing
`@llvm.masked.compressstore` has very strong constraints in that it can
only write values that were selected, resulting in guard branches for
all targets except AVX-512 (and even there the AMD implementation is
_very_ slow). More instruction sets support "compress" logic, but only
within registers. So to store the values, an additional store is needed.
But this combination is likely significantly faster on many target as it
avoids branches.
In follow up PRs, my plan is to add target-specific lowerings for x86,
SVE, and possibly RISCV. I also want to combine this with a store
instruction, as this is probably a common case and we can avoid some
memory writes in that case.
See [discussion in
forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663)
for initial discussion on the design.
|
|
This change is an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This change adds constraint intrinsics and some lowering cases for
`acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`.
The only x86 specific change was for f80.
https://github.com/llvm/llvm-project/issues/70079
https://github.com/llvm/llvm-project/issues/70080
https://github.com/llvm/llvm-project/issues/70081
https://github.com/llvm/llvm-project/issues/70083
https://github.com/llvm/llvm-project/issues/70084
https://github.com/llvm/llvm-project/issues/95966
The x86 lowering is going to be done in three pr changes with this being
the first.
A second PR will be put up for Loop Vectorizing and then SLPVectorizer.
The constraint intrinsics is also going to be in multiple parts, but
just 2.
This part covers just the llvm specific changes, part2 will cover clang
specifc changes and legalization for backends than have special
legalization
requirements like aarch64 and wasm.
|
|
As far as I can tell, this pull request was not approved, and
did not go through an RFC on discourse.
This reverts commit 89881480030f48f83af668175b70a9798edca2fb.
This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
|
|
Currently, on different platform, the behaivor of llvm.minnum is
different if one operand is sNaN:
When we compare sNaN vs NUM:
ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN.
RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86:
Returns NUM but not same with IEEE754-2019's minimumNumber as
+0.0 is not always greater than -0.0.
MIPS/LoongArch/Generic: return NUM.
LIBCALL: returns qNaN.
So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow
IEEE754-2019's minimumNumber/maximumNumber.
Half-fix: #93033
|
|
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison
intrinsics in the SelectionDAG. Some of the expansions/lowerings
are not optimal yet.
|
|
Always match AVG patterns pre-legalization, and use TargetLowering::expandAVG to expand again during legalization.
I've removed the X86 custom AVGCEILU pattern detection and replaced with combines to try and convert other AVG nodes to AVGCEILU.
|
|
First, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client
wanted it expanded, and it can always be expanded. This logic was tied
up with what the VectorLegalizer wanted.
Prefer using the min/max opcodes, and unrolling if we don't have a
vselect.
This seems to produce better code in all the changed tests.
|
|
LegalizeVectorOps.
705636a1130551ab105aec95b909a35a0305fc9f moved reductions from
LegalizeVectorOps to LegalizeDAG, but the way it was done inadvertently
moved stores from LegalizeVectorOps to LegalizeDAG too. This was
not intended or desired.
Found when this was pulled into my downstream which has other changes
that make the distinction important.
|
|
This change is an implementation of #87367's investigation on supporting
IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
Much of this change was following how G_FSIN and G_FCOS were used.
Changes:
- `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN`
opcode
- `llvm/docs/LangRef.rst` - Document the tan intrinsic
- `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan
intrinsic as a vector function similar to the tanf libcall.
- `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to
`ISD::FTAN`
- `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for
`FTAN` and `STRICT_FTAN`
- `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic
- `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall
mappings
- `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN`
Opcode
- `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN`
Opcode handler
- `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map
`G_FTAN` to `ftan`
- `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`,
`strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN`
and `STRICT_FTAN`
- `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a
vector intrinsic
- `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic
to `G_FTAN` Opcode
- `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to
the list of floating point math operations also associate `G_FTAN` with
the `TAN_F` runtime lib.
- `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math
operation common behaviors.
- llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function
expansion operations for `FTAN` and `STRICT_FTAN`. Also define both
opcodes in `PromoteNode`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN`
and `STRICT_FTAN` handling in the legalizer
- `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define
`SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN`
as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define
`FTAN` as a legal vector operation.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an
intrinsic that doesn't return NaN.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map
`LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map
`Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for
`Intrinsic::tan`.
- `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan`
and `strict_ftan` names for the equivalent ISD opcodes.
- `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and
ISD::FTAN as a target lowering action.
- `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for
tan intrinsic
resolves https://github.com/llvm/llvm-project/issues/70082
|
|
by LegalizeVectorOps. (#92332)
The expand code is present, but we were missing the type query code
so the nodes would be ignored until LegalizeDAG.
|
|
LegalizeVectorType is responsible for legalizing nodes that perform an
operation on each element may need to scalarize.
This is not true for nodes like VP_REDUCE.*, BUILD_VECTOR,
SHUFFLE_VECTOR, EXTRACT_SUBVECTOR, etc.
This patch drops any nodes with a scalar result from LegalizeVectorOps
and handles them in LegalizeDAG instead.
This required moving the reduction promotion to LegalizeDAG. I have
removed the support integer promotion as it was incorrect for integer
min/max reductions. Since it was untested, it was best to assert on it
until it was really needed.
There are a couple regressions that can be fixed with a small DAG
combine which I will do as a follow up.
|
|
According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and
propagates NaN.
Expand the nodes on targets not supporting the operation, by adding
extra check for NaN and using is_fpclass to check zero signs.
|
|
|
|
(#88475)
…return only Load since other output is chain.
Added testcase that showed mismatched expected arity when Load and chain
were returned as separate items after
003b58f65bdd5d9c7d0c1b355566c9ef430c0e7d
|