Age | Commit message (Collapse) | Author | Files | Lines |
|
This PR provides the support for the capability
`TensorFloat32RoundingINTEL` and the instruction `OpRoundFToTF32INTE`L,
as specified by the `SPV_INTEL_tensor_float32_conversion` extension.
This extension introduces a rounding instruction that converts standard
32-bit floating-point values to the TensorFloat32 (TF32) format.
Reference Specification:
https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_tensor_float32_conversion.asciidoc
|
|
This is expanding on https://github.com/llvm/llvm-project/pull/102562
This allows also propagating attributes for scf.while lowering
|
|
This will ensure that:
- The `field` of a class can have an initial value
- The `field` op is emitted correctly
- The `getfield` op is emitted correctly
|
|
This PR enforces that SPIR-V instructions only allow fixed-size vectors
(no scalable vectors) by updating `SPIRV_VectorOf` to use
`FixedVectorOfLengthAndType` instead of `VectorOfLengthAndType`. The
affected MLIR tests are also updated accordingly.
|
|
(#151168)
…composites of multi-dimensional array
This fixes a bug in verification and serialization of replicated
constant composite ops where the splat value can potentially be a
multi-dimensional array.
---------
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
|
|
The private clause is the first with 'recipes', so a lot of
infrastructure is included here, including some MLIR dialect changes
that allow simple adding of a privatization. We'll likely get similar
for firstprivate and reduction.
Also, we have quite a bit of infrastructure in clause lowering to make
sure we have most cases we could think of covered.
At the moment, ONLY private is implemented, so all it requires is an
'init' segment (that doesn't call any copy operations), and potentially
a 'destroy' segment. However, actually calling 'init' functions on each
of the elements in them are not properly implemented, and will be in a
followup patch.
This patch implements all of that, and adds tests in a way that will be
useful for firstprivate as well.
|
|
parity (#150284)
This PR ensures parity in folding/canonicalizing of vector.broadcast
(from a scalar) and vector.splat. This means that by using
vector.broadcast instead of vector.splat (which is currently
deprecated), there is no loss in optimizations performed. All tests
which were previously checking folding/canonicalizing of vector.splat
are now done for vector.broadcast. The vector.splat canonicalization
tests are now in a separate file, ready for removal when, in the future,
we remove vector.splat completely.
This PR also adds a canonicalizer to vector.splat to always convert it
to vector.broadcast. This is to reduce the 'traffic' through
vector.splat.
There is a chance that this PR will break downstream users who create/expect
for vector.splat. Changing all such logic to work just vector.broadcast instead
should fix.
|
|
Pass (#150598)
Previous behaviour was to only run the check for constant operands when
the "strict-op-spec-alignment" check was enabled. However, this type of
check seems more generally useful without this option enabled. For
example, tosa-to-linalg doesn't enable the "strict-op-spec-alignment"
option when running the tosa-validate pass, but it does expect operands
to be constant.
By enabling this type of checking by default, lowering that don't
support non constant operands and don't use the
"strict-op-spec-alignment" option will fail early, rather than later in
the lowering pipeline.
Should a use-case need to disable this type of checking,
`--tosa-validate="extension=dynamic"` can be used.
|
|
format (#150991)
This prefix the output with the DEBUG_TYPE.
Dialect conversion is using a ScopedPrinter, we insert the
raw_ldbg_ostream to consistently prefix each new line.
|
|
(#151485)
…tNull
This patch enables (de)serialization to/from OpConstantNull for null
TensorARM
---------
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
|
|
This PR fixes several bugs in `ClampIsNoOp` pattern.
- static shape check is no need.
- ensures i1 values are zero extended to support fold boolean types
clamp.
Fixes #130016.
|
|
Operations (NFC) (#151547)
On the model of OpWithFlags, this modifier allows to stream an operation
using a custom AsmPrinter.
|
|
The `gpu.module` operation can contain `spirv.target_env` attributes
within an array attribute named `"targets"`. So it accounts for that
case by iterating over the `"targets"` attribute, if present, and
looking up `spirv.target_env`.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
|
|
linalg.unpack (#151503)
This patch introduces a new helper, `getCollapsedVecType`, and updates
`vectorizeAsTensorUnpackOp` to use it. The motivation stems from improving how
`vector.shape_cast` operations are generated when vectorizing `linalg.unpack`.
Previously, the vectorizer relied on
* `tensor::CollapseShapeOp::inferCollapsedType`
to compute the collapsed vector type. This approach is suboptimal
because:
* `inferCollapsedType` lacks awareness of scalable vector flags.
* Linalg vectorization should not depend on Tensor dialect utilities.
Instead of relocating `inferCollapsedType`, we introduce
`getCollapsedVecType` — a lightweight, specialized hook that:
* Assumes no dynamic sizes.
* Handles scalable flags alongside shape dimensions.
This change also reduces temporary variables in
`vectorizeAsTensorUnpackOp` and paves the way for a cleaner update in
#149293.
|
|
Add support for the `@llvm.nvvm.stmatrix` intrinsic series. These
correspond to PTX stmatrix operations, as documented in the [PTX ISA
reference](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-store-instruction-stmatrix).
|
|
The folder `shape_cast(splat constant) -> splat constant` was first
introduced
[here](https://github.com/llvm/llvm-project/commit/36480657d8ce97836f76bf5fa8c36677b9cdc19a#diff-484cea976e0c96459027c951733bf2d22d34c5a0c0de6f577069870ef4588983R2600)
(Nov 2020). In that commit there is a comment to _Only handle splat for
now_. Based on that I assume the intention was to, at a later time,
support a general `shape_cast(constant) -> constant` folder. That is
what this PR does
One minor downside: It is possible with this folder end up with, instead
of 1 large constant and 1 shape_cast, 2 large constants:
```mlir
func.func @foo() -> (vector<4xi32>, vector<2x2xi32>) {
%cst = arith.constant dense<[1, 2, 3, 4]> : vector<4xi32> # 'large' constant 1
%0 = vector.shape_cast %cst : vector<4xi32> to vector<2x2xi32>
return %cst, %0 : vector<4xi32>, vector<2x2xi32>
}
```
gets folded with this new folder to
```mlir
func.func @foo() -> (vector<4xi32>, vector<2x2xi32>) {
%cst = arith.constant dense<[1, 2, 3, 4]> : vector<4xi32> # 'large' constant 1
%cst_0 = arith.constant dense<[[1, 2], [3, 4]]> : vector<2x2xi32> # 'large' constant 2
return %cst, %cst_0 : vector<4xi32>, vector<2x2xi32>
}
```
Notes on the above case:
1) This only effects the textual IR, the actual values share the same
context storage (I've verified this by checking pointer values in the
`DenseIntOrFPElementsAttrStorage`
[constructor](https://github.com/llvm/llvm-project/blob/da5c442550a3823fff05c14300c1664d0fbf68c8/mlir/lib/IR/AttributeDetail.h#L59))
so no compile-time memory overhead to this folding. At the LLVM
IR level the constant is shared, too.
2) This only happens when the pre-folded constant cannot be dead code
eliminated (i.e. when it has 2+ uses) which I don't think is common.
|
|
spirv. (#151552)
This patch adds support for lowering several float classification ops
from the Math dialect to the SPIR-V dialect.
### Highlights:
- Introduced a new `spirv.IsFinite` operation corresponding to the
SPIR-V `OpIsFinite` instruction.
- Lowered `math.isfinite`, `math.isinf`, and `math.isnan` to SPIR-V
using `CheckedElementwiseOpPattern`.
- Added corresponding tests for op definition and conversion lowering.
This addresses the discussion in:
https://github.com/llvm/llvm-project/issues/150778
---
Let me know if any additional adjustments are needed!
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
|
|
Implements the `inferResultRanges` method from the
`InferIntRangeInterface` interface for `vector.step`. The implementation
is similar to that of arith.constant, since the exact result values are
statically known.
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
|
|
Enabling users to explicitly specify which regions should be preserved,
uncovers additional opportunities to utilize `scf.execute_region` op.
|
|
Implements the `inferResultRanges` method from the
`InferIntRangeInterface` interface for `vector.transpose`. The result
ranges simply match the source ranges.
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
|
|
When adding a predicated field to non-attribute properties /
implemneting PropConstraint, a call to genPropertyVerifiers() wasn't
added to the generation sequence for [Op]GenericAdaptor::verify. This
commit fixes the issue.
|
|
|
|
trait (#116544)
|
|
This is part of vector.splat deprecation
Reference: https://discourse.llvm.org/t/rfc-mlir-vector-deprecate-then-remove-vector-splat/87143/5
Instead of creating vector::SplatOp, create vector::BroadcastOp
|
|
Adds support for the module level assembly in the LLVM IR dialect.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
|
|
Previously when dropping unit dim from a pad with mixed dynamic/static
input/output shapes, the resulting shape would take on the Type of the
input, resulting in invalid IR.
Also did some minor cleanup to the formatting of the
`drop_unit_dim_corresponding_to_dynamic_dim` test to make it match the
rest of the file.
---------
Signed-off-by: dan <danimal197@gmail.com>
|
|
tdesc (#150576)
Add variant of load/store/prefetch to allow offset. The new xegpu.load
variant accepts memref+offset, and the existing tdesc operand will be
removed in the future PR.
The semantics are combination of "creating scattered_tdesc + xegpu.load
with scattered_tdesc". The current xegpu.load accepts tdesc operand,
which encapsulates "memref+offset". This PR "fold" "memref+offset"
directly to xegpu.load replacing "tdesc". Create_tdesc will be removed
as scatter_tdesc only contains base address after offsets being taken
away, so there is no point to keep it.
```mlir
// wi level code example
%2 = xegpu.load %src[%offsets], %mask <{chunk_size = 2}> : ui64, vector<1xindex>, vector<1xi1> -> vector<2xf32>
xegpu.store %val, %src[%offsets], %mask: vector<1xf16>, memref<?xf16>, vector<1xindex>, vector<1xi1>
xegpu.prefetch %src[%0] : ui64, vector<1xindex>
```
|
|
8-bit floats are not supported in SPIR-V. They are emulated as 8-bit
integer during conversion.
|
|
In https://github.com/llvm/llvm-project/pull/149156, I ensured that we
no longer generate spurious `tensor.empty` ops when vectorizing
`linalg.unpack`.
This follow-up removes leftover code that is now redundant but was
missed in the original PR and in #150602 that was also meant to clean-up
left-over code.
Note, this is removing code to compute "write-vector-sizes". Instead,
these are fully inferred from previous Ops.
|
|
(#151158)
This fixes an issue where multi-dimensional TensorArm dense elements
could not be serialized.
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
|
|
UpdateVCEPass currently deduces the required version based on vrsion
requirement of ops. This fix adds a check to update the minimum required
version based on capabilities as well.
---------
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
|
|
This patch extends the operation that rewrites elementwise operations
whose inputs are all broadcast from the same shape to handle
mixed-types, such as when the result and input types don't match, or
when the inputs have multiple types.
PR #150867 failed to check for the possibility of type mismatches when
rewriting splat constants. In order to fix that issue, we add support
for mixed-type operations more generally.
|
|
capabilities" (#151358)
Reverts llvm/llvm-project#151108 as it breaks sanitizer builds.
|
|
(#151349)
|
|
|
|
|
|
This PR refactors the xegpu-wg-to-sg.mlir tests to use larger shapes
which resemble closer to workgroup level programming.
|
|
(#151108)
When deducing capabilities implied capabilities are not considered,
which causes generation of incorrect SPIR-V modules. This commit fixes
that by pulling in the capability set for all the implied ones.
---------
Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
|
|
An alternative implementation could use `ArrayRef` of `NamedAttribute`s
or `NamedAttrList` to store structs decorations, as the deserializer
uses `NamedAttribute`s for decorations. However, using a custom struct
allows us to store the `spirv::Decoration`s directly rather than its
name in a `StringRef`/`StringAttr`.
|
|
(#151324)
…(… (#151099)
This reverts commit 2780b8f22058b35a8e70045858b87a1966df8df3 and relands
b7bfbc0c4c7b20d6623a5b0b4a7fea8ae08a62da.
Adds the following fixes compared to the original PR
(https://github.com/llvm/llvm-project/pull/150783):
- A bazel fix
- Use `let methods` instead of `list<InterfaceMethod> methods`
The missing forward declaration has been added in meantime:
https://github.com/llvm/llvm-project/commit/9164d206b33d61c93f5fc4628797485f96d654ca.
|
|
Previously this check assumed that if an operator exists in profile
complimance (TosaProfileComplianceData.h.inc), an entry exists in both
the profiles and extensions section. However, this is not necessarily
the case.
This commit changes the check such that it doesn't assume the above. In
doing so, it allows more operators to be checked for invalid data type
combinations, which were otherwise skipped previously.
|
|
The sanizer bots are reporting a missing declaration:
```
In file included from /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/mlir/lib/Dialect/LLVMIR/Transforms/InlinerInterfaceImpl.cpp:17:
In file included from /home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h:26:
/home/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h:318:34: error: no type named 'CallBase' in namespace 'llvm'
318 | llvm::CallBase *call,
| ~~~~~~^
1 error generated.
```
https://lab.llvm.org/buildbot/#/builders/94/builds/9340
https://lab.llvm.org/buildbot/#/builders/24/builds/11029
https://lab.llvm.org/buildbot/#/builders/169/builds/13454
https://lab.llvm.org/buildbot/#/builders/25/builds/10250
PR #151302 removed some indirect header #includes which had to be
includes explicitly. I do not know why this particular error only occurs
with the sanitizer buildbots.
Fix by adding a forward declaration.
|
|
(…" (#151316)
Reverts llvm/llvm-project#151125
Broke the gcc-7 build:
include/mlir/Target/LLVMIR/ModuleTranslation.h:318:34: error: no type
named 'CallBase' in namespace 'llvm'
llvm::CallBase *call,
~~~~~~^
|
|
(#151315)
Fixes #151314.
|
|
(#151125)
…… (#151099)
This reverts commit 2780b8f22058b35a8e70045858b87a1966df8df3 to reland
59013d44058ef423a117f95092150e16e16fdb09.
In addition to the original commit this one includes:
- This includes a bazel fix
- Use `let methods` instead of `list<InterfaceMethod> methods`
The original commit message was:
This patch extends the LLVM dialect's intrinsic infra to support
argument and result attributes. Initial support is added for the memory
intrinsics llvm.intr.memcpy, llvm.intr.memmove, and llvm.intr.memset.
Additionally, an ArgAndResultAttrsOpInterface is factored out of
CallOpInterface and CallableOpInterface, enabling operations to have
argument and result attributes without requiring them to be a call or a
callable operation.
|
|
|
|
`#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"` can be replaced with
forward-declarations of `OpenMPIRBuilder` and `CanonicalLoopInfo`. This
also avoids a dependency to `omp_gen` of the LLVMFrontendOpenMP
component which is included indirectly in `OMPIRBuilder.h`.
Since its inclusion in #147069, additional indirect dependencies on
headers included by `OMPIRBuilder.h` were introduced as well. These are
now included directly.
Reported-by: fabrizio-indirli
See
https://github.com/llvm/llvm-project/pull/147069#issuecomment-3114034973
|
|
Add XeVM dialect to LLVMIR translation.
Currently no ops are translated.
Only xevm.DecorationCacheControl are translated to metadata for spirv
decoration - !spirv.DecorationCacheControlINTEL.
Co-authored-by: Artem Kroviakov artem.kroviakov@intel.com
|
|
When OpenACC is enabled and Fortran loops are annotated with `acc loop`,
they are lowered to `acc.loop` operation. And rest of the contained
loops use the normal FIR lowering path.
Hovever, the OpenACC specification has special provisions related to
contained loops and their induction variable. In order to adhere to
this, we convert all valid contained loops to `acc.loop` in order to
store this information appropriately.
The provisions in the spec that motivated this change (line numbers are
from OpenACC 3.4):
- 1353 Loop variables in Fortran do statements within a compute
construct are predetermined to be private to the thread that executes
the loop.
- 3783 When do concurrent appears without a loop construct in a kernels
construct it is treated as if it is annotated with loop auto. If it
appears in a parallel construct or an accelerator routine then it is
treated as if it is annotated with loop independent.
By valid loops - we convert do loops and do concurrent loops which have
induction variable. Loops which are unstructured are not handled.
|
|
(#149576)
This PR fixes the computation of padded shapes for convolution-style
affine maps (e.g., d0 + d1) in `PadTilingInterface`. Previously, the
codes used the direct sum of loop upper bounds, leading to over-padding.
For example, the following `conv_2d_nhwc_fhwc` op, if only padding the c
dimensions to multiples of 16, it also incorrectly pads the convolved
dimensions and generates the wrong input shape as:
```
%padded = tensor.pad %arg0 low[0, 0, 0, 0] high[0, 1, 1, 12] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<1x16x16x4xf32> to tensor<1x17x17x16xf32>
%padded_0 = tensor.pad %arg1 low[0, 0, 0, 0] high[0, 0, 0, 12] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<16x3x3x4xf32> to tensor<16x3x3x16xf32>
%0 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%padded, %padded_0 : tensor<1x17x17x16xf32>, tensor<16x3x3x16xf32>) outs(%arg2 : tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32>
return %0 : tensor<1x14x14x16xf32>
```
The new implementation uses the maximum accessed index as the input for
affine map and then adds 1 after aggregating all the terms to get the
final padded size. This fixed
https://github.com/llvm/llvm-project/issues/148679.
|