Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
This is the kind of options that downstream consumers of preconfigured MLIR
packages can check to see if the execution engine is available or not.
|
|
ElementwiseOpFusion file (#73850)
Declare `getPreservedProducerResults` function which helps to get the
preserved results of the producer linalg generic operation as a result
of elementwise fusion.
|
|
This MR fix the argument name of GEPOp builder from `basePtrType` to
`elementType` to avoid confusion.
Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>
|
|
VectorToGPU.cpp (NFC)
|
|
|
|
|
|
MapMemRefStorageClassPass.cpp (NFC)
|
|
GPUToLLVMConversion.cpp (NFC)
|
|
worklist management code (#74796)
Do not add the previous users of replaced ops to the worklist during
`notifyOperationReplaced`.
The previous users are modified inplace as part of
`PatternRewriter::replaceOp`, which calls
`PatternRewriter::replaceAllUsesWith`. The latter function updates all
users with `updateRootInPlace`, which already puts all previous users of
the replaced op on the worklist. No further worklist management work is
needed in the `notifyOperationReplaced` callback.
|
|
Also unifies some of the test set up parts in other conv tests
|
|
|
|
(#74754)
This patch modifies the CombineTransferReadOpTranspose pattern to handle
extf ops. Also adds a test which shows the transpose getting folded into
the transfer_read.
|
|
|
|
|
|
(#74750)
…se conv kernel.
|
|
|
|
(#67994)"
Test was failing due to a different transform sequence declaration (transform sequence were used, while now it should be named transform sequence). Test is now fixed.
|
|
(NFC)
Fixes #74611
|
|
The implementation of these are modeled after the existing fastmath
flags for floating point arithmetic.
|
|
(#67994)"
This reverts commit c4399130ae403acf4e6325b8b46a51bb6abf222f.
Test fails https://lab.llvm.org/buildbot/#/builders/272/builds/2757
|
|
Fixes https://github.com/llvm/llvm-project/issues/62489.
Some notes for each number:
- 1 `bool-literal` should be reasonably clear from context.
- 2 Fixed.
- 3 This is now fixed. `loc(fused[])` is valid, but `loc(fused["foo",])`
is not.
- 4 This operation uses `assemblyFormat` so the syntax is correct
(assuming ODS is correct).
- 5 This operation uses `assemblyFormat` so the syntax is correct
(assuming ODS is correct).
- 6 Added an example.
- 7 The suggested fix is in line with other `assemblyFormat` examples.
- 8 Added syntax and an example.
- 9 I don't know what this is referring too.
- 10 Added example.
- 11 and 12 suggestion seems wrong as the `ShapedTypeInterface` could be
extended by clients, so is not limited to tensors or vectors.
- 13 is already reasonably clear with the example, I think.
- 14 is already reasonably clear with the example, I think.
- 15 Added an example from the `opaque_locations.mlir` tests.
- 16 The answer to this seems to change over time and depend on the use
case? Suggestions by reviewers are welcome.
|
|
This patchs adds the `filter_operand_types` attribute to transform::MatchOp, allowing to filter ops depending on their operand types.
|
|
(#74346)
Following discussions from this RFC:
https://discourse.llvm.org/t/rfc-integer-overflow-semantics
Adding the overflow semantics into the muli, subi and addi arith
operations.
|
|
Rewrite patterns must return `success` if the IR was modified. This
commit fixes sparse tensor tests such as
`SparseTensor/sparse_fusion.mlir`,
`SparseTensor/CPU/sparse_reduce_custom.mlir`,
`SparseTensor/CPU/sparse_semiring_select.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.
|
|
The op used to support only float element types. This was inconsistent
with `ConstantOp::isBuildableWith`, which allows integer element types.
The complex type allows any float/integer element type.
Note: The other complex dialect ops do not support non-float element
types yet. The main purpose of this change to fix
`Tensor/canonicalize.mlir`, which is currently failing when verifying
the IR after each pattern application (#74270).
```
within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: error: 'complex.constant' op result #0 must be complex type with floating-point elements, but got 'complex<i32>'
%complex1 = tensor.extract %c1[] : tensor<complex<i32>>
^
within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: note: see current operation: %0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32>
"func.func"() <{function_type = () -> tensor<3xcomplex<i32>>, sym_name = "extract_from_elements_complex_i"}> ({
%0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32>
%1 = "arith.constant"() <{value = dense<(3,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>>
%2 = "arith.constant"() <{value = dense<(1,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>>
%3 = "tensor.extract"(%1) : (tensor<complex<i32>>) -> complex<i32>
%4 = "tensor.from_elements"(%0, %3, %0) : (complex<i32>, complex<i32>, complex<i32>) -> tensor<3xcomplex<i32>>
"func.return"(%4) : (tensor<3xcomplex<i32>>) -> ()
}) : () -> ()
```
|
|
|
|
|
|
The `ForallRewriter` pattern used to generate invalid IR:
```
mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: error: 'scf.for' op expects region #0 to have 0 or 1 blocks
mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: note: see current operation:
"scf.for"(%8, %2, %9) ({
^bb0(%arg5: index):
// ...
"scf.yield"() : () -> ()
^bb1(%10: index): // no predecessors
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
```
This commit fixes tests such as
`mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir` when verifying the
IR after each pattern application (#74270).
|
|
`createQuickSort` used to generate invalid IR:
```
"func.func"() <{function_type = (index, index, memref<?xindex>, memref<?xf32>, memref<?xi32>) -> (), sym_name = "_sparse_qsort_0_1_index_coo_1_f32_i32", sym_visibility = "private"}> ({
^bb0(%arg0: index, %arg1: index, %arg2: memref<?xindex>, %arg3: memref<?xf32>, %arg4: memref<?xi32>):
%0:2 = "scf.while"(%arg0, %arg1) ({
^bb0(%arg5: index, %arg6: index):
// ...
"scf.condition"(%3, %arg5, %arg6) : (i1, index, index) -> ()
}, {
^bb0(%arg5: index, %arg6: index):
// ...
%7:2 = "scf.if"(%6) ({
%8 = "arith.cmpi"(%2, %3) <{predicate = 7 : i64}> : (index, index) -> i1
// ...
"scf.yield"(%9#0, %9#1) : (index, index) -> ()
%10 = "arith.constant"() <{value = 0 : index}> : () -> index
}, {
"scf.yield"(%arg5, %arg5) : (index, index) -> ()
}) : (i1) -> (index, index)
"scf.yield"(%7#0, %7#1) : (index, index) -> ()
}) : (index, index) -> (index, index)
"func.return"() : () -> ()
}) : () -> ()
within split at mlir/test/Dialect/SparseTensor/buffer_rewriting.mlir:76 offset :11:1: error: 'scf.yield' op must be the last operation in the parent block
```
This commit fixes tests such as
`mlir/test/Dialect/SparseTensor/buffer_rewriting.mlir` when verifying
the IR after each pattern application (#74270).
|
|
Op verifiers should verify only local properties of an op. The dynamic
sizes of a `tensor.generate` op should not be verified. Dynamic sizes
that have a negative constant value should not prevent the
`tensor.generate` op from verifying.
Also share some code between the `tensor.empty` and `tensor.generate`
"dynamic dim -> static dim" canonicalization patterns.
Remove the `invalid-canonicalize.mlir` file and move the test case to
`canonicalize.mlir`. Canonicalization no longer produces IR that does
not verify (and leaves the op as is).
|
|
patterns (#74551)
The partial bufferization framework has been replaced with One-Shot
Bufferize. SCF-specific canonicalization patterns for
`to_memref`/`to_tensor` are no longer needed.
|
|
Add missing constant propogation folder for LeftShiftLogical,
RightShift[Logical|Arithmetic].
Implement additional folding when Shift value is 0.
This helps for readability of lowered code into SPIR-V.
Part of work for #70704
|
|
`*` is not friendly to the MLIR attribute parser and will fail to be
parsed. Switch the `*` enum representation to `star`.
|
|
Removes at in favor of curr; also makes method delegates consistent
|
|
|
|
The current vectorization of 1D depthwise convolutions in Linalg is
_sub-optimal_ for tensor with a low number of channel dimensions, e.g.:
```mlir
linalg.depthwise_conv_1d_nwc_wc
{dilations = dense<1> : vector<1xi64>,
strides = dense<1> : vector<1xi64>}
ins(%input, %filter : tensor<1x8x3xi8>, tensor<1x3xi8>)
outs(%output : tensor<1x8x3xi8>) -> tensor<1x8x3xi8>
```
That's due to the fact that ultimately (i.e. at LLVM level),
vectorization happens along the trailing dimension (i.e. the channel
dimension). In this case it leads to vectors with 3 elements (or worse,
if there's e.g. only 1 channel dimension). For comparison, a 128 bit
wide vector registers can hold 16 x i8.
Instead, this patch adds an option to flatten/collapse the channel
dimension into the width dimension of the input/filter/output using
`vector.shape_cast` operation:
```mlir
%sc_input = vector.shape_cast %input : vector<1x8x3xi8> to vector<1x24xi8>
%sc_output = vector.shape_cast %output : vector<1x8x3xi8> to vector<1x24xi8>
%b_filter = vector.broadcast %filter : vector<3xi8> to vector<1x8x3xi8>
%sc_filter = vector.shape_cast %b_filter : vector<1x8x3xi8> to vector<1x24xi8>
```
This new vectorization mode is implemented in `depthwiseConv` by
inserting `vector.shape_cast` Ops before and after
`depthwiseConv1dSliceAsMulAcc` is invoked. It can be selected through
e.g. a transform dialect attribute:
```mlir
transform.structured.vectorize_children_and_apply_patterns %conv {flatten_1d_depthwise_conv}
```
A forthcoming patch will implement a strategy to automatically switch
between the two implementations, depending on the shape of the input
tensors.
Co-authored by: Bradley Smith <bradley.smith@arm.com>
|
|
This adds a consistent usage with `at` for everything that refers to the
current loop nesting. This cleans up some redundant legacy code from
when we were still using topSort inside sparsifier code.
|
|
`TileUsingForOp` has an optional Attribute `interchange` which was given
in curly braces like this: `{interchange = [...]}`. The way this was
parsed meant that no `attr-dict` could be attached to the Op.
This patch adds printing / parsing of an `attr-dict` to the Op and
prints/parses the `interchange` Attribute separate from the
discardable Attributes.
|
|
ImportError (#74595)
|
|
Rationale:
We no longer deal with topsort during sparsification, so that LoopId ==
LoopOrd for all methods. This first revision removes the types. A follow
up revision will simplify some other remaining constructs that deal with
loop order (e.g. at and ldx).
|
|
|
|
Since #73253, loops over tiles in SSA form (i.e. loops that take
`iter_args` and yield a new tile) are supported, so this patch updates
ArmSME lowerings to this form. This is a NFC, as it still lowers to the
same intrinsics, but this makes IR less 'surprising' at a higher-level,
and may be recognised by more transforms.
Example:
IR before:
```mlir
scf.for %tile_slice_index = %c0 to %num_tile_slices step %c1
{
arm_sme.move_vector_to_tile_slice
%broadcast_to_1d, %tile, %tile_slice_index :
vector<[4]xi32> into vector<[4]x[4]xi32>
}
// ... later use %tile
```
IR now:
```mlir
%broadcast_to_tile = scf.for %tile_slice_index = %c0 to %num_tile_slices
step %c1 iter_args(%iter_tile = %init_tile) -> (vector<[4]x[4]xi32>)
{
%tile_update = arm_sme.move_vector_to_tile_slice
%broadcast_to_1d, %iter_tile, %tile_slice_index :
vector<[4]xi32> into vector<[4]x[4]xi32>
scf.yield %tile_update : vector<[4]x[4]xi32>
}
// ... later use %broadcast_to_tile
```
|
|
|
|
This PR introduce `fence.mbarrier.init` OP
|
|
`llvm.fcmp` does support fast math attributes therefore so should
`arith.cmpf`.
The heavy churn in flang tests are because flang sets
`fastmath<contract>` by default on all operations that support the fast
math interface. Downstream users of MLIR should not be so effected.
This was requested in https://github.com/llvm/llvm-project/issues/74263
|
|
(#73951)" (#74579)
This reverts commit f42b7615b862bb5f77981f619f92877eb20adf54.
The fold pattern is incorrect, because it does not even look at the
permutation of non-unit dims and is happy to replace a pattern such as
```
%22 = vector.shape_cast %21 : vector<1x256x256xf32> to vector<256x256xf32>
%23 = vector.transpose %22, [1, 0] : vector<256x256xf32> to vector<256x256xf32>
```
with
```
%22 = vector.shape_cast %21 : vector<1x256x256xf32> to vector<256x256xf32>
```
which is obviously incorrect.
|
|
`gpu.dynamic_shared_memory` currently does not get lowered when it is
used with vector dialect. The reason is that vector-to-llvm conversion
is not included in gpu-to-nvvm. This PR includes that and adds a test.
|
|
There's an issue in the translator today where, for a CallsiteLoc, if
the callee does not have a DI scope (perhaps due to compile options or
optimizations), it may get propagated the DI scope of its callsite's
parent function, which will create a non-existent DILocation combining
line & col number from one file, and the filename from another.
The root problem is we cannot propagate the parent scope when
translating the callee location, as it no longer applies to inlined
locations (see code diff and hopefully this will make sense).
To facilitate this, the importer is also changed so that callee scopes
are fused with the callee FileLineCol loc, instead of on the Callsite
loc itself. This comes with the benefit that we now have a symmetric
Callsite loc representation. If we required the callee scope be always
annotated on the Callsite loc, it would be hard for generic inlining
passes to maintain that, since it would have to somehow understand the
semantics of the fused metadata and pull it out while inlining.
|
|
The tracking listener should not report op replacement errors for
payload ops that are not mapped to any live handles. The handle liveless
analysis did not work properly with transform IR that has named
sequences.
A handle is live if it has a user after the transform op that is
currently being applied. With named sequences, we need to maintain a
stack of currently applied transform ops. That stack already exists
(`regionStack`), the only thing that's missing is the current transform
op for each stack frame.
This commit fixes #72931.
|