aboutsummaryrefslogtreecommitdiff
path: root/mlir/test
AgeCommit message (Collapse)AuthorFilesLines
2023-12-05[mlir][gpu] Add lowering dynamic_shared_memory op for rocdl (#74473)Guray Ozen1-0/+17
This PR adds lowering of `gpu.dynamic_shared_memory` to rocdl target.
2023-12-05[tosa] Fix crash in shape inference for `tosa.transpose` (#74367)Felix Schneider1-0/+11
Fixes a crash in `TransposeOp::inferReturnTypeComponents()` when the supplied permutation tensor is rank-0. Also removes some dead code from the type inference function. Fix https://github.com/llvm/llvm-project/issues/74237
2023-12-05[mlir][nfc] Add missing comment in a testAndrzej Warzynski1-0/+3
2023-12-05[mlir][llvm] Fix verifier for const int and dense (#74340)Rik Huijzer1-0/+16
Continuation of https://github.com/llvm/llvm-project/pull/74247 to fix https://github.com/llvm/llvm-project/issues/56962. Fixes verifier for (Integer Attr): ```mlir llvm.mlir.constant(1 : index) : f32 ``` and (Dense Attr): ```mlir llvm.mlir.constant(dense<100.0> : vector<1xf64>) : f32 ``` ## Integer Attr The addition that this PR makes to `LLVM::ConstantOp::verify` is meant to be exactly verifying the code in `mlir::LLVM::detail::getLLVMConstant`: https://github.com/llvm/llvm-project/blob/9f78edbd20ed922cced9482f7791deb9899a6d82/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L350-L353 One failure mode is when the `type` (`llvm.mlir.constant(<value>) : <type>`) is not an `Integer`, because then the `cast` in `getIntegerBitWidth` will crash: https://github.com/llvm/llvm-project/blob/dca432cb7b1c282f5dc861095813c4f40f109619/llvm/include/llvm/IR/DerivedTypes.h#L97-L99 So that's now caught in the verifier. Apart from that, I don't see anything we could check for. `sextOrTrunc` means "Sign extend or truncate to width" and that one is quite permissive. For example, the following doesn't have to be caught in the verifier as it doesn't crash during `mlir-translate -mlir-to-llvmir`: ```mlir llvm.func @main() -> f32 { %cst = llvm.mlir.constant(100 : i64) : f32 llvm.return %cst : f32 } ``` ## Dense Attr Crash if not either a MLIR Vector type or one of these: https://github.com/llvm/llvm-project/blob/9f78edbd20ed922cced9482f7791deb9899a6d82/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L375-L391
2023-12-05[mlir][llvm] Add llvm.target_features features attribute (#71510)Benjamin Maxwell3-0/+39
This patch adds a target_features (TargetFeaturesAttr) to the LLVM dialect to allow setting and querying the features in use on a function. The motivation for this comes from the Arm SME dialect where we would like a convenient way to check what variants of an operation are available based on the CPU features. Intended usage: The target_features attribute is populated manually or by a pass: ```mlir func.func @example() attributes { target_features = #llvm.target_features<["+sme", "+sve", "+sme-f64f64"]> } { // ... } ``` Then within a later rewrite the attribute can be checked, and used to make lowering decisions. ```c++ // Finds the "target_features" attribute on the parent // FunctionOpInterface. auto targetFeatures = LLVM::TargetFeaturesAttr::featuresAt(op); // Check a feature. // Returns false if targetFeatures is null or the feature is not in // the list. if (!targetFeatures.contains("+sme-f64f64")) return failure(); ``` For now, this is rather simple just checks if the exact feature is in the list, though it could be possible to extend with implied features using information from LLVM.
2023-12-05[MLIR] Add support for frame pointers in MLIR (#72145)Radu Salavat3-0/+22
Add support for frame pointers in MLIR. --------- Co-authored-by: Markus Böck <markus.boeck02@gmail.com> Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
2023-12-05[MLIR][LLVM] Translate Debug EmissionKind (#74376)Billy Zhu1-0/+21
Translate debug emission kind into LLVM (the importer already supports this).
2023-12-05[mlir][Vector] Update patterns for flattening vector.xfer Ops (2/N) (#73523)Andrzej Warzyński2-0/+56
Updates patterns for flattening `vector.transfer_read` by relaxing the requirement that the "collapsed" indices are all zero. This enables collapsing cases like this one: ```mlir %2 = vector.transfer_read %arg4[%c0, %arg0, %arg1, %c0] ... : memref<1x43x4x6xi32>, vector<1x2x6xi32> ``` Previously only the following case would be consider for collapsing (all indices are 0): ```mlir %2 = vector.transfer_read %arg4[%c0, %c0, %c0, %c0] ... : memref<1x43x4x6xi32>, vector<1x2x6xi32> ``` Also adds some new comments and renames the `firstContiguousInnerDim` parameter as `firstDimToCollapse` (the latter better matches the actual meaning). Similar updates for `vector.transfer_write` will be implemented in a follow-up patch.
2023-12-05[mlir][python] python binding wrapper for the affine.AffineForOp (#74408)Amy Wang1-27/+155
This PR creates the wrapper class AffineForOp and adds a testcase for it. A testcase for the AffineLoadOp is also added.
2023-12-05[mlir][Complex] Fix bug in `MergeComplexBitcast` (#74271)Matthias Springer1-1/+1
When two `complex.bitcast` ops are folded and the resulting bitcast is a non-complex -> non-complex bitcast, an `arith.bitcast` should be generated. Otherwise, the generated `complex.bitcast` op is invalid. Also remove a pattern that convertes non-complex -> non-complex `complex.bitcast` ops to `arith.bitcast`. Such `complex.bitcast` ops are invalid and should not appear in the input. Note: This bug can only be triggered by running with `-debug` (which will should intermediate IR that does not verify) or with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` (#74270).
2023-12-04[MLIR][NVVM]: Add cp.async.mbarrier.arrive Op (#74241)Durga2-0/+26
Add: * an Op for 'cp.async.mbarrier.arrive', targeting the nvvm_cp_async_mbarrier_arrive* family of intrinsics. * The 'noinc' intrinsic property is modelled as a default-valued-attr of type I1. * Test cases are added to verify the Op as well as the intrinsic lowering. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2023-12-04[mlir][nvvm] Introduce `nvvm.fence.proxy` (#74057)Guray Ozen2-0/+31
This PR introduce `nvvm.fence.proxy` OP for the following cases: ``` nvvm.fence.proxy { kind = #nvvm.proxy_kind<alias>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.global>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.shared>, space = #nvvm.shared_space<cta>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.shared>, space = #nvvm.shared_space<cluster>} ```
2023-12-04[mlir][nvgpu] Add address space attribute converter in nvgpu-to-nvvm pass ↵Guray Ozen1-0/+13
(#74075) GPU dialect has `#gpu.address_space<workgroup>` for shared memory of NVGPU (address space =3). Howeverm when IR combine NVGPU and GPU dialect, `nvgpu-to-nvvm` pass fails due to missing attribute conversion. This PR adds `populateGpuMemorySpaceAttributeConversions` to nvgou-to-nvvm lowering, so we can use `#gpu.address_space<workgroup>` `nvgpu-to-nvvm` pass
2023-12-04[mlir] Add support for DIGlobalVariable and DIGlobalVariableExpression (#73367)Justin Wilson5-8/+85
This PR introduces DIGlobalVariableAttr and DIGlobalVariableExpressionAttr so that ModuleTranslation can emit the required metadata needed for debug information about global variable. The translator implementation for debug metadata needed to be refactored in order to allow translation of nodes based on MDNode (DIGlobalVariableExpressionAttr and DIExpression) in addition to DINode-based nodes. A DIGlobalVariableExpressionAttr can now be passed to the GlobalOp operation directly and ModuleTranslation will create the respective DIGlobalVariable and DIGlobalVariableExpression nodes. The compile unit that DIGlobalVariable is expected to be configured with will be updated with the created DIGlobalVariableExpression.
2023-12-04[mlir][Vector] Update patterns for flattening vector.xfer Ops (1/N) (#73522)Andrzej Warzyński1-14/+82
Updates "flatten vector" patterns to support more cases, namely Ops that read/write vectors with leading unit dims. For example: ```mlir %0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0] ... : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>, vector<1x1x2x2xi8> ``` Currently, the `vector.transfer_read` above would not be flattened. With this change, it will be rewritten as follows: ```mlir %collapse_shape = memref.collapse_shape %arg0 [[0, 1, 2, 3]] : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>> into memref<120xi8, strided<[1], offset: ?>> %0 = vector.transfer_read %collapse_shape[%c0] ... : memref<120xi8, strided<[1], offset: ?>>, vector<4xi8> %1 = vector.shape_cast %0 : vector<4xi8> to vector<1x1x2x2xi8> ``` `hasMatchingInnerContigousShape` is generalised and renamed as `isContiguousSlice` to better match the updated functionality. A few test names are updated to better highlight what case is being exercised.
2023-12-04[mlir][ArmSME] Move vector.print -> ArmSME lowering to VectorToArmSME (#74063)Benjamin Maxwell2-22/+22
This moves the SME tile vector.print lowering from `-convert-arm-sme-to-scf` to `-convert-vector-to-arm-sme`. This seems like a more logical place, as this is lowering a vector op to ArmSME, and it also prevents vector.print from blocking tile allocation.
2023-12-04[mlir][memref] Fix an invalid dim loop motion crash (#74204)Rik Huijzer1-0/+48
Fixes https://github.com/llvm/llvm-project/issues/73382. This PR suggests to replace two assertions that were introduced in https://github.com/llvm/llvm-project/commit/adabce41185910227ca276a1cfd22e76443dd238 (https://reviews.llvm.org/D135748). According to the enum definition of `NotSpeculatable`, an op that invokes undefined behavior is `NotSpeculatable`. https://github.com/llvm/llvm-project/blob/0c06e8745f131d867c566f4d35a7a04e24b4a075/mlir/include/mlir/Interfaces/SideEffectInterfaces.h#L248-L258 and both `tensor.dim` and `memref.dim` state that "If the dimension index is out of bounds, the behavior is undefined." So therefore it seems to me that `DimOp::getSpeculatability()` should return `NotSpeculatable` if the dimension index is out of bounds. The added test is just a simplified version of https://github.com/llvm/llvm-project/issues/73382.
2023-12-04[mlir][llvm] Fix verifier for const float (#74247)Rik Huijzer2-1/+17
Fixes one of the cases of https://github.com/llvm/llvm-project/issues/56962. This PR basically moves some code from `mlir::LLVM::detail::getLLVMConstant` ([source](https://github.com/llvm/llvm-project/blob/9f78edbd20ed922cced9482f7791deb9899a6d82/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp#L354-L371)) over to the verifier of `LLVM::ConstantOp`. For now, I focused just on the case where the attribute is a float and ignored the integer case of https://github.com/llvm/llvm-project/issues/56962. Note that without this patch, both added tests will crash inside `getLLVMConstant` during `mlir-translate -mlir-to-llvmir`.
2023-12-02[mlir][tosa] Improve lowering of tosa.conv2d (#74143)Spenser Bauman1-66/+59
The existing lowering of tosa.conv2d emits a separate linalg.generic operator to add the bias after computing the computation. This change eliminates that additional step by using the generated linalg.conv_2d_* operator by using the bias value as the input to the linalg.conv_2d operation. Rather than: %init = tensor.empty() %conv = linalg.conv_2d ins(%A, %B) %outs(%init) %init = tensor.empty() %bias = linalg.generic ins(%conv, %bias) outs(%init2) { // perform add operation } The lowering now produces: %init = tensor.empty() %bias_expanded = linalg.broadcast ins(%bias) outs(%init) %conv = linalg.conv_2d ins(%A, %B) %outs(%bias) This is the same strategy as https://github.com/llvm/llvm-project/pull/73049 applied to convolutions.
2023-12-01[MLIR][SCF] Handle more cases in pipelining transform (#74007)Thomas Raoux1-1/+53
-Fix case where an op is scheduled in stage 0 and used with a distance of 1 -Fix case where we don't peel the epilogue and a value not part of the last stage is used outside the loop.
2023-12-01[mlir][Linalg] Fix foldFillPackIntoFillOp to work for general cases (#74148)Vivian1-0/+19
2023-12-01[mlir][tensor] Fix ReifyResultShapes implementation for tensor.concat (#74157)Quinn Dawkins1-22/+26
Without folding the result of the initial tensor.dim, the ReifyResultShapes implementation would be incorrect because it would return a dynamic shape for a static result shape.
2023-12-01[mlir][sparse] fix crash when generate rotated convolution kernels. (#74146)Peiming Liu1-1/+36
2023-12-01[mlir][linalg] Fix weight dimension ordering in 2D grouped conv (#73855)Felix Schneider1-0/+32
The `conv_2d_ngchw_fgchw` Op implements 2d grouped convolution with dimensions ordered as given in the name. However, the current implementation orders weights as `gfchw` instead of `fgchw`. This was already pointed out in an old phabricator revision which never landed: https://reviews.llvm.org/D150064 This patch 1) Adds a new op `conv_2d_ngchw_gfchw` 2) Fixes the dimension ordering of the old op `conv_2d_ngchw_fgchw` 3) Adds tests with non-dynamic dimensions so that it's easier to understand.
2023-12-01[mlir][tensor] Add a tensor.concat operation (#72779)Quinn Dawkins4-0/+134
This adds an operation for concatenating ranked tensors along a static dimension, as well as a decomposition mirroring the existing lowering from TOSA to Tensor. This offers a convergence point for "input" like dialects that include various lowerings for concatenation operations, easing later analysis. In the future, this op can implement the necessary interfaces for tiling, as well as potentially add conversions to some kind of linalg and/or memref counterpart. This patch adds the op, the decomposition, and some basic folding/canonicalization. Replacing lowerings with the op (such as the TOSA lowering) will come as a follow up. See https://discourse.llvm.org/t/rfc-tensor-add-a-tensor-concatenate-operation/74858
2023-12-01[mlir][tensor] Fold padding_value away for pack ops when possible. (#74005)Han-Chung Wang1-0/+65
If we can infer statically that there are no incomplete tiles, we can remove the optional padding operand. Fixes https://github.com/openxla/iree/issues/15417
2023-12-01[mlir][sve][nfc] Merge the integration tests for linalg.matmul (#74059)Andrzej Warzyński2-96/+69
At the moment the logic to tile and vectorize `linalg.matmul` is duplicated in multiple test files: * matmul.mlir * matmul_mixed_ty.mlir Instead, this patch uses `transform.foreach` to apply the same sequence to multiple functions within the same test file (e.g. `matmul_f32` and `matmul_mixed_ty` as defined in the original files). This allows us to merge relevant test files.
2023-12-01[mlir][tosa] Fix lowering of tosa.conv2d (#73240)Spenser Bauman1-0/+23
The lowering of tosa.conv2d produces an illegal tensor.empty operation where the number of inputs do not match the number of dynamic dimensions in the output type. The fix is to base the generation of tensor.dim operations off the result type of the conv2d operation, rather than the input type. The problem and fix are very similar to this fix https://github.com/llvm/llvm-project/pull/72724 but for convolution.
2023-12-01[mlir][tosa] Improve lowering to tosa.fully_connected (#73049)Spenser Bauman2-43/+77
The current lowering of tosa.fully_connected produces a linalg.matmul followed by a linalg.generic to add the bias. The IR looks like the following: %init = tensor.empty() %zero = linalg.fill ins(0 : f32) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%zero) // Add the bias %initB = tensor.empty() %result = linalg.generic ins(%prod, %bias) outs(%initB) { // add bias and product } This has two down sides: 1. The tensor.empty operations typically result in additional allocations after bufferization 2. There is a redundant traversal of the data to add the bias to the matrix product. This extra work can be avoided by leveraging the out-param of linalg.matmul. The new IR sequence is: %init = tensor.empty() %broadcast = linalg.broadcast ins(%bias) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%broadcast) In my experiments, this eliminates one loop and one allocation (post bufferization) from the generated code.
2023-12-01[mlir][tosa] Improve tosa-infer-shapes for ops consumed by non-TOSA ↵Spenser Bauman1-0/+12
operators (#72715) TOSA operators consumed by non-TOSA ops generally do not have their types inferred, as that would alter the types expected by their consumers. This prevents type refinement on many TOSA operators when the IR contains a mix of dialects. This change modifies tosa-infer-shapes to update the types of all TOSA operators during inference. When a consumer of that TOSA op is not safe to update, a tensor.cast is inserted back to the original type. This behavior is similar to how TOSA ops consumed by func.return are handled. This allows for more type refinement of TOSA ops, and the additional tensor.cast operators may be removed by later canonicalizations.
2023-12-01[mlir][vector] Fix unit dim dropping pattern for masked writes (#74038)Quinn Dawkins1-0/+44
This does the same as #72142 for vector.transfer_write. Previously the pattern would silently drop the mask.
2023-12-01[mlir][Vector] Add fold transpose(shape_cast) -> shape_cast (#73951)Benjamin Maxwell1-0/+12
This folds transpose(shape_cast) into a new shape_cast, when the transpose just permutes a unit dim from the result of the shape_cast. Example: ``` %0 = vector.shape_cast %vec : vector<[4]xf32> to vector<[4]x1xf32> %1 = vector.transpose %0, [1, 0] : vector<[4]x1xf32> to vector<1x[4]xf32> ``` Folds to: ``` %0 = vector.shape_cast %vec : vector<[4]xf32> to vector<1x[4]xf32> ``` This is an (alternate) fix for lowering matmuls to ArmSME.
2023-12-01[mlir][ArmSME] Add option to only enable streaming mode/ZA if required (#73931)Benjamin Maxwell1-0/+16
This adds a `only-if-required-by-ops` flag to the `enable-arm-streaming` pass. This flag defaults to `false` (which preserves the original behaviour), however, if set to `true` the pass will only add the selected ZA/streaming mode to functions that contain ops that implement `ArmSMETileOpInterface`. This simplifies enabling these modes, as we can now first try lowering ops to ArmSME, then only if we succeed, add the relevant function attributes.
2023-12-01[mlir][sve][nfc] Update a test to use transform-interpreter (#73771)Andrzej Warzyński1-28/+42
This is a follow-up of #70040 in which the test updated here was missed. Includes a few additional NFC changes in preparation for extending this test.
2023-12-01[mlir][nvvm] Introduce `cp.async.bulk.commit.group`Guray Ozen1-0/+8
This PR introduced `cp.async.bulk.commit.group` op.
2023-12-01[mlir] notify insertion of parent op first when cloning (#73806)jeanPerier2-4/+27
When cloning an operation with a region, the builder was currently notifying about the insertion of the cloned operations inside the region before the cloned operation itself. When using cloning inside rewrite pass, this could cause issues if a pattern is expected to be applied on a cloned parent operation before trying to apply patterns on the cloned operations it contains (the patterns are attempted in order of notifications for the cloned operations).
2023-12-01[MLIR][Vector] Refactor tests for contract -> OP transforms (4/N) (#73807)Andrzej Warzyński1-0/+46
This patch refactors tests for: vector.contract -> vector.outerproduct for matvec operations (b += Ax). Summary of changes: * add 2 missing cases (masked + scalable) when the operation kind is `maxf`. This is a part of a larger effort to add cases with scalable vectors to tests for the Vector dialect. Implements #72834.
2023-11-30[mlir][sparse] refactoring: using util functions to query the index to load ↵Peiming Liu1-203/+210
from position array for slice-driven loop. (#73986)
2023-12-01[mlir] Expose type and attribute names in the MLIRContext and abstract ↵Fehr Mathieu5-2/+14
type/attr classes (#72189) This patch expose the type and attribute names in C++ as methods in the `AbstractType` and `AbstractAttribute` classes, and keep a map of names to `AbstractType` and `AbstractAttribute` in the `MLIRContext`. Type and attribute names should be unique. It adds support in ODS to generate the `getName` methods in `AbstractType` and `AbstractAttribute`, through the use of two new variables, `typeName` and `attrName`. It also adds names to C++-defined type and attributes.
2023-11-30[mlir][vector] Add support for vector.maskedstore sub-type emulation. (#73871)Han-Chung Wang1-0/+72
The idea is similar to vector.maskedload + vector.store emulation. What the emulation does is: 1. Get a compressed mask and load the data from destination. 2. Bitcast the data to original vector type. 3. Select values between `op.valueToStore` and the data from load using original mask. 4. Bitcast the new value and store it to destination using compressed masked.
2023-11-30[mlir][Vector] Add a rewrite pattern for gather over a strided memref (#72991)Andrzej Warzyński1-0/+55
This patch adds a rewrite pattern for `vector.gather` over a strided memref like the following: ```mlir %subview = memref.subview %arg0[0, 0] [100, 1] [1, 1] : memref<100x3xf32> to memref<100xf32, strided<[3]>> %gather = vector.gather %subview[%c0] [%idxs], %cst_0, %cst : memref<100xf32, strided<[3]>>, vector<4xindex>, vector<4xi1>, vector<4xf32> into vector<4xf32> ``` After the pattern added in this patch: ```mlir %collapse_shape = memref.collapse_shape %arg0 [[0, 1]] : memref<100x3xf32> into memref<300xf32> %1 = arith.muli %arg3, %cst : vector<4xindex> %gather = vector.gather %collapse_shape[%c0] [%1], %cst_1, %cst_0 : memref<300xf32>, vector<4xindex>, vector<4xi1>, vector<4xf32> into vector<4xf32> ``` Fixes https://github.com/openxla/iree/issues/15364.
2023-11-30[mlir][vector] Fix a `target-rank=0` unrolling (#73365)Rik Huijzer1-0/+18
Fixes https://github.com/llvm/llvm-project/issues/64269. With this patch, calling `mlir-opt "-convert-vector-to-scf=full-unroll target-rank=0"` on ```mlir func.func @main(%vec : vector<2xi32>) { %alloc = memref.alloc() : memref<4xi32> %c0 = arith.constant 0 : index vector.transfer_write %vec, %alloc[%c0] : vector<2xi32>, memref<4xi32> return } ``` will result in ```mlir module { func.func @main(%arg0: vector<2xi32>) { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %alloc = memref.alloc() : memref<4xi32> %0 = vector.extract %arg0[0] : i32 from vector<2xi32> %1 = vector.broadcast %0 : i32 to vector<i32> vector.transfer_write %1, %alloc[%c0] : vector<i32>, memref<4xi32> %2 = vector.extract %arg0[1] : i32 from vector<2xi32> %3 = vector.broadcast %2 : i32 to vector<i32> vector.transfer_write %3, %alloc[%c1] : vector<i32>, memref<4xi32> return } } ``` I've also tried to proactively find other `target-rank=0` bugs, but couldn't find any. `options.targetRank` is only used 8 times throughout the `mlir` folder, all inside `VectorToSCF.cpp`. None of the other uses look like they could cause a crash. I've also tried ```mlir func.func @main(%vec : vector<2xi32>) -> vector<2xi32> { %alloc = memref.alloc() : memref<4xindex> %c0 = arith.constant 0 : index %out = vector.transfer_read %alloc[%c0], %c0 : memref<4xindex>, vector<2xi32> return %out : vector<2xi32> } ``` with `"--convert-vector-to-scf=full-unroll target-rank=0"` and that also didn't crash. (Maybe obvious. I have to admit that I'm not very familiar with these ops.)
2023-11-30[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253)Benjamin Maxwell27-1100/+858
This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.
2023-11-30[mlir][docgen] Add ops source link (#73657)Rik Huijzer1-2/+2
This patch suggests to change two things. Firstly, it adds a source link above the generated operations docs (above the `emitOpDoc` calls). This link will point directly to the source TableGen file for the group of operations. For example, for the current [`amdgpu`](https://mlir.llvm.org/docs/Dialects/AMDGPU/) page, the link will add a source link below the "Operation definition" heading pointing to [`mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td`](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/AMDGPU/IR/AMDGPU.td). The link is wrapped in a "op-definitions-source-link" class which could allow for custom styling, but it also looks reasonable without custom styling I think: ![afbeelding](https://github.com/llvm/llvm-project/assets/20724914/7c0e59b9-b14b-4f5d-a671-c87e857a7b03) Secondly, this patch simplifies the header names such as "Operation definition" and "Attribute definition" to "Operations" and "Attributes" respectively. This is in line with manually defined subheadings on pages such as the one for the [`vector`](https://mlir.llvm.org/docs/Dialects/Vector/#operations) dialect.
2023-11-30[mlir] Fix two `CHECK:` typos (#73803)Rik Huijzer2-2/+2
Out of curiosity, I ran [typos](https://github.com/crate-ci/typos) against MLIR. It found two `CHECK:` typos (and many minor typos; which I'm not gonna work on today).
2023-11-29[mlir][sve] Add an e2e for linalg.matmul with mixed types (#73773)Andrzej Warzyński1-0/+83
Apart from the test itself, this patch also updates a few patterns to fix how new VectorType(s) are created. Namely, it makes sure that "scalability" is correctly propagated. Regression tests will be updated seperately while auditing Vector dialect tests in the context of scalable vectors: * https://github.com/orgs/llvm/projects/23
2023-11-29[mlir][spirv] Add canon patterns for IAddCarry/[S|U]MulExtended (#73340)Finn Plummer1-0/+182
Add missing constant propogation folder for IAddCarry and [S|U]MulExtended. Due to currently missing constant value for spirv.struct the folding is done using canonicalization patterns. Implement additional folding when rhs is 0 for all ops and when rhs is 1 for UMulExt. This helps for readability of lowered code into SPIR-V. Part of work for #70704
2023-11-29Reapply "[mlir] Add mlirTranslateModuleToLLVMIR to MLIR-C (#73627)" (#73749) ↵Vitaly Buka4-4/+83
(#73751) Co-authored-by: Edgar <git@edgarluque.com>
2023-11-29[mlir][spirv] Add folding for [S|U]Mod, [S|U]Div, SRem (#73341)Finn Plummer1-0/+297
Add missing constant propogation folder for [S|U]Mod, [S|U]Div, SRem Implement additional folding when rhs is 1 for all ops. This helps for readability of lowered code into SPIR-V. Part of work for #70704
2023-11-29[mlir][nvvm] Introduce `setmaxregister.sync.aligned` Op (#73780)Guray Ozen2-0/+25
This PR introduce `setmaxregister.sync.aligned` Op to increase or decrease the register size. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg