aboutsummaryrefslogtreecommitdiff
path: root/mlir
AgeCommit message (Collapse)AuthorFilesLines
2024-03-26[mlir][doc] NFC fix incorrect filename in td match interfaceAlex Zinenko1-1/+1
2024-03-26[ArithToSPIRV] Fix a warning (#86702)Kazu Hirata1-1/+0
mlir/lib/Conversion/ArithToSPIRV/ArithToSPIRV.cpp:995:11: error: unused variable 'converter' [-Werror,-Wunused-variable]
2024-03-26[TosaToTensor] Fix a warning (#86703)Kazu Hirata1-1/+1
This patch fixes: mlir/lib/Conversion/TosaToTensor/TosaToTensor.cpp:76:46: error: 'multiplies' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported]
2024-03-26[mlir][spirv] Remove `enableFastMathMode` flag from SPIR-V conversion (#86578)Ivan Butygin4-20/+10
Most of arith/math ops support fastmath attribute, use it instead of global flag.
2024-03-26[mlir][vector] Refactor linearize.mlir (#86648)Andrzej Warzyński1-67/+61
This patch refactors the `linearize.mlir` test - currently it contains some duplication and can be tricky to follow. Summary of changes: * reduce duplication by introducing a shared check prefix (`ALL`) and by introducing `-check-prefixes`, * make sure that every "check" line is directly above the corresponding line of input MLIR, * group check lines corresponding to a particular prefix together (so that it's easier to see the expected output for a particular prefix), * remove `CHECK` from prefix names (with multiple prefixes that's just noise that can be avoided) and use a bit more descriptive prefixes instead (`CHECK0` -> `BW-0`, where `BW` stands for bitwidth), * unify indentation, * `nonvec_result` -> `test_tensor_no_linearize` (for consistency with `test_index_no_linearize`). NOTE: This change only updates the format of the "CHECK" lines and doesn't affect what's being tested. This change is intended as preparation for adding support for scalable vectors to `LinearizeConstant` and `LinearizeVectorizable` - i.e. patterns that `linearlize.mlir` is meant to test.
2024-03-26Fixes in 'tosa.reshape' lowering and folder (#85798)Rafael Ubal5-195/+495
- Revamped lowering conversion pattern for `tosa.reshape` to handle previously unsupported combinations of dynamic dimensions in input and output tensors. The lowering strategy continues to rely on pairs `tensor.collapse_shape` + `tensor.expand_shape`, which allow for downstream fusion with surrounding `linalg.generic` ops. - Fixed bug in canonicalization pattern `ReshapeOp::fold()` in `TosaCanonicalizations.cpp`. The input and result types being equal is not a sufficient condition for folding. If there is more than 1 dynamic dimension in the input and result types, a productive reshape could still occur. - This work exposed the fact that bufferization does not properly handle a `tensor.collapse_shape` op producing a 0D tensor from a dynamically shaped one due to a limitation in `memref.collapse_shape`. While the proper way to address this would involve releasing the `memref.collapse_shape` restriction and verifying correct bufferization, this is left as possible future work. For now, this scenario is avoided by casting the `tosa.reshape` input tensor to a static shape if necessary (see `inferReshapeInputType()`. - An extended set of tests are intended to cover relevant conversion paths. Tests are named using pattern `test_reshape_<rank>_{up|down|same}_{s2s|s2d|d2s|d2d}_{explicit|auto}[_empty][_identity]`, where: - `<rank>` is the input rank (e.g., 3d, 6d) - `{up|down|same}` indicates whether the reshape increases, decreases, or retains the input rank. - `{s2s|s2d|d2s|d2d}` indicates whether reshape converts a statically shaped input to a statically shaped result (`s2s`), a statically shaped input to a dynamically shaped result (`s2d`), etc. - `{explicit|auto}` is used to indicate that all values in the `new_shape` attribute are >=0 (`explicit`) or that a -1 placeholder value is used (`auto`). - `empty` is used to indicate that `new_shape` includes a component set to 0. - `identity` is used when the input and result shapes are the same.
2024-03-26[MLIR][LLVM] Add `llvm.experimental.constrained.fptrunc` operation (#86260)Victor Perez12-0/+328
Add operation mapping to the LLVM `llvm.experimental.constrained.fptrunc.*` intrinsic. The new operation implements the new `LLVM::FPExceptionBehaviorOpInterface` and `LLVM::RoundingModeOpInterface` interfaces. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>
2024-03-26[mlir][bazel] Export more headers by a single target only. (#86637)Christian Sigg1-1/+0
Ideally, header files should be used by only one target, but this is hard because CMake is less strict with headers (no layering check). But even with bazel, headers should only be exported once in the `hdrs` attribute. Other targets may use them in the `srcs` attribute to avoid circular dependencies.
2024-03-26[mlir][complex] Canonicalize complex.div by one (#85513)Kai Sasaki3-0/+84
We can canonicalize the complex.div if the divisor is one (real = 1.0, imag = 0.0) with the input number itself. Ref: https://www.cuemath.com/numbers/division-of-complex-numbers/
2024-03-25[mlir][spirv] Add folding for [S|U|GreaterThan[Equal] (#85434)Finn Plummer3-0/+268
Add missing constant propogation folder for [S|U]GreaterThan[Equal]. Implement additional folding when the operands are equal for all ops. Allows for constant folding in the IndexToSPIRV pass. Part of work #70704
2024-03-25Add missing declarations of explicit template instantiations. (#86591)Thomas Köppe1-0/+3
Found with `-Wundefined-func-template`.
2024-03-25[documentation] [mlir] DataLayout.md: fix broken link to DLTI dialect (#86524)Iman Hosseini1-1/+1
The link to DLTI dialect was broken.
2024-03-25[ODS][NFC] Cast range.size() to int32_t in accumulation (#85629)Andrei Golubev1-1/+1
Using range.size() "as is" means we accumulate 'size_t' values into 'int32_t' variable. This may produce narrowing conversion warnings (particularly, on MSVC). The surrounding code seems to cast <x>.size() to 'int32_t' so following this practice seems safe enough. Co-authored-by: Ovidiu Pintican <ovidiu.pintican@intel.com>
2024-03-25[mlir][Vector] Fix an assertion on failing cast in ↵Balaji V. Iyer2-2/+15
vector-transfer-flatten-patterns (#86030) When the result is not a vectorType, there is an assert. This patch will do the check and bail when the result is not a VectorType.
2024-03-25Fix the condition for peeling the first iteration (#86350)Vivian2-5/+44
This PR fixes the condition used in loop peeling of the first iteration. Using ceilDiv instead of floorDiv when calculating the loop counts, so that the first iteration gets peeled as needed.
2024-03-25[mlir][SVE] Add e2e for 1D depthwise WC convolutionAndrzej Warzynski1-0/+60
Follow-up for #81625 Relands #85225 with a minor update to the RUN line to fix buildbot failures: ```diff -// RUN: %{compile} | %{run} | FileCheck %s +// RUN: rm -f %t && %{compile} && %{run} | FileCheck %s ``` Failing buildbots after landing #85225: * https://lab.llvm.org/buildbot/#/builders/184/builds/11363 * https://lab.llvm.org/buildbot/#/builders/176/builds/9331
2024-03-25[mlir][SVE] Fix memory leaks in integration tests (#86488)Andrzej Warzyński2-2/+2
Buffers are no longer deallocated by One-Shot Bufferize - this is now done by a separate buffer deallocation pass. In order to see the leaks in SVE integration tests, use the following CMake flags (enables the address sanitizer and SVE integration tests): -DLLVM_USE_SANITIZER="Address" -DMLIR_INCLUDE_INTEGRATION_TESTS=On -DMLIR_RUN_ARM_SVE_TESTS=On Follow-up for #85366
2024-03-25[mlir][tensor] fix out-of-bound index in tensor.dim (#85901)Jianbang Yang2-0/+26
fix a crash when fold tensor.dim with out-of-bound index. Fixes: https://github.com/llvm/llvm-project/issues/70183
2024-03-25[MLIR][LLVM] Make subprogram flags optional (#86433)Tobias Gysi5-13/+12
This revision makes the subprogramFlags field in the DISubprogrammAttr optional. This is necessary since the DISubprogram attached to a declaration may have none of the subprogram flags set.
2024-03-25[mlir][bufferization] Add `BufferOriginAnalysis` (#86461)Matthias Springer6-56/+321
This commit adds the `BufferOriginAnalysis`, which can be queried to check if two buffer SSA values originate from the same allocation. This new analysis is used in the buffer deallocation pass to fold away or simplify `bufferization.dealloc` ops more aggressively. The `BufferOriginAnalysis` is based on the `BufferViewFlowAnalysis`, which collects buffer SSA value "same buffer" dependencies. E.g., given IR such as: ``` %0 = memref.alloc() %1 = memref.subview %0 %2 = memref.subview %1 ``` The `BufferViewFlowAnalysis` will report the following "reverse" dependencies (`resolveReverse`) for `%2`: {`%2`, `%1`, `%0`}. I.e., all buffer SSA values in the reverse use-def chain that originate from the same allocation as `%2`. The `BufferOriginAnalysis` is built on top of that. It handles only simple cases at the moment and may conservatively return "unknown" around certain IR with branches, memref globals and function arguments. This analysis enables additional simplifications during `-buffer-deallocation-simplification`. In particular, "regular" scf.for loop nests, that yield buffers (or reallocations thereof) in the same order as they appear in the iter_args, are now handled much more efficiently. Such IR patterns are generated by the sparse compiler.
2024-03-25[mlir][complex] Fastmath flag for the trigonometric ops in complex (#85563)Kai Sasaki2-21/+75
Support Fastmath flag to convert trigonometric ops in the complex dialect. See: https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
2024-03-24[mlir][spirv] Add folding for [S|U|LessThan[Equal] (#85435)Finn Plummer3-0/+266
Add missing constant propogation folder for [S|U]LessThan[Equal]. Implement additional folding when the operands are equal for all ops. Allows for constant folding in the IndexToSPIRV pass. Part of work #70704
2024-03-24[mlir][bufferization] Add `BufferViewFlowOpInterface` (#78718)Matthias Springer14-14/+323
This commit adds the `BufferViewFlowOpInterface` to the bufferization dialect. This interface can be implemented by ops that operate on buffers to indicate that a buffer op result and/or region entry block argument may be the same buffer as a buffer operand (or a view thereof). This interface is queried by the `BufferViewFlowAnalysis`. The new interface has two interface methods: * `populateDependencies`: Implementations use the provided callback to declare dependencies between operands and op results/region entry block arguments. E.g., for `%r = arith.select %c, %m1, %m2 : memref<5xf32>`, the interface implementation should declare two dependencies: %m1 -> %r and %m2 -> %r. * `mayBeTerminalBuffer`: An SSA value is a terminal buffer if the buffer view flow analysis stops at the specified value. E.g., because the value is a newly allocated buffer or because no further information is available about the origin of the buffer. Ops that implement the `RegionBranchOpInterface` or `BranchOpInterface` do not have to implement the `BufferViewFlowOpInterface`. The buffer dependencies can be inferred from those two interfaces. This commit makes the `BufferViewFlowAnalysis` more accurate. For unknown ops, it conservatively used to declare all combinations of operands and op results/region entry block arguments as dependencies (false positives). This is no longer the case. While the analysis is still a "maybe" analysis with false positives (e.g., when analyzing ops such as `arith.select` or `scf.if` where the taken branch is not known at compile time), results and region entry block arguments of unknown ops are now marked as terminal buffers. This commit addresses a TODO in `BufferViewFlowAnalysis.cpp`: ``` // TODO: We should have an op interface instead of a hard-coded list of // interfaces/ops. ``` It is no longer needed to hard-code ops.
2024-03-23Revert "[mlir][SVE] Add e2e for 1D depthwise WC convolution (#85225)"Muhammad Omair Javaid1-60/+0
This reverts commit 01b1b0c1f728e2c2639edc654424f50830295989. Breaks following AArch64 SVE buildbots: https://lab.llvm.org/buildbot/#/builders/184/builds/11363 https://lab.llvm.org/buildbot/#/builders/176/builds/9331
2024-03-23[mlir][inliner] Return early if the inliningThreshold is 0U or -1U. (#86287)Fabian Tschopp1-1/+8
Computing the inlinling profitability can be costly due to walking the graph when counting the number of operations. This PR addresses that by returning early if the threshold is set to never or always inline.
2024-03-22[mlir][transform] Emit error message with `emitSilenceableFailure` (#86146)srcarroll2-6/+41
The previous implementation used a `notifyMatchFailure` to emit failure message inappropriately and then used the `emitDefaultSilenceableFailure`. This patch changes this to use the more appropriate `emitSilenceableFailure` with error message. Additionally a failure test has been added.
2024-03-22[mlir][SVE] Add e2e for 1D depthwise WC convolution (#85225)Andrzej Warzyński1-0/+60
Follow-up for https://github.com/llvm/llvm-project/pull/81625
2024-03-22[mlir][arith] fix wrong floordivsi fold (#83248)long.chen5-115/+102
Fixs https://github.com/llvm/llvm-project/issues/83079
2024-03-22[MLIR][LLVM][SROA] Fix pointer escape through stores bug (#86291)Christian Ulmann2-0/+17
This commit resolves a SROA bug caused by not properly checking if a llvm store operation writes the pointer to memory or not. Now, we do no longer consider stores that use a slot pointer as a value to store as fixable.
2024-03-22[mlir][emitc] Arith to EmitC: Handle addi, subi and muli (#86120)Matthias Gehre3-0/+118
Important to consider that `arith` has wrap around semantics, and in C++ signed overflow is UB. Unless the operation guarantees that no signed overflow happens, we will perform the arithmetic in an equivalent unsigned type. `bool` also doesn't wrap around in C++, and is not addressed here.
2024-03-22[MLIR][OpenMP] Refactor bounds offsetting and fix to apply to all directives ↵agozillon3-166/+244
(#84349) This PR refactors bounds offsetting by combining the two differing implementations (one applying to initial derived type member map implementation for descriptors and the other for regular arrays, effectively allocatable array vs regular array in fortran) now that it's a little simpler to do. The PR also moves the utilization of createAlteredByCaptureMap into genMapInfoOp, where it will be correctly applied to all MapInfoData, appropriately offsetting and altering Pointer data set in the kernel argument structure on the host. This primarily means bounds offsets will now correctly apply to enter/exit/update map clauses as opposed to just the Target directive that is currently the case. A few fortran runtime tests have been added to verify this new behavior. This PR depends on: https://github.com/llvm/llvm-project/pull/84328 and is an extraction of the larger derived type member map PR stack (so a requirement for it to land).
2024-03-22[mlir] Extend split marker tests of `mlir-opt` and `mlir-pdll`. (#85620)Ingo Müller2-12/+32
Recently #84765 made the split markers of various tools configurable but did not test *not* using the split markers for two of them. This PR adds those tests.
2024-03-22[mlir] Remove unused and untested `shouldSplitInputFile`. (#85622)Ingo Müller1-1/+0
This was changed by #84765 but turned out to be buggy. Since it isn't used and isn't tested, it is probably best to remove it.
2024-03-22[mlir][linalg] Emit a warning when tile_using_forall generates non ↵Pablo Antonio Martinez3-3/+180
thread-safe code (#80813) **Description** The documentation of `transform.structured.tile_using_forall` says: _"It is the user’s responsibility to ensure that num_threads/tile_sizes is a valid tiling specification (i.e. that only tiles parallel dimensions, e.g. in the Linalg case)."_ In other words, tiling a non-parallel dimension would generate code with data races which is not safe to parallelize. For example, consider this example (included in the tests in this PR): ``` func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> { %0 = scf.forall (%arg2) in (8) shared_outs(%arg3 = %arg1) -> (tensor<300x8xf32>) { %1 = affine.min #map(%arg2) %2 = affine.max #map1(%1) %3 = affine.apply #map2(%arg2) %extracted_slice = tensor.extract_slice %arg0[%3, 0, 0] [%2, 300, 8] [1, 1, 1] : tensor<100x300x8xf32> to tensor<?x300x8xf32> %4 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["reduction", "parallel", "parallel"]} ins(%extracted_slice : tensor<?x300x8xf32>) outs(%arg3 : tensor<300x8xf32>) { ^bb0(%in: f32, %out: f32): %5 = arith.addf %in, %out : f32 linalg.yield %5 : f32 } -> tensor<300x8xf32> scf.forall.in_parallel { tensor.parallel_insert_slice %4 into %arg3[0, 0] [300, 8] [1, 1] : tensor<300x8xf32> into tensor<300x8xf32> } } return %0 : tensor<300x8xf32> } ``` We can easily see that this is not safe to parallelize because all threads would be writing to the same position in `%arg3` (in the `scf.forall.in_parallel`. This PR detects wether it's safe to `tile_using_forall` and emits a warning in the case it is not. **Brief explanation** It first generates a vector of affine expressions representing the tile values and stores it in `dimExprs`. These affine expressions are compared with the affine expressions coming from the results of the affine map of each output in the linalg op. So going back to the previous example, the original transform is: ``` #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> #map1 = affine_map<(d0, d1, d2) -> (d1, d2)> func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> { // expected-warning@+1 {{tiling is not thread safe at axis #0}} %0 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction", "parallel", "parallel"]} ins(%arg0 : tensor<100x300x8xf32>) outs(%arg1 : tensor<300x8xf32>) { ^bb0(%in: f32, %out: f32): %1 = arith.addf %in, %out : f32 linalg.yield %1 : f32 } -> tensor<300x8xf32> return %0 : tensor<300x8xf32> } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!transform.any_op) -> !transform.any_op %forall, %tiled_generic = transform.structured.tile_using_forall %0 num_threads [8] : (!transform.any_op) -> (!transform.any_op, !transform.any_op) transform.yield } } ``` The `num_threads` attribute would be represented as `(d0)`. Because the linalg op has only one output (`arg1`) it would only check against the results of `#map1`, which are `(d1, d2)`. The idea is to check that all affine expressions in `dimExprs` are present in the output affine map. In this example, `d0` is not in `(d1, d2)`, so tiling that axis is considered not thread safe.
2024-03-22[mlir][vector] Propagate scalability in TransferWriteNonPermutationLowering ↵Crefeda Rodrigues2-2/+24
(#85632) Updates `extendVectorRank` so that scalability in patterns that use it (in particular, `TransferWriteNonPermutationLowering`), is correctly propagated. Closed related previous PR https://github.com/llvm/llvm-project/pull/85270 --------- Signed-off-by: Crefeda Rodrigues <crefeda.rodrigues@arm.com> Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
2024-03-22[mlir][vector] Add support for masks in castAwayContractionLeadingOneDim ↵Andrzej Warzyński4-56/+114
(#81906) Updates `castAwayContractionLeadingOneDim` to inherit from `MaskableOpRewritePattern` so that this pattern can support masking. Builds on top of #83827
2024-03-22[MLIR][LLVM][SROA] Support incorrectly typed memory accesses (#85813)Christian Ulmann8-263/+204
This commit relaxes the assumption of type consistency for LLVM dialect load and store operations in SROA. Instead, there is now a check that loads and stores are in the bounds specified by the sub-slot they access. This commit additionally removes the corresponding patterns from the type consistency pass, as they are no longer necessary. Note: It will be necessary to extend Mem2Reg with the logic for differently sized accesses as well. This is non-the-less a strict upgrade for productive flows, as the type consistency pass can produce invalid IR for some odd cases.
2024-03-21[mlir][Affine] Fix unused variable warning (NFC)lorenzo chelini1-2/+1
2024-03-21[mlir][spirv] Improve folding of MemRef to SPIRV Lowering (#85433)Finn Plummer8-210/+93
Investigate the lowering of MemRef Load/Store ops and implement additional folding of created ops Aims to improve readability of generated lowered SPIR-V code. Part of work llvm#70704
2024-03-21[mlir][Vector] Add utility for computing scalable value bounds (#83876)Benjamin Maxwell10-28/+555
This adds a new API built with the `ValueBoundsConstraintSet` to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale. The result is an `AffineMap` that will always take at most one parameter, vscale, and returns a single result, which is the bound of `value`. The API is defined as follows: ```c++ FailureOr<ConstantOrScalableBound> vector::ScalableValueBoundsConstraintSet::computeScalableBound( Value value, std::optional<int64_t> dim, unsigned vscaleMin, unsigned vscaleMax, presburger::BoundType boundType, bool closedUB = true, StopConditionFn stopCondition = nullptr); ``` Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap` with a utility for converting the bound to a single quantity (i.e. a size and scalable flag). We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).
2024-03-21[mlir][emitc] Fix form-expressions inside expression (#86081)Kirill Chibisov2-1/+19
Make form-expressions not create `emitc.expression`s for operations inside the `emitc.expression`s, since they are invalid.
2024-03-21[MLIR] Add initial convert-memref-to-emitc pass (#85389)Matthias Gehre10-0/+307
This converts `memref.alloca`, `memref.load` & `memref.store` to `emitc.variable`, `emitc.subscript` and `emitc.assign`.
2024-03-21[mlir][tosa] Fix assertion failure in tosa-layerwise-constant-fold (#85670)Spenser Bauman2-34/+56
The existing implementation of tosa-layerwise-constant-fold only works for constant values backed by DenseElementsAttr. For constants which hold DenseResourceAttrs, the folder will end up asserting at runtime, as it assumes that the backing data can always be accessed through ElementsAttr::getValues. This change reworks the logic so that types types used to perform folding are based on whether the ElementsAttr can be converted to a range of that particular type. --------- Co-authored-by: Spenser Bauman <sabauma@mathworks.com> Co-authored-by: Tina Jung <tinamaria.jung@amd.com>
2024-03-21[mlir][tensor] NFC: fully qualify verifyEncoding arguments.Christian Sigg1-2/+2
2024-03-21Lower shuffle to single-result form if possible. (#84321)Johannes Reifferscheid2-10/+49
We currently always lower shuffle to the struct-returning variant. I saw some cases where this survived all the way through ptx, resulting in increased register usage. The easiest fix is to simply lower to the single-result version when the predicate is unused.
2024-03-21[MLIR] Add index bitwidth to the DataLayout (#85927)Tobias Gysi16-38/+193
When importing from LLVM IR the data layout of all pointer types contains an index bitwidth that should be used for index computations. This revision adds a getter to the DataLayout that provides access to the already stored bitwidth. The function returns an optional since only pointer-like types have an index bitwidth. Querying the bitwidth of a non-pointer type returns std::nullopt. The new function works for the built-in Index type and, using a type interface, for the LLVMPointerType.
2024-03-21[mlir][transform] Fix failure in flattening already flattened linalg ops ↵srcarroll2-5/+31
(#86037) The previous implementation was doing an early successful return on `rank <= 1` without adding the original op to transform results. This resulted in errors about number of returns. This patch fixes this by adding the original op to results. Additionally, we first check if op is elementwise and return a slienceable failure early if not.
2024-03-21[mlir][bufferization] Add "bottom-up from terminators" analysis heuristic ↵Matthias Springer17-64/+198
(#83964) One-Shot Bufferize currently does not support loops where a yielded value bufferizes to a buffer that is different from the buffer of the region iter_arg. In such a case, the bufferization fails with an error such as: ``` Yield operand #0 is not equivalent to the corresponding iter bbArg scf.yield %0 : tensor<5xf32> ``` One common reason for non-equivalent buffers is that an op on the path from the region iter_arg to the terminator bufferizes out-of-place. Ops that are analyzed earlier are more likely to bufferize in-place. This commit adds a new heuristic that gives preference to ops that are reachable on the reverse SSA use-def chain from a region terminator and are within the parent region of the terminator. This is expected to work better than the existing heuristics for loops where an iter_arg is written to multiple times within a loop, but only one write is fed into the terminator. Current users of One-Shot Bufferize are not affected by this change. "Bottom-up" is still the default heuristic. Users can switch to the new heuristic manually. This commit also turns the "fuzzer" pass option into a heuristic, cleaning up the code a bit.
2024-03-21[mlir][sparse] Fix typos in comments (#86074)Matthias Springer2-4/+2
2024-03-20[MLIR][XeGPU] Fix shared build. NFCMichael Liao1-0/+2