Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
mlir/lib/Conversion/ArithToSPIRV/ArithToSPIRV.cpp:995:11: error:
unused variable 'converter' [-Werror,-Wunused-variable]
|
|
This patch fixes:
mlir/lib/Conversion/TosaToTensor/TosaToTensor.cpp:76:46: error:
'multiplies' may not intend to support class template argument
deduction [-Werror,-Wctad-maybe-unsupported]
|
|
Most of arith/math ops support fastmath attribute, use it instead of
global flag.
|
|
This patch refactors the `linearize.mlir` test - currently it contains
some duplication and can be tricky to follow.
Summary of changes:
* reduce duplication by introducing a shared check prefix (`ALL`) and
by introducing `-check-prefixes`,
* make sure that every "check" line is directly above the
corresponding line of input MLIR,
* group check lines corresponding to a particular prefix together (so
that it's easier to see the expected output for a particular
prefix),
* remove `CHECK` from prefix names (with multiple prefixes that's just
noise that can be avoided) and use a bit more descriptive prefixes
instead (`CHECK0` -> `BW-0`, where `BW` stands for bitwidth),
* unify indentation,
* `nonvec_result` -> `test_tensor_no_linearize` (for consistency with
`test_index_no_linearize`).
NOTE: This change only updates the format of the "CHECK" lines and
doesn't affect what's being tested.
This change is intended as preparation for adding support for scalable
vectors to `LinearizeConstant` and `LinearizeVectorizable` - i.e.
patterns that `linearlize.mlir` is meant to test.
|
|
- Revamped lowering conversion pattern for `tosa.reshape` to handle previously unsupported combinations of dynamic dimensions in input and output tensors. The lowering strategy continues to rely on pairs `tensor.collapse_shape` + `tensor.expand_shape`, which allow for downstream fusion with surrounding `linalg.generic` ops.
- Fixed bug in canonicalization pattern `ReshapeOp::fold()` in `TosaCanonicalizations.cpp`. The input and result types being equal is not a sufficient condition for folding. If there is more than 1 dynamic dimension in the input and result types, a productive reshape could still occur.
- This work exposed the fact that bufferization does not properly handle a `tensor.collapse_shape` op producing a 0D tensor from a dynamically shaped one due to a limitation in `memref.collapse_shape`. While the proper way to address this would involve releasing the `memref.collapse_shape` restriction and verifying correct bufferization, this is left as possible future work. For now, this scenario is avoided by casting the `tosa.reshape` input tensor to a static shape if necessary (see `inferReshapeInputType()`.
- An extended set of tests are intended to cover relevant conversion paths. Tests are named using pattern `test_reshape_<rank>_{up|down|same}_{s2s|s2d|d2s|d2d}_{explicit|auto}[_empty][_identity]`, where:
- `<rank>` is the input rank (e.g., 3d, 6d)
- `{up|down|same}` indicates whether the reshape increases, decreases, or retains the input rank.
- `{s2s|s2d|d2s|d2d}` indicates whether reshape converts a statically shaped input to a statically shaped result (`s2s`), a statically shaped input to a dynamically shaped result (`s2d`), etc.
- `{explicit|auto}` is used to indicate that all values in the `new_shape` attribute are >=0 (`explicit`) or that a -1 placeholder value is used (`auto`).
- `empty` is used to indicate that `new_shape` includes a component set to 0.
- `identity` is used when the input and result shapes are the same.
|
|
Add operation mapping to the LLVM
`llvm.experimental.constrained.fptrunc.*` intrinsic.
The new operation implements the new
`LLVM::FPExceptionBehaviorOpInterface` and
`LLVM::RoundingModeOpInterface` interfaces.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
|
|
Ideally, header files should be used by only one target, but this is
hard because CMake is less strict with headers (no layering check). But
even with bazel, headers should only be exported once in the `hdrs`
attribute. Other targets may use them in the `srcs` attribute to avoid
circular dependencies.
|
|
We can canonicalize the complex.div if the divisor is one (real = 1.0,
imag = 0.0) with the input number itself.
Ref: https://www.cuemath.com/numbers/division-of-complex-numbers/
|
|
Add missing constant propogation folder for [S|U]GreaterThan[Equal].
Implement additional folding when the operands are equal for all ops.
Allows for constant folding in the IndexToSPIRV pass.
Part of work #70704
|
|
Found with `-Wundefined-func-template`.
|
|
The link to DLTI dialect was broken.
|
|
Using range.size() "as is" means we accumulate 'size_t' values into
'int32_t' variable. This may produce narrowing conversion warnings
(particularly, on MSVC). The surrounding code seems to cast <x>.size()
to 'int32_t' so following this practice seems safe enough.
Co-authored-by: Ovidiu Pintican <ovidiu.pintican@intel.com>
|
|
vector-transfer-flatten-patterns (#86030)
When the result is not a vectorType, there is an assert. This patch will
do the check and bail when the result is not a VectorType.
|
|
This PR fixes the condition used in loop peeling of the first iteration.
Using ceilDiv instead of floorDiv when calculating the loop counts, so
that the first iteration gets peeled as needed.
|
|
Follow-up for #81625
Relands #85225 with a minor update to the RUN line to fix buildbot
failures:
```diff
-// RUN: %{compile} | %{run} | FileCheck %s
+// RUN: rm -f %t && %{compile} && %{run} | FileCheck %s
```
Failing buildbots after landing #85225:
* https://lab.llvm.org/buildbot/#/builders/184/builds/11363
* https://lab.llvm.org/buildbot/#/builders/176/builds/9331
|
|
Buffers are no longer deallocated by One-Shot Bufferize - this is now
done by a separate buffer deallocation pass.
In order to see the leaks in SVE integration tests, use the following
CMake flags (enables the address sanitizer and SVE integration tests):
-DLLVM_USE_SANITIZER="Address"
-DMLIR_INCLUDE_INTEGRATION_TESTS=On
-DMLIR_RUN_ARM_SVE_TESTS=On
Follow-up for #85366
|
|
fix a crash when fold tensor.dim with out-of-bound index.
Fixes: https://github.com/llvm/llvm-project/issues/70183
|
|
This revision makes the subprogramFlags field in the DISubprogrammAttr
optional. This is necessary since the DISubprogram attached to a
declaration may have none of the subprogram flags set.
|
|
This commit adds the `BufferOriginAnalysis`, which can be queried to
check if two buffer SSA values originate from the same allocation. This
new analysis is used in the buffer deallocation pass to fold away or
simplify `bufferization.dealloc` ops more aggressively.
The `BufferOriginAnalysis` is based on the `BufferViewFlowAnalysis`,
which collects buffer SSA value "same buffer" dependencies. E.g., given
IR such as:
```
%0 = memref.alloc()
%1 = memref.subview %0
%2 = memref.subview %1
```
The `BufferViewFlowAnalysis` will report the following "reverse"
dependencies (`resolveReverse`) for `%2`: {`%2`, `%1`, `%0`}. I.e., all
buffer SSA values in the reverse use-def chain that originate from the
same allocation as `%2`. The `BufferOriginAnalysis` is built on top of
that. It handles only simple cases at the moment and may conservatively
return "unknown" around certain IR with branches, memref globals and
function arguments.
This analysis enables additional simplifications during
`-buffer-deallocation-simplification`. In particular, "regular" scf.for
loop nests, that yield buffers (or reallocations thereof) in the same
order as they appear in the iter_args, are now handled much more
efficiently. Such IR patterns are generated by the sparse compiler.
|
|
Support Fastmath flag to convert trigonometric ops in the complex
dialect.
See:
https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
|
|
Add missing constant propogation folder for [S|U]LessThan[Equal].
Implement additional folding when the operands are equal for all ops.
Allows for constant folding in the IndexToSPIRV pass.
Part of work #70704
|
|
This commit adds the `BufferViewFlowOpInterface` to the bufferization
dialect. This interface can be implemented by ops that operate on
buffers to indicate that a buffer op result and/or region entry block
argument may be the same buffer as a buffer operand (or a view thereof).
This interface is queried by the `BufferViewFlowAnalysis`.
The new interface has two interface methods:
* `populateDependencies`: Implementations use the provided callback to
declare dependencies between operands and op results/region entry block
arguments. E.g., for `%r = arith.select %c, %m1, %m2 : memref<5xf32>`,
the interface implementation should declare two dependencies: %m1 -> %r
and %m2 -> %r.
* `mayBeTerminalBuffer`: An SSA value is a terminal buffer if the buffer
view flow analysis stops at the specified value. E.g., because the value
is a newly allocated buffer or because no further information is
available about the origin of the buffer.
Ops that implement the `RegionBranchOpInterface` or `BranchOpInterface`
do not have to implement the `BufferViewFlowOpInterface`. The buffer
dependencies can be inferred from those two interfaces.
This commit makes the `BufferViewFlowAnalysis` more accurate. For
unknown ops, it conservatively used to declare all combinations of
operands and op results/region entry block arguments as dependencies
(false positives). This is no longer the case. While the analysis is
still a "maybe" analysis with false positives (e.g., when analyzing ops
such as `arith.select` or `scf.if` where the taken branch is not known
at compile time), results and region entry block arguments of unknown
ops are now marked as terminal buffers.
This commit addresses a TODO in `BufferViewFlowAnalysis.cpp`:
```
// TODO: We should have an op interface instead of a hard-coded list of
// interfaces/ops.
```
It is no longer needed to hard-code ops.
|
|
This reverts commit 01b1b0c1f728e2c2639edc654424f50830295989.
Breaks following AArch64 SVE buildbots:
https://lab.llvm.org/buildbot/#/builders/184/builds/11363
https://lab.llvm.org/buildbot/#/builders/176/builds/9331
|
|
Computing the inlinling profitability can be costly due to walking the
graph when counting the number of operations.
This PR addresses that by returning early if the threshold is set to
never or always inline.
|
|
The previous implementation used a `notifyMatchFailure` to emit failure
message inappropriately and then used the
`emitDefaultSilenceableFailure`. This patch changes this to use the more
appropriate `emitSilenceableFailure` with error message. Additionally a
failure test has been added.
|
|
Follow-up for https://github.com/llvm/llvm-project/pull/81625
|
|
Fixs https://github.com/llvm/llvm-project/issues/83079
|
|
This commit resolves a SROA bug caused by not properly checking if a
llvm store operation writes the pointer to memory or not. Now, we do no
longer consider stores that use a slot pointer as a value to store as
fixable.
|
|
Important to consider that `arith` has wrap around semantics, and in C++
signed overflow is UB.
Unless the operation guarantees that no signed overflow happens, we will
perform the arithmetic in an equivalent unsigned type.
`bool` also doesn't wrap around in C++, and is not addressed here.
|
|
(#84349)
This PR refactors bounds offsetting by combining the two differing
implementations (one applying to initial derived type member map
implementation for descriptors and the other for regular arrays,
effectively allocatable array vs regular array in fortran) now that it's
a little simpler to do.
The PR also moves the utilization of createAlteredByCaptureMap into
genMapInfoOp, where it will be correctly applied to all MapInfoData,
appropriately offsetting and altering Pointer data set in the kernel
argument structure on the host. This primarily means bounds offsets will
now correctly apply to enter/exit/update map clauses as opposed to just
the Target directive that is currently the case. A few fortran runtime
tests have been added to verify this new behavior.
This PR depends on: https://github.com/llvm/llvm-project/pull/84328 and
is an extraction of the larger derived type member map PR stack (so a
requirement for it to land).
|
|
Recently #84765 made the split markers of various tools configurable but
did not test *not* using the split markers for two of them. This PR adds
those tests.
|
|
This was changed by #84765 but turned out to be buggy. Since it isn't
used and isn't tested, it is probably best to remove it.
|
|
thread-safe code (#80813)
**Description**
The documentation of `transform.structured.tile_using_forall` says:
_"It is the user’s responsibility to ensure that num_threads/tile_sizes
is a valid tiling specification (i.e. that only tiles parallel
dimensions, e.g. in the Linalg case)."_
In other words, tiling a non-parallel dimension would generate code with
data races which is not safe to parallelize. For example, consider this
example (included in the tests in this PR):
```
func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> {
%0 = scf.forall (%arg2) in (8) shared_outs(%arg3 = %arg1) -> (tensor<300x8xf32>) {
%1 = affine.min #map(%arg2)
%2 = affine.max #map1(%1)
%3 = affine.apply #map2(%arg2)
%extracted_slice = tensor.extract_slice %arg0[%3, 0, 0] [%2, 300, 8] [1, 1, 1] : tensor<100x300x8xf32> to tensor<?x300x8xf32>
%4 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["reduction", "parallel", "parallel"]} ins(%extracted_slice : tensor<?x300x8xf32>) outs(%arg3 : tensor<300x8xf32>) {
^bb0(%in: f32, %out: f32):
%5 = arith.addf %in, %out : f32
linalg.yield %5 : f32
} -> tensor<300x8xf32>
scf.forall.in_parallel {
tensor.parallel_insert_slice %4 into %arg3[0, 0] [300, 8] [1, 1] : tensor<300x8xf32> into tensor<300x8xf32>
}
}
return %0 : tensor<300x8xf32>
}
```
We can easily see that this is not safe to parallelize because all
threads would be writing to the same position in `%arg3` (in the
`scf.forall.in_parallel`.
This PR detects wether it's safe to `tile_using_forall` and emits a
warning in the case it is not.
**Brief explanation**
It first generates a vector of affine expressions representing the tile
values and stores it in `dimExprs`. These affine expressions are
compared with the affine expressions coming from the results of the
affine map of each output in the linalg op. So going back to the
previous example, the original transform is:
```
#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d1, d2)>
func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> {
// expected-warning@+1 {{tiling is not thread safe at axis #0}}
%0 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction", "parallel", "parallel"]} ins(%arg0 : tensor<100x300x8xf32>) outs(%arg1 : tensor<300x8xf32>) {
^bb0(%in: f32, %out: f32):
%1 = arith.addf %in, %out : f32
linalg.yield %1 : f32
} -> tensor<300x8xf32>
return %0 : tensor<300x8xf32>
}
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
%0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!transform.any_op) -> !transform.any_op
%forall, %tiled_generic = transform.structured.tile_using_forall %0 num_threads [8]
: (!transform.any_op) -> (!transform.any_op, !transform.any_op)
transform.yield
}
}
```
The `num_threads` attribute would be represented as `(d0)`. Because the
linalg op has only one output (`arg1`) it would only check against the
results of `#map1`, which are `(d1, d2)`. The idea is to check that all
affine expressions in `dimExprs` are present in the output affine map.
In this example, `d0` is not in `(d1, d2)`, so tiling that axis is
considered not thread safe.
|
|
(#85632)
Updates `extendVectorRank` so that scalability in patterns
that use it (in particular, `TransferWriteNonPermutationLowering`),
is correctly propagated.
Closed related previous PR
https://github.com/llvm/llvm-project/pull/85270
---------
Signed-off-by: Crefeda Rodrigues <crefeda.rodrigues@arm.com>
Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
|
|
(#81906)
Updates `castAwayContractionLeadingOneDim` to inherit from
`MaskableOpRewritePattern` so that this pattern can support masking.
Builds on top of #83827
|
|
This commit relaxes the assumption of type consistency for LLVM dialect
load and store operations in SROA. Instead, there is now a check that
loads and stores are in the bounds specified by the sub-slot they
access.
This commit additionally removes the corresponding patterns from the
type consistency pass, as they are no longer necessary.
Note: It will be necessary to extend Mem2Reg with the logic for
differently sized accesses as well. This is non-the-less a strict
upgrade for productive flows, as the type consistency pass can produce
invalid IR for some odd cases.
|
|
|
|
Investigate the lowering of MemRef Load/Store ops and implement
additional folding of created ops
Aims to improve readability of generated lowered SPIR-V code.
Part of work llvm#70704
|
|
This adds a new API built with the `ValueBoundsConstraintSet` to compute
the bounds of possibly scalable quantities. It uses knowledge of the
range of vscale (which is defined by the target architecture), to solve
for the bound as either a constant or an expression in terms of vscale.
The result is an `AffineMap` that will always take at most one
parameter, vscale, and returns a single result, which is the bound of
`value`.
The API is defined as follows:
```c++
FailureOr<ConstantOrScalableBound>
vector::ScalableValueBoundsConstraintSet::computeScalableBound(
Value value, std::optional<int64_t> dim,
unsigned vscaleMin, unsigned vscaleMax,
presburger::BoundType boundType,
bool closedUB = true,
StopConditionFn stopCondition = nullptr);
```
Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap`
with a utility for converting the bound to a single quantity (i.e. a
size and scalable flag).
We believe this API could prove useful downstream in IREE (which uses a
similar analysis to hoist allocas, which currently fails for scalable
vectors).
|
|
Make form-expressions not create `emitc.expression`s for operations
inside the `emitc.expression`s, since they are invalid.
|
|
This converts `memref.alloca`, `memref.load` & `memref.store` to
`emitc.variable`, `emitc.subscript` and `emitc.assign`.
|
|
The existing implementation of tosa-layerwise-constant-fold only works
for constant values backed by DenseElementsAttr. For constants which
hold DenseResourceAttrs, the folder will end up asserting at runtime, as
it assumes that the backing data can always be accessed through
ElementsAttr::getValues.
This change reworks the logic so that types types used to perform
folding are based on whether the ElementsAttr can be converted to a
range of that particular type.
---------
Co-authored-by: Spenser Bauman <sabauma@mathworks.com>
Co-authored-by: Tina Jung <tinamaria.jung@amd.com>
|
|
|
|
We currently always lower shuffle to the struct-returning variant. I saw
some cases where this survived all the way through ptx, resulting in
increased register usage. The easiest fix is to simply lower to the
single-result version when the predicate is unused.
|
|
When importing from LLVM IR the data layout of all pointer types
contains an index bitwidth that should be used for index computations.
This revision adds a getter to the DataLayout that provides access to
the already stored bitwidth. The function returns an optional since only
pointer-like types have an index bitwidth. Querying the bitwidth of a
non-pointer type returns std::nullopt.
The new function works for the built-in Index type and, using a type
interface, for the LLVMPointerType.
|
|
(#86037)
The previous implementation was doing an early successful return on
`rank <= 1` without adding the original op to transform results. This
resulted in errors about number of returns. This patch fixes this by
adding the original op to results. Additionally, we first check if op is
elementwise and return a slienceable failure early if not.
|
|
(#83964)
One-Shot Bufferize currently does not support loops where a yielded
value bufferizes to a buffer that is different from the buffer of the
region iter_arg. In such a case, the bufferization fails with an error
such as:
```
Yield operand #0 is not equivalent to the corresponding iter bbArg
scf.yield %0 : tensor<5xf32>
```
One common reason for non-equivalent buffers is that an op on the path
from the region iter_arg to the terminator bufferizes out-of-place. Ops
that are analyzed earlier are more likely to bufferize in-place.
This commit adds a new heuristic that gives preference to ops that are
reachable on the reverse SSA use-def chain from a region terminator and
are within the parent region of the terminator. This is expected to work
better than the existing heuristics for loops where an iter_arg is
written to multiple times within a loop, but only one write is fed into
the terminator.
Current users of One-Shot Bufferize are not affected by this change.
"Bottom-up" is still the default heuristic. Users can switch to the new
heuristic manually.
This commit also turns the "fuzzer" pass option into a heuristic,
cleaning up the code a bit.
|
|
|
|
|