Age | Commit message (Collapse) | Author | Files | Lines |
|
This uses the tablegen macros for generating pass constructors, exposing
pass options for fold-unit-extent-dims and linalg-detensorize.
Additionally aligns some of the pass namings to their text counterpart.
This includes an API change:
createLinalgGeneralizationPass -> createLinalgGeneralizeNamedOpsPass
|
|
When creating a new block in (conversion) rewrite patterns,
`OpBuilder::createBlock` must be used. Otherwise, no
`notifyBlockInserted` notification is sent to the listener.
Note: The dialect conversion relies on listener notifications to keep
track of IR modifications. Creating blocks without the builder API can
lead to memory leaks during rollback.
|
|
function pass to finalize, utilised in convertTarget (#78818)
This patch seeks to add a mechanism to raise constant (not ConstantExpr
or runtime/dynamic) sized allocations into the entry block for select
functions that have been inserted into a list for processing. This
processing occurs during the finalize call, after OutlinedInfo regions
have completed. This currently has only been utilised for
createOutlinedFunction, which is triggered for TargetOp generation in
the OpenMP MLIR dialect lowering to LLVM-IR.
This currently is required for Target kernels generated by
createOutlinedFunction to avoid subsequent optimization passes doing
some unintentional malformed optimizations for AMD kernels (unsure if it
occurs for other vendors). If the allocas are generated inside of the
kernel and are not in the entry block and are subsequently passed to a
function this can lead to required instructions being erased or
manipulated in a way that causes the kernel to run into a HSA access
error.
This fix is related to a series of problems found in:
https://github.com/llvm/llvm-project/issues/74603
This problem primarily presents itself for Flang's HLFIR AssignOp
currently, when utilised with a scalar temporary constant on the RHS and
a descriptor type on the LHS. It will generate a call to a runtime
function, wrap the RHS temporary in a newly allocated descriptor (an
llvm struct), and pass both the LHS and RHS descriptor into the runtime
function call. This will currently be
embedded into the middle of the target region in the user entry block,
which means the allocas are also embedded in the middle, which seems to
pose
issues when later passes are executed. This issue may present itself in
other HLFIR operations or unrelated operations that generate allocas as
a by product, but for the moment, this one test case is the only
scenario I've found this problem.
Perhaps this is not the appropriate fix, I am very open to other
suggestions, I've tried a few others (at varying levels of the
flang/mlir compiler flow), but this one is the smallest and least
intrusive change set. The other two, that come to mind (but I've not
fully looked into, the former I tried a little with blocks but it had a
few issues I'd need to think through):
- Having a proper alloca only block (or region) generated for TargetOps
that we could merge into the entry block that's generated by
convertTarget's createOutlinedFunction.
- Or diverging a little from Clang's current target generation and using
the CodeExtractor to generate the user code as an outlined function
region invoked from the kernel we make, with our kernel arguments passed
into it. Similar to the current parallel generation. I am not sure how
well this would intermingle with the existing parallel generation though
that's layered in.
Both of these methods seem like quite a divergence from the current
status quo, which I am not entirely sure is merited for the small test
this change aims to fix.
|
|
|
|
remove some obsoleted APIs from the library that have been fully
replaced with actual direct IR codegen
|
|
|
|
This is a follow-up to #82333. It is possible that the target block of a
`BlockTypeConversionRewrite` is detached, so the `MLIRContext` cannot be
taken from the block.
|
|
Related discussion:
https://github.com/llvm/llvm-project/pull/73908/files#r1414913030.
This change fixes #73547.
|
|
This commit fixes memory leaks that were introduced by #81759. The way
ops and blocks are erased changed slightly.
The leaks were caused by an incorrect implementation of op builders:
blocks must be created with the supplied builder object. Otherwise, they
are not properly tracked by the dialect conversion and can leak during
rollback.
|
|
Fix Test ARM SME library and build rule.
|
|
These patterns can already be used via
populateMathPolynomialApproximationPatterns, but that includes a number
of other patterns that may not be needed.
There are already similar functions for expansion.
For now only adding tanh and erf since I have a concrete use case for
these two.
|
|
`%ld` specifier is defined to work on values of type `long`. The parameter given to `fprintf` is of type `intptr_t` whose actual underlying integer type is unspecified. On Unix systems it happens to commonly be `long` but on 64-bit Windows it is defined as `long long`.
The cross-platform way to print a `intptr_t` is to use `PRIdPTR` which expands to the correct format specifier for `intptr_t`. This avoids any undefined behaviour and compiler warnings.
|
|
`ConversionPatternRewriter` (#82333)
`ConversionPatternRewriterImpl` no longer maintains a reference to the
respective `ConversionPatternRewriter`. An `MLIRContext` is sufficient.
This commit simplifies the internal state of
`ConversionPatternRewriterImpl`.
|
|
`ConversionConfig` (#82250)
This commit adds a new `ConversionConfig` struct that allows users to
customize the dialect conversion. This configuration is similar to
`GreedyRewriteConfig` for the greedy pattern rewrite driver.
A few existing options are moved to this objects, simplifying the
dialect conversion API.
|
|
This also generally increases the coverage of scalable vector types in
the math-to-llvm tests.
|
|
The ArmSME compilation pipeline has evolved significantly and is now
sufficiently complex enough that it warrants a proper lowering pipeline
that encapsulates the various passes and orderings. Currently the
pipeline is loosely defined in our integration tests, but these have
diverged and are not using the same passes or ordering everywhere.
This patch introduces a test-lower-to-arm-sme pipeline mirroring
test-lower-to-llvm that provides some sanity when running e2e examples
and can be used a reference for targeting ArmSME in MLIR.
All the integration tests are updated to use this pipeline. The
intention is to productize the pipeline once it becomes more mature.
|
|
`ConversionPatternRewriter` objects should not be constructed outside of
dialect conversions. Some IR modifications performed through a
`ConversionPatternRewriter` are reflected in the IR in a delayed fashion
(e.g., only when the dialect conversion is guaranteed to succeed). Using
a `ConversionPatternRewriter` outside of the dialect conversion is
incorrect API usage and can bring the IR in an inconsistent state.
Migration guide: Use `IRRewriter` instead of
`ConversionPatternRewriter`.
|
|
This revision handles the case that the translation of a scope fails due
to cyclic metadata. This mainly affects the import of debug intrinsics
that indirectly take such a scope as metadata argument (e.g. via local
variable or label metadata). This commit ensures we drop intrinsics with
such a dependency on cyclic metadata.
|
|
(#81761)
This commit is a refactoring of the dialect conversion. The dialect
conversion maintains a list of "IR rewrites" that can be committed (upon
success) or rolled back (upon failure).
This commit turns the creation of unresolved materializations
(`unrealized_conversion_cast`) into `IRRewrite` objects. After this
commit, all steps in `applyRewrites` and `discardRewrites` are calls to
`IRRewrite::commit` and `IRRewrite::rollback`.
|
|
This commit is a refactoring of the dialect conversion. The dialect
conversion maintains a list of "IR rewrites" that can be committed (upon
success) or rolled back (upon failure).
Until now, the dialect conversion kept track of "op creation" in
separate internal data structures. This commit turns "op creation" into
an `IRRewrite` that can be committed and rolled back just like any other
rewrite. This commit simplifies the internal state of the dialect
conversion.
|
|
(#81757)
This commit is a refactoring of the dialect conversion. The dialect
conversion maintains a list of "IR rewrites" that can be committed (upon
success) or rolled back (upon failure).
Until now, op replacements and block argument replacements were kept
track in separate data structures inside the dialect conversion. This
commit turns them into `IRRewrite`s, so that they can be committed or
rolled back just like any other rewrite. This simplifies the internal
state of the dialect conversion.
Overview of changes:
* Add two new rewrite classes: `ReplaceBlockArgRewrite` and
`ReplaceOperationRewrite`. Remove the `OpReplacement` helper class; it
is now part of `ReplaceOperationRewrite`.
* Simplify `RewriterState`: `numReplacements` and `numArgReplacements`
are no longer needed. (Now being kept track of by `numRewrites`.)
* Add `IRRewrite::cleanup`. Operations should not be erased in `commit`
because they may still be referenced in other internal state of the
dialect conversion (`mapping`). Detaching operations is fine.
* `trackedOps` are now updated during the "commit" phase instead of
after applying all rewrites.
|
|
namespace qualified (#82682)
`extraSharedClassDeclaration` of `FunctionOpInterface` can be inherited
by other `OpInterfaces` into foreign namespaces, thus types must be
fully qualified to prevent compiler errors, for example:
def MyFunc : OpInterface<"MyFunc", [FunctionOpInterface]> {
let cppNamespace = "::MyNamespace";
}
|
|
This test failed after landing #81964 due to a bad merge. I provided a quick fix and this PR is adding the rest of CHECK rules that were not merged properly.
|
|
- Add Tosa Sin and Cos operators to the MLIR dialect
- Define the new Tosa_FloatTensor type
---------
Signed-off-by: Jerry Ge <jerry.ge@arm.com>
|
|
AnyInteger (#82694)
LLVM IR does not support signed integer, the LLVM dialect was
underspecified (likely unintentionally) and the AnyInteger constraint
was overly lax.
The arithmetic dialect is already consistently using AnySignlessInteger.
|
|
|
|
It looks like the affine map generated to compute the indices of the
collapsed dimensions used the wrong dim size. For indices `[idx0][idx1]`
we computed the collapsed index as `idx0*size0 + idx1` instead of
`idx0*size1 + idx1`. This led to correctness issues in convolution tests
when enabling this transformation internally.
|
|
tosa.clamp takes `min`/`max` attributes as i64, so ensure that the
lowering to linalg works for the whole range.
Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>
|
|
(#82442)
Don't require that `mesh.shard` operations come in pairs. If there is
only a single `mesh.shard` operation we assume that the producer result
and consumer operand have the same sharding.
|
|
Fix for https://lab.llvm.org/buildbot/#/builders/179/builds/9438
|
|
|
|
This commit is a refactoring of the dialect conversion. The dialect
conversion maintains a list of "IR rewrites" that can be committed (upon
success) or rolled back (upon failure).
Until now, the signature conversion of a block was only a "partial" IR
rewrite. Rollbacks were triggered via
`BlockTypeConversionRewrite::rollback`, but there was no
`BlockTypeConversionRewrite::commit` equivalent.
Overview of changes:
* Remove `ArgConverter`, an internal helper class that kept track of all
block type conversions. There is now a separate
`BlockTypeConversionRewrite` for each block type conversion.
* No more special handling for block type conversions. They are now
normal "IR rewrites", just like "block creation" or "block movement". In
particular, trigger "commits" of block type conversion via
`BlockTypeConversionRewrite::commit`.
* Remove `ArgConverter::notifyOpRemoved`. This function was used to
inform the `ArgConverter` that an operation was erased, to prevent a
double-free of operations in certain situations. It would be unpractical
to add a `notifyOpRemoved` API to `IRRewrite`. Instead, erasing
ops/block should go through a new `SingleEraseRewriter` (that is owned
by the `ConversionPatternRewriterImpl`) if there is chance of
double-free. This rewriter ignores `eraseOp`/`eraseBlock` if the
op/block was already freed.
|
|
This commit improves the block signature conversion API of the dialect
conversion.
There is the following comment in
`ArgConverter::applySignatureConversion`:
```
// If no arguments are being changed or added, there is nothing to do.
```
However, the implementation actually used to replace a block with a new
block even if the block argument types do not change (i.e., there is
"nothing to do"). This is fixed in this commit. The documentation of the
public `ConversionPatternRewriter` API is updated accordingly.
This commit also removes a check that used to *sometimes* skip a block
signature conversion if the block was already converted. This is not
consistent with the public `ConversionPatternRewriter` API; blocks
should always be converted, regardless of whether they were already
converted or not.
Block signature conversion also used to be silently skipped when the
specified block was detached. Instead of silently skipping, an assertion
is triggered. Attempting to convert a detached block (which is likely an
erased block) is invalid API usage.
|
|
narrow type emulation (#82550)
This PR replaces the generation of `vector.shuffle` with
`vector.interleave` in the i4 conversions in vector narrow type
emulation. The multi dimensional semantics of `vector.interleave` allow
us to enable these conversion emulations also for multi dimensional
vectors.
|
|
SerializeNVVMTarget.cpp (NFC)
|
|
(NFC)
|
|
|
|
InterfaceAttachmentTest.cpp (NFC)
|
|
SerializationTest.cpp (NFC)
|
|
The `SerializeToCubin` pass was deprecated in September 2023 in favor of
GPU compilation attributes; see the [GPU
compilation](https://mlir.llvm.org/docs/Dialects/GPU/#gpu-compilation)
section in the `gpu` dialect MLIR docs.
This patch removes `SerializeToCubin` from the repo.
|
|
I believe the semantics should be the same, but this saves 1 op and simplifies the code.
For example, the following two instructions:
```
%2 = cmp sgt %0, %1
%3 = select %2, %0, %1
```
Are equivalent to:
```
%2 = maxsi %0 %1
```
|
|
Summary:
Currently, OpenMP handles the `omp requires` clause by emitting a global
constructor into the runtime for every translation unit that requires
it. However, this is not a great solution because it prevents us from
having a defined order in which the runtime is accessed and used.
This patch changes the approach to no longer use global constructors,
but to instead group the flag with the other offloading entires that we
already handle. This has the effect of still registering each flag per
requires TU, but now we have a single constructor that handles
everything.
This function removes support for the old `__tgt_register_requires` and
replaces it with a warning message. We just had a recent release, and
the OpenMP policy for the past four releases since we switched to LLVM
is that we do not provide strict backwards compatibility between major
LLVM releases now that the library is versioned. This means that a user
will need to recompile if they have an old binary that relied on
`register_requires` having the old behavior. It is important that we
actively deprecate this, as otherwise it would not solve the problem of
having no defined init and shutdown order for `libomptarget`. The
problem of `libomptarget` not having a define init and shutdown order
cascades into a lot of other issues so I have a strong incentive to be
rid of it.
It is worth noting that the current `__tgt_offload_entry` only has space
for a 32-bit integer here. I am planning to overhaul these at some point
as well.
|
|
This PR adds an optional bitwidth parameter to the vector xfer op
flattening transformation so that the flattening doesn't happen if the
trailing dimension of the read/writen vector is larger than this
bitwidth (i.e., we are already able to fill at least one vector register
with that size).
|
|
When a `ModifyOperationRewrite` is committed, the operation may already
have been erased, so `OperationName` must be cached in the rewrite
object.
Note: This will no longer be needed with #81757, which adds a "cleanup"
method to `IRRewrite`.
|
|
* When converting a block signature, `ArgConverter` creates a new block
with the new signature and moves all operation from the old block to the
new block. The new block is temporarily inserted into a region that is
stored in `regionMapping`. The old block is not yet deleted, so that the
conversion can be rolled back. `regionMapping` is not needed. Instead of
moving the old block to a temporary region, it can just be unlinked.
Block erasures are handles in the same way in the dialect conversion.
* `regionToConverter` is a mapping from regions to type converter. That
field is never accessed within `ArgConverter`. It should be stored in
`ConversionPatternRewriterImpl` instead.
* `convertedBlocks` is not needed. Old blocks are already stored in
`ConvertedBlockInfo`.
|
|
(#82474)
The dialect conversion rolls back in-place op modifications upon
failure. Rolling back modifications of attributes is already supported,
but there was no support for properties until now.
|
|
This commit simplifies the internal state of the dialect conversion. A
separate field for the previous state of in-place op modifications is no
longer needed.
|
|
Use const reference for loop variable.
|
|
Fix a leak of the root operation not being deleted in the recently
introduced transform_interpreter.c.
|
|
_SubClassValueT is only useful when it is has >1 usage in a signature.
This was not true for the signatures produced by tblgen.
For example
def call(result, callee, operands_, *, loc=None, ip=None) ->
_SubClassValueT:
...
here a type checker does not have enough information to infer a type
argument for _SubClassValueT, and thus effectively treats it as Any.
|