mlir/docs/Tutorials/MlirOpt.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294

# Using `mlir-opt`

`mlir-opt` is a command-line entry point for running passes and lowerings on MLIR code.
This tutorial will explain how to use `mlir-opt`, show some examples of its usage,
and mention some useful tips for working with it.

Prerequisites:

- [Building MLIR from source](/getting_started/)
- [MLIR Language Reference](/docs/LangRef/)

[TOC]

## `mlir-opt` basics

The `mlir-opt` tool loads a textual IR or bytecode into an in-memory structure,
and optionally executes a sequence of passes
before serializing back the IR (textual form by default).
It is intended as a testing and debugging utility.

After building the MLIR project,
the `mlir-opt` binary (located in `build/bin`)
is the entry point for running passes and lowerings,
as well as emitting debug and diagnostic data.

Running `mlir-opt` with no flags will consume textual or bytecode IR
from the standard input, parse and run verifiers on it,
and write the textual format back to the standard output.
This is a good way to test if an input MLIR is well-formed.

`mlir-opt --help` shows a complete list of flags
(there are nearly 1000).
Each pass has its own flag,
though it is recommended to use `--pass-pipeline`
to run passes rather than bare flags.

## Running a pass

Next we run [`convert-to-llvm`](/docs/Passes/#-convert-to-llvm),
which converts all supported dialects to the `llvm` dialect,
on the following IR:

```mlir
// mlir/test/Examples/mlir-opt/ctlz.mlir
module {
  func.func @main(%arg0: i32) -> i32 {
    %0 = math.ctlz %arg0 : i32
    func.return %0 : i32
  }
}
```

After building MLIR, and from the `llvm-project` base directory, run

```bash
build/bin/mlir-opt --pass-pipeline="builtin.module(convert-math-to-llvm)" mlir/test/Examples/mlir-opt/ctlz.mlir
```

which produces

```mlir
module {
  func.func @main(%arg0: i32) -> i32 {
    %0 = "llvm.intr.ctlz"(%arg0) <{is_zero_poison = false}> : (i32) -> i32
    return %0 : i32
  }
}
```

Note that `llvm` here is MLIR's `llvm` dialect,
which would still need to be processed through `mlir-translate`
to generate LLVM-IR.

## Running a pass with options

Next we will show how to run a pass that takes configuration options.
Consider the following IR containing loops with poor cache locality.

```mlir
// mlir/test/Examples/mlir-opt/loop_fusion.mlir
module {
  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
    %0 = memref.alloc() : memref<10xf32>
    %1 = memref.alloc() : memref<10xf32>
    %cst = arith.constant 0.000000e+00 : f32
    affine.for %arg2 = 0 to 10 {
      affine.store %cst, %0[%arg2] : memref<10xf32>
      affine.store %cst, %1[%arg2] : memref<10xf32>
    }
    affine.for %arg2 = 0 to 10 {
      %2 = affine.load %0[%arg2] : memref<10xf32>
      %3 = arith.addf %2, %2 : f32
      affine.store %3, %arg0[%arg2] : memref<10xf32>
    }
    affine.for %arg2 = 0 to 10 {
      %2 = affine.load %1[%arg2] : memref<10xf32>
      %3 = arith.mulf %2, %2 : f32
      affine.store %3, %arg1[%arg2] : memref<10xf32>
    }
    return
  }
}
```

Running this with the [`affine-loop-fusion`](/docs/Passes/#-affine-loop-fusion) pass
produces a fused loop.

```bash
build/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion)" mlir/test/Examples/mlir-opt/loop_fusion.mlir
```

```mlir
module {
  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
    %alloc = memref.alloc() : memref<1xf32>
    %alloc_0 = memref.alloc() : memref<1xf32>
    %cst = arith.constant 0.000000e+00 : f32
    affine.for %arg2 = 0 to 10 {
      affine.store %cst, %alloc[0] : memref<1xf32>
      affine.store %cst, %alloc_0[0] : memref<1xf32>
      %0 = affine.load %alloc_0[0] : memref<1xf32>
      %1 = arith.mulf %0, %0 : f32
      affine.store %1, %arg1[%arg2] : memref<10xf32>
      %2 = affine.load %alloc[0] : memref<1xf32>
      %3 = arith.addf %2, %2 : f32
      affine.store %3, %arg0[%arg2] : memref<10xf32>
    }
    return
  }
}
```

This pass has options that allow the user to configure its behavior.
For example, the `fusion-compute-tolerance` option
is described as the "fractional increase in additional computation tolerated while fusing."
If this value is set to zero on the command line,
the pass will not fuse the loops.

```bash
build/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion{fusion-compute-tolerance=0})" \
mlir/test/Examples/mlir-opt/loop_fusion.mlir
```

```mlir
module {
  func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
    %alloc = memref.alloc() : memref<10xf32>
    %alloc_0 = memref.alloc() : memref<10xf32>
    %cst = arith.constant 0.000000e+00 : f32
    affine.for %arg2 = 0 to 10 {
      affine.store %cst, %alloc[%arg2] : memref<10xf32>
      affine.store %cst, %alloc_0[%arg2] : memref<10xf32>
    }
    affine.for %arg2 = 0 to 10 {
      %0 = affine.load %alloc[%arg2] : memref<10xf32>
      %1 = arith.addf %0, %0 : f32
      affine.store %1, %arg0[%arg2] : memref<10xf32>
    }
    affine.for %arg2 = 0 to 10 {
      %0 = affine.load %alloc_0[%arg2] : memref<10xf32>
      %1 = arith.mulf %0, %0 : f32
      affine.store %1, %arg1[%arg2] : memref<10xf32>
    }
    return
  }
}
```

Options passed to a pass
are specified via the syntax `{option1=value1 option2=value2 ...}`,
i.e., use space-separated `key=value` pairs for each option.

## Building a pass pipeline on the command line

The `--pass-pipeline` flag supports combining multiple passes into a pipeline.
So far we have used the trivial pipeline with a single pass
that is "anchored" on the top-level `builtin.module` op.
[Pass anchoring](/docs/PassManagement/#oppassmanager)
is a way for passes to specify
that they only run on particular ops.
While many passes are anchored on `builtin.module`,
if you try to run a pass that is anchored on some other op
inside `--pass-pipeline="builtin.module(pass-name)"`,
it will not run.

Multiple passes can be chained together
by providing the pass names in a comma-separated list
in the `--pass-pipeline` string,
e.g.,
`--pass-pipeline="builtin.module(pass1,pass2)"`.
The passes will be run sequentially.

To use passes that have nontrivial anchoring,
the appropriate level of nesting must be specified
in the pass pipeline.
For example, consider the following IR which has the same redundant code,
but in two different levels of nesting.

```mlir
module {
  module {
    func.func @func1(%arg0: i32) -> i32 {
      %0 = arith.addi %arg0, %arg0 : i32
      %1 = arith.addi %arg0, %arg0 : i32
      %2 = arith.addi %0, %1 : i32
      func.return %2 : i32
    }
  }

  gpu.module @gpu_module {
    gpu.func @func2(%arg0: i32) -> i32 {
      %0 = arith.addi %arg0, %arg0 : i32
      %1 = arith.addi %arg0, %arg0 : i32
      %2 = arith.addi %0, %1 : i32
      gpu.return %2 : i32
    }
  }
}
```

The following pipeline runs `cse` (common subexpression elimination)
but only on the `func.func` inside the two `builtin.module` ops.

```bash
build/bin/mlir-opt mlir/test/Examples/mlir-opt/ctlz.mlir --pass-pipeline='
    builtin.module(
        builtin.module(
            func.func(cse,canonicalize),
            convert-to-llvm
        )
    )'
```

The output leaves the `gpu.module` alone

```mlir
module {
  module {
    llvm.func @func1(%arg0: i32) -> i32 {
      %0 = llvm.add %arg0, %arg0 : i32
      %1 = llvm.add %0, %0 : i32
      llvm.return %1 : i32
    }
  }
  gpu.module @gpu_module {
    gpu.func @func2(%arg0: i32) -> i32 {
      %0 = arith.addi %arg0, %arg0 : i32
      %1 = arith.addi %arg0, %arg0 : i32
      %2 = arith.addi %0, %1 : i32
      gpu.return %2 : i32
    }
  }
}
```

Specifying a pass pipeline with nested anchoring
is also beneficial for performance reasons:
passes with anchoring can run on IR subsets in parallel,
which provides better threaded runtime and cache locality
within threads.
For example,
even if a pass is not restricted to anchor on `func.func`,
running `builtin.module(func.func(cse, canonicalize))`
is more efficient than `builtin.module(cse, canonicalize)`.

For a spec of the pass-pipeline textual description language,
see [the docs](/docs/PassManagement/#textual-pass-pipeline-specification).
For more general information on pass management, see [Pass Infrastructure](/docs/PassManagement/#).

## Useful CLI flags

- `--debug` prints all debug information produced by `LLVM_DEBUG` calls.
- `--debug-only="my-tag"` prints only the debug information produced by `LLVM_DEBUG`
  in files that have the macro `#define DEBUG_TYPE "my-tag"`.
  This often allows you to print only debug information associated with a specific pass.
    - `"greedy-rewriter"` only prints debug information
      for patterns applied with the greedy rewriter engine.
    - `"dialect-conversion"` only prints debug information
      for the dialect conversion framework.
 - `--emit-bytecode` emits MLIR in the bytecode format.
 - `--mlir-pass-statistics` print statistics about the passes run.
    These are generated via [pass statistics](/docs/PassManagement/#pass-statistics).
 - `--mlir-print-ir-after-all` prints the IR after each pass.
    - See also `--mlir-print-ir-after-change`, `--mlir-print-ir-after-failure`,
      and analogous versions of these flags with `before` instead of `after`.
    - When using `print-ir` flags, adding `--mlir-print-ir-tree-dir` writes the
      IRs to files in a directory tree, making them easier to inspect versus a
      large dump to the terminal.
 - `--mlir-timing` displays execution times of each pass.

## Further readering

- [List of passes](/docs/Passes/)
- [List of dialects](/docs/Dialects/)