aboutsummaryrefslogtreecommitdiff
path: root/mlir/docs/LangRef.md
blob: 07f804018c2aa2a47f8ea8125e0f700c039a845c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
# MLIR Language Reference

MLIR (Multi-Level IR) is a compiler intermediate representation with
similarities to traditional three-address SSA representations (like
[LLVM IR](http://llvm.org/docs/LangRef.html) or
[SIL](https://github.com/apple/swift/blob/main/docs/SIL.rst)), but which
introduces notions from polyhedral loop optimization as first-class concepts.
This hybrid design is optimized to represent, analyze, and transform high level
dataflow graphs as well as target-specific code generated for high performance
data parallel systems. Beyond its representational capabilities, its single
continuous design provides a framework to lower from dataflow graphs to
high-performance target-specific code.

This document defines and describes the key concepts in MLIR, and is intended to
be a dry reference document - the
[rationale documentation](Rationale/Rationale.md),
[glossary](../getting_started/Glossary.md), and other content are hosted
elsewhere.

MLIR is designed to be used in three different forms: a human-readable textual
form suitable for debugging, an in-memory form suitable for programmatic
transformations and analysis, and a compact serialized form suitable for storage
and transport. The different forms all describe the same semantic content. This
document describes the human-readable textual form.

[TOC]

## High-Level Structure

MLIR is fundamentally based on a graph-like data structure of nodes, called
*Operations*, and edges, called *Values*. Each Value is the result of exactly
one Operation or Block Argument, and has a *Value Type* defined by the
[type system](#type-system). [Operations](#operations) are contained in
[Blocks](#blocks) and Blocks are contained in [Regions](#regions). Operations
are also ordered within their containing block and Blocks are ordered in their
containing region, although this order may or may not be semantically meaningful
in a given [kind of region](Interfaces.md/#regionkindinterfaces)). Operations
may also contain regions, enabling hierarchical structures to be represented.

Operations can represent many different concepts, from higher-level concepts
like function definitions, function calls, buffer allocations, view or slices of
buffers, and process creation, to lower-level concepts like target-independent
arithmetic, target-specific instructions, configuration registers, and logic
gates. These different concepts are represented by different operations in MLIR
and the set of operations usable in MLIR can be arbitrarily extended.

MLIR also provides an extensible framework for transformations on operations,
using familiar concepts of compiler [Passes](Passes.md). Enabling an arbitrary
set of passes on an arbitrary set of operations results in a significant scaling
challenge, since each transformation must potentially take into account the
semantics of any operation. MLIR addresses this complexity by allowing operation
semantics to be described abstractly using [Traits](Traits.md) and
[Interfaces](Interfaces.md), enabling transformations to operate on operations
more generically. Traits often describe verification constraints on valid IR,
enabling complex invariants to be captured and checked. (see
[Op vs Operation](Tutorials/Toy/Ch-2.md/#op-vs-operation-using-mlir-operations))

One obvious application of MLIR is to represent an
[SSA-based](https://en.wikipedia.org/wiki/Static_single_assignment_form) IR,
like the LLVM core IR, with appropriate choice of operation types to define
Modules, Functions, Branches, Memory Allocation, and verification constraints to
ensure the SSA Dominance property. MLIR includes a collection of dialects which
defines just such structures. However, MLIR is intended to be general enough to
represent other compiler-like data structures, such as Abstract Syntax Trees in
a language frontend, generated instructions in a target-specific backend, or
circuits in a High-Level Synthesis tool.

Here's an example of an MLIR module:

```mlir
// Compute A*B using an implementation of multiply kernel and print the
// result using a TensorFlow op. The dimensions of A and B are partially
// known. The shapes are assumed to match.
func.func @mul(%A: tensor<100x?xf32>, %B: tensor<?x50xf32>) -> (tensor<100x50xf32>) {
  // Compute the inner dimension of %A using the dim operation.
  %n = memref.dim %A, 1 : tensor<100x?xf32>

  // Allocate addressable "buffers" and copy tensors %A and %B into them.
  %A_m = memref.alloc(%n) : memref<100x?xf32>
  memref.tensor_store %A to %A_m : memref<100x?xf32>

  %B_m = memref.alloc(%n) : memref<?x50xf32>
  memref.tensor_store %B to %B_m : memref<?x50xf32>

  // Call function @multiply passing memrefs as arguments,
  // and getting returned the result of the multiplication.
  %C_m = call @multiply(%A_m, %B_m)
          : (memref<100x?xf32>, memref<?x50xf32>) -> (memref<100x50xf32>)

  memref.dealloc %A_m : memref<100x?xf32>
  memref.dealloc %B_m : memref<?x50xf32>

  // Load the buffer data into a higher level "tensor" value.
  %C = memref.tensor_load %C_m : memref<100x50xf32>
  memref.dealloc %C_m : memref<100x50xf32>

  // Call TensorFlow built-in function to print the result tensor.
  "tf.Print"(%C){message: "mul result"} : (tensor<100x50xf32>) -> (tensor<100x50xf32>)

  return %C : tensor<100x50xf32>
}

// A function that multiplies two memrefs and returns the result.
func.func @multiply(%A: memref<100x?xf32>, %B: memref<?x50xf32>)
          -> (memref<100x50xf32>)  {
  // Compute the inner dimension of %A.
  %n = memref.dim %A, 1 : memref<100x?xf32>

  // Allocate memory for the multiplication result.
  %C = memref.alloc() : memref<100x50xf32>

  // Multiplication loop nest.
  affine.for %i = 0 to 100 {
     affine.for %j = 0 to 50 {
        memref.store 0 to %C[%i, %j] : memref<100x50xf32>
        affine.for %k = 0 to %n {
           %a_v  = memref.load %A[%i, %k] : memref<100x?xf32>
           %b_v  = memref.load %B[%k, %j] : memref<?x50xf32>
           %prod = arith.mulf %a_v, %b_v : f32
           %c_v  = memref.load %C[%i, %j] : memref<100x50xf32>
           %sum  = arith.addf %c_v, %prod : f32
           memref.store %sum, %C[%i, %j] : memref<100x50xf32>
        }
     }
  }
  return %C : memref<100x50xf32>
}
```

## Notation

MLIR has a simple and unambiguous grammar, allowing it to reliably round-trip
through a textual form. This is important for development of the compiler - e.g.
for understanding the state of code as it is being transformed and writing test
cases.

This document describes the grammar using
[Extended Backus-Naur Form (EBNF)](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form).

This is the EBNF grammar used in this document, presented in yellow boxes.

```
alternation ::= expr0 | expr1 | expr2  // Either expr0 or expr1 or expr2.
sequence    ::= expr0 expr1 expr2      // Sequence of expr0 expr1 expr2.
repetition0 ::= expr*  // 0 or more occurrences.
repetition1 ::= expr+  // 1 or more occurrences.
optionality ::= expr?  // 0 or 1 occurrence.
grouping    ::= (expr) // Everything inside parens is grouped together.
literal     ::= `abcd` // Matches the literal `abcd`.
```

Code examples are presented in blue boxes.

```
// This is an example use of the grammar above:
// This matches things like: ba, bana, boma, banana, banoma, bomana...
example ::= `b` (`an` | `om`)* `a`
```

### Common syntax

The following core grammar productions are used in this document:

```
// TODO: Clarify the split between lexing (tokens) and parsing (grammar).
digit     ::= [0-9]
hex_digit ::= [0-9a-fA-F]
letter    ::= [a-zA-Z]
id-punct  ::= [$._-]

integer-literal ::= decimal-literal | hexadecimal-literal
decimal-literal ::= digit+
hexadecimal-literal ::= `0x` hex_digit+
float-literal ::= [-+]?[0-9]+[.][0-9]*([eE][-+]?[0-9]+)?
string-literal  ::= `"` [^"\n\f\v\r]* `"`   TODO: define escaping rules
```

Not listed here, but MLIR does support comments. They use standard BCPL syntax,
starting with a `//` and going until the end of the line.


### Top level Productions

```
// Top level production
toplevel := (operation | attribute-alias-def | type-alias-def)*
```

The production `toplevel` is the top level production that is parsed by any parsing
consuming the MLIR syntax. [Operations](#operations),
[Attribute alises](#attribute-value-aliases), and [Type aliases](#type-aliases)
can be declared on the toplevel.

### Identifiers and keywords

Syntax:

```
// Identifiers
bare-id ::= (letter|[_]) (letter|digit|[_$.])*
bare-id-list ::= bare-id (`,` bare-id)*
value-id ::= `%` suffix-id
alias-name :: = bare-id
suffix-id ::= (digit+ | ((letter|id-punct) (letter|id-punct|digit)*))

symbol-ref-id ::= `@` (suffix-id | string-literal) (`::` symbol-ref-id)?
value-id-list ::= value-id (`,` value-id)*

// Uses of value, e.g. in an operand list to an operation.
value-use ::= value-id
value-use-list ::= value-use (`,` value-use)*
```

Identifiers name entities such as values, types and functions, and are chosen by
the writer of MLIR code. Identifiers may be descriptive (e.g. `%batch_size`,
`@matmul`), or may be non-descriptive when they are auto-generated (e.g. `%23`,
`@func42`). Identifier names for values may be used in an MLIR text file but are
not persisted as part of the IR - the printer will give them anonymous names
like `%42`.

MLIR guarantees identifiers never collide with keywords by prefixing identifiers
with a sigil (e.g. `%`, `#`, `@`, `^`, `!`). In certain unambiguous contexts
(e.g. affine expressions), identifiers are not prefixed, for brevity. New
keywords may be added to future versions of MLIR without danger of collision
with existing identifiers.

Value identifiers are only [in scope](#value-scoping) for the (nested) region in
which they are defined and cannot be accessed or referenced outside of that
region. Argument identifiers in mapping functions are in scope for the mapping
body. Particular operations may further limit which identifiers are in scope in
their regions. For instance, the scope of values in a region with
[SSA control flow semantics](#control-flow-and-ssacfg-regions) is constrained
according to the standard definition of
[SSA dominance](https://en.wikipedia.org/wiki/Dominator_\(graph_theory\)).
Another example is the [IsolatedFromAbove trait](Traits.md/#isolatedfromabove),
which restricts directly accessing values defined in containing regions.

Function identifiers and mapping identifiers are associated with
[Symbols](SymbolsAndSymbolTables.md) and have scoping rules dependent on symbol
attributes.

## Dialects

Dialects are the mechanism by which to engage with and extend the MLIR
ecosystem. They allow for defining new [operations](#operations), as well as
[attributes](#attributes) and [types](#type-system). Each dialect is given a
unique `namespace` that is prefixed to each defined attribute/operation/type.
For example, the [Affine dialect](Dialects/Affine.md) defines the namespace:
`affine`.

MLIR allows for multiple dialects, even those outside of the main tree, to
co-exist together within one module. Dialects are produced and consumed by
certain passes. MLIR provides a [framework](DialectConversion.md) to convert
between, and within, different dialects.

A few of the dialects supported by MLIR:

*   [Affine dialect](Dialects/Affine.md)
*   [Func dialect](Dialects/Func.md)
*   [GPU dialect](Dialects/GPU.md)
*   [LLVM dialect](Dialects/LLVM.md)
*   [SPIR-V dialect](Dialects/SPIR-V.md)
*   [Vector dialect](Dialects/Vector.md)

### Target specific operations

Dialects provide a modular way in which targets can expose target-specific
operations directly through to MLIR. As an example, some targets go through
LLVM. LLVM has a rich set of intrinsics for certain target-independent
operations (e.g. addition with overflow check) as well as providing access to
target-specific operations for the targets it supports (e.g. vector permutation
operations). LLVM intrinsics in MLIR are represented via operations that start
with an "llvm." name.

Example:

```mlir
// LLVM: %x = call {i16, i1} @llvm.sadd.with.overflow.i16(i16 %a, i16 %b)
%x:2 = "llvm.sadd.with.overflow.i16"(%a, %b) : (i16, i16) -> (i16, i1)
```

These operations only work when targeting LLVM as a backend (e.g. for CPUs and
GPUs), and are required to align with the LLVM definition of these intrinsics.

## Operations

Syntax:

```
operation            ::= op-result-list? (generic-operation | custom-operation)
                         trailing-location?
generic-operation    ::= string-literal `(` value-use-list? `)`  successor-list?
                         region-list? dictionary-attribute? `:` function-type
custom-operation     ::= bare-id custom-operation-format
op-result-list       ::= op-result (`,` op-result)* `=`
op-result            ::= value-id (`:` integer-literal)
successor-list       ::= `[` successor (`,` successor)* `]`
successor            ::= caret-id (`:` block-arg-list)?
region-list          ::= `(` region (`,` region)* `)`
dictionary-attribute ::= `{` (attribute-entry (`,` attribute-entry)*)? `}`
trailing-location    ::= (`loc` `(` location `)`)?
```

MLIR introduces a uniform concept called *operations* to enable describing many
different levels of abstractions and computations. Operations in MLIR are fully
extensible (there is no fixed list of operations) and have application-specific
semantics. For example, MLIR supports
[target-independent operations](Dialects/MemRef.md),
[affine operations](Dialects/Affine.md), and
[target-specific machine operations](#target-specific-operations).

The internal representation of an operation is simple: an operation is
identified by a unique string (e.g. `dim`, `tf.Conv2d`, `x86.repmovsb`,
`ppc.eieio`, etc), can return zero or more results, take zero or more operands,
has a dictionary of [attributes](#attributes), has zero or more successors, and
zero or more enclosed [regions](#regions). The generic printing form includes
all these elements literally, with a function type to indicate the types of the
results and operands.

Example:

```mlir
// An operation that produces two results.
// The results of %result can be accessed via the <name> `#` <opNo> syntax.
%result:2 = "foo_div"() : () -> (f32, i32)

// Pretty form that defines a unique name for each result.
%foo, %bar = "foo_div"() : () -> (f32, i32)

// Invoke a TensorFlow function called tf.scramble with two inputs
// and an attribute "fruit".
%2 = "tf.scramble"(%result#0, %bar) {fruit = "banana"} : (f32, i32) -> f32
```

In addition to the basic syntax above, dialects may register known operations.
This allows those dialects to support *custom assembly form* for parsing and
printing operations. In the operation sets listed below, we show both forms.

### Builtin Operations

The [builtin dialect](Dialects/Builtin.md) defines a select few operations that
are widely applicable by MLIR dialects, such as a universal conversion cast
operation that simplifies inter/intra dialect conversion. This dialect also
defines a top-level `module` operation, that represents a useful IR container.

## Blocks

Syntax:

```
block           ::= block-label operation+
block-label     ::= block-id block-arg-list? `:`
block-id        ::= caret-id
caret-id        ::= `^` suffix-id
value-id-and-type ::= value-id `:` type

// Non-empty list of names and types.
value-id-and-type-list ::= value-id-and-type (`,` value-id-and-type)*

block-arg-list ::= `(` value-id-and-type-list? `)`
```

A *Block* is a list of operations. In
[SSACFG regions](#control-flow-and-ssacfg-regions), each block represents a
compiler [basic block](https://en.wikipedia.org/wiki/Basic_block) where
instructions inside the block are executed in order and terminator operations
implement control flow branches between basic blocks.

The last operation in a block must be a
[terminator operation](#control-flow-and-ssacfg-regions). A region with a single
block may opt out of this requirement by attaching the `NoTerminator` on the
enclosing op. The top-level `ModuleOp` is an example of such an operation which
defines this trait and whose block body does not have a terminator.

Blocks in MLIR take a list of block arguments, notated in a function-like way.
Block arguments are bound to values specified by the semantics of individual
operations. Block arguments of the entry block of a region are also arguments to
the region and the values bound to these arguments are determined by the
semantics of the containing operation. Block arguments of other blocks are
determined by the semantics of terminator operations, e.g. Branches, which have
the block as a successor. In regions with
[control flow](#control-flow-and-ssacfg-regions), MLIR leverages this structure
to implicitly represent the passage of control-flow dependent values without the
complex nuances of PHI nodes in traditional SSA representations. Note that
values which are not control-flow dependent can be referenced directly and do
not need to be passed through block arguments.

Here is a simple example function showing branches, returns, and block
arguments:

```mlir
func.func @simple(i64, i1) -> i64 {
^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a
  cf.cond_br %cond, ^bb1, ^bb2

^bb1:
  cf.br ^bb3(%a: i64)    // Branch passes %a as the argument

^bb2:
  %b = arith.addi %a, %a : i64
  cf.br ^bb3(%b: i64)    // Branch passes %b as the argument

// ^bb3 receives an argument, named %c, from predecessors
// and passes it on to bb4 along with %a. %a is referenced
// directly from its defining operation and is not passed through
// an argument of ^bb3.
^bb3(%c: i64):
  cf.br ^bb4(%c, %a : i64, i64)

^bb4(%d : i64, %e : i64):
  %0 = arith.addi %d, %e : i64
  return %0 : i64   // Return is also a terminator.
}
```

**Context:** The "block argument" representation eliminates a number of special
cases from the IR compared to traditional "PHI nodes are operations" SSA IRs
(like LLVM). For example, the
[parallel copy semantics](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.524.5461&rep=rep1&type=pdf)
of SSA is immediately apparent, and function arguments are no longer a special
case: they become arguments to the entry block
[[more rationale](Rationale/Rationale.md/#block-arguments-vs-phi-nodes)]. Blocks
are also a fundamental concept that cannot be represented by operations because
values defined in an operation cannot be accessed outside the operation.

## Regions

### Definition

A region is an ordered list of MLIR [Blocks](#blocks). The semantics within a
region is not imposed by the IR. Instead, the containing operation defines the
semantics of the regions it contains. MLIR currently defines two kinds of
regions: [SSACFG regions](#control-flow-and-ssacfg-regions), which describe
control flow between blocks, and [Graph regions](#graph-regions), which do not
require control flow between block. The kinds of regions within an operation are
described using the [RegionKindInterface](Interfaces.md/#regionkindinterfaces).

Regions do not have a name or an address, only the blocks contained in a region
do. Regions must be contained within operations and have no type or attributes.
The first block in the region is a special block called the 'entry block'. The
arguments to the entry block are also the arguments of the region itself. The
entry block cannot be listed as a successor of any other block. The syntax for a
region is as follows:

```
region      ::= `{` entry-block? block* `}`
entry-block ::= operation+
```

A function body is an example of a region: it consists of a CFG of blocks and
has additional semantic restrictions that other types of regions may not have.
For example, in a function body, block terminators must either branch to a
different block, or return from a function where the types of the `return`
arguments must match the result types of the function signature. Similarly, the
function arguments must match the types and count of the region arguments. In
general, operations with regions can define these correspondences arbitrarily.

An *entry block* is a block with no label and no arguments that may occur at
the beginning of a region. It enables a common pattern of using a region to
open a new scope.


### Value Scoping

Regions provide hierarchical encapsulation of programs: it is impossible to
reference, i.e. branch to, a block which is not in the same region as the source
of the reference, i.e. a terminator operation. Similarly, regions provides a
natural scoping for value visibility: values defined in a region don't escape to
the enclosing region, if any. By default, operations inside a region can
reference values defined outside of the region whenever it would have been legal
for operands of the enclosing operation to reference those values, but this can
be restricted using traits, such as
[OpTrait::IsolatedFromAbove](Traits.md/#isolatedfromabove), or a custom
verifier.

Example:

```mlir
  "any_op"(%a) ({ // if %a is in-scope in the containing region...
     // then %a is in-scope here too.
    %new_value = "another_op"(%a) : (i64) -> (i64)
  }) : (i64) -> (i64)
```

MLIR defines a generalized 'hierarchical dominance' concept that operates across
hierarchy and defines whether a value is 'in scope' and can be used by a
particular operation. Whether a value can be used by another operation in the
same region is defined by the kind of region. A value defined in a region can be
used by an operation which has a parent in the same region, if and only if the
parent could use the value. A value defined by an argument to a region can
always be used by any operation deeply contained in the region. A value defined
in a region can never be used outside of the region.

### Control Flow and SSACFG Regions

In MLIR, control flow semantics of a region is indicated by
[RegionKind::SSACFG](Interfaces.md/#regionkindinterfaces). Informally, these
regions support semantics where operations in a region 'execute sequentially'.
Before an operation executes, its operands have well-defined values. After an
operation executes, the operands have the same values and results also have
well-defined values. After an operation executes, the next operation in the
block executes until the operation is the terminator operation at the end of a
block, in which case some other operation will execute. The determination of the
next instruction to execute is the 'passing of control flow'.

In general, when control flow is passed to an operation, MLIR does not restrict
when control flow enters or exits the regions contained in that operation.
However, when control flow enters a region, it always begins in the first block
of the region, called the *entry* block. Terminator operations ending each block
represent control flow by explicitly specifying the successor blocks of the
block. Control flow can only pass to one of the specified successor blocks as in
a `branch` operation, or back to the containing operation as in a `return`
operation. Terminator operations without successors can only pass control back
to the containing operation. Within these restrictions, the particular semantics
of terminator operations is determined by the specific dialect operations
involved. Blocks (other than the entry block) that are not listed as a successor
of a terminator operation are defined to be unreachable and can be removed
without affecting the semantics of the containing operation.

Although control flow always enters a region through the entry block, control
flow may exit a region through any block with an appropriate terminator. The
standard dialect leverages this capability to define operations with
Single-Entry-Multiple-Exit (SEME) regions, possibly flowing through different
blocks in the region and exiting through any block with a `return` operation.
This behavior is similar to that of a function body in most programming
languages. In addition, control flow may also not reach the end of a block or
region, for example if a function call does not return.

Example:

```mlir
func.func @accelerator_compute(i64, i1) -> i64 { // An SSACFG region
^bb0(%a: i64, %cond: i1): // Code dominated by ^bb0 may refer to %a
  cf.cond_br %cond, ^bb1, ^bb2

^bb1:
  // This def for %value does not dominate ^bb2
  %value = "op.convert"(%a) : (i64) -> i64
  cf.br ^bb3(%a: i64)    // Branch passes %a as the argument

^bb2:
  accelerator.launch() { // An SSACFG region
    ^bb0:
      // Region of code nested under "accelerator.launch", it can reference %a but
      // not %value.
      %new_value = "accelerator.do_something"(%a) : (i64) -> ()
  }
  // %new_value cannot be referenced outside of the region

^bb3:
  ...
}
```

#### Operations with Multiple Regions

An operation containing multiple regions also completely determines the
semantics of those regions. In particular, when control flow is passed to an
operation, it may transfer control flow to any contained region. When control
flow exits a region and is returned to the containing operation, the containing
operation may pass control flow to any region in the same operation. An
operation may also pass control flow to multiple contained regions concurrently.
An operation may also pass control flow into regions that were specified in
other operations, in particular those that defined the values or symbols the
given operation uses as in a call operation. This passage of control is
generally independent of passage of control flow through the basic blocks of the
containing region.

#### Closure

Regions allow defining an operation that creates a closure, for example by
“boxing” the body of the region into a value they produce. It remains up to the
operation to define its semantics. Note that if an operation triggers
asynchronous execution of the region, it is under the responsibility of the
operation caller to wait for the region to be executed guaranteeing that any
directly used values remain live.

### Graph Regions

In MLIR, graph-like semantics in a region is indicated by
[RegionKind::Graph](Interfaces.md/#regionkindinterfaces). Graph regions are
appropriate for concurrent semantics without control flow, or for modeling
generic directed graph data structures. Graph regions are appropriate for
representing cyclic relationships between coupled values where there is no
fundamental order to the relationships. For instance, operations in a graph
region may represent independent threads of control with values representing
streams of data. As usual in MLIR, the particular semantics of a region is
completely determined by its containing operation. Graph regions may only
contain a single basic block (the entry block).

**Rationale:** Currently graph regions are arbitrarily limited to a single basic
block, although there is no particular semantic reason for this limitation. This
limitation has been added to make it easier to stabilize the pass infrastructure
and commonly used passes for processing graph regions to properly handle
feedback loops. Multi-block regions may be allowed in the future if use cases
that require it arise.

In graph regions, MLIR operations naturally represent nodes, while each MLIR
value represents a multi-edge connecting a single source node and multiple
destination nodes. All values defined in the region as results of operations are
in scope within the region and can be accessed by any other operation in the
region. In graph regions, the order of operations within a block and the order
of blocks in a region is not semantically meaningful and non-terminator
operations may be freely reordered, for instance, by canonicalization. Other
kinds of graphs, such as graphs with multiple source nodes and multiple
destination nodes, can also be represented by representing graph edges as MLIR
operations.

Note that cycles can occur within a single block in a graph region, or between
basic blocks.

```mlir
"test.graph_region"() ({ // A Graph region
  %1 = "op1"(%1, %3) : (i32, i32) -> (i32)  // OK: %1, %3 allowed here
  %2 = "test.ssacfg_region"() ({
     %5 = "op2"(%1, %2, %3, %4) : (i32, i32, i32, i32) -> (i32) // OK: %1, %2, %3, %4 all defined in the containing region
  }) : () -> (i32)
  %3 = "op2"(%1, %4) : (i32, i32) -> (i32)  // OK: %4 allowed here
  %4 = "op3"(%1) : (i32) -> (i32)
}) : () -> ()
```

### Arguments and Results

The arguments of the first block of a region are treated as arguments of the
region. The source of these arguments is defined by the semantics of the parent
operation. They may correspond to some of the values the operation itself uses.

Regions produce a (possibly empty) list of values. The operation semantics
defines the relation between the region results and the operation results.

## Type System

Each value in MLIR has a type defined by the type system. MLIR has an open type
system (i.e. there is no fixed list of types), and types may have
application-specific semantics. MLIR dialects may define any number of types
with no restrictions on the abstractions they represent.

```
type ::= type-alias | dialect-type | builtin-type

type-list-no-parens ::=  type (`,` type)*
type-list-parens ::= `(` `)`
                   | `(` type-list-no-parens `)`

// This is a common way to refer to a value with a specified type.
ssa-use-and-type ::= ssa-use `:` type
ssa-use ::= value-use

// Non-empty list of names and types.
ssa-use-and-type-list ::= ssa-use-and-type (`,` ssa-use-and-type)*

function-type ::= (type | type-list-parens) `->` (type | type-list-parens)
```

### Type Aliases

```
type-alias-def ::= '!' alias-name '=' type
type-alias ::= '!' alias-name
```

MLIR supports defining named aliases for types. A type alias is an identifier
that can be used in the place of the type that it defines. These aliases *must*
be defined before their uses. Alias names may not contain a '.', since those
names are reserved for [dialect types](#dialect-types).

Example:

```mlir
!avx_m128 = vector<4 x f32>

// Using the original type.
"foo"(%x) : vector<4 x f32> -> ()

// Using the type alias.
"foo"(%x) : !avx_m128 -> ()
```

### Dialect Types

Similarly to operations, dialects may define custom extensions to the type
system.

```
dialect-namespace ::= bare-id

dialect-type ::= '!' (opaque-dialect-type | pretty-dialect-type)
opaque-dialect-type ::= dialect-namespace dialect-type-body
pretty-dialect-type ::= dialect-namespace '.' pretty-dialect-type-lead-ident
                                              dialect-type-body?
pretty-dialect-type-lead-ident ::= '[A-Za-z][A-Za-z0-9._]*'

dialect-type-body ::= '<' dialect-type-contents+ '>'
dialect-type-contents ::= dialect-type-body
                            | '(' dialect-type-contents+ ')'
                            | '[' dialect-type-contents+ ']'
                            | '{' dialect-type-contents+ '}'
                            | '[^\[<({\]>)}\0]+'
```

Dialect types are generally specified in an opaque form, where the contents
of the type are defined within a body wrapped with the dialect namespace
and `<>`. Consider the following examples:

```mlir
// A tensorflow string type.
!tf<string>

// A type with complex components.
!foo<something<abcd>>

// An even more complex type.
!foo<"a123^^^" + bar>
```

Dialect types that are simple enough may use a prettier format, which unwraps
part of the syntax into an equivalent, but lighter weight form:

```mlir
// A tensorflow string type.
!tf.string

// A type with complex components.
!foo.something<abcd>
```

See [here](DefiningDialects/AttributesAndTypes.md) to learn how to define dialect types.

### Builtin Types

The [builtin dialect](Dialects/Builtin.md) defines a set of types that are
directly usable by any other dialect in MLIR. These types cover a range from
primitive integer and floating-point types, function types, and more.

## Attributes

Syntax:

```
attribute-entry ::= (bare-id | string-literal) `=` attribute-value
attribute-value ::= attribute-alias | dialect-attribute | builtin-attribute
```

Attributes are the mechanism for specifying constant data on operations in
places where a variable is never allowed - e.g. the comparison predicate of a
[`cmpi` operation](Dialects/ArithOps.md#arithcmpi-mlirarithcmpiop). Each operation has an
attribute dictionary, which associates a set of attribute names to attribute
values. MLIR's builtin dialect provides a rich set of
[builtin attribute values](#builtin-attribute-values) out of the box (such as
arrays, dictionaries, strings, etc.). Additionally, dialects can define their
own [dialect attribute values](#dialect-attribute-values).

The top-level attribute dictionary attached to an operation has special
semantics. The attribute entries are considered to be of two different kinds
based on whether their dictionary key has a dialect prefix:

-   *inherent attributes* are inherent to the definition of an operation's
    semantics. The operation itself is expected to verify the consistency of
    these attributes. An example is the `predicate` attribute of the
    `arith.cmpi` op. These attributes must have names that do not start with a
    dialect prefix.

-   *discardable attributes* have semantics defined externally to the operation
    itself, but must be compatible with the operations's semantics. These
    attributes must have names that start with a dialect prefix. The dialect
    indicated by the dialect prefix is expected to verify these attributes. An
    example is the `gpu.container_module` attribute.

Note that attribute values are allowed to themselves be dictionary attributes,
but only the top-level dictionary attribute attached to the operation is subject
to the classification above.

### Attribute Value Aliases

```
attribute-alias-def ::= '#' alias-name '=' attribute-value
attribute-alias ::= '#' alias-name
```

MLIR supports defining named aliases for attribute values. An attribute alias is
an identifier that can be used in the place of the attribute that it defines.
These aliases *must* be defined before their uses. Alias names may not contain a
'.', since those names are reserved for
[dialect attributes](#dialect-attribute-values).

Example:

```mlir
#map = affine_map<(d0) -> (d0 + 10)>

// Using the original attribute.
%b = affine.apply affine_map<(d0) -> (d0 + 10)> (%a)

// Using the attribute alias.
%b = affine.apply #map(%a)
```

### Dialect Attribute Values

Similarly to operations, dialects may define custom attribute values.

```
dialect-namespace ::= bare-id

dialect-attribute ::= '#' (opaque-dialect-attribute | pretty-dialect-attribute)
opaque-dialect-attribute ::= dialect-namespace dialect-attribute-body
pretty-dialect-attribute ::= dialect-namespace '.' pretty-dialect-attribute-lead-ident
                                              dialect-attribute-body?
pretty-dialect-attribute-lead-ident ::= '[A-Za-z][A-Za-z0-9._]*'

dialect-attribute-body ::= '<' dialect-attribute-contents+ '>'
dialect-attribute-contents ::= dialect-attribute-body
                            | '(' dialect-attribute-contents+ ')'
                            | '[' dialect-attribute-contents+ ']'
                            | '{' dialect-attribute-contents+ '}'
                            | '[^\[<({\]>)}\0]+'
```

Dialect attributes are generally specified in an opaque form, where the contents
of the attribute are defined within a body wrapped with the dialect namespace
and `<>`. Consider the following examples:

```mlir
// A string attribute.
#foo<string<"">>

// A complex attribute.
#foo<"a123^^^" + bar>
```

Dialect attributes that are simple enough may use a prettier format, which unwraps
part of the syntax into an equivalent, but lighter weight form:

```mlir
// A string attribute.
#foo.string<"">
```

See [here](DefiningDialects/AttributesAndTypes.md) on how to define dialect attribute values.

### Builtin Attribute Values

The [builtin dialect](Dialects/Builtin.md) defines a set of attribute values
that are directly usable by any other dialect in MLIR. These types cover a range
from primitive integer and floating-point values, attribute dictionaries, dense
multi-dimensional arrays, and more.