diff options
Diffstat (limited to 'mlir/docs/Dialects')
-rw-r--r-- | mlir/docs/Dialects/Mesh.md | 74 | ||||
-rw-r--r-- | mlir/docs/Dialects/Shard.md | 92 | ||||
-rw-r--r-- | mlir/docs/Dialects/Transform.md | 12 | ||||
-rw-r--r-- | mlir/docs/Dialects/Vector.md | 2 | ||||
-rw-r--r-- | mlir/docs/Dialects/emitc.md | 2 |
5 files changed, 107 insertions, 75 deletions
diff --git a/mlir/docs/Dialects/Mesh.md b/mlir/docs/Dialects/Mesh.md deleted file mode 100644 index 5eb6569..0000000 --- a/mlir/docs/Dialects/Mesh.md +++ /dev/null @@ -1,74 +0,0 @@ -# 'mesh' Dialect - -The `mesh` dialect contains a set of attributes, operations and interfaces that -are useful for representing sharding and communication on a device mesh -cluster. - -[TOC] - -## Collective Communication Operations -There are a number of operations in the Mesh dialect to facilitate -communication between devices in a mesh. -It is assumed that the user is familiar with collective operations. -[Wikipedia](https://en.wikipedia.org/wiki/Collective_operation) has a good -explanation. -The main addition is that the collectives in this dialect have mesh -semantics. - -### Device groups -The operation attributes `mesh` and `mesh_axes` specifies a list of device mesh -axes that partition the devices into disjoint groups. -The collective operation is performed between devices in the same group. -Devices that have the same coordinates outside of axes `mesh_axes` are in the -same group. -A group is described by its multi-index along the axes outside of `mesh_axes`. -For example if we have a device mesh of size `2x3x4x5` and the partition mesh -axes list is `[0, 1]` then devices are partitioned into the groups -`{ { (i, j, k, m) | 0<=i<2, 0<=j<3 } | 0<=k<4, 0<=m<5 }`. -The device groups would be `{ (k, m) | 0<=k<4, 0<=m<5 }`. -Devices (1, 0, 2, 3) and (1, 1, 2, 3) will be in the same group. -Device (1, 0, 2, 4) will be in another group. -Some collective operations like all-to-all and all-gather care about the -order of devices. -The order of device in a device group is induced by the order of axes in -`mesh_axes`. -The axes are ordered from outer to inner. -If we have an axis list `[3, 1]` then device `(i, 1, k, 0)` will precede -both devices `(i, 0, k, 1)` and `(i, 2, k, 0)`. - -### In-group Device -Some operations like `broadcast`, `scatter` and `send` specify devices in each -device-group. -These devices are represented with their multi-index over the mesh axes that -are not constant within a device group. -These are the axes specified by `mesh_axes` attribute. - -For Example on a 3D mesh an operation with `mesh_axes = [0, 2]` would specify -an in-group device with `(i, j)`. Then for each group with index `g` on the -second axis, the in-group device would be `(i, g, j)`. -### Purity -Collectives that involve the whole device group to perform a single operation -are pure. The exceptions are `send` and `recv`. - -There is an assumption that the execution is SPMD. -Not only that each process runs the same program, but that at the point of -execution of a collective operation, all processes are in a coherent state. -All compiler transformations must be consistent. -Collective operations in the IR that may correspond to the same runtime -collective operation must be transformed in a consistent manner. -For example if a collective operation is optimized out, than it must also -not appear in any path of execution on any process. - -Having the operations as `Pure` implies that if an interpreter is to execute -the IR containing the `mesh` collectives, all processes would execute the same -line when they reach a pure collective operation. -This requirement stems from the need to be compatible with general optimization -passes like dead code and common sub-expression elimination. - -## Operations - -[include "Dialects/MeshOps.md"] - -## Attributes - -[include "Dialects/MeshAttrs.md"] diff --git a/mlir/docs/Dialects/Shard.md b/mlir/docs/Dialects/Shard.md new file mode 100644 index 0000000..eb6ff61 --- /dev/null +++ b/mlir/docs/Dialects/Shard.md @@ -0,0 +1,92 @@ +# 'shard' Dialect + +The 'shard' dialect defines a set of attributes, operations, and interfaces for +working with tensor sharding and device communication. + +It’s inspired by [GSPMD](*General and Scalable Parallelization for ML Computation Graphs*). + +Originally, the dialect was called `mesh`, but it was renamed to better reflect +what it actually does. + +[TOC] + +## Collective Communication Operations + +The 'shard' dialect includes several collective operations that help coordinate +communication between devices arranged in a grid. + +If you’re not already familiar with collective operations, [this Wikipedia +article](https://en.wikipedia.org/wiki/Collective_operation) is a good starting +point. + +Unlike traditional collectives that are defined in terms of message-passing +between explicit buffers on each process, the collectives in this dialect work +at a higher level. They’re defined in terms of how data moves across the +dimensions of a tensor, and the participating processes are inferred from how +the tensor is sharded - not specified manually. + +### Device Groups + +Each collective operation runs within a group of devices. You define groups +using the `grid` and `grid_axes` attributes, which describe how to slice the +full device grid into smaller groups. + +Devices that have the same coordinates *outside* the listed `grid_axes` belong +to the same group. + +Example: Say your device grid is shaped `2×3×4×5`, and you set +`grid_axes = [0, 1]`. This splits the grid into groups by fixing axes 2 and 3. You’d get groups like: + +``` +{ { (i, j, k, m) | 0 ≤ i < 2, 0 ≤ j < 3 } | 0 ≤ k < 4, 0 ≤ m < 5 } +``` + +So the groups are identified by the coordinates `(k, m)`, and devices like +`(1, 0, 2, 3)` and `(1, 1, 2, 3)` are in the same group. But `(1, 0, 2, 4)` +is in a different group. + +For some collectives (like `all-to-all`), the order of devices in the group +matters. The device order is based on the order of axes in `grid_axes`, from +outermost to innermost. + +Example: If `grid_axes = [3, 1]`, then device `(i, 1, k, 0)` comes before +`(i, 0, k, 1)` and `(i, 2, k, 0)`. + +### In-group Devices + +Some operations (like `broadcast`, `scatter`, and `send`) refer to a specific +device within each group. These in-group devices are identified using their +coordinates over the axes listed in `grid_axes`. + +Example: In a 3D grid with `grid_axes = [0, 2]`, an in-group device is specified +as `(i, j)`. If a group is fixed at coordinate `g` on axis 1, then the full +device index would be `(i, g, j)`. + +### Purity and Execution Model + +Collective operations involve all devices in a group (e.g. `all-gather`, +`all-to-all`) and are considered pure. Operations like `send` and `recv` are not +collective and are not pure. + +The execution model assumes SPMD (Single Program, Multiple Data): + +* Every process runs the same program. +* At any collective operation, all processes are in sync. + +This means compiler optimizations must treat collective ops carefully. For +example, if a collective is removed during optimization, it must be removed from +*every* path and *every* process that would have participated - otherwise, you’ll +get undefined behavior at runtime. + +Marking these ops as pure also helps with standard compiler passes like dead +code elimination and common subexpression elimination. It ensures that when the +program is executed, all devices hit the same line of code at the same time +during collectives and so avoid dead-locks. + +## Operations + +[include "Dialects/ShardOps.md"] + +## Attributes + +[include "Dialects/ShardAttrs.md"] diff --git a/mlir/docs/Dialects/Transform.md b/mlir/docs/Dialects/Transform.md index 5f79116..7164cb7 100644 --- a/mlir/docs/Dialects/Transform.md +++ b/mlir/docs/Dialects/Transform.md @@ -415,10 +415,22 @@ ops rather than having the methods directly act on the payload IR. [include "Dialects/TransformOps.md"] +## Tuning Extension Operaiton + +[include "Dialects/TuneExtensionOps.md"] + ## Affine Transform Operations [include "Dialects/AffineLoopTransformOps.md"] +## ARM Neon Transform Operations + +[include "Dialects/ArmNeonVectorTransformOps.md"] + +## ARM SVE Transform Operations + +[include "Dialects/ArmSVEVectorTransformOps.md"] + ## Bufferization Transform Operations [include "Dialects/BufferizationTransformOps.md"] diff --git a/mlir/docs/Dialects/Vector.md b/mlir/docs/Dialects/Vector.md index ebeb0a2..6c8949d 100644 --- a/mlir/docs/Dialects/Vector.md +++ b/mlir/docs/Dialects/Vector.md @@ -294,7 +294,7 @@ LLVM instructions are prefixed by the `llvm.` dialect prefix (e.g. `llvm.insertvalue`). Such ops operate exclusively on 1-D vectors and aggregates following the [LLVM LangRef](https://llvm.org/docs/LangRef.html). MLIR operations are prefixed by the `vector.` dialect prefix (e.g. -`vector.insertelement`). Such ops operate exclusively on MLIR `n-D` `vector` +`vector.insert`). Such ops operate exclusively on MLIR `n-D` `vector` types. ### Alternatives For Lowering an n-D Vector Type to LLVM diff --git a/mlir/docs/Dialects/emitc.md b/mlir/docs/Dialects/emitc.md index e2288f5..6d09e93 100644 --- a/mlir/docs/Dialects/emitc.md +++ b/mlir/docs/Dialects/emitc.md @@ -18,6 +18,8 @@ The following convention is followed: GCC or Clang. * If `emitc.array` with a dimension of size zero is used, then the code requires [a GCC extension](https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html). +* If `aligned_alloc` is passed to an `emitc.call_opaque` operation, then C++17 + or C11 is required. * Else the generated code is compatible with C99. These restrictions are neither inherent to the EmitC dialect itself nor to the |