5 files changed, 107 insertions, 75 deletions
diff --git a/mlir/docs/Dialects/Mesh.md b/mlir/docs/Dialects/Mesh.md
deleted file mode 100644
index 5eb6569..0000000
--- a/mlir/docs/Dialects/Mesh.md
+++ /dev/null
@@ -1,74 +0,0 @@
-# 'mesh' Dialect
-
-The `mesh` dialect contains a set of attributes, operations and interfaces that
-are useful for representing sharding and communication on a device mesh
-cluster.
-
-[TOC]
-
-## Collective Communication Operations
-There are a number of operations in the Mesh dialect to facilitate
-communication between devices in a mesh.
-It is assumed that the user is familiar with collective operations.
-[Wikipedia](https://en.wikipedia.org/wiki/Collective_operation) has a good
-explanation.
-The main addition is that the collectives in this dialect have mesh
-semantics.
-
-### Device groups
-The operation attributes `mesh` and `mesh_axes` specifies a list of device mesh
-axes that partition the devices into disjoint groups.
-The collective operation is performed between devices in the same group.
-Devices that have the same coordinates outside of axes `mesh_axes` are in the
-same group.
-A group is described by its multi-index along the axes outside of `mesh_axes`.
-For example if we have a device mesh of size `2x3x4x5` and the partition mesh
-axes list is `[0, 1]` then devices are partitioned into the groups
-`{ { (i, j, k, m) | 0<=i<2, 0<=j<3 } | 0<=k<4, 0<=m<5 }`.
-The device groups would be `{ (k, m) | 0<=k<4, 0<=m<5 }`.
-Devices (1, 0, 2, 3) and (1, 1, 2, 3) will be in the same group.
-Device (1, 0, 2, 4) will be in another group.
-Some collective operations like all-to-all and all-gather care about the
-order of devices.
-The order of device in a device group is induced by the order of axes in
-`mesh_axes`.
-The axes are ordered from outer to inner.
-If we have an axis list `[3, 1]` then device `(i, 1, k, 0)` will precede
-both devices `(i, 0, k, 1)` and `(i, 2, k, 0)`.
-
-### In-group Device
-Some operations like `broadcast`, `scatter` and `send` specify devices in each
-device-group.
-These devices are represented with their multi-index over the mesh axes that
-are not constant within a device group.
-These are the axes specified by `mesh_axes` attribute.
-
-For Example on a 3D mesh an operation with `mesh_axes = [0, 2]` would specify
-an in-group device with `(i, j)`. Then for each group with index `g` on the
-second axis, the in-group device would be `(i, g, j)`.
-### Purity
-Collectives that involve the whole device group to perform a single operation
-are pure. The exceptions are `send` and `recv`.
-
-There is an assumption that the execution is SPMD.
-Not only that each process runs the same program, but that at the point of
-execution of a collective operation, all processes are in a coherent state.
-All compiler transformations must be consistent.
-Collective operations in the IR that may correspond to the same runtime
-collective operation must be transformed in a consistent manner.
-For example if a collective operation is optimized out, than it must also
-not appear in any path of execution on any process.
-
-Having the operations as `Pure` implies that if an interpreter is to execute
-the IR containing the `mesh` collectives, all processes would execute the same
-line when they reach a pure collective operation.
-This requirement stems from the need to be compatible with general optimization
-passes like dead code and common sub-expression elimination.
-
-## Operations
-
-[include "Dialects/MeshOps.md"]
-
-## Attributes
-
-[include "Dialects/MeshAttrs.md"]
diff --git a/mlir/docs/Dialects/Shard.md b/mlir/docs/Dialects/Shard.md
new file mode 100644
index 0000000..eb6ff61
--- /dev/null
+++ b/mlir/docs/Dialects/Shard.md
@@ -0,0 +1,92 @@
+# 'shard' Dialect
+
+The 'shard' dialect defines a set of attributes, operations, and interfaces for
+working with tensor sharding and device communication.
+
+It’s inspired by [GSPMD](*General and Scalable Parallelization for ML Computation Graphs*).
+
+Originally, the dialect was called `mesh`, but it was renamed to better reflect
+what it actually does.
+
+[TOC]
+
+## Collective Communication Operations
+
+The 'shard' dialect includes several collective operations that help coordinate
+communication between devices arranged in a grid.
+
+If you’re not already familiar with collective operations, [this Wikipedia
+article](https://en.wikipedia.org/wiki/Collective_operation) is a good starting
+point.
+
+Unlike traditional collectives that are defined in terms of message-passing
+between explicit buffers on each process, the collectives in this dialect work
+at a higher level. They’re defined in terms of how data moves across the
+dimensions of a tensor, and the participating processes are inferred from how
+the tensor is sharded - not specified manually.
+
+### Device Groups
+
+Each collective operation runs within a group of devices. You define groups
+using the `grid` and `grid_axes` attributes, which describe how to slice the
+full device grid into smaller groups.
+
+Devices that have the same coordinates *outside* the listed `grid_axes` belong
+to the same group.
+
+Example: Say your device grid is shaped `2×3×4×5`, and you set
+`grid_axes = [0, 1]`. This splits the grid into groups by fixing axes 2 and 3. You’d get groups like:
+
+```
+{ { (i, j, k, m) | 0 ≤ i < 2, 0 ≤ j < 3 } | 0 ≤ k < 4, 0 ≤ m < 5 }
+```
+
+So the groups are identified by the coordinates `(k, m)`, and devices like
+`(1, 0, 2, 3)` and `(1, 1, 2, 3)` are in the same group. But `(1, 0, 2, 4)`
+is in a different group.
+
+For some collectives (like `all-to-all`), the order of devices in the group
+matters. The device order is based on the order of axes in `grid_axes`, from
+outermost to innermost.
+
+Example: If `grid_axes = [3, 1]`, then device `(i, 1, k, 0)` comes before
+`(i, 0, k, 1)` and `(i, 2, k, 0)`.
+
+### In-group Devices
+
+Some operations (like `broadcast`, `scatter`, and `send`) refer to a specific
+device within each group. These in-group devices are identified using their
+coordinates over the axes listed in `grid_axes`.
+
+Example: In a 3D grid with `grid_axes = [0, 2]`, an in-group device is specified
+as `(i, j)`. If a group is fixed at coordinate `g` on axis 1, then the full
+device index would be `(i, g, j)`.
+
+### Purity and Execution Model
+
+Collective operations involve all devices in a group (e.g. `all-gather`,
+`all-to-all`) and are considered pure. Operations like `send` and `recv` are not
+collective and are not pure.
+
+The execution model assumes SPMD (Single Program, Multiple Data):
+
+* Every process runs the same program.
+* At any collective operation, all processes are in sync.
+
+This means compiler optimizations must treat collective ops carefully. For
+example, if a collective is removed during optimization, it must be removed from
+*every* path and *every* process that would have participated - otherwise, you’ll
+get undefined behavior at runtime.
+
+Marking these ops as pure also helps with standard compiler passes like dead
+code elimination and common subexpression elimination. It ensures that when the
+program is executed, all devices hit the same line of code at the same time
+during collectives and so avoid dead-locks.
+
+## Operations
+
+[include "Dialects/ShardOps.md"]
+
+## Attributes
+
+[include "Dialects/ShardAttrs.md"]
diff --git a/mlir/docs/Dialects/Transform.md b/mlir/docs/Dialects/Transform.md
index 5f79116..7164cb7 100644
--- a/mlir/docs/Dialects/Transform.md
+++ b/mlir/docs/Dialects/Transform.md
@@ -415,10 +415,22 @@ ops rather than having the methods directly act on the payload IR.
 
 [include "Dialects/TransformOps.md"]
 
+## Tuning Extension Operaiton
+
+[include "Dialects/TuneExtensionOps.md"]
+
 ## Affine Transform Operations
 
 [include "Dialects/AffineLoopTransformOps.md"]
 
+## ARM Neon Transform Operations
+
+[include "Dialects/ArmNeonVectorTransformOps.md"]
+
+## ARM SVE Transform Operations
+
+[include "Dialects/ArmSVEVectorTransformOps.md"]
+
 ## Bufferization Transform Operations
 
 [include "Dialects/BufferizationTransformOps.md"]
diff --git a/mlir/docs/Dialects/Vector.md b/mlir/docs/Dialects/Vector.md
index ebeb0a2..6c8949d 100644
--- a/mlir/docs/Dialects/Vector.md
+++ b/mlir/docs/Dialects/Vector.md
@@ -294,7 +294,7 @@ LLVM instructions are prefixed by the `llvm.` dialect prefix (e.g.
 `llvm.insertvalue`). Such ops operate exclusively on 1-D vectors and aggregates
 following the [LLVM LangRef](https://llvm.org/docs/LangRef.html). MLIR
 operations are prefixed by the `vector.` dialect prefix (e.g.
-`vector.insertelement`). Such ops operate exclusively on MLIR `n-D` `vector`
+`vector.insert`). Such ops operate exclusively on MLIR `n-D` `vector`
 types.
 
 ### Alternatives For Lowering an n-D Vector Type to LLVM
diff --git a/mlir/docs/Dialects/emitc.md b/mlir/docs/Dialects/emitc.md
index e2288f5..6d09e93 100644
--- a/mlir/docs/Dialects/emitc.md
+++ b/mlir/docs/Dialects/emitc.md
@@ -18,6 +18,8 @@ The following convention is followed:
     GCC or Clang.
 *   If `emitc.array` with a dimension of size zero is used, then the code
     requires [a GCC extension](https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html).
+*   If `aligned_alloc` is passed to an `emitc.call_opaque` operation, then C++17 
+    or C11 is required.
 *   Else the generated code is compatible with C99.
 
 These restrictions are neither inherent to the EmitC dialect itself nor to the