5 files changed, 147 insertions, 8 deletions
diff --git a/mlir/docs/Bindings/Python.md b/mlir/docs/Bindings/Python.md
index 98ac635..6f778b0 100644
--- a/mlir/docs/Bindings/Python.md
+++ b/mlir/docs/Bindings/Python.md
@@ -1188,7 +1188,6 @@ which can be `import`ed  from the main dialect file, i.e.
 `python/mlir/dialects/<dialect-namespace>/passes.py` if it is undesirable to
 make the passes available along with the dialect.
 
-
 ### Other functionality
 
 Dialect functionality other than IR objects or passes, such as helper functions,
@@ -1201,6 +1200,82 @@ utilities to connect to the rest of Python API. The bindings can be located in a
 separate module or in the same module as attributes and types, and
 loaded along with the dialect.
 
+
+## Extending MLIR in Python
+
+The MLIR Python bindings provide support for defining custom components in Python,
+mainly including dialects, passes, and rewrite patterns.
+The following sections outline how each of these can be implemented.
+
+### Dialects
+
+Dialects can be defined through the IRDL dialect bindings in Python.
+The IRDL bindings offer a `load_dialects` function that
+converts an MLIR module containing `irdl.dialect` ops into MLIR dialects.
+For further details, see the documentation of [the IRDL dialect](../Dialects/IRDL.md).
+
+### Passes
+
+Passes can be defined as Python callables via the `PassManager.add` API.
+In such case, the callable is wrapped as an `mlir::Pass` internally and
+executed as part of the pass pipeline when `PassManager.run` is invoked.
+In the callable, the `op` parameter represents the current operation being transformed,
+while the `pass_` parameter provides access to the current `Pass` object,
+allowing actions such as `signalPassFailure()`.
+The lifetime of the callable is extended at least until the `PassManager` is destroyed.
+The following example code demonstrates how to define Python passes.
+
+```python
+def demo_pass(op, pass_):
+    # do something with the given op
+    pass
+
+pm = PassManager('any')
+pm.add(demo_pass)
+pm.add('some-cpp-defined-passes')
+...
+pm.run(some_op)
+```
+
+### Rewrite Patterns
+
+Rewrite patterns can be registered via the `add` method
+of `mlir.rewrite.RewritePatternSet` in Python.
+This method takes the operation type to be rewritten
+and a Python callable that defines the *match and rewrite* logic.
+Note that the Python callable should be defined so that
+the rewrite is applied if and only if the match succeeds,
+which corresponds to the return value being castable to `False`.
+
+The `RewritePatternSet` can be converted into
+a `FrozenRewritePatternSet` using the `freeze` method,
+which can be applied to an operation through
+the greedy pattern driver using `apply_patterns_and_fold_greedily`.
+The following example demonstrates the typical usage:
+
+```python
+def to_muli(op, rewriter):
+    with rewriter.ip:
+        new_op = arith.muli(op.lhs, op.rhs, loc=op.location)
+    rewriter.replace_op(op, new_op)
+
+patterns = RewritePatternSet()
+patterns.add(arith.AddIOp, to_muli)  # Rewrite arith.addi into arith.muli
+patterns.add(...)
+frozen = patterns.freeze()
+
+module = ...
+apply_patterns_and_fold_greedily(module, frozen)
+```
+
+The PDL dialect bindings also enable defining and generating rewrite patterns in Python.
+The `mlir.rewrite.PDLModule` class accepts a module containing `pdl.pattern` ops,
+which can be transformed into a `FrozenRewritePatternSet` using the `freeze` method.
+This frozen set can then be applied to an operation
+using the greedy rewrite pattern driver via `apply_patterns_and_fold_greedily`.
+For further information, see [the PDL dialect documentation](/docs/Dialects/PDLOps/).
+
+
 ## Free-threading (No-GIL) support
 
 Free-threading or no-GIL support refers to CPython interpreter (>=3.13) with Global Interpreter Lock made optional. For details on the topic, please check [PEP-703](https://peps.python.org/pep-0703/) and this [Python free-threading guide](https://py-free-threading.github.io/).
diff --git a/mlir/docs/Canonicalization.md b/mlir/docs/Canonicalization.md
index 686e500..2622c08 100644
--- a/mlir/docs/Canonicalization.md
+++ b/mlir/docs/Canonicalization.md
@@ -55,7 +55,7 @@ Some important things to think about w.r.t. canonicalization patterns:
 *   It is always good to eliminate operations entirely when possible, e.g. by
     folding known identities (like "x + 0 = x").
 
-*   Pattens with expensive running time (i.e. have O(n) complexity) or
+*   Patterns with expensive running time (i.e. have O(n) complexity) or
     complicated cost models don't belong to canonicalization: since the
     algorithm is executed iteratively until fixed-point we want patterns that
     execute quickly (in particular their matching phase).
diff --git a/mlir/docs/Dialects/Shard.md b/mlir/docs/Dialects/Shard.md
index eb6ff61..573e888 100644
--- a/mlir/docs/Dialects/Shard.md
+++ b/mlir/docs/Dialects/Shard.md
@@ -27,9 +27,9 @@ the tensor is sharded - not specified manually.
 
 ### Device Groups
 
-Each collective operation runs within a group of devices. You define groups
-using the `grid` and `grid_axes` attributes, which describe how to slice the
-full device grid into smaller groups.
+Collective operations run within groups of devices, which are defined
+using the `grid` and `grid_axes` attributes. These describe
+how the full device grid is sliced into smaller groups.
 
 Devices that have the same coordinates *outside* the listed `grid_axes` belong
 to the same group.
diff --git a/mlir/docs/Dialects/Vector.md b/mlir/docs/Dialects/Vector.md
index 6c8949d..839dc75 100644
--- a/mlir/docs/Dialects/Vector.md
+++ b/mlir/docs/Dialects/Vector.md
@@ -125,7 +125,7 @@ Some existing Arith and Vector Dialect on `n-D` `vector` types comprise:
 // Produces a vector<3x7x8xf32>
 %b = arith.mulf %0, %1 : vector<3x7x8xf32>
 // Produces a vector<3x7x8xf32>
-%c = vector.splat %1 : vector<3x7x8xf32>
+%c = vector.broadcast %1 : f32 to vector<3x7x8xf32>
 
 %d = vector.extract %0[1]: vector<7x8xf32> from vector<3x7x8xf32>
 %e = vector.extract %0[1, 5]: vector<8xf32> from vector<3x7x8xf32>
@@ -176,8 +176,6 @@ infrastructure can apply iteratively.
 ### Virtual Vector to Hardware Vector Lowering
 
 For now, `VV -> HWV` are specified in C++ (see for instance the
-[SplatOpLowering for n-D vectors](https://github.com/tensorflow/mlir/commit/0a0c4867c6a6fcb0a2f17ef26a791c1d551fe33d)
-or the
 [VectorOuterProductOp lowering](https://github.com/tensorflow/mlir/commit/957b1ca9680b4aacabb3a480fbc4ebd2506334b8)).
 
 Simple
diff --git a/mlir/docs/Rationale/RationaleLinalgDialect.md b/mlir/docs/Rationale/RationaleLinalgDialect.md
index 8975b0a..fbe2217 100644
--- a/mlir/docs/Rationale/RationaleLinalgDialect.md
+++ b/mlir/docs/Rationale/RationaleLinalgDialect.md
@@ -506,6 +506,72 @@ potential by introducing lower-level IR ops and *smaller* Linalg ops.
 This gradually reduces the potential, all the way to Loops + VectorOps
 and LLVMIR.
 
+### Interchangeability of Forms<a name="forms"></a>
+
+#### The Linalg Forms
+
+The core Linalg operation set has four forms:
+* **Generic:** Represented by `linalg.generic` and can encode all perfectly-nested
+loop operations.
+* **Category:** For example, `linalg.contract` and `linalg.elementwise`, that encode
+higher-level semantics of a `linalg.generic` while still representing multiple _named_
+operations via attributes and syntax. In the future, other category operations are
+planned (e.g.: `linalg.convolution` and `linalg.pooling`).
+* **Named:** For example, `linalg.matmul`, `linalg.add`, etc. All _named_ forms that
+can be converted to either a single _category_ or _generic_ forms, ie. are _perfectly nested_.
+* **Composite:** For example `linalg.softmax` and the `winograd` variations. These
+operations are not perfectly nested, and are converted to a list of other operations
+(of various dialects).
+
+The forms correlate in the following manner:
+```
++ generic
+ \__ + category
+      \__ + named
++ composite
+```
+
+The `category` and `named` forms are derived from `linalg.generic` and are *equivalent*.
+It should always be possible to convert a `named` operation into a `category` and that
+into a `generic` and back to `named`. However, it may not be possible to convert a
+`generic` into a `named` if there is no such `named` form.
+
+`Composite` operations cannot be converted to the other three classes and forms a
+sub-set on its own. But they can use other Linalg forms when expanding. There can be
+a pattern-matching transform to detect a graph of operations and convert into a
+`composite` operation.
+
+The various forms in the Linalg dialect are meant to facilitate
+pattern matching (single operations or DAGs) and to be able to consider
+different forms as *canonical* for different transforms.
+
+Linalg's various forms also carry information, and that
+information should be preserved as much as possible during the progressive
+lowering. A `matmul` operation is a special case of a `contract` operation,
+which in turn is a special case of a `generic` operation. Transformations on
+Linalg operations (in any form) should avoid breaking down into
+loops + arithmetic if they can still be represented as a Linalg operation,
+preferably in their original form.
+
+#### Canonical Forms<a name="canonical_forms"></a>
+
+With multiple (often exchangeable) forms, and with transformation simplicity
+in mind, compilers should aim for reducing matching and replacing complexity
+as much as possible. When matching a single operation with a complex pattern,
+having all the information in a `generic` Op is useful to iteratively match
+different patterns in turn. However, when assembling a DAG of operations to
+form a pattern, it's much simpler to match against named operations (like
+`max` + `div` + `reduce` + `broadcast`) than their generic counterparts.
+
+This is where the interchangeability of forms comes in handy. Linalg has the
+ability to specialize and generalize in order to convert the IR to a form that
+is easier for a particular type of transform. With forms being semantically
+equivalent, one can convert back-and-forth throughout the various transforms
+to match the needs of each transform. For that particular transform, such
+form can be considered _canonical_ and therefore "expected" for the pattern
+to _match_. This reduces complexity of pattern matchers and simplifies compiler
+pipelines.
+
 ### Composable and Declarative Transformations<a name="declarative_transformations"></a>
 Complex and impactful transformations need not be hard to manipulate, write or
 maintain. Mixing XLA-style high-level op semantics knowledge with generic