1 files changed, 145 insertions, 38 deletions
diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst
index fc590a6..f51da06 100644
--- a/llvm/docs/CommandGuide/llvm-ir2vec.rst
+++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst
@@ -1,5 +1,5 @@
-llvm-ir2vec - IR2Vec Embedding Generation Tool
-==============================================
+llvm-ir2vec - IR2Vec and MIR2Vec Embedding Generation Tool
+===========================================================
 
 .. program:: llvm-ir2vec
 
@@ -11,9 +11,9 @@ SYNOPSIS
 DESCRIPTION
 -----------
 
-:program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It
-generates IR2Vec embeddings for LLVM IR and supports triplet generation 
-for vocabulary training. 
+:program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec and MIR2Vec.
+It generates embeddings for both LLVM IR and Machine IR (MIR) and supports 
+triplet generation for vocabulary training. 
 
 The tool provides three main subcommands:
 
@@ -23,23 +23,33 @@ The tool provides three main subcommands:
 2. **entities**: Generates entity mapping files (entity2id.txt) for vocabulary 
    training.
 
-3. **embeddings**: Generates IR2Vec embeddings using a trained vocabulary
+3. **embeddings**: Generates IR2Vec or MIR2Vec embeddings using a trained vocabulary
    at different granularity levels (instruction, basic block, or function).
 
+The tool supports two operation modes:
+
+* **LLVM IR mode** (``--mode=llvm``): Process LLVM IR bitcode files and generate
+  IR2Vec embeddings
+* **Machine IR mode** (``--mode=mir``): Process Machine IR (.mir) files and generate
+  MIR2Vec embeddings
+
 The tool is designed to facilitate machine learning applications that work with
-LLVM IR by converting the IR into numerical representations that can be used by
-ML models. The `triplets` subcommand generates numeric IDs directly instead of string 
-triplets, streamlining the training data preparation workflow.
+LLVM IR or Machine IR by converting them into numerical representations that can 
+be used by ML models. The `triplets` subcommand generates numeric IDs directly 
+instead of string triplets, streamlining the training data preparation workflow.
 
 .. note::
 
-   For information about using IR2Vec programmatically within LLVM passes and 
-   the C++ API, see the `IR2Vec Embeddings <https://llvm.org/docs/MLGO.html#ir2vec-embeddings>`_ 
+   For information about using IR2Vec and MIR2Vec programmatically within LLVM 
+   passes and the C++ API, see the `IR2Vec Embeddings <https://llvm.org/docs/MLGO.html#ir2vec-embeddings>`_ 
    section in the MLGO documentation.
 
 OPERATION MODES
 ---------------
 
+The tool operates in two modes: **LLVM IR mode** and **Machine IR mode**. The mode
+is selected using the ``--mode`` option (default: ``llvm``).
+
 Triplet Generation and Entity Mapping Modes are used for preparing
 vocabulary and training data for knowledge graph embeddings. The Embedding Mode
 is used for generating embeddings from LLVM IR using a pre-trained vocabulary.
@@ -58,49 +68,82 @@ these two modes are used to generate the triplets and entity mappings.
 Triplet Generation
 ~~~~~~~~~~~~~~~~~~
 
-With the `triplets` subcommand, :program:`llvm-ir2vec` analyzes LLVM IR and extracts
-numeric triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets
+With the `triplets` subcommand, :program:`llvm-ir2vec` analyzes LLVM IR or Machine IR
+and extracts numeric triplets consisting of opcode IDs and operand IDs. These triplets
 are generated in the standard format used for knowledge graph embedding training.
-The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping
-infrastructure, eliminating the need for string-to-ID preprocessing.
+The tool outputs numeric IDs directly using the vocabulary mapping infrastructure,
+eliminating the need for string-to-ID preprocessing.
+
+Usage for LLVM IR:
+
+.. code-block:: bash
 
-Usage:
+   llvm-ir2vec triplets --mode=llvm input.bc -o triplets_train2id.txt
+
+Usage for Machine IR:
 
 .. code-block:: bash
 
-   llvm-ir2vec triplets input.bc -o triplets_train2id.txt
+   llvm-ir2vec triplets --mode=mir input.mir -o triplets_train2id.txt
 
 Entity Mapping Generation
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
 With the `entities` subcommand, :program:`llvm-ir2vec` generates the entity mappings
-supported by IR2Vec in the standard format used for knowledge graph embedding
-training. This subcommand outputs all supported entities (opcodes, types, and
-operands) with their corresponding numeric IDs, and is not specific for an
-LLVM IR file.
+supported by IR2Vec or MIR2Vec in the standard format used for knowledge graph embedding
+training. This subcommand outputs all supported entities with their corresponding numeric IDs.
+
+For LLVM IR, entities include opcodes, types, and operands. For Machine IR, entities include
+machine opcodes, common operands, and register classes (both physical and virtual).
+
+Usage for LLVM IR:
+
+.. code-block:: bash
+
+   llvm-ir2vec entities --mode=llvm -o entity2id.txt
 
-Usage:
+Usage for Machine IR:
 
 .. code-block:: bash
 
-   llvm-ir2vec entities -o entity2id.txt
+   llvm-ir2vec entities --mode=mir input.mir -o entity2id.txt
+
+.. note::
+
+   For LLVM IR mode, the entity mapping is target-independent and does not require an input file.
+   For Machine IR mode, an input .mir file is required to determine the target architecture,
+   as entity mappings vary by target (different architectures have different instruction sets
+   and register classes).
 
 Embedding Generation
 ~~~~~~~~~~~~~~~~~~~~
 
 With the `embeddings` subcommand, :program:`llvm-ir2vec` uses a pre-trained vocabulary to
-generate numerical embeddings for LLVM IR at different levels of granularity.
+generate numerical embeddings for LLVM IR or Machine IR at different levels of granularity.
+
+Example Usage for LLVM IR:
+
+.. code-block:: bash
 
-Example Usage:
+   llvm-ir2vec embeddings --mode=llvm --ir2vec-vocab-path=vocab.json --ir2vec-kind=symbolic --level=func input.bc -o embeddings.txt
+
+Example Usage for Machine IR:
 
 .. code-block:: bash
 
-   llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --ir2vec-kind=symbolic --level=func input.bc -o embeddings.txt
+   llvm-ir2vec embeddings --mode=mir --mir2vec-vocab-path=vocab.json --level=func input.mir -o embeddings.txt
 
 OPTIONS
 -------
 
-Global options:
+Common options (applicable to both LLVM IR and Machine IR modes):
+
+.. option:: --mode=<mode>
+
+   Specify the operation mode. Valid values are:
+
+   * ``llvm`` - Process LLVM IR bitcode files (default)
+   * ``mir`` - Process Machine IR (.mir) files
 
 .. option:: -o <filename>
 
@@ -116,8 +159,8 @@ Subcommand-specific options:
 
 .. option:: <input-file>
 
-   The input LLVM IR or bitcode file to process. This positional argument is
-   required for the `embeddings` subcommand.
+   The input LLVM IR/bitcode file (.ll/.bc) or Machine IR file (.mir) to process. 
+   This positional argument is required for the `embeddings` subcommand.
 
 .. option:: --level=<level>
 
@@ -131,6 +174,8 @@ Subcommand-specific options:
 
    Process only the specified function instead of all functions in the module.
 
+**IR2Vec-specific options** (for ``--mode=llvm``):
+
 .. option:: --ir2vec-kind=<kind>
 
    Specify the kind of IR2Vec embeddings to generate. Valid values are:
@@ -143,8 +188,8 @@ Subcommand-specific options:
 
 .. option:: --ir2vec-vocab-path=<path>
 
-   Specify the path to the vocabulary file (required for embedding generation).
-   The vocabulary file should be in JSON format and contain the trained
+   Specify the path to the IR2Vec vocabulary file (required for LLVM IR embedding 
+   generation). The vocabulary file should be in JSON format and contain the trained
    vocabulary for embedding generation. See `llvm/lib/Analysis/models`
    for pre-trained vocabulary files.
 
@@ -163,17 +208,51 @@ Subcommand-specific options:
    Specify the weight for argument embeddings (default: 0.2). This controls
    the relative importance of operand information in the final embedding.
 
+**MIR2Vec-specific options** (for ``--mode=mir``):
+
+.. option:: --mir2vec-vocab-path=<path>
+
+   Specify the path to the MIR2Vec vocabulary file (required for Machine IR 
+   embedding generation). The vocabulary file should be in JSON format and 
+   contain the trained vocabulary for embedding generation.
+
+.. option:: --mir2vec-kind=<kind>
+
+   Specify the kind of MIR2Vec embeddings to generate. Valid values are:
+
+   * ``symbolic`` - Generate symbolic embeddings (default)
+
+.. option:: --mir2vec-opc-weight=<weight>
+
+   Specify the weight for machine opcode embeddings (default: 1.0). This controls
+   the relative importance of machine instruction opcodes in the final embedding.
+
+.. option:: --mir2vec-common-operand-weight=<weight>
+
+   Specify the weight for common operand embeddings (default: 1.0). This controls
+   the relative importance of common operand types in the final embedding.
+
+.. option:: --mir2vec-reg-operand-weight=<weight>
+
+   Specify the weight for register operand embeddings (default: 1.0). This controls
+   the relative importance of register operands in the final embedding.
+
 
 **triplets** subcommand:
 
 .. option:: <input-file>
 
-   The input LLVM IR or bitcode file to process. This positional argument is
-   required for the `triplets` subcommand.
+   The input LLVM IR/bitcode file (.ll/.bc) or Machine IR file (.mir) to process. 
+   This positional argument is required for the `triplets` subcommand.
 
 **entities** subcommand:
 
-   No subcommand-specific options.
+.. option:: <input-file>
+
+   The input Machine IR file (.mir) to process. This positional argument is required
+   for the `entities` subcommand when using ``--mode=mir``, as the entity mappings
+   are target-specific. For ``--mode=llvm``, no input file is required as IR2Vec
+   entity mappings are target-independent.
 
 OUTPUT FORMAT
 -------------
@@ -186,19 +265,37 @@ metadata headers. The format includes:
 
 .. code-block:: text
 
-   MAX_RELATIONS=<max_relations_count>
+   MAX_RELATION=<max_relation_count>
    <head_entity_id> <tail_entity_id> <relation_id>
    <head_entity_id> <tail_entity_id> <relation_id>
    ...
 
 Each line after the metadata header represents one instruction relationship,
-with numeric IDs for head entity, relation, and tail entity. The metadata 
-header (MAX_RELATIONS) provides counts for post-processing and training setup.
+with numeric IDs for head entity, tail entity, and relation type. The metadata 
+header (MAX_RELATION) indicates the maximum relation ID used.
+
+**Relation Types:**
+
+For LLVM IR (IR2Vec):
+  * **0** = Type relationship (instruction to its type)
+  * **1** = Next relationship (sequential instructions)
+  * **2+** = Argument relationships (Arg0, Arg1, Arg2, ...)
+
+For Machine IR (MIR2Vec):
+  * **0** = Next relationship (sequential instructions)
+  * **1+** = Argument relationships (Arg0, Arg1, Arg2, ...)
+
+**Entity IDs:**
+
+For LLVM IR: Entity IDs represent opcodes, types, and operands as defined by the IR2Vec vocabulary.
+
+For Machine IR: Entity IDs represent machine opcodes, common operands (immediate, frame index, etc.),
+physical register classes, and virtual register classes as defined by the MIR2Vec vocabulary. The entity layout is target-specific.
 
 Entity Mode Output
 ~~~~~~~~~~~~~~~~~~
 
-In entity mode, the output consists of entity mapping in the format:
+In entity mode, the output consists of entity mappings in the format:
 
 .. code-block:: text
 
@@ -210,6 +307,13 @@ In entity mode, the output consists of entity mapping in the format:
 The first line contains the total number of entities, followed by one entity
 mapping per line with tab-separated entity string and numeric ID.
 
+For LLVM IR, entities include instruction opcodes (e.g., "Add", "Ret"), types 
+(e.g., "INT", "PTR"), and operand kinds.
+
+For Machine IR, entities include machine opcodes (e.g., "COPY", "ADD"), 
+common operands (e.g., "Immediate", "FrameIndex"), physical register classes 
+(e.g., "PhyReg_GR32"), and virtual register classes (e.g., "VirtReg_GR32").
+
 Embedding Mode Output
 ~~~~~~~~~~~~~~~~~~~~~
 
@@ -240,3 +344,6 @@ SEE ALSO
 
 For more information about the IR2Vec algorithm and approach, see:
 `IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_.
+
+For more information about the MIR2Vec algorithm and approach, see:
+`RL4ReAl: Reinforcement Learning for Register Allocation <https://doi.org/10.1145/3578360.3580273>`_.