11 files changed, 137 insertions, 72 deletions
diff --git a/llvm/docs/CommandGuide/lit.rst b/llvm/docs/CommandGuide/lit.rst
index 938b7f9..eb90e95 100644
--- a/llvm/docs/CommandGuide/lit.rst
+++ b/llvm/docs/CommandGuide/lit.rst
@@ -356,6 +356,11 @@ The timing data is stored in the `test_exec_root` in a file named
   primary purpose is to suppress an ``XPASS`` result without modifying a test
   case that uses the ``XFAIL`` directive.
 
+.. option:: --exclude-xfail
+
+  ``XFAIL`` tests won't be run, unless they are listed in the ``--xfail-not``
+  (or ``LIT_XFAIL_NOT``) lists.
+
 .. option:: --num-shards M
 
  Divide the set of selected tests into ``M`` equal-sized subsets or
diff --git a/llvm/docs/CommandGuide/llvm-bcanalyzer.rst b/llvm/docs/CommandGuide/llvm-bcanalyzer.rst
index 8f15e03..1e0b581 100644
--- a/llvm/docs/CommandGuide/llvm-bcanalyzer.rst
+++ b/llvm/docs/CommandGuide/llvm-bcanalyzer.rst
@@ -14,7 +14,7 @@ DESCRIPTION
 The :program:`llvm-bcanalyzer` command is a small utility for analyzing bitcode
 files.  The tool reads a bitcode file (such as generated with the
 :program:`llvm-as` tool) and produces a statistical report on the contents of
-the bitcode file.  The tool can also dump a low level but human readable
+the bitcode file.  The tool can also dump a low level but human-readable
 version of the bitcode file.  This tool is probably not of much interest or
 utility except for those working directly with the bitcode file format.  Most
 LLVM users can just ignore this tool.
@@ -30,7 +30,7 @@ OPTIONS
 
 .. option:: --dump
 
- Causes :program:`llvm-bcanalyzer` to dump the bitcode in a human readable
+ Causes :program:`llvm-bcanalyzer` to dump the bitcode in a human-readable
  format.  This format is significantly different from LLVM assembly and
  provides details about the encoding of the bitcode file.
 
diff --git a/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst b/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst
index 1264f80..6a4e348 100644
--- a/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst
+++ b/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst
@@ -14,7 +14,7 @@ DESCRIPTION
 -----------
 :program:`llvm-debuginfo-analyzer` parses debug and text sections in
 binary object files and prints their contents in a logical view, which
-is a human readable representation that closely matches the structure
+is a human-readable representation that closely matches the structure
 of the original user source code. Supported object file formats include
 ELF, Mach-O, WebAssembly, PDB and COFF.
 
diff --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst
index 25e8969..5996026 100644
--- a/llvm/docs/CommandGuide/llvm-exegesis.rst
+++ b/llvm/docs/CommandGuide/llvm-exegesis.rst
@@ -106,7 +106,7 @@ properly.
   using the loop repetition mode. :program:`llvm-exegesis` needs to keep track
   of the current loop iteration within the loop repetition mode in a performant
   manner (i.e., no memory accesses), and uses a register to do this. This register
-  has an architecture specific default (e.g., `R8` on X86), but this might conflict
+  has an architecture-specific default (e.g., `R8` on X86), but this might conflict
   with some snippets. This annotation allows changing the register to prevent
   interference between the loop index register and the snippet.
 
diff --git a/llvm/docs/CommandGuide/llvm-ifs.rst b/llvm/docs/CommandGuide/llvm-ifs.rst
index 1fe81c2..e3582b3 100644
--- a/llvm/docs/CommandGuide/llvm-ifs.rst
+++ b/llvm/docs/CommandGuide/llvm-ifs.rst
@@ -11,7 +11,7 @@ SYNOPSIS
 DESCRIPTION
 -----------
 
-:program:`llvm-ifs` is a tool that jointly produces human readable text-based
+:program:`llvm-ifs` is a tool that jointly produces human-readable text-based
 stubs (.ifs files) for shared objects and linkable shared object stubs
 (.so files) from either ELF shared objects or text-based stubs. The text-based
 stubs is useful for monitoring ABI changes of the shared object. The linkable
diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst
index 13fe4996..0c9fb6e 100644
--- a/llvm/docs/CommandGuide/llvm-ir2vec.rst
+++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst
@@ -6,24 +6,28 @@ llvm-ir2vec - IR2Vec Embedding Generation Tool
 SYNOPSIS
 --------
 
-:program:`llvm-ir2vec` [*options*] *input-file*
+:program:`llvm-ir2vec` [*subcommand*] [*options*]
 
 DESCRIPTION
 -----------
 
 :program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It
 generates IR2Vec embeddings for LLVM IR and supports triplet generation 
-for vocabulary training. It provides two main operation modes:
+for vocabulary training. The tool provides three main subcommands:
 
-1. **Triplet Mode**: Generates triplets (opcode, type, operands) for vocabulary
+1. **triplets**: Generates numeric triplets in train2id format for vocabulary
    training from LLVM IR.
 
-2. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary
+2. **entities**: Generates entity mapping files (entity2id.txt) for vocabulary 
+   training.
+
+3. **embeddings**: Generates IR2Vec embeddings using a trained vocabulary
    at different granularity levels (instruction, basic block, or function).
 
 The tool is designed to facilitate machine learning applications that work with
 LLVM IR by converting the IR into numerical representations that can be used by
-ML models.
+ML models. The `triplets` subcommand generates numeric IDs directly instead of string 
+triplets, streamlining the training data preparation workflow.
 
 .. note::
 
@@ -34,94 +38,130 @@ ML models.
 OPERATION MODES
 ---------------
 
-Triplet Generation Mode
-~~~~~~~~~~~~~~~~~~~~~~~
+Triplet Generation and Entity Mapping Modes are used for preparing
+vocabulary and training data for knowledge graph embeddings. The Embedding Mode
+is used for generating embeddings from LLVM IR using a pre-trained vocabulary.
+
+The Seed Embedding Vocabulary of IR2Vec is trained on a large corpus of LLVM IR
+by modeling the relationships between opcodes, types, and operands as a knowledge
+graph. For this purpose, Triplet Generation and Entity Mapping Modes generate
+triplets and entity mappings in the standard format used for knowledge graph
+embedding training (see 
+<https://github.com/thunlp/OpenKE/tree/OpenKE-PyTorch?tab=readme-ov-file#data-format> 
+for details).
+
+See `llvm/utils/mlgo-utils/IR2Vec/generateTriplets.py` for more details on how
+these two modes are used to generate the triplets and entity mappings.
+
+Triplet Generation
+~~~~~~~~~~~~~~~~~~
 
-In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts triplets
-consisting of opcodes, types, and operands. These triplets can be used to train
-vocabularies for embedding generation.
+With the `triplets` subcommand, :program:`llvm-ir2vec` analyzes LLVM IR and extracts
+numeric triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets
+are generated in the standard format used for knowledge graph embedding training.
+The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping
+infrastructure, eliminating the need for string-to-ID preprocessing.
 
 Usage:
 
 .. code-block:: bash
 
-   llvm-ir2vec --mode=triplets input.bc -o triplets.txt
+   llvm-ir2vec triplets input.bc -o triplets_train2id.txt
 
-Embedding Generation Mode
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+Entity Mapping Generation
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
-In embedding mode, :program:`llvm-ir2vec` uses a pre-trained vocabulary to
+With the `entities` subcommand, :program:`llvm-ir2vec` generates the entity mappings
+supported by IR2Vec in the standard format used for knowledge graph embedding
+training. This subcommand outputs all supported entities (opcodes, types, and
+operands) with their corresponding numeric IDs, and is not specific for an
+LLVM IR file.
+
+Usage:
+
+.. code-block:: bash
+
+   llvm-ir2vec entities -o entity2id.txt
+
+Embedding Generation
+~~~~~~~~~~~~~~~~~~~~
+
+With the `embeddings` subcommand, :program:`llvm-ir2vec` uses a pre-trained vocabulary to
 generate numerical embeddings for LLVM IR at different levels of granularity.
 
 Example Usage:
 
 .. code-block:: bash
 
-   llvm-ir2vec --mode=embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt
+   llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt
 
 OPTIONS
 -------
 
-.. option:: --mode=<mode>
+Global options:
+
+.. option:: -o <filename>
+
+   Specify the output filename. Use ``-`` to write to standard output (default).
+
+.. option:: --help
+
+   Print a summary of command line options.
+
+Subcommand-specific options:
+
+**embeddings** subcommand:
 
- Specify the operation mode. Valid values are:
+.. option:: <input-file>
 
- * ``triplets`` - Generate triplets for vocabulary training
- * ``embeddings`` - Generate embeddings using trained vocabulary (default)
+   The input LLVM IR or bitcode file to process. This positional argument is
+   required for the `embeddings` subcommand.
 
 .. option:: --level=<level>
 
- Specify the embedding generation level. Valid values are:
+   Specify the embedding generation level. Valid values are:
 
- * ``inst`` - Generate instruction-level embeddings
- * ``bb`` - Generate basic block-level embeddings  
- * ``func`` - Generate function-level embeddings (default)
+   * ``inst`` - Generate instruction-level embeddings
+   * ``bb`` - Generate basic block-level embeddings  
+   * ``func`` - Generate function-level embeddings (default)
 
 .. option:: --function=<name>
 
- Process only the specified function instead of all functions in the module.
+   Process only the specified function instead of all functions in the module.
 
 .. option:: --ir2vec-vocab-path=<path>
 
- Specify the path to the vocabulary file (required for embedding mode).
- The vocabulary file should be in JSON format and contain the trained
- vocabulary for embedding generation. See `llvm/lib/Analysis/models`
- for pre-trained vocabulary files.
+   Specify the path to the vocabulary file (required for embedding generation).
+   The vocabulary file should be in JSON format and contain the trained
+   vocabulary for embedding generation. See `llvm/lib/Analysis/models`
+   for pre-trained vocabulary files.
 
 .. option:: --ir2vec-opc-weight=<weight>
 
- Specify the weight for opcode embeddings (default: 1.0). This controls
- the relative importance of instruction opcodes in the final embedding.
+   Specify the weight for opcode embeddings (default: 1.0). This controls
+   the relative importance of instruction opcodes in the final embedding.
 
 .. option:: --ir2vec-type-weight=<weight>
 
- Specify the weight for type embeddings (default: 0.5). This controls
- the relative importance of type information in the final embedding.
+   Specify the weight for type embeddings (default: 0.5). This controls
+   the relative importance of type information in the final embedding.
 
 .. option:: --ir2vec-arg-weight=<weight>
 
- Specify the weight for argument embeddings (default: 0.2). This controls
- the relative importance of operand information in the final embedding.
+   Specify the weight for argument embeddings (default: 0.2). This controls
+   the relative importance of operand information in the final embedding.
 
-.. option:: -o <filename>
-
- Specify the output filename. Use ``-`` to write to standard output (default).
 
-.. option:: --help
+**triplets** subcommand:
 
- Print a summary of command line options.
+.. option:: <input-file>
 
-.. note::
+   The input LLVM IR or bitcode file to process. This positional argument is
+   required for the `triplets` subcommand.
 
-   ``--level``, ``--function``, ``--ir2vec-vocab-path``, ``--ir2vec-opc-weight``, 
-   ``--ir2vec-type-weight``, and ``--ir2vec-arg-weight`` are only used in embedding 
-   mode. These options are ignored in triplet mode.
+**entities** subcommand:
 
-INPUT FILE FORMAT
------------------
-
-:program:`llvm-ir2vec` accepts LLVM bitcode files (``.bc``) and LLVM IR files 
-(``.ll``) as input. The input file should contain valid LLVM IR.
+   No subcommand-specific options.
 
 OUTPUT FORMAT
 -------------
@@ -129,14 +169,34 @@ OUTPUT FORMAT
 Triplet Mode Output
 ~~~~~~~~~~~~~~~~~~~
 
-In triplet mode, the output consists of lines containing space-separated triplets:
+In triplet mode, the output consists of numeric triplets in train2id format with
+metadata headers. The format includes:
+
+.. code-block:: text
+
+   MAX_RELATIONS=<max_relations_count>
+   <head_entity_id> <tail_entity_id> <relation_id>
+   <head_entity_id> <tail_entity_id> <relation_id>
+   ...
+
+Each line after the metadata header represents one instruction relationship,
+with numeric IDs for head entity, relation, and tail entity. The metadata 
+header (MAX_RELATIONS) provides counts for post-processing and training setup.
+
+Entity Mode Output
+~~~~~~~~~~~~~~~~~~
+
+In entity mode, the output consists of entity mapping in the format:
 
 .. code-block:: text
 
-   <opcode> <type> <operand1> <operand2> ...
+   <total_entities>
+   <entity_string>	<numeric_id>
+   <entity_string>	<numeric_id>
+   ...
 
-Each line represents the information of one instruction, with the opcode, type,
-and operands.
+The first line contains the total number of entities, followed by one entity
+mapping per line with tab-separated entity string and numeric ID.
 
 Embedding Mode Output
 ~~~~~~~~~~~~~~~~~~~~~
diff --git a/llvm/docs/CommandGuide/llvm-locstats.rst b/llvm/docs/CommandGuide/llvm-locstats.rst
index 3186566..7f436c1 100644
--- a/llvm/docs/CommandGuide/llvm-locstats.rst
+++ b/llvm/docs/CommandGuide/llvm-locstats.rst
@@ -13,7 +13,7 @@ DESCRIPTION
 
 :program:`llvm-locstats` works like a wrapper around :program:`llvm-dwarfdump`.
 It parses :program:`llvm-dwarfdump` statistics regarding debug location by
-pretty printing it in a more human readable way.
+pretty printing it in a more human-readable way.
 
 The line 0% shows the number and the percentage of DIEs with no location
 information, but the line 100% shows the information for DIEs where there is
diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst
index bea1931..1daae5d 100644
--- a/llvm/docs/CommandGuide/llvm-mca.rst
+++ b/llvm/docs/CommandGuide/llvm-mca.rst
@@ -241,7 +241,7 @@ option specifies "``-``", then the output will also be sent to standard output.
 .. option:: -disable-cb
 
   Force usage of the generic CustomBehaviour and InstrPostProcess classes rather
-  than using the target specific implementation. The generic classes never
+  than using the target-specific implementation. The generic classes never
   detect any custom hazards or make any post processing modifications to
   instructions.
 
@@ -1125,9 +1125,9 @@ CustomBehaviour class can be used in these cases to enforce proper
 instruction modeling (often by customizing data dependencies and detecting
 hazards that :program:`llvm-mca` has no way of knowing about).
 
-:program:`llvm-mca` comes with one generic and multiple target specific
+:program:`llvm-mca` comes with one generic and multiple target-specific
 CustomBehaviour classes. The generic class will be used if the ``-disable-cb``
-flag is used or if a target specific CustomBehaviour class doesn't exist for
+flag is used or if a target-specific CustomBehaviour class doesn't exist for
 that target. (The generic class does nothing.) Currently, the CustomBehaviour
 class is only a part of the in-order pipeline, but there are plans to add it
 to the out-of-order pipeline in the future.
@@ -1141,7 +1141,7 @@ if you don't know the exact number and a value of 0 represents no stall).
 
 If you'd like to add a CustomBehaviour class for a target that doesn't
 already have one, refer to an existing implementation to see how to set it
-up. The classes are implemented within the target specific backend (for
+up. The classes are implemented within the target-specific backend (for
 example `/llvm/lib/Target/AMDGPU/MCA/`) so that they can access backend symbols.
 
 Instrument Manager
@@ -1177,12 +1177,12 @@ classes (MCSubtargetInfo, MCInstrInfo, etc.), please add it to the
 AND requires unexposed backend symbols or functionality, you can define it in
 the `/lib/Target/<TargetName>/MCA/` directory.
 
-To enable this target specific View, you will have to use this target's
+To enable this target-specific View, you will have to use this target's
 CustomBehaviour class to override the `CustomBehaviour::getViews()` methods.
 There are 3 variations of these methods based on where you want your View to
 appear in the output: `getStartViews()`, `getPostInstrInfoViews()`, and
 `getEndViews()`. These methods returns a vector of Views so you will want to
-return a vector containing all of the target specific Views for the target in
+return a vector containing all of the target-specific Views for the target in
 question.
 
 Because these target specific (and backend dependent) Views require the
diff --git a/llvm/docs/CommandGuide/llvm-profdata.rst b/llvm/docs/CommandGuide/llvm-profdata.rst
index b2c0457..0b1cd02 100644
--- a/llvm/docs/CommandGuide/llvm-profdata.rst
+++ b/llvm/docs/CommandGuide/llvm-profdata.rst
@@ -338,7 +338,7 @@ OPTIONS
 
  Instruct the profile dumper to show profile counts in the text format of the
  instrumentation-based profile data representation. By default, the profile
- information is dumped in a more human readable form (also in text) with
+ information is dumped in a more human-readable form (also in text) with
  annotations.
 
 .. option:: --topn=<n>
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 2da1b24..fb86a69 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -371,7 +371,7 @@ OPTIONS
   * Prints an address's debug-data discriminator when it is non-zero. One way to
     produce discriminators is to compile with clang's -fdebug-info-for-profiling.
 
-  ``JSON`` style provides a machine readable output in JSON. If addresses are
+  ``JSON`` style provides a machine-readable output in JSON. If addresses are
     supplied via stdin, the output JSON will be a series of individual objects.
     Otherwise, all results will be contained in a single array.
 
@@ -444,7 +444,7 @@ OPTIONS
 
 .. option:: --pretty-print, -p
 
-  Print human readable output. If :option:`--inlining` is specified, the
+  Print human-readable output. If :option:`--inlining` is specified, the
   enclosing scope is prefixed by (inlined by).
   For JSON output, the option will cause JSON to be indented and split over
   new lines. Otherwise, the JSON output will be printed in a compact form.
diff --git a/llvm/docs/CommandGuide/opt.rst b/llvm/docs/CommandGuide/opt.rst
index f067f62..da93b8e 100644
--- a/llvm/docs/CommandGuide/opt.rst
+++ b/llvm/docs/CommandGuide/opt.rst
@@ -46,12 +46,12 @@ OPTIONS
 
  Write output in LLVM intermediate language (instead of bitcode).
 
-.. option:: -{passname}
+.. option:: -passes=<string>
 
- :program:`opt` provides the ability to run any of LLVM's optimization or
- analysis passes in any order.  The :option:`-help` option lists all the passes
- available.  The order in which the options occur on the command line are the
- order in which they are executed (within pass constraints).
+ A textual (comma-separated) description of the pass pipeline,
+ e.g., ``-passes="sroa,instcombine"``. See
+ `invoking opt <../NewPassManager.html#invoking-opt>`_ for more details on the
+ pass pipeline syntax.
 
 .. option:: -strip-debug