aboutsummaryrefslogtreecommitdiff
path: root/llvm/docs/MLGO.rst
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs/MLGO.rst')
-rw-r--r--llvm/docs/MLGO.rst144
1 files changed, 140 insertions, 4 deletions
diff --git a/llvm/docs/MLGO.rst b/llvm/docs/MLGO.rst
index bf3de11..2443835 100644
--- a/llvm/docs/MLGO.rst
+++ b/llvm/docs/MLGO.rst
@@ -434,8 +434,27 @@ The latter is also used in tests.
There is no C++ implementation of a log reader. We do not have a scenario
motivating one.
-IR2Vec Embeddings
-=================
+Embeddings
+==========
+
+LLVM provides embedding frameworks to generate vector representations of code
+at different abstraction levels. These embeddings capture syntactic, semantic,
+and structural properties of the code and can be used as features for machine
+learning models in various compiler optimization tasks.
+
+Two embedding frameworks are available:
+
+- **IR2Vec**: Generates embeddings for LLVM IR
+- **MIR2Vec**: Generates embeddings for Machine IR
+
+Both frameworks follow a similar architecture with vocabulary-based embedding
+generation, where a vocabulary maps code entities to n-dimensional floating
+point vectors. These embeddings can be computed at multiple granularity levels
+(instruction, basic block, and function) and used for ML-guided compiler
+optimizations.
+
+IR2Vec
+------
IR2Vec is a program embedding approach designed specifically for LLVM IR. It
is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
@@ -466,7 +485,7 @@ The core components are:
compute embeddings for instructions, basic blocks, and functions.
Using IR2Vec
-------------
+^^^^^^^^^^^^
.. note::
@@ -526,7 +545,7 @@ embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance.
between different code snippets, or perform other analyses as needed.
Further Details
----------------
+^^^^^^^^^^^^^^^
For more detailed information about the IR2Vec algorithm, its parameters, and
advanced usage, please refer to the original paper:
@@ -538,6 +557,123 @@ triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`.
The LLVM source code for ``IR2Vec`` can also be explored to understand the
implementation details.
+MIR2Vec
+-------
+
+MIR2Vec is an extension of IR2Vec designed specifically for LLVM Machine IR
+(MIR). It generates embeddings for machine-level instructions, basic blocks,
+and functions. MIR2Vec operates on the target-specific machine representation,
+capturing machine instruction semantics including opcodes, operands, and
+register information at the machine level.
+
+MIR2Vec extends the vocabulary to include:
+
+- **Machine Opcodes**: Target-specific instruction opcodes derived from the
+ TargetInstrInfo, grouped by instruction semantics.
+
+- **Common Operands**: All common operand types (excluding register operands),
+ defined by the ``MachineOperand::MachineOperandType`` enum.
+
+- **Physical Register Classes**: Register classes defined by the target,
+ specialized for physical registers.
+
+- **Virtual Register Classes**: Register classes defined by the target,
+ specialized for virtual registers.
+
+The core components are:
+
+- **Vocabulary**: A mapping from machine IR entities (opcodes, operands, register
+ classes) to their vector representations. This is managed by
+ ``MIR2VecVocabLegacyAnalysis`` for the legacy pass manager, with a
+ ``MIR2VecVocabProvider`` that can be used standalone or wrapped by pass
+ managers. The vocabulary (.json file) contains sections for opcodes, common
+ operands, physical register classes, and virtual register classes.
+
+ .. note::
+
+ The vocabulary file should contain these sections for it to be valid.
+
+- **Embedder**: A class (``mir2vec::MIREmbedder``) that uses the vocabulary to
+ compute embeddings for machine instructions, machine basic blocks, and
+ machine functions. Currently, ``SymbolicMIREmbedder`` is the available
+ implementation.
+
+Using MIR2Vec
+^^^^^^^^^^^^^
+
+.. note::
+
+ This section describes how to use MIR2Vec within LLVM passes. `llvm-ir2vec`
+ tool ` :doc:`CommandGuide/llvm-ir2vec` can be used for generating MIR2Vec
+ embeddings from Machine IR files (.mir), which can be useful for generating
+ embeddings outside of compiler passes.
+
+To generate MIR2Vec embeddings in a compiler pass, first obtain the vocabulary,
+then create an embedder instance to compute and access embeddings.
+
+1. **Get the Vocabulary**:
+ In a MachineFunctionPass, get the vocabulary from the analysis:
+
+ .. code-block:: c++
+
+ auto &VocabAnalysis = getAnalysis<MIR2VecVocabLegacyAnalysis>();
+ auto VocabOrErr = VocabAnalysis.getMIR2VecVocabulary(*MF.getFunction().getParent());
+ if (!VocabOrErr) {
+ // Handle error: vocabulary is not available or invalid
+ return;
+ }
+ const mir2vec::MIRVocabulary &Vocabulary = *VocabOrErr;
+
+ Note that ``MIR2VecVocabLegacyAnalysis`` is an immutable pass.
+
+2. **Create Embedder instance**:
+ With the vocabulary, create an embedder for a specific machine function:
+
+ .. code-block:: c++
+
+ // Assuming MF is a MachineFunction&
+ // For example, using MIR2VecKind::Symbolic:
+ std::unique_ptr<mir2vec::MIREmbedder> Emb =
+ mir2vec::MIREmbedder::create(MIR2VecKind::Symbolic, MF, Vocabulary);
+
+
+3. **Compute and Access Embeddings**:
+ Call ``getMFunctionVector()`` to get the embedding for the machine function.
+
+ .. code-block:: c++
+
+ mir2vec::Embedding FuncVector = Emb->getMFunctionVector();
+
+ Currently, ``MIREmbedder`` can generate embeddings at three levels: Machine
+ Instructions, Machine Basic Blocks, and Machine Functions. Appropriate
+ getters are provided to access the embeddings at these levels.
+
+ .. note::
+
+ The validity of the ``MIREmbedder`` instance (and the embeddings it
+ generates) is tied to the machine function it is associated with. If the
+ machine function is modified, the embeddings may become stale and should
+ be recomputed accordingly.
+
+4. **Working with Embeddings:**
+ Embeddings are represented as ``std::vector<double>``. These vectors can be
+ used as features for machine learning models, compute similarity scores
+ between different code snippets, or perform other analyses as needed.
+
+Further Details
+^^^^^^^^^^^^^^^
+
+For more detailed information about the MIR2Vec algorithm, its parameters, and
+advanced usage, please refer to the original paper:
+`RL4ReAl: Reinforcement Learning for Register Allocation <https://doi.org/10.1145/3578360.3580273>`_.
+
+For information about using MIR2Vec tool for generating embeddings from
+Machine IR, see :doc:`CommandGuide/llvm-ir2vec`.
+
+The LLVM source code for ``MIR2Vec`` can be explored to understand the
+implementation details. See ``llvm/include/llvm/CodeGen/MIR2Vec.h`` and
+``llvm/lib/CodeGen/MIR2Vec.cpp``.
+
Building with ML support
========================