aboutsummaryrefslogtreecommitdiff
path: root/llvm/docs
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs')
-rw-r--r--llvm/docs/CodingStandards.rst36
-rw-r--r--llvm/docs/CommandGuide/llc.rst7
-rw-r--r--llvm/docs/CommandGuide/lli.rst5
-rw-r--r--llvm/docs/CommandGuide/llvm-ir2vec.rst95
-rw-r--r--llvm/docs/Extensions.rst29
-rw-r--r--llvm/docs/GlobalISel/GenericOpcode.rst2
-rw-r--r--llvm/docs/MLGO.rst144
-rw-r--r--llvm/docs/SourceLevelDebugging.rst2
8 files changed, 265 insertions, 55 deletions
diff --git a/llvm/docs/CodingStandards.rst b/llvm/docs/CodingStandards.rst
index 8677d89..63f6663 100644
--- a/llvm/docs/CodingStandards.rst
+++ b/llvm/docs/CodingStandards.rst
@@ -1692,29 +1692,29 @@ faraway places in the file to tell that the function is local:
Don't Use Braces on Simple Single-Statement Bodies of if/else/loop Statements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-When writing the body of an ``if``, ``else``, or for/while loop statement, we
-prefer to omit the braces to avoid unnecessary line noise. However, braces
-should be used in cases where the omission of braces harms the readability and
-maintainability of the code.
+When writing the body of an ``if``, ``else``, or ``for``/``while`` loop
+statement, we aim to reduce unnecessary line noise.
-We consider that readability is harmed when omitting the brace in the presence
-of a single statement that is accompanied by a comment (assuming the comment
-can't be hoisted above the ``if`` or loop statement, see below).
+**Omit braces when:**
-Similarly, braces should be used when a single-statement body is complex enough
-that it becomes difficult to see where the block containing the following
-statement began. An ``if``/``else`` chain or a loop is considered a single
-statement for this rule, and this rule applies recursively.
+* The body consists of a single **simple** statement.
+* The single statement is not preceded by a comment.
+ (Hoist comments above the control statement if you can.)
+* An ``else`` clause, if present, also meets the above criteria (single
+ simple statement, no associated comments).
-This list is not exhaustive. For example, readability is also harmed if an
-``if``/``else`` chain does not use braced bodies for either all or none of its
-members, or has complex conditionals, deep nesting, etc. The examples below
-intend to provide some guidelines.
+**Use braces in all other cases, including:**
-Maintainability is harmed if the body of an ``if`` ends with a (directly or
-indirectly) nested ``if`` statement with no ``else``. Braces on the outer ``if``
-would help to avoid running into a "dangling else" situation.
+* Multi-statement bodies
+* Single-statement bodies with non-hoistable comments
+* Complex single-statement bodies (e.g., deep nesting, complex nested
+ loops)
+* Inconsistent bracing within ``if``/``else if``/``else`` chains (if one
+ block requires braces, all must)
+* ``if`` statements ending with a nested ``if`` lacking an ``else`` (to
+ prevent "dangling else")
+The examples below provide guidelines for these cases:
.. code-block:: c++
diff --git a/llvm/docs/CommandGuide/llc.rst b/llvm/docs/CommandGuide/llc.rst
index 900649f..cc670f6 100644
--- a/llvm/docs/CommandGuide/llc.rst
+++ b/llvm/docs/CommandGuide/llc.rst
@@ -125,13 +125,6 @@ End-user Options
Enable setting the FP exceptions build attribute not to use exceptions.
-.. option:: --enable-unsafe-fp-math
-
- Enable optimizations that make unsafe assumptions about IEEE math (e.g. that
- addition is associative) or may not work for all input ranges. These
- optimizations allow the code generator to make use of some instructions which
- would otherwise not be usable (such as ``fsin`` on X86).
-
.. option:: --stats
Print statistics recorded by code-generation passes.
diff --git a/llvm/docs/CommandGuide/lli.rst b/llvm/docs/CommandGuide/lli.rst
index 94c0013..8afe10d 100644
--- a/llvm/docs/CommandGuide/lli.rst
+++ b/llvm/docs/CommandGuide/lli.rst
@@ -107,11 +107,6 @@ FLOATING POINT OPTIONS
Enable optimizations that assume no NAN values.
-.. option:: -enable-unsafe-fp-math
-
- Causes :program:`lli` to enable optimizations that may decrease floating point
- precision.
-
.. option:: -soft-float
Causes :program:`lli` to generate software floating point library calls instead of
diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst
index fc590a6..55fe75d 100644
--- a/llvm/docs/CommandGuide/llvm-ir2vec.rst
+++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst
@@ -1,5 +1,5 @@
-llvm-ir2vec - IR2Vec Embedding Generation Tool
-==============================================
+llvm-ir2vec - IR2Vec and MIR2Vec Embedding Generation Tool
+===========================================================
.. program:: llvm-ir2vec
@@ -11,9 +11,9 @@ SYNOPSIS
DESCRIPTION
-----------
-:program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It
-generates IR2Vec embeddings for LLVM IR and supports triplet generation
-for vocabulary training.
+:program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec and MIR2Vec.
+It generates embeddings for both LLVM IR and Machine IR (MIR) and supports
+triplet generation for vocabulary training.
The tool provides three main subcommands:
@@ -23,23 +23,33 @@ The tool provides three main subcommands:
2. **entities**: Generates entity mapping files (entity2id.txt) for vocabulary
training.
-3. **embeddings**: Generates IR2Vec embeddings using a trained vocabulary
+3. **embeddings**: Generates IR2Vec or MIR2Vec embeddings using a trained vocabulary
at different granularity levels (instruction, basic block, or function).
+The tool supports two operation modes:
+
+* **LLVM IR mode** (``--mode=llvm``): Process LLVM IR bitcode files and generate
+ IR2Vec embeddings
+* **Machine IR mode** (``--mode=mir``): Process Machine IR (.mir) files and generate
+ MIR2Vec embeddings
+
The tool is designed to facilitate machine learning applications that work with
-LLVM IR by converting the IR into numerical representations that can be used by
-ML models. The `triplets` subcommand generates numeric IDs directly instead of string
-triplets, streamlining the training data preparation workflow.
+LLVM IR or Machine IR by converting them into numerical representations that can
+be used by ML models. The `triplets` subcommand generates numeric IDs directly
+instead of string triplets, streamlining the training data preparation workflow.
.. note::
- For information about using IR2Vec programmatically within LLVM passes and
- the C++ API, see the `IR2Vec Embeddings <https://llvm.org/docs/MLGO.html#ir2vec-embeddings>`_
+ For information about using IR2Vec and MIR2Vec programmatically within LLVM
+ passes and the C++ API, see the `IR2Vec Embeddings <https://llvm.org/docs/MLGO.html#ir2vec-embeddings>`_
section in the MLGO documentation.
OPERATION MODES
---------------
+The tool operates in two modes: **LLVM IR mode** and **Machine IR mode**. The mode
+is selected using the ``--mode`` option (default: ``llvm``).
+
Triplet Generation and Entity Mapping Modes are used for preparing
vocabulary and training data for knowledge graph embeddings. The Embedding Mode
is used for generating embeddings from LLVM IR using a pre-trained vocabulary.
@@ -89,18 +99,31 @@ Embedding Generation
~~~~~~~~~~~~~~~~~~~~
With the `embeddings` subcommand, :program:`llvm-ir2vec` uses a pre-trained vocabulary to
-generate numerical embeddings for LLVM IR at different levels of granularity.
+generate numerical embeddings for LLVM IR or Machine IR at different levels of granularity.
+
+Example Usage for LLVM IR:
+
+.. code-block:: bash
+
+ llvm-ir2vec embeddings --mode=llvm --ir2vec-vocab-path=vocab.json --ir2vec-kind=symbolic --level=func input.bc -o embeddings.txt
-Example Usage:
+Example Usage for Machine IR:
.. code-block:: bash
- llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --ir2vec-kind=symbolic --level=func input.bc -o embeddings.txt
+ llvm-ir2vec embeddings --mode=mir --mir2vec-vocab-path=vocab.json --level=func input.mir -o embeddings.txt
OPTIONS
-------
-Global options:
+Common options (applicable to both LLVM IR and Machine IR modes):
+
+.. option:: --mode=<mode>
+
+ Specify the operation mode. Valid values are:
+
+ * ``llvm`` - Process LLVM IR bitcode files (default)
+ * ``mir`` - Process Machine IR (.mir) files
.. option:: -o <filename>
@@ -116,8 +139,8 @@ Subcommand-specific options:
.. option:: <input-file>
- The input LLVM IR or bitcode file to process. This positional argument is
- required for the `embeddings` subcommand.
+ The input LLVM IR/bitcode file (.ll/.bc) or Machine IR file (.mir) to process.
+ This positional argument is required for the `embeddings` subcommand.
.. option:: --level=<level>
@@ -131,6 +154,8 @@ Subcommand-specific options:
Process only the specified function instead of all functions in the module.
+**IR2Vec-specific options** (for ``--mode=llvm``):
+
.. option:: --ir2vec-kind=<kind>
Specify the kind of IR2Vec embeddings to generate. Valid values are:
@@ -143,8 +168,8 @@ Subcommand-specific options:
.. option:: --ir2vec-vocab-path=<path>
- Specify the path to the vocabulary file (required for embedding generation).
- The vocabulary file should be in JSON format and contain the trained
+ Specify the path to the IR2Vec vocabulary file (required for LLVM IR embedding
+ generation). The vocabulary file should be in JSON format and contain the trained
vocabulary for embedding generation. See `llvm/lib/Analysis/models`
for pre-trained vocabulary files.
@@ -163,6 +188,35 @@ Subcommand-specific options:
Specify the weight for argument embeddings (default: 0.2). This controls
the relative importance of operand information in the final embedding.
+**MIR2Vec-specific options** (for ``--mode=mir``):
+
+.. option:: --mir2vec-vocab-path=<path>
+
+ Specify the path to the MIR2Vec vocabulary file (required for Machine IR
+ embedding generation). The vocabulary file should be in JSON format and
+ contain the trained vocabulary for embedding generation.
+
+.. option:: --mir2vec-kind=<kind>
+
+ Specify the kind of MIR2Vec embeddings to generate. Valid values are:
+
+ * ``symbolic`` - Generate symbolic embeddings (default)
+
+.. option:: --mir2vec-opc-weight=<weight>
+
+ Specify the weight for machine opcode embeddings (default: 1.0). This controls
+ the relative importance of machine instruction opcodes in the final embedding.
+
+.. option:: --mir2vec-common-operand-weight=<weight>
+
+ Specify the weight for common operand embeddings (default: 1.0). This controls
+ the relative importance of common operand types in the final embedding.
+
+.. option:: --mir2vec-reg-operand-weight=<weight>
+
+ Specify the weight for register operand embeddings (default: 1.0). This controls
+ the relative importance of register operands in the final embedding.
+
**triplets** subcommand:
@@ -240,3 +294,6 @@ SEE ALSO
For more information about the IR2Vec algorithm and approach, see:
`IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_.
+
+For more information about the MIR2Vec algorithm and approach, see:
+`RL4ReAl: Reinforcement Learning for Register Allocation <https://doi.org/10.1145/3578360.3580273>`_.
diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst
index 89a0e80..214323e 100644
--- a/llvm/docs/Extensions.rst
+++ b/llvm/docs/Extensions.rst
@@ -601,6 +601,35 @@ sees fit (generally the section that would provide the best locality).
.. _CFI jump table: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#forward-edge-cfi-for-indirect-function-calls
+``SHT_LLVM_CALL_GRAPH`` Section (Call Graph)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This section is used to store the call graph. It has a type of
+``SHT_LLVM_CALL_GRAPH`` (0x6fff4c0f). Details of call graph section layout
+are described in :doc:`CallGraphSection`.
+
+For example:
+
+.. code-block:: gas
+
+ .section ".llvm.callgraph","",@llvm_call_graph
+ .byte 0
+ .byte 7
+ .quad .Lball
+ .quad 0
+ .byte 3
+ .quad foo
+ .quad bar
+ .quad baz
+ .byte 3
+ .quad 4524972987496481828
+ .quad 3498816979441845844
+ .quad 8646233951371320954
+
+This indicates that ``ball`` calls ``foo``, ``bar`` and ``baz`` directly;
+``ball`` indirectly calls functions whose types are ``4524972987496481828``,
+``3498816979441845844`` and ``8646233951371320954``.
+
CodeView-Dependent
------------------
diff --git a/llvm/docs/GlobalISel/GenericOpcode.rst b/llvm/docs/GlobalISel/GenericOpcode.rst
index b055327..661a115 100644
--- a/llvm/docs/GlobalISel/GenericOpcode.rst
+++ b/llvm/docs/GlobalISel/GenericOpcode.rst
@@ -504,7 +504,7 @@ undefined.
G_ABDS, G_ABDU
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Compute the absolute difference (signed and unsigned), e.g. abs(x-y).
+Compute the absolute difference (signed and unsigned), e.g. trunc(abs(ext(x)-ext(y)).
.. code-block:: none
diff --git a/llvm/docs/MLGO.rst b/llvm/docs/MLGO.rst
index bf3de11..2443835 100644
--- a/llvm/docs/MLGO.rst
+++ b/llvm/docs/MLGO.rst
@@ -434,8 +434,27 @@ The latter is also used in tests.
There is no C++ implementation of a log reader. We do not have a scenario
motivating one.
-IR2Vec Embeddings
-=================
+Embeddings
+==========
+
+LLVM provides embedding frameworks to generate vector representations of code
+at different abstraction levels. These embeddings capture syntactic, semantic,
+and structural properties of the code and can be used as features for machine
+learning models in various compiler optimization tasks.
+
+Two embedding frameworks are available:
+
+- **IR2Vec**: Generates embeddings for LLVM IR
+- **MIR2Vec**: Generates embeddings for Machine IR
+
+Both frameworks follow a similar architecture with vocabulary-based embedding
+generation, where a vocabulary maps code entities to n-dimensional floating
+point vectors. These embeddings can be computed at multiple granularity levels
+(instruction, basic block, and function) and used for ML-guided compiler
+optimizations.
+
+IR2Vec
+------
IR2Vec is a program embedding approach designed specifically for LLVM IR. It
is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
@@ -466,7 +485,7 @@ The core components are:
compute embeddings for instructions, basic blocks, and functions.
Using IR2Vec
-------------
+^^^^^^^^^^^^
.. note::
@@ -526,7 +545,7 @@ embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance.
between different code snippets, or perform other analyses as needed.
Further Details
----------------
+^^^^^^^^^^^^^^^
For more detailed information about the IR2Vec algorithm, its parameters, and
advanced usage, please refer to the original paper:
@@ -538,6 +557,123 @@ triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`.
The LLVM source code for ``IR2Vec`` can also be explored to understand the
implementation details.
+MIR2Vec
+-------
+
+MIR2Vec is an extension of IR2Vec designed specifically for LLVM Machine IR
+(MIR). It generates embeddings for machine-level instructions, basic blocks,
+and functions. MIR2Vec operates on the target-specific machine representation,
+capturing machine instruction semantics including opcodes, operands, and
+register information at the machine level.
+
+MIR2Vec extends the vocabulary to include:
+
+- **Machine Opcodes**: Target-specific instruction opcodes derived from the
+ TargetInstrInfo, grouped by instruction semantics.
+
+- **Common Operands**: All common operand types (excluding register operands),
+ defined by the ``MachineOperand::MachineOperandType`` enum.
+
+- **Physical Register Classes**: Register classes defined by the target,
+ specialized for physical registers.
+
+- **Virtual Register Classes**: Register classes defined by the target,
+ specialized for virtual registers.
+
+The core components are:
+
+- **Vocabulary**: A mapping from machine IR entities (opcodes, operands, register
+ classes) to their vector representations. This is managed by
+ ``MIR2VecVocabLegacyAnalysis`` for the legacy pass manager, with a
+ ``MIR2VecVocabProvider`` that can be used standalone or wrapped by pass
+ managers. The vocabulary (.json file) contains sections for opcodes, common
+ operands, physical register classes, and virtual register classes.
+
+ .. note::
+
+ The vocabulary file should contain these sections for it to be valid.
+
+- **Embedder**: A class (``mir2vec::MIREmbedder``) that uses the vocabulary to
+ compute embeddings for machine instructions, machine basic blocks, and
+ machine functions. Currently, ``SymbolicMIREmbedder`` is the available
+ implementation.
+
+Using MIR2Vec
+^^^^^^^^^^^^^
+
+.. note::
+
+ This section describes how to use MIR2Vec within LLVM passes. `llvm-ir2vec`
+ tool ` :doc:`CommandGuide/llvm-ir2vec` can be used for generating MIR2Vec
+ embeddings from Machine IR files (.mir), which can be useful for generating
+ embeddings outside of compiler passes.
+
+To generate MIR2Vec embeddings in a compiler pass, first obtain the vocabulary,
+then create an embedder instance to compute and access embeddings.
+
+1. **Get the Vocabulary**:
+ In a MachineFunctionPass, get the vocabulary from the analysis:
+
+ .. code-block:: c++
+
+ auto &VocabAnalysis = getAnalysis<MIR2VecVocabLegacyAnalysis>();
+ auto VocabOrErr = VocabAnalysis.getMIR2VecVocabulary(*MF.getFunction().getParent());
+ if (!VocabOrErr) {
+ // Handle error: vocabulary is not available or invalid
+ return;
+ }
+ const mir2vec::MIRVocabulary &Vocabulary = *VocabOrErr;
+
+ Note that ``MIR2VecVocabLegacyAnalysis`` is an immutable pass.
+
+2. **Create Embedder instance**:
+ With the vocabulary, create an embedder for a specific machine function:
+
+ .. code-block:: c++
+
+ // Assuming MF is a MachineFunction&
+ // For example, using MIR2VecKind::Symbolic:
+ std::unique_ptr<mir2vec::MIREmbedder> Emb =
+ mir2vec::MIREmbedder::create(MIR2VecKind::Symbolic, MF, Vocabulary);
+
+
+3. **Compute and Access Embeddings**:
+ Call ``getMFunctionVector()`` to get the embedding for the machine function.
+
+ .. code-block:: c++
+
+ mir2vec::Embedding FuncVector = Emb->getMFunctionVector();
+
+ Currently, ``MIREmbedder`` can generate embeddings at three levels: Machine
+ Instructions, Machine Basic Blocks, and Machine Functions. Appropriate
+ getters are provided to access the embeddings at these levels.
+
+ .. note::
+
+ The validity of the ``MIREmbedder`` instance (and the embeddings it
+ generates) is tied to the machine function it is associated with. If the
+ machine function is modified, the embeddings may become stale and should
+ be recomputed accordingly.
+
+4. **Working with Embeddings:**
+ Embeddings are represented as ``std::vector<double>``. These vectors can be
+ used as features for machine learning models, compute similarity scores
+ between different code snippets, or perform other analyses as needed.
+
+Further Details
+^^^^^^^^^^^^^^^
+
+For more detailed information about the MIR2Vec algorithm, its parameters, and
+advanced usage, please refer to the original paper:
+`RL4ReAl: Reinforcement Learning for Register Allocation <https://doi.org/10.1145/3578360.3580273>`_.
+
+For information about using MIR2Vec tool for generating embeddings from
+Machine IR, see :doc:`CommandGuide/llvm-ir2vec`.
+
+The LLVM source code for ``MIR2Vec`` can be explored to understand the
+implementation details. See ``llvm/include/llvm/CodeGen/MIR2Vec.h`` and
+``llvm/lib/CodeGen/MIR2Vec.cpp``.
+
Building with ML support
========================
diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst
index f057b2d..12b5e3e 100644
--- a/llvm/docs/SourceLevelDebugging.rst
+++ b/llvm/docs/SourceLevelDebugging.rst
@@ -674,7 +674,7 @@ Compiled to LLVM, this function would be represented like this:
ret void, !dbg !24
}
- attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "frame-pointer"="all" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
+ attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "frame-pointer"="all" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "use-soft-float"="false" }
attributes #1 = { nounwind readnone }
!llvm.dbg.cu = !{!0}