diff options
Diffstat (limited to 'llvm/docs')
-rw-r--r-- | llvm/docs/CodingStandards.rst | 36 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llc.rst | 7 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/lli.rst | 5 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-ir2vec.rst | 95 | ||||
-rw-r--r-- | llvm/docs/Extensions.rst | 29 | ||||
-rw-r--r-- | llvm/docs/GlobalISel/GenericOpcode.rst | 2 | ||||
-rw-r--r-- | llvm/docs/MLGO.rst | 144 | ||||
-rw-r--r-- | llvm/docs/SourceLevelDebugging.rst | 2 |
8 files changed, 265 insertions, 55 deletions
diff --git a/llvm/docs/CodingStandards.rst b/llvm/docs/CodingStandards.rst index 8677d89..63f6663 100644 --- a/llvm/docs/CodingStandards.rst +++ b/llvm/docs/CodingStandards.rst @@ -1692,29 +1692,29 @@ faraway places in the file to tell that the function is local: Don't Use Braces on Simple Single-Statement Bodies of if/else/loop Statements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -When writing the body of an ``if``, ``else``, or for/while loop statement, we -prefer to omit the braces to avoid unnecessary line noise. However, braces -should be used in cases where the omission of braces harms the readability and -maintainability of the code. +When writing the body of an ``if``, ``else``, or ``for``/``while`` loop +statement, we aim to reduce unnecessary line noise. -We consider that readability is harmed when omitting the brace in the presence -of a single statement that is accompanied by a comment (assuming the comment -can't be hoisted above the ``if`` or loop statement, see below). +**Omit braces when:** -Similarly, braces should be used when a single-statement body is complex enough -that it becomes difficult to see where the block containing the following -statement began. An ``if``/``else`` chain or a loop is considered a single -statement for this rule, and this rule applies recursively. +* The body consists of a single **simple** statement. +* The single statement is not preceded by a comment. + (Hoist comments above the control statement if you can.) +* An ``else`` clause, if present, also meets the above criteria (single + simple statement, no associated comments). -This list is not exhaustive. For example, readability is also harmed if an -``if``/``else`` chain does not use braced bodies for either all or none of its -members, or has complex conditionals, deep nesting, etc. The examples below -intend to provide some guidelines. +**Use braces in all other cases, including:** -Maintainability is harmed if the body of an ``if`` ends with a (directly or -indirectly) nested ``if`` statement with no ``else``. Braces on the outer ``if`` -would help to avoid running into a "dangling else" situation. +* Multi-statement bodies +* Single-statement bodies with non-hoistable comments +* Complex single-statement bodies (e.g., deep nesting, complex nested + loops) +* Inconsistent bracing within ``if``/``else if``/``else`` chains (if one + block requires braces, all must) +* ``if`` statements ending with a nested ``if`` lacking an ``else`` (to + prevent "dangling else") +The examples below provide guidelines for these cases: .. code-block:: c++ diff --git a/llvm/docs/CommandGuide/llc.rst b/llvm/docs/CommandGuide/llc.rst index 900649f..cc670f6 100644 --- a/llvm/docs/CommandGuide/llc.rst +++ b/llvm/docs/CommandGuide/llc.rst @@ -125,13 +125,6 @@ End-user Options Enable setting the FP exceptions build attribute not to use exceptions. -.. option:: --enable-unsafe-fp-math - - Enable optimizations that make unsafe assumptions about IEEE math (e.g. that - addition is associative) or may not work for all input ranges. These - optimizations allow the code generator to make use of some instructions which - would otherwise not be usable (such as ``fsin`` on X86). - .. option:: --stats Print statistics recorded by code-generation passes. diff --git a/llvm/docs/CommandGuide/lli.rst b/llvm/docs/CommandGuide/lli.rst index 94c0013..8afe10d 100644 --- a/llvm/docs/CommandGuide/lli.rst +++ b/llvm/docs/CommandGuide/lli.rst @@ -107,11 +107,6 @@ FLOATING POINT OPTIONS Enable optimizations that assume no NAN values. -.. option:: -enable-unsafe-fp-math - - Causes :program:`lli` to enable optimizations that may decrease floating point - precision. - .. option:: -soft-float Causes :program:`lli` to generate software floating point library calls instead of diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst index fc590a6..55fe75d 100644 --- a/llvm/docs/CommandGuide/llvm-ir2vec.rst +++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst @@ -1,5 +1,5 @@ -llvm-ir2vec - IR2Vec Embedding Generation Tool -============================================== +llvm-ir2vec - IR2Vec and MIR2Vec Embedding Generation Tool +=========================================================== .. program:: llvm-ir2vec @@ -11,9 +11,9 @@ SYNOPSIS DESCRIPTION ----------- -:program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It -generates IR2Vec embeddings for LLVM IR and supports triplet generation -for vocabulary training. +:program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec and MIR2Vec. +It generates embeddings for both LLVM IR and Machine IR (MIR) and supports +triplet generation for vocabulary training. The tool provides three main subcommands: @@ -23,23 +23,33 @@ The tool provides three main subcommands: 2. **entities**: Generates entity mapping files (entity2id.txt) for vocabulary training. -3. **embeddings**: Generates IR2Vec embeddings using a trained vocabulary +3. **embeddings**: Generates IR2Vec or MIR2Vec embeddings using a trained vocabulary at different granularity levels (instruction, basic block, or function). +The tool supports two operation modes: + +* **LLVM IR mode** (``--mode=llvm``): Process LLVM IR bitcode files and generate + IR2Vec embeddings +* **Machine IR mode** (``--mode=mir``): Process Machine IR (.mir) files and generate + MIR2Vec embeddings + The tool is designed to facilitate machine learning applications that work with -LLVM IR by converting the IR into numerical representations that can be used by -ML models. The `triplets` subcommand generates numeric IDs directly instead of string -triplets, streamlining the training data preparation workflow. +LLVM IR or Machine IR by converting them into numerical representations that can +be used by ML models. The `triplets` subcommand generates numeric IDs directly +instead of string triplets, streamlining the training data preparation workflow. .. note:: - For information about using IR2Vec programmatically within LLVM passes and - the C++ API, see the `IR2Vec Embeddings <https://llvm.org/docs/MLGO.html#ir2vec-embeddings>`_ + For information about using IR2Vec and MIR2Vec programmatically within LLVM + passes and the C++ API, see the `IR2Vec Embeddings <https://llvm.org/docs/MLGO.html#ir2vec-embeddings>`_ section in the MLGO documentation. OPERATION MODES --------------- +The tool operates in two modes: **LLVM IR mode** and **Machine IR mode**. The mode +is selected using the ``--mode`` option (default: ``llvm``). + Triplet Generation and Entity Mapping Modes are used for preparing vocabulary and training data for knowledge graph embeddings. The Embedding Mode is used for generating embeddings from LLVM IR using a pre-trained vocabulary. @@ -89,18 +99,31 @@ Embedding Generation ~~~~~~~~~~~~~~~~~~~~ With the `embeddings` subcommand, :program:`llvm-ir2vec` uses a pre-trained vocabulary to -generate numerical embeddings for LLVM IR at different levels of granularity. +generate numerical embeddings for LLVM IR or Machine IR at different levels of granularity. + +Example Usage for LLVM IR: + +.. code-block:: bash + + llvm-ir2vec embeddings --mode=llvm --ir2vec-vocab-path=vocab.json --ir2vec-kind=symbolic --level=func input.bc -o embeddings.txt -Example Usage: +Example Usage for Machine IR: .. code-block:: bash - llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --ir2vec-kind=symbolic --level=func input.bc -o embeddings.txt + llvm-ir2vec embeddings --mode=mir --mir2vec-vocab-path=vocab.json --level=func input.mir -o embeddings.txt OPTIONS ------- -Global options: +Common options (applicable to both LLVM IR and Machine IR modes): + +.. option:: --mode=<mode> + + Specify the operation mode. Valid values are: + + * ``llvm`` - Process LLVM IR bitcode files (default) + * ``mir`` - Process Machine IR (.mir) files .. option:: -o <filename> @@ -116,8 +139,8 @@ Subcommand-specific options: .. option:: <input-file> - The input LLVM IR or bitcode file to process. This positional argument is - required for the `embeddings` subcommand. + The input LLVM IR/bitcode file (.ll/.bc) or Machine IR file (.mir) to process. + This positional argument is required for the `embeddings` subcommand. .. option:: --level=<level> @@ -131,6 +154,8 @@ Subcommand-specific options: Process only the specified function instead of all functions in the module. +**IR2Vec-specific options** (for ``--mode=llvm``): + .. option:: --ir2vec-kind=<kind> Specify the kind of IR2Vec embeddings to generate. Valid values are: @@ -143,8 +168,8 @@ Subcommand-specific options: .. option:: --ir2vec-vocab-path=<path> - Specify the path to the vocabulary file (required for embedding generation). - The vocabulary file should be in JSON format and contain the trained + Specify the path to the IR2Vec vocabulary file (required for LLVM IR embedding + generation). The vocabulary file should be in JSON format and contain the trained vocabulary for embedding generation. See `llvm/lib/Analysis/models` for pre-trained vocabulary files. @@ -163,6 +188,35 @@ Subcommand-specific options: Specify the weight for argument embeddings (default: 0.2). This controls the relative importance of operand information in the final embedding. +**MIR2Vec-specific options** (for ``--mode=mir``): + +.. option:: --mir2vec-vocab-path=<path> + + Specify the path to the MIR2Vec vocabulary file (required for Machine IR + embedding generation). The vocabulary file should be in JSON format and + contain the trained vocabulary for embedding generation. + +.. option:: --mir2vec-kind=<kind> + + Specify the kind of MIR2Vec embeddings to generate. Valid values are: + + * ``symbolic`` - Generate symbolic embeddings (default) + +.. option:: --mir2vec-opc-weight=<weight> + + Specify the weight for machine opcode embeddings (default: 1.0). This controls + the relative importance of machine instruction opcodes in the final embedding. + +.. option:: --mir2vec-common-operand-weight=<weight> + + Specify the weight for common operand embeddings (default: 1.0). This controls + the relative importance of common operand types in the final embedding. + +.. option:: --mir2vec-reg-operand-weight=<weight> + + Specify the weight for register operand embeddings (default: 1.0). This controls + the relative importance of register operands in the final embedding. + **triplets** subcommand: @@ -240,3 +294,6 @@ SEE ALSO For more information about the IR2Vec algorithm and approach, see: `IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_. + +For more information about the MIR2Vec algorithm and approach, see: +`RL4ReAl: Reinforcement Learning for Register Allocation <https://doi.org/10.1145/3578360.3580273>`_. diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst index 89a0e80..214323e 100644 --- a/llvm/docs/Extensions.rst +++ b/llvm/docs/Extensions.rst @@ -601,6 +601,35 @@ sees fit (generally the section that would provide the best locality). .. _CFI jump table: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html#forward-edge-cfi-for-indirect-function-calls +``SHT_LLVM_CALL_GRAPH`` Section (Call Graph) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This section is used to store the call graph. It has a type of +``SHT_LLVM_CALL_GRAPH`` (0x6fff4c0f). Details of call graph section layout +are described in :doc:`CallGraphSection`. + +For example: + +.. code-block:: gas + + .section ".llvm.callgraph","",@llvm_call_graph + .byte 0 + .byte 7 + .quad .Lball + .quad 0 + .byte 3 + .quad foo + .quad bar + .quad baz + .byte 3 + .quad 4524972987496481828 + .quad 3498816979441845844 + .quad 8646233951371320954 + +This indicates that ``ball`` calls ``foo``, ``bar`` and ``baz`` directly; +``ball`` indirectly calls functions whose types are ``4524972987496481828``, +``3498816979441845844`` and ``8646233951371320954``. + CodeView-Dependent ------------------ diff --git a/llvm/docs/GlobalISel/GenericOpcode.rst b/llvm/docs/GlobalISel/GenericOpcode.rst index b055327..661a115 100644 --- a/llvm/docs/GlobalISel/GenericOpcode.rst +++ b/llvm/docs/GlobalISel/GenericOpcode.rst @@ -504,7 +504,7 @@ undefined. G_ABDS, G_ABDU ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Compute the absolute difference (signed and unsigned), e.g. abs(x-y). +Compute the absolute difference (signed and unsigned), e.g. trunc(abs(ext(x)-ext(y)). .. code-block:: none diff --git a/llvm/docs/MLGO.rst b/llvm/docs/MLGO.rst index bf3de11..2443835 100644 --- a/llvm/docs/MLGO.rst +++ b/llvm/docs/MLGO.rst @@ -434,8 +434,27 @@ The latter is also used in tests. There is no C++ implementation of a log reader. We do not have a scenario motivating one. -IR2Vec Embeddings -================= +Embeddings +========== + +LLVM provides embedding frameworks to generate vector representations of code +at different abstraction levels. These embeddings capture syntactic, semantic, +and structural properties of the code and can be used as features for machine +learning models in various compiler optimization tasks. + +Two embedding frameworks are available: + +- **IR2Vec**: Generates embeddings for LLVM IR +- **MIR2Vec**: Generates embeddings for Machine IR + +Both frameworks follow a similar architecture with vocabulary-based embedding +generation, where a vocabulary maps code entities to n-dimensional floating +point vectors. These embeddings can be computed at multiple granularity levels +(instruction, basic block, and function) and used for ML-guided compiler +optimizations. + +IR2Vec +------ IR2Vec is a program embedding approach designed specifically for LLVM IR. It is implemented as a function analysis pass in LLVM. The IR2Vec embeddings @@ -466,7 +485,7 @@ The core components are: compute embeddings for instructions, basic blocks, and functions. Using IR2Vec ------------- +^^^^^^^^^^^^ .. note:: @@ -526,7 +545,7 @@ embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance. between different code snippets, or perform other analyses as needed. Further Details ---------------- +^^^^^^^^^^^^^^^ For more detailed information about the IR2Vec algorithm, its parameters, and advanced usage, please refer to the original paper: @@ -538,6 +557,123 @@ triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`. The LLVM source code for ``IR2Vec`` can also be explored to understand the implementation details. +MIR2Vec +------- + +MIR2Vec is an extension of IR2Vec designed specifically for LLVM Machine IR +(MIR). It generates embeddings for machine-level instructions, basic blocks, +and functions. MIR2Vec operates on the target-specific machine representation, +capturing machine instruction semantics including opcodes, operands, and +register information at the machine level. + +MIR2Vec extends the vocabulary to include: + +- **Machine Opcodes**: Target-specific instruction opcodes derived from the + TargetInstrInfo, grouped by instruction semantics. + +- **Common Operands**: All common operand types (excluding register operands), + defined by the ``MachineOperand::MachineOperandType`` enum. + +- **Physical Register Classes**: Register classes defined by the target, + specialized for physical registers. + +- **Virtual Register Classes**: Register classes defined by the target, + specialized for virtual registers. + +The core components are: + +- **Vocabulary**: A mapping from machine IR entities (opcodes, operands, register + classes) to their vector representations. This is managed by + ``MIR2VecVocabLegacyAnalysis`` for the legacy pass manager, with a + ``MIR2VecVocabProvider`` that can be used standalone or wrapped by pass + managers. The vocabulary (.json file) contains sections for opcodes, common + operands, physical register classes, and virtual register classes. + + .. note:: + + The vocabulary file should contain these sections for it to be valid. + +- **Embedder**: A class (``mir2vec::MIREmbedder``) that uses the vocabulary to + compute embeddings for machine instructions, machine basic blocks, and + machine functions. Currently, ``SymbolicMIREmbedder`` is the available + implementation. + +Using MIR2Vec +^^^^^^^^^^^^^ + +.. note:: + + This section describes how to use MIR2Vec within LLVM passes. `llvm-ir2vec` + tool ` :doc:`CommandGuide/llvm-ir2vec` can be used for generating MIR2Vec + embeddings from Machine IR files (.mir), which can be useful for generating + embeddings outside of compiler passes. + +To generate MIR2Vec embeddings in a compiler pass, first obtain the vocabulary, +then create an embedder instance to compute and access embeddings. + +1. **Get the Vocabulary**: + In a MachineFunctionPass, get the vocabulary from the analysis: + + .. code-block:: c++ + + auto &VocabAnalysis = getAnalysis<MIR2VecVocabLegacyAnalysis>(); + auto VocabOrErr = VocabAnalysis.getMIR2VecVocabulary(*MF.getFunction().getParent()); + if (!VocabOrErr) { + // Handle error: vocabulary is not available or invalid + return; + } + const mir2vec::MIRVocabulary &Vocabulary = *VocabOrErr; + + Note that ``MIR2VecVocabLegacyAnalysis`` is an immutable pass. + +2. **Create Embedder instance**: + With the vocabulary, create an embedder for a specific machine function: + + .. code-block:: c++ + + // Assuming MF is a MachineFunction& + // For example, using MIR2VecKind::Symbolic: + std::unique_ptr<mir2vec::MIREmbedder> Emb = + mir2vec::MIREmbedder::create(MIR2VecKind::Symbolic, MF, Vocabulary); + + +3. **Compute and Access Embeddings**: + Call ``getMFunctionVector()`` to get the embedding for the machine function. + + .. code-block:: c++ + + mir2vec::Embedding FuncVector = Emb->getMFunctionVector(); + + Currently, ``MIREmbedder`` can generate embeddings at three levels: Machine + Instructions, Machine Basic Blocks, and Machine Functions. Appropriate + getters are provided to access the embeddings at these levels. + + .. note:: + + The validity of the ``MIREmbedder`` instance (and the embeddings it + generates) is tied to the machine function it is associated with. If the + machine function is modified, the embeddings may become stale and should + be recomputed accordingly. + +4. **Working with Embeddings:** + Embeddings are represented as ``std::vector<double>``. These vectors can be + used as features for machine learning models, compute similarity scores + between different code snippets, or perform other analyses as needed. + +Further Details +^^^^^^^^^^^^^^^ + +For more detailed information about the MIR2Vec algorithm, its parameters, and +advanced usage, please refer to the original paper: +`RL4ReAl: Reinforcement Learning for Register Allocation <https://doi.org/10.1145/3578360.3580273>`_. + +For information about using MIR2Vec tool for generating embeddings from +Machine IR, see :doc:`CommandGuide/llvm-ir2vec`. + +The LLVM source code for ``MIR2Vec`` can be explored to understand the +implementation details. See ``llvm/include/llvm/CodeGen/MIR2Vec.h`` and +``llvm/lib/CodeGen/MIR2Vec.cpp``. + Building with ML support ======================== diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst index f057b2d..12b5e3e 100644 --- a/llvm/docs/SourceLevelDebugging.rst +++ b/llvm/docs/SourceLevelDebugging.rst @@ -674,7 +674,7 @@ Compiled to LLVM, this function would be represented like this: ret void, !dbg !24 } - attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "frame-pointer"="all" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } + attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "frame-pointer"="all" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "use-soft-float"="false" } attributes #1 = { nounwind readnone } !llvm.dbg.cu = !{!0} |