diff options
Diffstat (limited to 'llvm/docs')
-rw-r--r-- | llvm/docs/AMDGPUUsage.rst | 9 | ||||
-rw-r--r-- | llvm/docs/CodingStandards.rst | 36 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-ir2vec.rst | 79 | ||||
-rw-r--r-- | llvm/docs/GettingStarted.rst | 2 | ||||
-rw-r--r-- | llvm/docs/LangRef.rst | 14 | ||||
-rw-r--r-- | llvm/docs/TableGen/ProgRef.rst | 39 |
6 files changed, 133 insertions, 46 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index d13f95b..c3d4833 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -768,6 +768,9 @@ For example: performant than code generated for XNACK replay disabled. + cu-stores TODO On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used. + If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater. + =============== ============================ ================================================== .. _amdgpu-target-id: @@ -5107,7 +5110,9 @@ The fields used by CP for code objects before V3 also match those specified in and must be 0, >454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT _SIZE - 457:455 3 bits Reserved, must be 0. + 455 1 bit USES_CU_STORES GFX12.5: Whether the ``cu-stores`` target attribute is enabled. + If 0, then all stores are ``SCOPE_SE`` or higher. + 457:456 2 bits Reserved, must be 0. 458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9 Reserved, must be 0. GFX10-GFX11 @@ -18188,6 +18193,8 @@ terminated by an ``.end_amdhsa_kernel`` directive. GFX942) ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. + ``.amdhsa_uses_cu_stores`` 0 GFX12.5 Controls USES_CU_STORES in + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific diff --git a/llvm/docs/CodingStandards.rst b/llvm/docs/CodingStandards.rst index 732227b..2dc3d77 100644 --- a/llvm/docs/CodingStandards.rst +++ b/llvm/docs/CodingStandards.rst @@ -1594,20 +1594,25 @@ Restrict Visibility ^^^^^^^^^^^^^^^^^^^ Functions and variables should have the most restricted visibility possible. + For class members, that means using appropriate ``private``, ``protected``, or -``public`` keyword to restrict their access. For non-member functions, variables, -and classes, that means restricting visibility to a single ``.cpp`` file if it's -not referenced outside that file. +``public`` keyword to restrict their access. + +For non-member functions, variables, and classes, that means restricting +visibility to a single ``.cpp`` file if it is not referenced outside that file. Visibility of file-scope non-member variables and functions can be restricted to the current translation unit by using either the ``static`` keyword or an anonymous -namespace. Anonymous namespaces are a great language feature that tells the C++ +namespace. + +Anonymous namespaces are a great language feature that tells the C++ compiler that the contents of the namespace are only visible within the current translation unit, allowing more aggressive optimization and eliminating the -possibility of symbol name collisions. Anonymous namespaces are to C++ as -``static`` is to C functions and global variables. While ``static`` is available -in C++, anonymous namespaces are more general: they can make entire classes -private to a file. +possibility of symbol name collisions. + +Anonymous namespaces are to C++ as ``static`` is to C functions and global +variables. While ``static`` is available in C++, anonymous namespaces are more +general: they can make entire classes private to a file. The problem with anonymous namespaces is that they naturally want to encourage indentation of their body, and they reduce locality of reference: if you see a @@ -1653,10 +1658,17 @@ Avoid putting declarations other than classes into anonymous namespaces: } // namespace -When you are looking at "``runHelper``" in the middle of a large C++ file, -you have no immediate way to tell if this function is local to the file. In -contrast, when the function is marked static, you don't need to cross-reference -faraway places in the file to tell that the function is local. +When you are looking at ``runHelper`` in the middle of a large C++ file, +you have no immediate way to tell if this function is local to the file. + +In contrast, when the function is marked static, you don't need to cross-reference +faraway places in the file to tell that the function is local: + +.. code-block:: c++ + + static void runHelper() { + ... + } Don't Use Braces on Simple Single-Statement Bodies of if/else/loop Statements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst index 13fe4996..d90e0e4 100644 --- a/llvm/docs/CommandGuide/llvm-ir2vec.rst +++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst @@ -13,17 +13,21 @@ DESCRIPTION :program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It generates IR2Vec embeddings for LLVM IR and supports triplet generation -for vocabulary training. It provides two main operation modes: +for vocabulary training. It provides three main operation modes: -1. **Triplet Mode**: Generates triplets (opcode, type, operands) for vocabulary +1. **Triplet Mode**: Generates numeric triplets in train2id format for vocabulary training from LLVM IR. -2. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary +2. **Entity Mode**: Generates entity mapping files (entity2id.txt) for vocabulary + training. + +3. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary at different granularity levels (instruction, basic block, or function). The tool is designed to facilitate machine learning applications that work with LLVM IR by converting the IR into numerical representations that can be used by -ML models. +ML models. The triplet mode generates numeric IDs directly instead of string +triplets, streamlining the training data preparation workflow. .. note:: @@ -34,18 +38,46 @@ ML models. OPERATION MODES --------------- +Triplet Generation and Entity Mapping Modes are used for preparing +vocabulary and training data for knowledge graph embeddings. The Embedding Mode +is used for generating embeddings from LLVM IR using a pre-trained vocabulary. + +The Seed Embedding Vocabulary of IR2Vec is trained on a large corpus of LLVM IR +by modeling the relationships between opcodes, types, and operands as a knowledge +graph. For this purpose, Triplet Generation and Entity Mapping Modes generate +triplets and entity mappings in the standard format used for knowledge graph +embedding training (see +<https://github.com/thunlp/OpenKE/tree/OpenKE-PyTorch?tab=readme-ov-file#data-format> +for details). + Triplet Generation Mode ~~~~~~~~~~~~~~~~~~~~~~~ -In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts triplets -consisting of opcodes, types, and operands. These triplets can be used to train -vocabularies for embedding generation. +In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts numeric +triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets +are generated in the standard format used for knowledge graph embedding training. +The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping +infrastructure, eliminating the need for string-to-ID preprocessing. + +Usage: + +.. code-block:: bash + + llvm-ir2vec --mode=triplets input.bc -o triplets_train2id.txt + +Entity Mapping Generation Mode +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In entity mode, :program:`llvm-ir2vec` generates the entity mappings supported by +IR2Vec in the standard format used for knowledge graph embedding training. This +mode outputs all supported entities (opcodes, types, and operands) with their +corresponding numeric IDs, and is not specific for an LLVM IR file. Usage: .. code-block:: bash - llvm-ir2vec --mode=triplets input.bc -o triplets.txt + llvm-ir2vec --mode=entities -o entity2id.txt Embedding Generation Mode ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -67,6 +99,7 @@ OPTIONS Specify the operation mode. Valid values are: * ``triplets`` - Generate triplets for vocabulary training + * ``entities`` - Generate entity mappings for vocabulary training * ``embeddings`` - Generate embeddings using trained vocabulary (default) .. option:: --level=<level> @@ -115,7 +148,7 @@ OPTIONS ``--level``, ``--function``, ``--ir2vec-vocab-path``, ``--ir2vec-opc-weight``, ``--ir2vec-type-weight``, and ``--ir2vec-arg-weight`` are only used in embedding - mode. These options are ignored in triplet mode. + mode. These options are ignored in triplet and entity modes. INPUT FILE FORMAT ----------------- @@ -129,14 +162,34 @@ OUTPUT FORMAT Triplet Mode Output ~~~~~~~~~~~~~~~~~~~ -In triplet mode, the output consists of lines containing space-separated triplets: +In triplet mode, the output consists of numeric triplets in train2id format with +metadata headers. The format includes: + +.. code-block:: text + + MAX_RELATIONS=<max_relations_count> + <head_entity_id> <tail_entity_id> <relation_id> + <head_entity_id> <tail_entity_id> <relation_id> + ... + +Each line after the metadata header represents one instruction relationship, +with numeric IDs for head entity, relation, and tail entity. The metadata +header (MAX_RELATIONS) provides counts for post-processing and training setup. + +Entity Mode Output +~~~~~~~~~~~~~~~~~~ + +In entity mode, the output consists of entity mapping in the format: .. code-block:: text - <opcode> <type> <operand1> <operand2> ... + <total_entities> + <entity_string> <numeric_id> + <entity_string> <numeric_id> + ... -Each line represents the information of one instruction, with the opcode, type, -and operands. +The first line contains the total number of entities, followed by one entity +mapping per line with tab-separated entity string and numeric ID. Embedding Mode Output ~~~~~~~~~~~~~~~~~~~~~ diff --git a/llvm/docs/GettingStarted.rst b/llvm/docs/GettingStarted.rst index 3036dae..e4dbb64b 100644 --- a/llvm/docs/GettingStarted.rst +++ b/llvm/docs/GettingStarted.rst @@ -240,8 +240,10 @@ Linux x86\ :sup:`1` GCC, Clang Linux amd64 GCC, Clang Linux ARM GCC, Clang Linux AArch64 GCC, Clang +Linux LoongArch GCC, Clang Linux Mips GCC, Clang Linux PowerPC GCC, Clang +Linux RISC-V GCC, Clang Linux SystemZ GCC, Clang Solaris V9 (Ultrasparc) GCC DragonFlyBSD amd64 GCC, Clang diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index bac13cc..eb2ef6b 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -26653,9 +26653,10 @@ object's lifetime. Arguments: """""""""" -The first argument is a constant integer representing the size of the -object, or -1 if it is variable sized. The second argument is a pointer -to an ``alloca`` instruction. +The first argument is a constant integer, which is ignored and will be removed +in the future. + +The second argument is a pointer to an ``alloca`` instruction. Semantics: """""""""" @@ -26693,9 +26694,10 @@ The '``llvm.lifetime.end``' intrinsic specifies the end of a Arguments: """""""""" -The first argument is a constant integer representing the size of the -object, or -1 if it is variable sized. The second argument is a pointer -to an ``alloca`` instruction. +The first argument is a constant integer, which is ignored and will be removed +in the future. + +The second argument is a pointer to an ``alloca`` instruction. Semantics: """""""""" diff --git a/llvm/docs/TableGen/ProgRef.rst b/llvm/docs/TableGen/ProgRef.rst index 7b30698..2b1af05 100644 --- a/llvm/docs/TableGen/ProgRef.rst +++ b/llvm/docs/TableGen/ProgRef.rst @@ -219,17 +219,17 @@ TableGen provides "bang operators" that have a wide variety of uses: .. productionlist:: BangOperator: one of - : !add !and !cast !con !dag - : !div !empty !eq !exists !filter - : !find !foldl !foreach !ge !getdagarg - : !getdagname !getdagop !gt !head !if - : !initialized !instances !interleave !isa !le - : !listconcat !listflatten !listremove !listsplat !logtwo - : !lt !match !mul !ne !not - : !or !range !repr !setdagarg !setdagname - : !setdagop !shl !size !sra !srl - : !strconcat !sub !subst !substr !tail - : !tolower !toupper !xor + : !add !and !cast !con !dag + : !div !empty !eq !exists !filter + : !find !foldl !foreach !ge !getdagarg + : !getdagname !getdagop !getdagopname !gt !head + : !if !initialized !instances !interleave !isa + : !le !listconcat !listflatten !listremove !listsplat + : !logtwo !lt !match !mul !ne + : !not !or !range !repr !setdagarg + : !setdagname !setdagop !setdagopname !shl !size + : !sra !srl !strconcat !sub !subst + : !substr !tail !tolower !toupper !xor The ``!cond`` operator has a slightly different syntax compared to other bang operators, so it is defined separately: @@ -1443,7 +1443,8 @@ DAG. The following bang operators are useful for working with DAGs: ``!con``, ``!dag``, ``!empty``, ``!foreach``, ``!getdagarg``, ``!getdagname``, -``!getdagop``, ``!setdagarg``, ``!setdagname``, ``!setdagop``, ``!size``. +``!getdagop``, ``!getdagopname``, ``!setdagarg``, ``!setdagname``, ``!setdagop``, +``!setdagopname``, ``!size``. Defvar in a record body ----------------------- @@ -1695,9 +1696,11 @@ and non-0 as true. This operator concatenates the DAG nodes *a*, *b*, etc. Their operations must equal. - ``!con((op a1:$name1, a2:$name2), (op b1:$name3))`` + ``!con((op:$lhs a1:$name1, a2:$name2), (op:$rhs b1:$name3))`` - results in the DAG node ``(op a1:$name1, a2:$name2, b1:$name3)``. + results in the DAG node ``(op:$lhs a1:$name1, a2:$name2, b1:$name3)``. + The name of the dag operator is derived from the LHS DAG node if it is + set, otherwise from the RHS DAG node. ``!cond(``\ *cond1* ``:`` *val1*\ ``,`` *cond2* ``:`` *val2*\ ``, ...,`` *condn* ``:`` *valn*\ ``)`` This operator tests *cond1* and returns *val1* if the result is true. @@ -1819,6 +1822,10 @@ and non-0 as true. dag d = !dag(!getdagop(someDag), args, names); +``!getdagopname(``\ *dag*\ ``)`` + This operator retrieves the name of the given *dag* operator. If the operator + has no name associated, ``?`` is returned. + ``!gt(``\ *a*\ `,` *b*\ ``)`` This operator produces 1 if *a* is greater than *b*; 0 otherwise. The arguments must be ``bit``, ``bits``, ``int``, or ``string`` values. @@ -1949,6 +1956,10 @@ and non-0 as true. Example: ``!setdagop((foo 1, 2), bar)`` results in ``(bar 1, 2)``. +``!setdagopname(``\ *dag*\ ``,``\ *name*\ ``)`` + This operator produces a DAG node with the same operator and arguments as + *dag*, but replacing the name of the operator with *name*. + ``!shl(``\ *a*\ ``,`` *count*\ ``)`` This operator shifts *a* left logically by *count* bits and produces the resulting value. The operation is performed on a 64-bit integer; the result |