diff options
Diffstat (limited to 'llvm/docs/CommandGuide')
-rw-r--r-- | llvm/docs/CommandGuide/lit.rst | 5 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-bcanalyzer.rst | 4 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst | 2 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-exegesis.rst | 2 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-ifs.rst | 2 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-ir2vec.rst | 164 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-locstats.rst | 2 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-mca.rst | 12 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-profdata.rst | 2 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-symbolizer.rst | 4 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/opt.rst | 10 |
11 files changed, 137 insertions, 72 deletions
diff --git a/llvm/docs/CommandGuide/lit.rst b/llvm/docs/CommandGuide/lit.rst index 938b7f9..eb90e95 100644 --- a/llvm/docs/CommandGuide/lit.rst +++ b/llvm/docs/CommandGuide/lit.rst @@ -356,6 +356,11 @@ The timing data is stored in the `test_exec_root` in a file named primary purpose is to suppress an ``XPASS`` result without modifying a test case that uses the ``XFAIL`` directive. +.. option:: --exclude-xfail + + ``XFAIL`` tests won't be run, unless they are listed in the ``--xfail-not`` + (or ``LIT_XFAIL_NOT``) lists. + .. option:: --num-shards M Divide the set of selected tests into ``M`` equal-sized subsets or diff --git a/llvm/docs/CommandGuide/llvm-bcanalyzer.rst b/llvm/docs/CommandGuide/llvm-bcanalyzer.rst index 8f15e03..1e0b581 100644 --- a/llvm/docs/CommandGuide/llvm-bcanalyzer.rst +++ b/llvm/docs/CommandGuide/llvm-bcanalyzer.rst @@ -14,7 +14,7 @@ DESCRIPTION The :program:`llvm-bcanalyzer` command is a small utility for analyzing bitcode files. The tool reads a bitcode file (such as generated with the :program:`llvm-as` tool) and produces a statistical report on the contents of -the bitcode file. The tool can also dump a low level but human readable +the bitcode file. The tool can also dump a low level but human-readable version of the bitcode file. This tool is probably not of much interest or utility except for those working directly with the bitcode file format. Most LLVM users can just ignore this tool. @@ -30,7 +30,7 @@ OPTIONS .. option:: --dump - Causes :program:`llvm-bcanalyzer` to dump the bitcode in a human readable + Causes :program:`llvm-bcanalyzer` to dump the bitcode in a human-readable format. This format is significantly different from LLVM assembly and provides details about the encoding of the bitcode file. diff --git a/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst b/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst index 1264f80..6a4e348 100644 --- a/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst +++ b/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst @@ -14,7 +14,7 @@ DESCRIPTION ----------- :program:`llvm-debuginfo-analyzer` parses debug and text sections in binary object files and prints their contents in a logical view, which -is a human readable representation that closely matches the structure +is a human-readable representation that closely matches the structure of the original user source code. Supported object file formats include ELF, Mach-O, WebAssembly, PDB and COFF. diff --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst index 25e8969..5996026 100644 --- a/llvm/docs/CommandGuide/llvm-exegesis.rst +++ b/llvm/docs/CommandGuide/llvm-exegesis.rst @@ -106,7 +106,7 @@ properly. using the loop repetition mode. :program:`llvm-exegesis` needs to keep track of the current loop iteration within the loop repetition mode in a performant manner (i.e., no memory accesses), and uses a register to do this. This register - has an architecture specific default (e.g., `R8` on X86), but this might conflict + has an architecture-specific default (e.g., `R8` on X86), but this might conflict with some snippets. This annotation allows changing the register to prevent interference between the loop index register and the snippet. diff --git a/llvm/docs/CommandGuide/llvm-ifs.rst b/llvm/docs/CommandGuide/llvm-ifs.rst index 1fe81c2..e3582b3 100644 --- a/llvm/docs/CommandGuide/llvm-ifs.rst +++ b/llvm/docs/CommandGuide/llvm-ifs.rst @@ -11,7 +11,7 @@ SYNOPSIS DESCRIPTION ----------- -:program:`llvm-ifs` is a tool that jointly produces human readable text-based +:program:`llvm-ifs` is a tool that jointly produces human-readable text-based stubs (.ifs files) for shared objects and linkable shared object stubs (.so files) from either ELF shared objects or text-based stubs. The text-based stubs is useful for monitoring ABI changes of the shared object. The linkable diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst index 13fe4996..0c9fb6e 100644 --- a/llvm/docs/CommandGuide/llvm-ir2vec.rst +++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst @@ -6,24 +6,28 @@ llvm-ir2vec - IR2Vec Embedding Generation Tool SYNOPSIS -------- -:program:`llvm-ir2vec` [*options*] *input-file* +:program:`llvm-ir2vec` [*subcommand*] [*options*] DESCRIPTION ----------- :program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It generates IR2Vec embeddings for LLVM IR and supports triplet generation -for vocabulary training. It provides two main operation modes: +for vocabulary training. The tool provides three main subcommands: -1. **Triplet Mode**: Generates triplets (opcode, type, operands) for vocabulary +1. **triplets**: Generates numeric triplets in train2id format for vocabulary training from LLVM IR. -2. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary +2. **entities**: Generates entity mapping files (entity2id.txt) for vocabulary + training. + +3. **embeddings**: Generates IR2Vec embeddings using a trained vocabulary at different granularity levels (instruction, basic block, or function). The tool is designed to facilitate machine learning applications that work with LLVM IR by converting the IR into numerical representations that can be used by -ML models. +ML models. The `triplets` subcommand generates numeric IDs directly instead of string +triplets, streamlining the training data preparation workflow. .. note:: @@ -34,94 +38,130 @@ ML models. OPERATION MODES --------------- -Triplet Generation Mode -~~~~~~~~~~~~~~~~~~~~~~~ +Triplet Generation and Entity Mapping Modes are used for preparing +vocabulary and training data for knowledge graph embeddings. The Embedding Mode +is used for generating embeddings from LLVM IR using a pre-trained vocabulary. + +The Seed Embedding Vocabulary of IR2Vec is trained on a large corpus of LLVM IR +by modeling the relationships between opcodes, types, and operands as a knowledge +graph. For this purpose, Triplet Generation and Entity Mapping Modes generate +triplets and entity mappings in the standard format used for knowledge graph +embedding training (see +<https://github.com/thunlp/OpenKE/tree/OpenKE-PyTorch?tab=readme-ov-file#data-format> +for details). + +See `llvm/utils/mlgo-utils/IR2Vec/generateTriplets.py` for more details on how +these two modes are used to generate the triplets and entity mappings. + +Triplet Generation +~~~~~~~~~~~~~~~~~~ -In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts triplets -consisting of opcodes, types, and operands. These triplets can be used to train -vocabularies for embedding generation. +With the `triplets` subcommand, :program:`llvm-ir2vec` analyzes LLVM IR and extracts +numeric triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets +are generated in the standard format used for knowledge graph embedding training. +The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping +infrastructure, eliminating the need for string-to-ID preprocessing. Usage: .. code-block:: bash - llvm-ir2vec --mode=triplets input.bc -o triplets.txt + llvm-ir2vec triplets input.bc -o triplets_train2id.txt -Embedding Generation Mode -~~~~~~~~~~~~~~~~~~~~~~~~~~ +Entity Mapping Generation +~~~~~~~~~~~~~~~~~~~~~~~~~ -In embedding mode, :program:`llvm-ir2vec` uses a pre-trained vocabulary to +With the `entities` subcommand, :program:`llvm-ir2vec` generates the entity mappings +supported by IR2Vec in the standard format used for knowledge graph embedding +training. This subcommand outputs all supported entities (opcodes, types, and +operands) with their corresponding numeric IDs, and is not specific for an +LLVM IR file. + +Usage: + +.. code-block:: bash + + llvm-ir2vec entities -o entity2id.txt + +Embedding Generation +~~~~~~~~~~~~~~~~~~~~ + +With the `embeddings` subcommand, :program:`llvm-ir2vec` uses a pre-trained vocabulary to generate numerical embeddings for LLVM IR at different levels of granularity. Example Usage: .. code-block:: bash - llvm-ir2vec --mode=embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt + llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt OPTIONS ------- -.. option:: --mode=<mode> +Global options: + +.. option:: -o <filename> + + Specify the output filename. Use ``-`` to write to standard output (default). + +.. option:: --help + + Print a summary of command line options. + +Subcommand-specific options: + +**embeddings** subcommand: - Specify the operation mode. Valid values are: +.. option:: <input-file> - * ``triplets`` - Generate triplets for vocabulary training - * ``embeddings`` - Generate embeddings using trained vocabulary (default) + The input LLVM IR or bitcode file to process. This positional argument is + required for the `embeddings` subcommand. .. option:: --level=<level> - Specify the embedding generation level. Valid values are: + Specify the embedding generation level. Valid values are: - * ``inst`` - Generate instruction-level embeddings - * ``bb`` - Generate basic block-level embeddings - * ``func`` - Generate function-level embeddings (default) + * ``inst`` - Generate instruction-level embeddings + * ``bb`` - Generate basic block-level embeddings + * ``func`` - Generate function-level embeddings (default) .. option:: --function=<name> - Process only the specified function instead of all functions in the module. + Process only the specified function instead of all functions in the module. .. option:: --ir2vec-vocab-path=<path> - Specify the path to the vocabulary file (required for embedding mode). - The vocabulary file should be in JSON format and contain the trained - vocabulary for embedding generation. See `llvm/lib/Analysis/models` - for pre-trained vocabulary files. + Specify the path to the vocabulary file (required for embedding generation). + The vocabulary file should be in JSON format and contain the trained + vocabulary for embedding generation. See `llvm/lib/Analysis/models` + for pre-trained vocabulary files. .. option:: --ir2vec-opc-weight=<weight> - Specify the weight for opcode embeddings (default: 1.0). This controls - the relative importance of instruction opcodes in the final embedding. + Specify the weight for opcode embeddings (default: 1.0). This controls + the relative importance of instruction opcodes in the final embedding. .. option:: --ir2vec-type-weight=<weight> - Specify the weight for type embeddings (default: 0.5). This controls - the relative importance of type information in the final embedding. + Specify the weight for type embeddings (default: 0.5). This controls + the relative importance of type information in the final embedding. .. option:: --ir2vec-arg-weight=<weight> - Specify the weight for argument embeddings (default: 0.2). This controls - the relative importance of operand information in the final embedding. + Specify the weight for argument embeddings (default: 0.2). This controls + the relative importance of operand information in the final embedding. -.. option:: -o <filename> - - Specify the output filename. Use ``-`` to write to standard output (default). -.. option:: --help +**triplets** subcommand: - Print a summary of command line options. +.. option:: <input-file> -.. note:: + The input LLVM IR or bitcode file to process. This positional argument is + required for the `triplets` subcommand. - ``--level``, ``--function``, ``--ir2vec-vocab-path``, ``--ir2vec-opc-weight``, - ``--ir2vec-type-weight``, and ``--ir2vec-arg-weight`` are only used in embedding - mode. These options are ignored in triplet mode. +**entities** subcommand: -INPUT FILE FORMAT ------------------ - -:program:`llvm-ir2vec` accepts LLVM bitcode files (``.bc``) and LLVM IR files -(``.ll``) as input. The input file should contain valid LLVM IR. + No subcommand-specific options. OUTPUT FORMAT ------------- @@ -129,14 +169,34 @@ OUTPUT FORMAT Triplet Mode Output ~~~~~~~~~~~~~~~~~~~ -In triplet mode, the output consists of lines containing space-separated triplets: +In triplet mode, the output consists of numeric triplets in train2id format with +metadata headers. The format includes: + +.. code-block:: text + + MAX_RELATIONS=<max_relations_count> + <head_entity_id> <tail_entity_id> <relation_id> + <head_entity_id> <tail_entity_id> <relation_id> + ... + +Each line after the metadata header represents one instruction relationship, +with numeric IDs for head entity, relation, and tail entity. The metadata +header (MAX_RELATIONS) provides counts for post-processing and training setup. + +Entity Mode Output +~~~~~~~~~~~~~~~~~~ + +In entity mode, the output consists of entity mapping in the format: .. code-block:: text - <opcode> <type> <operand1> <operand2> ... + <total_entities> + <entity_string> <numeric_id> + <entity_string> <numeric_id> + ... -Each line represents the information of one instruction, with the opcode, type, -and operands. +The first line contains the total number of entities, followed by one entity +mapping per line with tab-separated entity string and numeric ID. Embedding Mode Output ~~~~~~~~~~~~~~~~~~~~~ diff --git a/llvm/docs/CommandGuide/llvm-locstats.rst b/llvm/docs/CommandGuide/llvm-locstats.rst index 3186566..7f436c1 100644 --- a/llvm/docs/CommandGuide/llvm-locstats.rst +++ b/llvm/docs/CommandGuide/llvm-locstats.rst @@ -13,7 +13,7 @@ DESCRIPTION :program:`llvm-locstats` works like a wrapper around :program:`llvm-dwarfdump`. It parses :program:`llvm-dwarfdump` statistics regarding debug location by -pretty printing it in a more human readable way. +pretty printing it in a more human-readable way. The line 0% shows the number and the percentage of DIEs with no location information, but the line 100% shows the information for DIEs where there is diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst index bea1931..1daae5d 100644 --- a/llvm/docs/CommandGuide/llvm-mca.rst +++ b/llvm/docs/CommandGuide/llvm-mca.rst @@ -241,7 +241,7 @@ option specifies "``-``", then the output will also be sent to standard output. .. option:: -disable-cb Force usage of the generic CustomBehaviour and InstrPostProcess classes rather - than using the target specific implementation. The generic classes never + than using the target-specific implementation. The generic classes never detect any custom hazards or make any post processing modifications to instructions. @@ -1125,9 +1125,9 @@ CustomBehaviour class can be used in these cases to enforce proper instruction modeling (often by customizing data dependencies and detecting hazards that :program:`llvm-mca` has no way of knowing about). -:program:`llvm-mca` comes with one generic and multiple target specific +:program:`llvm-mca` comes with one generic and multiple target-specific CustomBehaviour classes. The generic class will be used if the ``-disable-cb`` -flag is used or if a target specific CustomBehaviour class doesn't exist for +flag is used or if a target-specific CustomBehaviour class doesn't exist for that target. (The generic class does nothing.) Currently, the CustomBehaviour class is only a part of the in-order pipeline, but there are plans to add it to the out-of-order pipeline in the future. @@ -1141,7 +1141,7 @@ if you don't know the exact number and a value of 0 represents no stall). If you'd like to add a CustomBehaviour class for a target that doesn't already have one, refer to an existing implementation to see how to set it -up. The classes are implemented within the target specific backend (for +up. The classes are implemented within the target-specific backend (for example `/llvm/lib/Target/AMDGPU/MCA/`) so that they can access backend symbols. Instrument Manager @@ -1177,12 +1177,12 @@ classes (MCSubtargetInfo, MCInstrInfo, etc.), please add it to the AND requires unexposed backend symbols or functionality, you can define it in the `/lib/Target/<TargetName>/MCA/` directory. -To enable this target specific View, you will have to use this target's +To enable this target-specific View, you will have to use this target's CustomBehaviour class to override the `CustomBehaviour::getViews()` methods. There are 3 variations of these methods based on where you want your View to appear in the output: `getStartViews()`, `getPostInstrInfoViews()`, and `getEndViews()`. These methods returns a vector of Views so you will want to -return a vector containing all of the target specific Views for the target in +return a vector containing all of the target-specific Views for the target in question. Because these target specific (and backend dependent) Views require the diff --git a/llvm/docs/CommandGuide/llvm-profdata.rst b/llvm/docs/CommandGuide/llvm-profdata.rst index b2c0457..0b1cd02 100644 --- a/llvm/docs/CommandGuide/llvm-profdata.rst +++ b/llvm/docs/CommandGuide/llvm-profdata.rst @@ -338,7 +338,7 @@ OPTIONS Instruct the profile dumper to show profile counts in the text format of the instrumentation-based profile data representation. By default, the profile - information is dumped in a more human readable form (also in text) with + information is dumped in a more human-readable form (also in text) with annotations. .. option:: --topn=<n> diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst index 2da1b24..fb86a69 100644 --- a/llvm/docs/CommandGuide/llvm-symbolizer.rst +++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst @@ -371,7 +371,7 @@ OPTIONS * Prints an address's debug-data discriminator when it is non-zero. One way to produce discriminators is to compile with clang's -fdebug-info-for-profiling. - ``JSON`` style provides a machine readable output in JSON. If addresses are + ``JSON`` style provides a machine-readable output in JSON. If addresses are supplied via stdin, the output JSON will be a series of individual objects. Otherwise, all results will be contained in a single array. @@ -444,7 +444,7 @@ OPTIONS .. option:: --pretty-print, -p - Print human readable output. If :option:`--inlining` is specified, the + Print human-readable output. If :option:`--inlining` is specified, the enclosing scope is prefixed by (inlined by). For JSON output, the option will cause JSON to be indented and split over new lines. Otherwise, the JSON output will be printed in a compact form. diff --git a/llvm/docs/CommandGuide/opt.rst b/llvm/docs/CommandGuide/opt.rst index f067f62..da93b8e 100644 --- a/llvm/docs/CommandGuide/opt.rst +++ b/llvm/docs/CommandGuide/opt.rst @@ -46,12 +46,12 @@ OPTIONS Write output in LLVM intermediate language (instead of bitcode). -.. option:: -{passname} +.. option:: -passes=<string> - :program:`opt` provides the ability to run any of LLVM's optimization or - analysis passes in any order. The :option:`-help` option lists all the passes - available. The order in which the options occur on the command line are the - order in which they are executed (within pass constraints). + A textual (comma-separated) description of the pass pipeline, + e.g., ``-passes="sroa,instcombine"``. See + `invoking opt <../NewPassManager.html#invoking-opt>`_ for more details on the + pass pipeline syntax. .. option:: -strip-debug |