diff options
Diffstat (limited to 'llvm/docs')
-rw-r--r-- | llvm/docs/CommandGuide/index.rst | 1 | ||||
-rw-r--r-- | llvm/docs/CommandGuide/llvm-ir2vec.rst | 170 | ||||
-rw-r--r-- | llvm/docs/GettingInvolved.rst | 5 | ||||
-rw-r--r-- | llvm/docs/HowToUpdateDebugInfo.rst | 2 | ||||
-rw-r--r-- | llvm/docs/LangRef.rst | 2 | ||||
-rw-r--r-- | llvm/docs/MLGO.rst | 12 | ||||
-rw-r--r-- | llvm/docs/ReleaseNotes.md | 2 | ||||
-rw-r--r-- | llvm/docs/Security.rst | 76 |
8 files changed, 237 insertions, 33 deletions
diff --git a/llvm/docs/CommandGuide/index.rst b/llvm/docs/CommandGuide/index.rst index 88fc1fd..f85f32a 100644 --- a/llvm/docs/CommandGuide/index.rst +++ b/llvm/docs/CommandGuide/index.rst @@ -27,6 +27,7 @@ Basic Commands llvm-dis llvm-dwarfdump llvm-dwarfutil + llvm-ir2vec llvm-lib llvm-libtool-darwin llvm-link diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst new file mode 100644 index 0000000..13fe4996 --- /dev/null +++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst @@ -0,0 +1,170 @@ +llvm-ir2vec - IR2Vec Embedding Generation Tool +============================================== + +.. program:: llvm-ir2vec + +SYNOPSIS +-------- + +:program:`llvm-ir2vec` [*options*] *input-file* + +DESCRIPTION +----------- + +:program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It +generates IR2Vec embeddings for LLVM IR and supports triplet generation +for vocabulary training. It provides two main operation modes: + +1. **Triplet Mode**: Generates triplets (opcode, type, operands) for vocabulary + training from LLVM IR. + +2. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary + at different granularity levels (instruction, basic block, or function). + +The tool is designed to facilitate machine learning applications that work with +LLVM IR by converting the IR into numerical representations that can be used by +ML models. + +.. note:: + + For information about using IR2Vec programmatically within LLVM passes and + the C++ API, see the `IR2Vec Embeddings <https://llvm.org/docs/MLGO.html#ir2vec-embeddings>`_ + section in the MLGO documentation. + +OPERATION MODES +--------------- + +Triplet Generation Mode +~~~~~~~~~~~~~~~~~~~~~~~ + +In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts triplets +consisting of opcodes, types, and operands. These triplets can be used to train +vocabularies for embedding generation. + +Usage: + +.. code-block:: bash + + llvm-ir2vec --mode=triplets input.bc -o triplets.txt + +Embedding Generation Mode +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In embedding mode, :program:`llvm-ir2vec` uses a pre-trained vocabulary to +generate numerical embeddings for LLVM IR at different levels of granularity. + +Example Usage: + +.. code-block:: bash + + llvm-ir2vec --mode=embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt + +OPTIONS +------- + +.. option:: --mode=<mode> + + Specify the operation mode. Valid values are: + + * ``triplets`` - Generate triplets for vocabulary training + * ``embeddings`` - Generate embeddings using trained vocabulary (default) + +.. option:: --level=<level> + + Specify the embedding generation level. Valid values are: + + * ``inst`` - Generate instruction-level embeddings + * ``bb`` - Generate basic block-level embeddings + * ``func`` - Generate function-level embeddings (default) + +.. option:: --function=<name> + + Process only the specified function instead of all functions in the module. + +.. option:: --ir2vec-vocab-path=<path> + + Specify the path to the vocabulary file (required for embedding mode). + The vocabulary file should be in JSON format and contain the trained + vocabulary for embedding generation. See `llvm/lib/Analysis/models` + for pre-trained vocabulary files. + +.. option:: --ir2vec-opc-weight=<weight> + + Specify the weight for opcode embeddings (default: 1.0). This controls + the relative importance of instruction opcodes in the final embedding. + +.. option:: --ir2vec-type-weight=<weight> + + Specify the weight for type embeddings (default: 0.5). This controls + the relative importance of type information in the final embedding. + +.. option:: --ir2vec-arg-weight=<weight> + + Specify the weight for argument embeddings (default: 0.2). This controls + the relative importance of operand information in the final embedding. + +.. option:: -o <filename> + + Specify the output filename. Use ``-`` to write to standard output (default). + +.. option:: --help + + Print a summary of command line options. + +.. note:: + + ``--level``, ``--function``, ``--ir2vec-vocab-path``, ``--ir2vec-opc-weight``, + ``--ir2vec-type-weight``, and ``--ir2vec-arg-weight`` are only used in embedding + mode. These options are ignored in triplet mode. + +INPUT FILE FORMAT +----------------- + +:program:`llvm-ir2vec` accepts LLVM bitcode files (``.bc``) and LLVM IR files +(``.ll``) as input. The input file should contain valid LLVM IR. + +OUTPUT FORMAT +------------- + +Triplet Mode Output +~~~~~~~~~~~~~~~~~~~ + +In triplet mode, the output consists of lines containing space-separated triplets: + +.. code-block:: text + + <opcode> <type> <operand1> <operand2> ... + +Each line represents the information of one instruction, with the opcode, type, +and operands. + +Embedding Mode Output +~~~~~~~~~~~~~~~~~~~~~ + +In embedding mode, the output format depends on the specified level: + +* **Function Level**: One embedding vector per function +* **Basic Block Level**: One embedding vector per basic block, grouped by function +* **Instruction Level**: One embedding vector per instruction, grouped by basic block and function + +Each embedding is represented as a floating point vector. + +EXIT STATUS +----------- + +:program:`llvm-ir2vec` returns 0 on success, and a non-zero value on failure. + +Common failure cases include: + +* Invalid or missing input file +* Missing or invalid vocabulary file (in embedding mode) +* Specified function not found in the module +* Invalid command line options + +SEE ALSO +-------- + +:doc:`../MLGO` + +For more information about the IR2Vec algorithm and approach, see: +`IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_. diff --git a/llvm/docs/GettingInvolved.rst b/llvm/docs/GettingInvolved.rst index dc53072..d87a8bd 100644 --- a/llvm/docs/GettingInvolved.rst +++ b/llvm/docs/GettingInvolved.rst @@ -354,11 +354,6 @@ The :doc:`CodeOfConduct` applies to all office hours. - Every first Friday of the month, 14:00 UK time, for 60 minutes. - `Google meet <https://meet.google.com/jps-twgq-ivz>`__ - English, Portuguese - * - Rotating hosts - - Getting Started, beginner questions, new contributors. - - Every Tuesday at 2 PM ET (11 AM PT), for 30 minutes. - - `Google meet <https://meet.google.com/nga-uhpf-bbb>`__ - - English For event owners, our Discord bot also supports sending automated announcements of upcoming office hours. Please see the :ref:`discord-bot-event-pings` section diff --git a/llvm/docs/HowToUpdateDebugInfo.rst b/llvm/docs/HowToUpdateDebugInfo.rst index abe21c6..915e289 100644 --- a/llvm/docs/HowToUpdateDebugInfo.rst +++ b/llvm/docs/HowToUpdateDebugInfo.rst @@ -504,7 +504,7 @@ as follows: .. code-block:: bash - $ llvm-original-di-preservation.py sample.json sample.html + $ llvm-original-di-preservation.py sample.json --report-file sample.html Testing of original debug info preservation can be invoked from front-end level as follows: diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 2759e18..371f356 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -4867,7 +4867,7 @@ to be eliminated. This is because '``poison``' is stronger than '``undef``'. %D = undef %E = icmp slt %D, 4 - %F = icmp gte %D, 4 + %F = icmp sge %D, 4 Safe: %A = undef diff --git a/llvm/docs/MLGO.rst b/llvm/docs/MLGO.rst index ed0769b..965a21b 100644 --- a/llvm/docs/MLGO.rst +++ b/llvm/docs/MLGO.rst @@ -468,6 +468,13 @@ The core components are: Using IR2Vec ------------ +.. note:: + + This section describes how to use IR2Vec within LLVM passes. A standalone + tool :doc:`CommandGuide/llvm-ir2vec` is available for generating the + embeddings and triplets from LLVM IR files, which can be useful for + training vocabularies and generating embeddings outside of compiler passes. + For generating embeddings, first the vocabulary should be obtained. Then, the embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance. @@ -524,6 +531,10 @@ Further Details For more detailed information about the IR2Vec algorithm, its parameters, and advanced usage, please refer to the original paper: `IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_. + +For information about using IR2Vec tool for generating embeddings and +triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`. + The LLVM source code for ``IR2Vec`` can also be explored to understand the implementation details. @@ -595,4 +606,3 @@ optimizations that are currently MLGO-enabled, it may be used as follows: where the ``name`` is a path fragment. We will expect to find 2 files, ``<name>.in`` (readable, data incoming from the managing process) and ``<name>.out`` (writable, the model runner sends data to the managing process) - diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md index 68d653b..5591ac6 100644 --- a/llvm/docs/ReleaseNotes.md +++ b/llvm/docs/ReleaseNotes.md @@ -233,6 +233,8 @@ Changes to the X86 Backend -------------------------- * `fp128` will now use `*f128` libcalls on 32-bit GNU targets as well. +* On x86-32, `fp128` and `i128` are now passed with the expected 16-byte stack + alignment. Changes to the OCaml bindings ----------------------------- diff --git a/llvm/docs/Security.rst b/llvm/docs/Security.rst index 8f04b65..5cb8d04 100644 --- a/llvm/docs/Security.rst +++ b/llvm/docs/Security.rst @@ -157,6 +157,7 @@ Members of the LLVM Security Response Group are expected to: * Help write and review patches to address security issues. * Participate in the member nomination and removal processes. +.. _security-group-discussion-medium: Discussion Medium ================= @@ -204,6 +205,10 @@ The LLVM Security Policy may be changed by majority vote of the LLVM Security Re What is considered a security issue? ==================================== +We define "security-sensitive" to mean that a discovered bug or vulnerability +may require coordinated disclosure, and therefore should be reported to the LLVM +Security Response group rather than publishing in the public bug tracker. + The LLVM Project has a significant amount of code, and not all of it is considered security-sensitive. This is particularly true because LLVM is used in a wide variety of circumstances: there are different threat models, untrusted @@ -217,31 +222,52 @@ security-sensitive). This requires a rationale, and buy-in from the LLVM community as for any RFC. In some cases, parts of the codebase could be handled as security-sensitive but need significant work to get to the stage where that's manageable. The LLVM community will need to decide whether it wants to invest in -making these parts of the code securable, and maintain these security -properties over time. In all cases the LLVM Security Response Group should be consulted, -since they'll be responding to security issues filed against these parts of the -codebase. - -If you're not sure whether an issue is in-scope for this security process or -not, err towards assuming that it is. The Security Response Group might agree or disagree -and will explain its rationale in the report, as well as update this document -through the above process. - -The security-sensitive parts of the LLVM Project currently are the following. -Note that this list can change over time. - -* None are currently defined. Please don't let this stop you from reporting - issues to the LLVM Security Response Group that you believe are security-sensitive. - -The parts of the LLVM Project which are currently treated as non-security -sensitive are the following. Note that this list can change over time. - -* Language front-ends, such as clang, for which a malicious input file can cause - undesirable behavior. For example, a maliciously crafted C or Rust source file - can cause arbitrary code to execute in LLVM. These parts of LLVM haven't been - hardened, and compiling untrusted code usually also includes running utilities - such as `make` which can more readily perform malicious things. - +making these parts of the code securable, and maintain these security properties +over time. In all cases the LLVM Security Response Group +`should be consulted <security-group-discussion-medium_>`__, since they'll be +responding to security issues filed against these parts of the codebase. + +The security-sensitive parts of the LLVM Project currently are the following: + +* Code generation: most miscompilations are not security sensitive. However, a + miscompilation where there are clear indications that it can result in the + produced binary becoming significantly easier to exploit could be considered + security sensitive, and should be reported to the security response group. +* Run-time libraries: only parts of the run-time libraries are considered + security-sensitive. The parts that are not considered security-sensitive are + documented below. + +The following parts of the LLVM Project are currently treated as non-security +sensitive: + +* LLVM's language frontends, analyzers, optimizers, and code generators for + which a malicious input can cause undesirable behavior. For example, a + maliciously crafted C, Rust or bitcode input file can cause arbitrary code to + execute in LLVM. These parts of LLVM haven't been hardened, and handling + untrusted code usually also includes running utilities such as make which can + more readily perform malicious things. For example, vulnerabilities in clang, + clangd, or the LLVM optimizer in a JIT caused by untrusted inputs are not + security-sensitive. +* The following parts of the run-time libraries are explicitly not considered + security-sensitive: + + * parts of the run-time libraries that are not meant to be included in + production binaries. For example, most sanitizers are not considered + security-sensitive as they are meant to be used during development only, not + in production. + * for libc and libc++: if a user calls library functionality in an undefined + or otherwise incorrect way, this will most likely not be considered a + security issue, unless the libc/libc++ documentation explicitly promises to + harden or catch that specific undefined behaviour or incorrect usage. + * unwinding and exception handling: the implementations are not hardened + against malformed or malicious unwind or exception handling data. This is + not considered security sensitive. + +Note that both the explicit security-sensitive and explicit non-security +sensitive lists can change over time. If you're not sure whether an issue is +in-scope for this security process or not, err towards assuming that it is. The +Security Response Group might agree or disagree and will explain its rationale +in the report, as well as update this document through the above process. .. _CVE process: https://cve.mitre.org .. _report a vulnerability: https://github.com/llvm/llvm-security-repo/security/advisories/new |