aboutsummaryrefslogtreecommitdiff
path: root/llvm/docs
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs')
-rw-r--r--llvm/docs/CommandGuide/index.rst1
-rw-r--r--llvm/docs/CommandGuide/llvm-ir2vec.rst170
-rw-r--r--llvm/docs/GettingInvolved.rst5
-rw-r--r--llvm/docs/HowToUpdateDebugInfo.rst2
-rw-r--r--llvm/docs/LangRef.rst2
-rw-r--r--llvm/docs/MLGO.rst12
-rw-r--r--llvm/docs/ReleaseNotes.md2
-rw-r--r--llvm/docs/Security.rst76
8 files changed, 237 insertions, 33 deletions
diff --git a/llvm/docs/CommandGuide/index.rst b/llvm/docs/CommandGuide/index.rst
index 88fc1fd..f85f32a 100644
--- a/llvm/docs/CommandGuide/index.rst
+++ b/llvm/docs/CommandGuide/index.rst
@@ -27,6 +27,7 @@ Basic Commands
llvm-dis
llvm-dwarfdump
llvm-dwarfutil
+ llvm-ir2vec
llvm-lib
llvm-libtool-darwin
llvm-link
diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst
new file mode 100644
index 0000000..13fe4996
--- /dev/null
+++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst
@@ -0,0 +1,170 @@
+llvm-ir2vec - IR2Vec Embedding Generation Tool
+==============================================
+
+.. program:: llvm-ir2vec
+
+SYNOPSIS
+--------
+
+:program:`llvm-ir2vec` [*options*] *input-file*
+
+DESCRIPTION
+-----------
+
+:program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It
+generates IR2Vec embeddings for LLVM IR and supports triplet generation
+for vocabulary training. It provides two main operation modes:
+
+1. **Triplet Mode**: Generates triplets (opcode, type, operands) for vocabulary
+ training from LLVM IR.
+
+2. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary
+ at different granularity levels (instruction, basic block, or function).
+
+The tool is designed to facilitate machine learning applications that work with
+LLVM IR by converting the IR into numerical representations that can be used by
+ML models.
+
+.. note::
+
+ For information about using IR2Vec programmatically within LLVM passes and
+ the C++ API, see the `IR2Vec Embeddings <https://llvm.org/docs/MLGO.html#ir2vec-embeddings>`_
+ section in the MLGO documentation.
+
+OPERATION MODES
+---------------
+
+Triplet Generation Mode
+~~~~~~~~~~~~~~~~~~~~~~~
+
+In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts triplets
+consisting of opcodes, types, and operands. These triplets can be used to train
+vocabularies for embedding generation.
+
+Usage:
+
+.. code-block:: bash
+
+ llvm-ir2vec --mode=triplets input.bc -o triplets.txt
+
+Embedding Generation Mode
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In embedding mode, :program:`llvm-ir2vec` uses a pre-trained vocabulary to
+generate numerical embeddings for LLVM IR at different levels of granularity.
+
+Example Usage:
+
+.. code-block:: bash
+
+ llvm-ir2vec --mode=embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt
+
+OPTIONS
+-------
+
+.. option:: --mode=<mode>
+
+ Specify the operation mode. Valid values are:
+
+ * ``triplets`` - Generate triplets for vocabulary training
+ * ``embeddings`` - Generate embeddings using trained vocabulary (default)
+
+.. option:: --level=<level>
+
+ Specify the embedding generation level. Valid values are:
+
+ * ``inst`` - Generate instruction-level embeddings
+ * ``bb`` - Generate basic block-level embeddings
+ * ``func`` - Generate function-level embeddings (default)
+
+.. option:: --function=<name>
+
+ Process only the specified function instead of all functions in the module.
+
+.. option:: --ir2vec-vocab-path=<path>
+
+ Specify the path to the vocabulary file (required for embedding mode).
+ The vocabulary file should be in JSON format and contain the trained
+ vocabulary for embedding generation. See `llvm/lib/Analysis/models`
+ for pre-trained vocabulary files.
+
+.. option:: --ir2vec-opc-weight=<weight>
+
+ Specify the weight for opcode embeddings (default: 1.0). This controls
+ the relative importance of instruction opcodes in the final embedding.
+
+.. option:: --ir2vec-type-weight=<weight>
+
+ Specify the weight for type embeddings (default: 0.5). This controls
+ the relative importance of type information in the final embedding.
+
+.. option:: --ir2vec-arg-weight=<weight>
+
+ Specify the weight for argument embeddings (default: 0.2). This controls
+ the relative importance of operand information in the final embedding.
+
+.. option:: -o <filename>
+
+ Specify the output filename. Use ``-`` to write to standard output (default).
+
+.. option:: --help
+
+ Print a summary of command line options.
+
+.. note::
+
+ ``--level``, ``--function``, ``--ir2vec-vocab-path``, ``--ir2vec-opc-weight``,
+ ``--ir2vec-type-weight``, and ``--ir2vec-arg-weight`` are only used in embedding
+ mode. These options are ignored in triplet mode.
+
+INPUT FILE FORMAT
+-----------------
+
+:program:`llvm-ir2vec` accepts LLVM bitcode files (``.bc``) and LLVM IR files
+(``.ll``) as input. The input file should contain valid LLVM IR.
+
+OUTPUT FORMAT
+-------------
+
+Triplet Mode Output
+~~~~~~~~~~~~~~~~~~~
+
+In triplet mode, the output consists of lines containing space-separated triplets:
+
+.. code-block:: text
+
+ <opcode> <type> <operand1> <operand2> ...
+
+Each line represents the information of one instruction, with the opcode, type,
+and operands.
+
+Embedding Mode Output
+~~~~~~~~~~~~~~~~~~~~~
+
+In embedding mode, the output format depends on the specified level:
+
+* **Function Level**: One embedding vector per function
+* **Basic Block Level**: One embedding vector per basic block, grouped by function
+* **Instruction Level**: One embedding vector per instruction, grouped by basic block and function
+
+Each embedding is represented as a floating point vector.
+
+EXIT STATUS
+-----------
+
+:program:`llvm-ir2vec` returns 0 on success, and a non-zero value on failure.
+
+Common failure cases include:
+
+* Invalid or missing input file
+* Missing or invalid vocabulary file (in embedding mode)
+* Specified function not found in the module
+* Invalid command line options
+
+SEE ALSO
+--------
+
+:doc:`../MLGO`
+
+For more information about the IR2Vec algorithm and approach, see:
+`IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_.
diff --git a/llvm/docs/GettingInvolved.rst b/llvm/docs/GettingInvolved.rst
index dc53072..d87a8bd 100644
--- a/llvm/docs/GettingInvolved.rst
+++ b/llvm/docs/GettingInvolved.rst
@@ -354,11 +354,6 @@ The :doc:`CodeOfConduct` applies to all office hours.
- Every first Friday of the month, 14:00 UK time, for 60 minutes.
- `Google meet <https://meet.google.com/jps-twgq-ivz>`__
- English, Portuguese
- * - Rotating hosts
- - Getting Started, beginner questions, new contributors.
- - Every Tuesday at 2 PM ET (11 AM PT), for 30 minutes.
- - `Google meet <https://meet.google.com/nga-uhpf-bbb>`__
- - English
For event owners, our Discord bot also supports sending automated announcements
of upcoming office hours. Please see the :ref:`discord-bot-event-pings` section
diff --git a/llvm/docs/HowToUpdateDebugInfo.rst b/llvm/docs/HowToUpdateDebugInfo.rst
index abe21c6..915e289 100644
--- a/llvm/docs/HowToUpdateDebugInfo.rst
+++ b/llvm/docs/HowToUpdateDebugInfo.rst
@@ -504,7 +504,7 @@ as follows:
.. code-block:: bash
- $ llvm-original-di-preservation.py sample.json sample.html
+ $ llvm-original-di-preservation.py sample.json --report-file sample.html
Testing of original debug info preservation can be invoked from front-end level
as follows:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 2759e18..371f356 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -4867,7 +4867,7 @@ to be eliminated. This is because '``poison``' is stronger than '``undef``'.
%D = undef
%E = icmp slt %D, 4
- %F = icmp gte %D, 4
+ %F = icmp sge %D, 4
Safe:
%A = undef
diff --git a/llvm/docs/MLGO.rst b/llvm/docs/MLGO.rst
index ed0769b..965a21b 100644
--- a/llvm/docs/MLGO.rst
+++ b/llvm/docs/MLGO.rst
@@ -468,6 +468,13 @@ The core components are:
Using IR2Vec
------------
+.. note::
+
+ This section describes how to use IR2Vec within LLVM passes. A standalone
+ tool :doc:`CommandGuide/llvm-ir2vec` is available for generating the
+ embeddings and triplets from LLVM IR files, which can be useful for
+ training vocabularies and generating embeddings outside of compiler passes.
+
For generating embeddings, first the vocabulary should be obtained. Then, the
embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance.
@@ -524,6 +531,10 @@ Further Details
For more detailed information about the IR2Vec algorithm, its parameters, and
advanced usage, please refer to the original paper:
`IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_.
+
+For information about using IR2Vec tool for generating embeddings and
+triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`.
+
The LLVM source code for ``IR2Vec`` can also be explored to understand the
implementation details.
@@ -595,4 +606,3 @@ optimizations that are currently MLGO-enabled, it may be used as follows:
where the ``name`` is a path fragment. We will expect to find 2 files,
``<name>.in`` (readable, data incoming from the managing process) and
``<name>.out`` (writable, the model runner sends data to the managing process)
-
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 68d653b..5591ac6 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -233,6 +233,8 @@ Changes to the X86 Backend
--------------------------
* `fp128` will now use `*f128` libcalls on 32-bit GNU targets as well.
+* On x86-32, `fp128` and `i128` are now passed with the expected 16-byte stack
+ alignment.
Changes to the OCaml bindings
-----------------------------
diff --git a/llvm/docs/Security.rst b/llvm/docs/Security.rst
index 8f04b65..5cb8d04 100644
--- a/llvm/docs/Security.rst
+++ b/llvm/docs/Security.rst
@@ -157,6 +157,7 @@ Members of the LLVM Security Response Group are expected to:
* Help write and review patches to address security issues.
* Participate in the member nomination and removal processes.
+.. _security-group-discussion-medium:
Discussion Medium
=================
@@ -204,6 +205,10 @@ The LLVM Security Policy may be changed by majority vote of the LLVM Security Re
What is considered a security issue?
====================================
+We define "security-sensitive" to mean that a discovered bug or vulnerability
+may require coordinated disclosure, and therefore should be reported to the LLVM
+Security Response group rather than publishing in the public bug tracker.
+
The LLVM Project has a significant amount of code, and not all of it is
considered security-sensitive. This is particularly true because LLVM is used in
a wide variety of circumstances: there are different threat models, untrusted
@@ -217,31 +222,52 @@ security-sensitive). This requires a rationale, and buy-in from the LLVM
community as for any RFC. In some cases, parts of the codebase could be handled
as security-sensitive but need significant work to get to the stage where that's
manageable. The LLVM community will need to decide whether it wants to invest in
-making these parts of the code securable, and maintain these security
-properties over time. In all cases the LLVM Security Response Group should be consulted,
-since they'll be responding to security issues filed against these parts of the
-codebase.
-
-If you're not sure whether an issue is in-scope for this security process or
-not, err towards assuming that it is. The Security Response Group might agree or disagree
-and will explain its rationale in the report, as well as update this document
-through the above process.
-
-The security-sensitive parts of the LLVM Project currently are the following.
-Note that this list can change over time.
-
-* None are currently defined. Please don't let this stop you from reporting
- issues to the LLVM Security Response Group that you believe are security-sensitive.
-
-The parts of the LLVM Project which are currently treated as non-security
-sensitive are the following. Note that this list can change over time.
-
-* Language front-ends, such as clang, for which a malicious input file can cause
- undesirable behavior. For example, a maliciously crafted C or Rust source file
- can cause arbitrary code to execute in LLVM. These parts of LLVM haven't been
- hardened, and compiling untrusted code usually also includes running utilities
- such as `make` which can more readily perform malicious things.
-
+making these parts of the code securable, and maintain these security properties
+over time. In all cases the LLVM Security Response Group
+`should be consulted <security-group-discussion-medium_>`__, since they'll be
+responding to security issues filed against these parts of the codebase.
+
+The security-sensitive parts of the LLVM Project currently are the following:
+
+* Code generation: most miscompilations are not security sensitive. However, a
+ miscompilation where there are clear indications that it can result in the
+ produced binary becoming significantly easier to exploit could be considered
+ security sensitive, and should be reported to the security response group.
+* Run-time libraries: only parts of the run-time libraries are considered
+ security-sensitive. The parts that are not considered security-sensitive are
+ documented below.
+
+The following parts of the LLVM Project are currently treated as non-security
+sensitive:
+
+* LLVM's language frontends, analyzers, optimizers, and code generators for
+ which a malicious input can cause undesirable behavior. For example, a
+ maliciously crafted C, Rust or bitcode input file can cause arbitrary code to
+ execute in LLVM. These parts of LLVM haven't been hardened, and handling
+ untrusted code usually also includes running utilities such as make which can
+ more readily perform malicious things. For example, vulnerabilities in clang,
+ clangd, or the LLVM optimizer in a JIT caused by untrusted inputs are not
+ security-sensitive.
+* The following parts of the run-time libraries are explicitly not considered
+ security-sensitive:
+
+ * parts of the run-time libraries that are not meant to be included in
+ production binaries. For example, most sanitizers are not considered
+ security-sensitive as they are meant to be used during development only, not
+ in production.
+ * for libc and libc++: if a user calls library functionality in an undefined
+ or otherwise incorrect way, this will most likely not be considered a
+ security issue, unless the libc/libc++ documentation explicitly promises to
+ harden or catch that specific undefined behaviour or incorrect usage.
+ * unwinding and exception handling: the implementations are not hardened
+ against malformed or malicious unwind or exception handling data. This is
+ not considered security sensitive.
+
+Note that both the explicit security-sensitive and explicit non-security
+sensitive lists can change over time. If you're not sure whether an issue is
+in-scope for this security process or not, err towards assuming that it is. The
+Security Response Group might agree or disagree and will explain its rationale
+in the report, as well as update this document through the above process.
.. _CVE process: https://cve.mitre.org
.. _report a vulnerability: https://github.com/llvm/llvm-security-repo/security/advisories/new