aboutsummaryrefslogtreecommitdiff
path: root/llvm/docs
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs')
-rw-r--r--llvm/docs/AMDGPUUsage.rst59
-rw-r--r--llvm/docs/CIBestPractices.rst28
-rw-r--r--llvm/docs/CommandGuide/llvm-objdump.rst26
-rw-r--r--llvm/docs/DirectX/RootSignatures.rst245
-rw-r--r--llvm/docs/DirectXUsage.rst1
-rw-r--r--llvm/docs/GettingStarted.rst42
-rw-r--r--llvm/docs/LangRef.rst106
-rw-r--r--llvm/docs/NVPTXUsage.rst161
-rw-r--r--llvm/docs/ProgrammersManual.rst38
-rw-r--r--llvm/docs/ReleaseNotes.md198
-rw-r--r--llvm/docs/TestingGuide.rst4
-rw-r--r--llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl04.rst2
12 files changed, 575 insertions, 335 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index c5b9bd9..d13f95b 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -677,7 +677,7 @@ the device used to execute the code match the features enabled when
generating the code. A mismatch of features may result in incorrect
execution, or a reduction in performance.
-The target features supported by each processor is listed in
+The target features supported by each processor are listed in
:ref:`amdgpu-processors`.
Target features are controlled by exactly one of the following Clang
@@ -783,7 +783,7 @@ description. The AMDGPU target specific information is:
Is an AMDGPU processor or alternative processor name specified in
:ref:`amdgpu-processor-table`. The non-canonical form target ID allows both
the primary processor and alternative processor names. The canonical form
- target ID only allow the primary processor name.
+ target ID only allows the primary processor name.
**target-feature**
Is a target feature name specified in :ref:`amdgpu-target-features-table` that
@@ -793,7 +793,7 @@ description. The AMDGPU target specific information is:
``--offload-arch``. Each target feature must appear at most once in a target
ID. The non-canonical form target ID allows the target features to be
specified in any order. The canonical form target ID requires the target
- features to be specified in alphabetic order.
+ features to be specified in alphabetical order.
.. _amdgpu-target-id-v2-v3:
@@ -886,7 +886,7 @@ supported for the ``amdgcn`` target.
setup (see :ref:`amdgpu-amdhsa-kernel-prolog-m0`).
To convert between a private or group address space address (termed a segment
- address) and a flat address the base address of the corresponding aperture
+ address) and a flat address, the base address of the corresponding aperture
can be used. For GFX7-GFX8 these are available in the
:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
@@ -1186,7 +1186,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
:ref:`llvm.stackrestore.p5 <int_stackrestore>` Implemented, must use the alloca address space.
:ref:`llvm.get.fpmode.i32 <int_get_fpmode>` The natural floating-point mode type is i32. This
- implemented by extracting relevant bits out of the MODE
+ is implemented by extracting relevant bits out of the MODE
register with s_getreg_b32. The first 10 bits are the
core floating-point mode. Bits 12:18 are the exception
mask. On gfx9+, bit 23 is FP16_OVFL. Bitfields not
@@ -1266,14 +1266,14 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
llvm.amdgcn.permlane16 Provides direct access to v_permlane16_b32. Performs arbitrary gather-style
operation within a row (16 contiguous lanes) of the second input operand.
- The third and fourth inputs must be scalar values. these are combined into
+ The third and fourth inputs must be scalar values. These are combined into
a single 64-bit value representing lane selects used to swizzle within each
row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>,
<2 x half>, <2 x bfloat>, i64, double, pointers, multiples of the 32-bit vectors.
llvm.amdgcn.permlanex16 Provides direct access to v_permlanex16_b32. Performs arbitrary gather-style
operation across two rows of the second input operand (each row is 16 contiguous
- lanes). The third and fourth inputs must be scalar values. these are combined
+ lanes). The third and fourth inputs must be scalar values. These are combined
into a single 64-bit value representing lane selects used to swizzle within each
row. Currently implemented for i16, i32, float, half, bfloat, <2 x i16>, <2 x half>,
<2 x bfloat>, i64, double, pointers, multiples of the 32-bit vectors.
@@ -1285,31 +1285,31 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
32-bit vectors.
llvm.amdgcn.udot2 Provides direct access to v_dot2_u32_u16 across targets which
- support such instructions. This performs unsigned dot product
+ support such instructions. This performs an unsigned dot product
with two v2i16 operands, summed with the third i32 operand. The
i1 fourth operand is used to clamp the output.
llvm.amdgcn.udot4 Provides direct access to v_dot4_u32_u8 across targets which
- support such instructions. This performs unsigned dot product
+ support such instructions. This performs an unsigned dot product
with two i32 operands (holding a vector of 4 8bit values), summed
with the third i32 operand. The i1 fourth operand is used to clamp
the output.
llvm.amdgcn.udot8 Provides direct access to v_dot8_u32_u4 across targets which
- support such instructions. This performs unsigned dot product
+ support such instructions. This performs an unsigned dot product
with two i32 operands (holding a vector of 8 4bit values), summed
with the third i32 operand. The i1 fourth operand is used to clamp
the output.
llvm.amdgcn.sdot2 Provides direct access to v_dot2_i32_i16 across targets which
- support such instructions. This performs signed dot product
+ support such instructions. This performs a signed dot product
with two v2i16 operands, summed with the third i32 operand. The
i1 fourth operand is used to clamp the output.
When applicable (e.g. no clamping), this is lowered into
v_dot2c_i32_i16 for targets which support it.
llvm.amdgcn.sdot4 Provides direct access to v_dot4_i32_i8 across targets which
- support such instructions. This performs signed dot product
+ support such instructions. This performs a signed dot product
with two i32 operands (holding a vector of 4 8bit values), summed
with the third i32 operand. The i1 fourth operand is used to clamp
the output.
@@ -1321,7 +1321,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
of this instruction for gfx11 targets.
llvm.amdgcn.sdot8 Provides direct access to v_dot8_u32_u4 across targets which
- support such instructions. This performs signed dot product
+ support such instructions. This performs a signed dot product
with two i32 operands (holding a vector of 8 4bit values), summed
with the third i32 operand. The i1 fourth operand is used to clamp
the output.
@@ -1401,7 +1401,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
llvm.amdgcn.atomic.cond.sub.u32 Provides direct access to flat_atomic_cond_sub_u32, global_atomic_cond_sub_u32
and ds_cond_sub_u32 based on address space on gfx12 targets. This
- performs subtraction only if the memory value is greater than or
+ performs a subtraction only if the memory value is greater than or
equal to the data value.
llvm.amdgcn.s.barrier.signal.isfirst Provides access to the s_barrier_signal_first instruction;
@@ -1646,7 +1646,7 @@ The AMDGPU backend supports the following LLVM IR attributes.
llvm.amdgcn.queue.ptr intrinsic. Note that unlike the other ABI hint
attributes, the queue pointer may be required in situations where the
intrinsic call does not directly appear in the program. Some subtargets
- require the queue pointer for to handle some addrspacecasts, as well
+ require the queue pointer to handle some addrspacecasts, as well
as the llvm.amdgcn.is.shared, llvm.amdgcn.is.private, llvm.trap, and
llvm.debug intrinsics.
@@ -1844,6 +1844,20 @@ The AMDGPU backend supports the following calling conventions:
..TODO::
Describe.
+ ``amdgpu_gfx_whole_wave`` Used for AMD graphics targets. Functions with this calling convention
+ cannot be used as entry points. They must have an i1 as the first argument,
+ which will be mapped to the value of EXEC on entry into the function. Other
+ arguments will contain poison in their inactive lanes. Similarly, the return
+ value for the inactive lanes is poison.
+
+ The function will run with all lanes enabled, i.e. EXEC will be set to -1 in the
+ prologue and restored to its original value in the epilogue. The inactive lanes
+ will be preserved for all the registers used by the function. Active lanes only
+ will only be preserved for the callee saved registers.
+
+ In all other respects, functions with this calling convention behave like
+ ``amdgpu_gfx`` functions.
+
``amdgpu_gs`` Used for Mesa/AMDPAL geometry shaders.
..TODO::
Describe.
@@ -1933,7 +1947,7 @@ The following describes all emitted function resource usage symbols:
callees, contains an indirect call
===================================== ========= ========================================= ===============================================================================
-Futhermore, three symbols are additionally emitted describing the compilation
+Furthermore, three symbols are additionally emitted describing the compilation
unit's worst case (i.e, maxima) ``num_vgpr``, ``num_agpr``, and
``numbered_sgpr`` which may be referenced and used by the aforementioned
symbolic expressions. These three symbols are ``amdgcn.max_num_vgpr``,
@@ -6344,10 +6358,13 @@ also have to wait on all global memory operations, which is unnecessary.
:doc:`Memory Model Relaxation Annotations <MemoryModelRelaxationAnnotations>` can
be used as an optimization hint for fences to solve this problem.
-The AMDGPU backend recognizes the following tags on fences:
+The AMDGPU backend recognizes the following tags on fences to control which address
+space a fence can synchronize:
+
+- ``amdgpu-synchronize-as:local`` - for the local address space
+- ``amdgpu-synchronize-as:global``- for the global address space
-- ``amdgpu-as:local`` - fence only the local address space
-- ``amdgpu-as:global``- fence only the global address space
+Multiple tags can be used at the same time to synchronize with more than one address space.
.. note::
@@ -17934,7 +17951,7 @@ set architecture (ISA) version of the assembly program.
"AMD" and *arch* should always be equal to "AMDGPU".
By default, the assembler will derive the ISA version, *vendor*, and *arch*
-from the value of the -mcpu option that is passed to the assembler.
+from the value of the ``-mcpu`` option that is passed to the assembler.
.. _amdgpu-amdhsa-assembler-directive-amdgpu_hsa_kernel:
@@ -17958,7 +17975,7 @@ default value for all keys is 0, with the following exceptions:
- *amd_kernel_code_version_minor* defaults to 2.
- *amd_machine_kind* defaults to 1.
- *amd_machine_version_major*, *machine_version_minor*, and
- *amd_machine_version_stepping* are derived from the value of the -mcpu option
+ *amd_machine_version_stepping* are derived from the value of the ``-mcpu`` option
that is passed to the assembler.
- *kernel_code_entry_byte_offset* defaults to 256.
- *wavefront_size* defaults 6 for all targets before GFX10. For GFX10 onwards
diff --git a/llvm/docs/CIBestPractices.rst b/llvm/docs/CIBestPractices.rst
index 71fdd12..8301b95 100644
--- a/llvm/docs/CIBestPractices.rst
+++ b/llvm/docs/CIBestPractices.rst
@@ -108,3 +108,31 @@ If specific jobs within the workflow need additional permissions, those
permissions should be added within the specific job. This practice locks down
all permissions by default and only enables them when needed, better enforcing
the principle of least privilege.
+
+Ensuring Workflows Run on the Correct Events
+--------------------------------------------
+
+Github allows workflows to run on a multitude of events and it is important to
+configure a workflow such that it triggers on the correct events. There are
+two main best practices around events that trigger workflows:
+
+1. Workflows that are designed to run on pull requests should not be
+restricted by target branch. Restricting the target branch unnecessarily
+will prevent any stacked PRs from being tested. ``pull_request`` events should
+not contain a branches key.
+
+2. Workflows that are designed to also trigger on push events (e.g., for
+testing on ``main`` or one of the release branches) need to be restricted by
+branch. While pushes to a fork will not trigger a workflow run due to the
+``push`` event if the workflow already has its jobs disabled in forks
+(described above), stacked PRs will end up running jobs twice if the ``push``
+event does not have any branch restrictions. ``push`` events should have
+their branches restricted at the very least to ``main`` and the release
+branches as follows:
+
+.. code-block:: yaml
+
+ push:
+ branches:
+ - main
+ - releases/*
diff --git a/llvm/docs/CommandGuide/llvm-objdump.rst b/llvm/docs/CommandGuide/llvm-objdump.rst
index c9f0379..aaf38f8 100644
--- a/llvm/docs/CommandGuide/llvm-objdump.rst
+++ b/llvm/docs/CommandGuide/llvm-objdump.rst
@@ -140,23 +140,29 @@ OPTIONS
debug information for stripped binaries. Multiple instances of this argument
are searched in the order given.
-.. option:: --debuginfod, --no-debuginfod
+.. option:: --debug-indent=<width>
- Whether or not to try debuginfod lookups for debug binaries. Unless specified,
- debuginfod is only enabled if libcurl was compiled in (``LLVM_ENABLE_CURL``)
- and at least one server URL was provided by the environment variable
- ``DEBUGINFOD_URLS``.
+ Distance to indent the source-level variable or inlined function display,
+ relative to the start of the disassembly. Defaults to 52 characters.
+
+.. option:: --debug-inlined-funcs[=<format>]
-.. option:: --debug-vars=<format>
+ Print the locations of inlined functions alongside disassembly.
+ ``format`` may be ``ascii``, ``limits-only``, or ``unicode``, defaulting to
+ ``unicode`` if omitted.
+
+.. option:: --debug-vars[=<format>]
Print the locations (in registers or memory) of source-level variables
- alongside disassembly. ``format`` may be ``unicode`` or ``ascii``, defaulting
+ alongside disassembly. ``format`` may be ``ascii`` or ``unicode``, defaulting
to ``unicode`` if omitted.
-.. option:: --debug-vars-indent=<width>
+.. option:: --debuginfod, --no-debuginfod
- Distance to indent the source-level variable display, relative to the start
- of the disassembly. Defaults to 52 characters.
+ Whether or not to try debuginfod lookups for debug binaries. Unless specified,
+ debuginfod is only enabled if libcurl was compiled in (``LLVM_ENABLE_CURL``)
+ and at least one server URL was provided by the environment variable
+ ``DEBUGINFOD_URLS``.
.. option:: -j, --section=<section1[,section2,...]>
diff --git a/llvm/docs/DirectX/RootSignatures.rst b/llvm/docs/DirectX/RootSignatures.rst
new file mode 100644
index 0000000..e328b4a
--- /dev/null
+++ b/llvm/docs/DirectX/RootSignatures.rst
@@ -0,0 +1,245 @@
+===============
+Root Signatures
+===============
+
+.. contents::
+ :local:
+
+.. toctree::
+ :hidden:
+
+Overview
+========
+
+A root signature is used to describe what resources a shader needs access to
+and how they're organized and bound in the pipeline. The DirectX Container
+(DXContainer) contains a root signature part (RTS0), which stores this
+information in a binary format. To assist with the construction of, and
+interaction with, a root signature is represented as metadata
+(``dx.rootsignatures`` ) in the LLVM IR. The metadata can then be converted to
+its binary form, as defined in
+`llvm/include/llvm/llvm/Frontend/HLSL/RootSignatureMetadata.h
+<https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Frontend/HLSL/RootSignatureMetadata.h>`_.
+This document serves as a reference for the metadata representation of a root
+signature for users to interface with.
+
+Metadata Representation
+=======================
+
+Consider the reference root signature, then the following sections describe the
+metadata representation of this root signature and the corresponding operands.
+
+.. code-block:: HLSL
+
+ RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT),
+ RootConstants(b0, space = 1, num32Constants = 3),
+ CBV(b1, flags = 0),
+ StaticSampler(
+ filter = FILTER_MIN_MAG_POINT_MIP_LINEAR,
+ addressU = TEXTURE_ADDRESS_BORDER,
+ ),
+ DescriptorTable(
+ visibility = VISIBILITY_ALL,
+ SRV(t0, flags = DATA_STATIC_WHILE_SET_AT_EXECUTE),
+ UAV(
+ numDescriptors = 5, u1, space = 10, offset = 5,
+ flags = DATA_VOLATILE
+ )
+ )
+
+.. note::
+
+ A root signature does not necessarily have a unique metadata representation.
+ Futher, a malformed root signature can be represented in the metadata format,
+ (eg. mixing Sampler and non-Sampler descriptor ranges), and so it is the
+ user's responsibility to verify that it is a well-formed root signature.
+
+Named Root Signature Table
+==========================
+
+.. code-block:: LLVM
+
+ !dx.rootsignatures = !{!0}
+
+A named metadata node, ``dx.rootsignatures``` is used to identify the root
+signature table. The table itself is a list of references to function/root
+signature pairs.
+
+Function/Root Signature Pair
+============================
+
+.. code-block:: LLVM
+
+ !1 = !{ptr @main, !2, i32 2 }
+
+The function/root signature associates a function (the first operand) with a
+reference to a root signature (the second operand). The root signature version
+(the third operand) used for validation logic and binary format follows.
+
+Root Signature
+==============
+
+.. code-block:: LLVM
+
+ !2 = !{ !3, !4, !5, !6, !7 }
+
+The root signature itself simply consists of a list of references to its root
+signature elements.
+
+Root Signature Element
+======================
+
+A root signature element is identified by the first operand, which is a string.
+The following root signature elements are defined:
+
+================= ======================
+Identifier String Root Signature Element
+================= ======================
+"RootFlags" Root Flags
+"RootConstants" Root Constants
+"RootCBV" Root Descriptor
+"RootSRV" Root Descriptor
+"RootUAV" Root Descriptor
+"StaticSampler" Static Sampler
+"DescriptorTable" Descriptor Table
+================= ======================
+
+Below is listed the representation for each type of root signature element.
+
+Root Flags
+==========
+
+.. code-block:: LLVM
+
+ !3 = { !"RootFlags", i32 1 }
+
+======================= ====
+Description Type
+======================= ====
+`Root Signature Flags`_ i32
+======================= ====
+
+.. _Root Signature Flags: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_root_signature_flags
+
+Root Constants
+==============
+
+.. code-block:: LLVM
+
+ !4 = { !"RootConstants", i32 0, i32 1, i32 2, i32 3 }
+
+==================== ====
+Description Type
+==================== ====
+`Shader Visibility`_ i32
+Shader Register i32
+Register Space i32
+Number 32-bit Values i32
+==================== ====
+
+.. _Shader Visibility: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_shader_visibility
+
+Root Descriptor
+===============
+
+As noted in the table above, the first operand will denote the type of
+root descriptor.
+
+.. code-block:: LLVM
+
+ !5 = { !"RootCBV", i32 0, i32 1, i32 0, i32 0 }
+
+======================== ====
+Description Type
+======================== ====
+`Shader Visibility`_ i32
+Shader Register i32
+Register Space i32
+`Root Descriptor Flags`_ i32
+======================== ====
+
+.. _Root Descriptor Flags: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_root_descriptor_flags
+
+Static Sampler
+==============
+
+.. code-block:: LLVM
+
+ !6 = !{ !"StaticSampler", i32 1, i32 4, ... }; remaining operands omitted for space
+
+==================== =====
+Description Type
+==================== =====
+`Filter`_ i32
+`AddressU`_ i32
+`AddressV`_ i32
+`AddressW`_ i32
+MipLODBias float
+MaxAnisotropy i32
+`ComparisonFunc`_ i32
+`BorderColor`_ i32
+MinLOD float
+MaxLOD float
+ShaderRegister i32
+RegisterSpace i32
+`Shader Visibility`_ i32
+==================== =====
+
+.. _Filter: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_filter
+.. _AddressU: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_texture_address_mode
+.. _AddressV: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_texture_address_mode
+.. _AddressW: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_texture_address_mode
+.. _ComparisonFunc: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_comparison_func>
+.. _BorderColor: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_static_border_color>
+
+Descriptor Table
+================
+
+A descriptor table consists of a visibility and the remaining operands are a
+list of references to its descriptor ranges.
+
+.. note::
+
+ The term Descriptor Table Clause is synonymous with Descriptor Range when
+ referencing the implementation details.
+
+.. code-block:: LLVM
+
+ !7 = { !"DescriptorTable", i32 0, !8, !9 }
+
+========================= ================
+Description Type
+========================= ================
+`Shader Visibility`_ i32
+Descriptor Range Elements Descriptor Range
+========================= ================
+
+
+Descriptor Range
+================
+
+Similar to a root descriptor, the first operand will denote the type of
+descriptor range. It is one of the following types:
+
+- "CBV"
+- "SRV"
+- "UAV"
+- "Sampler"
+
+.. code-block:: LLVM
+
+ !8 = !{ !"SRV", i32 1, i32 0, i32 0, i32 -1, i32 4 }
+ !9 = !{ !"UAV", i32 5, i32 1, i32 10, i32 5, i32 2 }
+
+============================== ====
+Description Type
+============================== ====
+Number of Descriptors in Range i32
+Shader Register i32
+Register Space i32
+`Offset`_ i32
+`Descriptor Range Flags`_ i32
+============================== ====
+
+.. _Offset: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ns-d3d12-d3d12_descriptor_range
+.. _Descriptor Range Flags: https://learn.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_descriptor_range_flags
diff --git a/llvm/docs/DirectXUsage.rst b/llvm/docs/DirectXUsage.rst
index 4d8f49b..1d964e6 100644
--- a/llvm/docs/DirectXUsage.rst
+++ b/llvm/docs/DirectXUsage.rst
@@ -17,6 +17,7 @@ User Guide for the DirectX Target
DirectX/DXILArchitecture
DirectX/DXILOpTableGenDesign
DirectX/DXILResources
+ DirectX/RootSignatures
Introduction
============
diff --git a/llvm/docs/GettingStarted.rst b/llvm/docs/GettingStarted.rst
index 372fd40..3036dae 100644
--- a/llvm/docs/GettingStarted.rst
+++ b/llvm/docs/GettingStarted.rst
@@ -12,7 +12,7 @@ Welcome to the LLVM project!
The LLVM project has multiple components. The core of the project is
itself called "LLVM". This contains all of the tools, libraries, and header
-files needed to process intermediate representations and converts it into
+files needed to process intermediate representations and convert them into
object files. Tools include an assembler, disassembler, bitcode analyzer, and
bitcode optimizer. It also contains basic regression tests.
@@ -32,11 +32,11 @@ Getting the Source Code and Building LLVM
#. Check out LLVM (including subprojects like Clang):
* ``git clone https://github.com/llvm/llvm-project.git``
- * Or, on windows:
+ * Or, on Windows:
``git clone --config core.autocrlf=false
https://github.com/llvm/llvm-project.git``
- * To save storage and speed-up the checkout time, you may want to do a
+ * To save storage and speed up the checkout time, you may want to do a
`shallow clone <https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---depthltdepthgt>`_.
For example, to get the latest revision of the LLVM project, use
@@ -71,7 +71,7 @@ Getting the Source Code and Building LLVM
Some common options:
- * ``-DLLVM_ENABLE_PROJECTS='...'`` --- semicolon-separated list of the LLVM
+ * ``-DLLVM_ENABLE_PROJECTS='...'`` --- A semicolon-separated list of the LLVM
subprojects you'd like to additionally build. Can include any of: clang,
clang-tools-extra, lldb, lld, polly, or cross-project-tests.
@@ -82,10 +82,10 @@ Getting the Source Code and Building LLVM
pathname of where you want the LLVM tools and libraries to be installed
(default ``/usr/local``).
- * ``-DCMAKE_BUILD_TYPE=type`` --- Controls optimization level and debug
+ * ``-DCMAKE_BUILD_TYPE=type`` --- Controls the optimization level and debug
information of the build. Valid options for *type* are ``Debug``,
``Release``, ``RelWithDebInfo``, and ``MinSizeRel``. For more detailed
- information see :ref:`CMAKE_BUILD_TYPE <cmake_build_type>`.
+ information, see :ref:`CMAKE_BUILD_TYPE <cmake_build_type>`.
* ``-DLLVM_ENABLE_ASSERTIONS=ON`` --- Compile with assertion checks enabled
(default is ON for Debug builds, OFF for all other build types).
@@ -124,7 +124,7 @@ Getting the Source Code and Building LLVM
``ninja -C build check-llvm``
- This will setup an LLVM build with debugging info, then compile LLVM and
+ This will set up an LLVM build with debugging info, then compile LLVM and
run LLVM tests.
* For more detailed information on CMake options, see `CMake <CMake.html>`__
@@ -150,7 +150,7 @@ page.
For stand-alone builds, you must have an llvm install that is configured
properly to be consumable by stand-alone builds of the other projects.
-This could be a distro provided LLVM install, or you can build it yourself,
+This could be a distro-provided LLVM install, or you can build it yourself,
like this:
.. code-block:: console
@@ -195,7 +195,7 @@ clang clang, cmake CLANG_INCLUDE_TESTS=ON (Required for check
lld lld, cmake
============ ======================== ======================
-Example for building stand-alone `clang`:
+Example of building stand-alone `clang`:
.. code-block:: console
@@ -224,7 +224,7 @@ Example for building stand-alone `clang`:
Requirements
============
-Before you begin to use the LLVM system, review the requirements given below.
+Before you begin to use the LLVM system, review the requirements below.
This may save you some trouble by knowing ahead of time what hardware and
software you will need.
@@ -265,7 +265,7 @@ Windows on Arm ARM64 Visual Studio, Clang\ :sup:`4`
#. Code generation supported for Pentium processors and up
#. Code generation supported for 32-bit ABI only
- #. To use LLVM modules on Win32-based system, you may configure LLVM
+ #. To use LLVM modules on a Win32-based system, you may configure LLVM
with ``-DBUILD_SHARED_LIBS=On``.
#. Visual Studio alone can compile LLVM. When using Clang, you
must also have Visual Studio installed.
@@ -309,7 +309,7 @@ Package Version Notes
#. Only needed if you want to run the automated test suite in the
``llvm/test`` directory, or if you plan to utilize any Python libraries,
utilities, or bindings.
- #. Optional, adds compression / uncompression capabilities to selected LLVM
+ #. Optional, adds compression/uncompression capabilities to selected LLVM
tools.
#. Optional, you can use any other build tool supported by CMake.
#. Only needed when building libc with New Headergen. Mainly used by libc.
@@ -401,11 +401,11 @@ Studio 2019 (or later), or a recent version of mingw64. FreeBSD 10.0 and newer
have a modern Clang as the system compiler.
However, some Linux distributions and some other or older BSDs sometimes have
-extremely old versions of GCC. These steps attempt to help you upgrade you
+extremely old versions of GCC. These steps attempt to help you upgrade your
compiler even on such a system. However, if at all possible, we encourage you
to use a recent version of a distribution with a modern system compiler that
meets these requirements. Note that it is tempting to install a prior
-version of Clang and libc++ to be the host compiler, however libc++ was not
+version of Clang and libc++ to be the host compiler; however, libc++ was not
well tested or set up to build on Linux until relatively recently. As
a consequence, this guide suggests just using libstdc++ and a modern GCC as the
initial host in a bootstrap, and then using Clang (and potentially libc++).
@@ -514,11 +514,11 @@ appropriate pathname on your local system. All these paths are absolute:
``SRC_ROOT``
- This is the top level directory of the LLVM source tree.
+ This is the top-level directory of the LLVM source tree.
``OBJ_ROOT``
- This is the top level directory of the LLVM object tree (i.e. the tree where
+ This is the top-level directory of the LLVM object tree (i.e. the tree where
object files and compiled programs will be placed. It can be the same as
SRC_ROOT).
@@ -666,7 +666,7 @@ cross-compiling CMake provides a variable ``CMAKE_TOOLCHAIN_FILE`` which can
define compiler flags and variables used during the CMake test operations.
The result of such a build is executables that are not runnable on the build
-host but can be executed on the target. As an example the following CMake
+host but can be executed on the target. As an example, the following CMake
invocation can generate build files targeting iOS. This will work on macOS
with the latest Xcode:
@@ -770,7 +770,7 @@ Generates system build files.
- Some simple examples showing how to use LLVM as a compiler for a custom
language - including lowering, optimization, and code generation.
-- Kaleidoscope Tutorial: Kaleidoscope language tutorial run through the
+- Kaleidoscope Tutorial: Kaleidoscope language tutorial runs through the
implementation of a nice little compiler for a non-trivial language
including a hand-written lexer, parser, AST, as well as code generation
support using LLVM- both static (ahead of time) and various approaches to
@@ -858,7 +858,7 @@ share code among the `tools`_.
``llvm/lib/Support/``
- Source code that corresponding to the header files in ``llvm/include/ADT/``
+ Source code that corresponds to the header files in ``llvm/include/ADT/``
and ``llvm/include/Support/``.
``llvm/bindings``
@@ -1051,7 +1051,7 @@ Example with clang
% lli hello.bc
- The second examples shows how to invoke the LLVM JIT, :doc:`lli
+ The second example shows how to invoke the LLVM JIT, :doc:`lli
<CommandGuide/lli>`.
#. Use the ``llvm-dis`` utility to take a look at the LLVM assembly code:
@@ -1163,7 +1163,7 @@ following options with cmake:
Consider setting this to ``ON`` if you require a debug build, as this will ease
memory pressure on the linker. This will make linking much faster, as the
- binaries will not contain any of the debug information. Instead the debug
+ binaries will not contain any of the debug information. Instead, the debug
information is in a separate DWARF object file (with the extension ``.dwo``).
This only applies to host platforms using ELF, such as Linux.
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 9a32f0c..bac13cc 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -280,9 +280,9 @@ linkage:
linkage are linked together, the two global arrays are appended
together. This is the LLVM, typesafe, equivalent of having the
system linker append together "sections" with identical names when
- .o files are linked.
+ ``.o`` files are linked.
- Unfortunately this doesn't correspond to any feature in .o files, so it
+ Unfortunately this doesn't correspond to any feature in ``.o`` files, so it
can only be used for variables like ``llvm.global_ctors`` which llvm
interprets specially.
@@ -371,7 +371,7 @@ added in the future:
This calling convention supports `tail call
optimization <CodeGenerator.html#tail-call-optimization>`_ but requires
- both the caller and callee are using it.
+ both the caller and callee to use it.
"``cc 11``" - The HiPE calling convention
This calling convention has been implemented specifically for use by
the `High-Performance Erlang
@@ -447,7 +447,7 @@ added in the future:
R11. R11 can be used as a scratch register. Furthermore it also preserves
all floating-point registers (XMMs/YMMs).
- - On AArch64 the callee preserve all general purpose registers, except
+ - On AArch64 the callee preserves all general purpose registers, except
X0-X8 and X16-X18. Furthermore it also preserves lower 128 bits of V8-V31
SIMD floating point registers. Not allowed with ``nest``.
@@ -890,7 +890,7 @@ Syntax::
[gc] [prefix Constant] [prologue Constant] [personality Constant]
(!name !N)* { ... }
-The argument list is a comma separated sequence of arguments where each
+The argument list is a comma-separated sequence of arguments where each
argument is of the following form:
Syntax::
@@ -1011,7 +1011,7 @@ some can only be checked when producing an object file:
IFuncs
-------
-IFuncs, like as aliases, don't create any new data or func. They are just a new
+IFuncs, like aliases, don't create any new data or func. They are just a new
symbol that is resolved at runtime by calling a resolver function.
On ELF platforms, IFuncs are resolved by the dynamic linker at load time. On
@@ -1211,7 +1211,7 @@ Currently, only the following parameter attributes are defined:
the callee (for a return value).
``noext``
This indicates to the code generator that the parameter or return
- value has the high bits undefined, as for a struct in register, and
+ value has the high bits undefined, as for a struct in a register, and
therefore does not need to be sign or zero extended. This is the same
as default behavior and is only actually used (by some targets) to
validate that one of the attributes is always present.
@@ -1252,7 +1252,7 @@ Currently, only the following parameter attributes are defined:
on the stack. This implies the pointer is dereferenceable up to
the storage size of the type.
- It is not generally permissible to introduce a write to an
+ It is not generally permissible to introduce a write to a
``byref`` pointer. The pointer may have any address space and may
be read only.
@@ -1393,7 +1393,7 @@ Currently, only the following parameter attributes are defined:
storage for any other object accessible to the caller.
``captures(...)``
- This attributes restrict the ways in which the callee may capture the
+ This attribute restricts the ways in which the callee may capture the
pointer. This is not a valid attribute for return values. This attribute
applies only to the particular copy of the pointer passed in this argument.
@@ -1615,7 +1615,7 @@ Currently, only the following parameter attributes are defined:
assigning this parameter or return value to a stack slot during calling
convention lowering. The enforcement of the specified alignment is
target-dependent, as target-specific calling convention rules may override
- this value. This attribute serves the purpose of carrying language specific
+ this value. This attribute serves the purpose of carrying language-specific
alignment information that is not mapped to base types in the backend (for
example, over-alignment specification through language attributes).
@@ -1993,7 +1993,7 @@ For example:
``cold``
This attribute indicates that this function is rarely called. When
computing edge weights, basic blocks post-dominated by a cold
- function call are also considered to be cold; and, thus, given low
+ function call are also considered to be cold and, thus, given a low
weight.
.. _attr_convergent:
@@ -2892,7 +2892,7 @@ site, these bundles may contain any values that are needed by the
generated code. For more details, see :ref:`GC Transitions
<gc_transition_args>`.
-The bundle contain an arbitrary list of Values which need to be passed
+The bundle contains an arbitrary list of Values which need to be passed
to GC transition code. They will be lowered and passed as operands to
the appropriate GC_TRANSITION nodes in the selection DAG. It is assumed
that these arguments must be available before and after (but not
@@ -2903,7 +2903,7 @@ necessarily during) the execution of the callee.
Assume Operand Bundles
^^^^^^^^^^^^^^^^^^^^^^
-Operand bundles on an :ref:`llvm.assume <int_assume>` allows representing
+Operand bundles on an :ref:`llvm.assume <int_assume>` allow representing
assumptions, such as that a :ref:`parameter attribute <paramattrs>` or a
:ref:`function attribute <fnattrs>` holds for a certain value at a certain
location. Operand bundles enable assumptions that are either hard or impossible
@@ -2922,11 +2922,11 @@ restricted form:
"<tag>"([ <holds for value> [, <attribute argument>] ])
-* The tag of the operand bundle is usually the name of attribute that can be
- assumed to hold. It can also be `ignore`, this tag doesn't contain any
+* The tag of the operand bundle is usually the name of the attribute that can be
+ assumed to hold. It can also be `ignore`; this tag doesn't contain any
information and should be ignored.
-* The first argument if present is the value for which the attribute hold.
-* The second argument if present is an argument of the attribute.
+* The first argument, if present, is the value for which the attribute holds.
+* The second argument, if present, is an argument of the attribute.
If there are no arguments the attribute is a property of the call location.
@@ -2968,7 +2968,7 @@ the behavior is undefined, unless one of the following exceptions applies:
dereferenceable at later pointers, e.g. because it could have been freed.
In addition to allowing operand bundles encoding function and parameter
-attributes, an assume operand bundle my also encode a ``separate_storage``
+attributes, an assume operand bundle may also encode a ``separate_storage``
operand bundle. This has the form:
.. code-block:: llvm
@@ -3115,7 +3115,7 @@ Note that the assembly string *must* be parseable by LLVM's integrated assembler
Data Layout
-----------
-A module may specify a target specific data layout string that specifies
+A module may specify a target-specific data layout string that specifies
how data is to be laid out in memory. The syntax for the data layout is
simply:
@@ -3356,6 +3356,19 @@ behavior is undefined:
- the size of all allocated objects must be non-negative and not exceed the
largest signed integer that fits into the index type.
+Allocated objects that are created with operations recognized by LLVM (such as
+:ref:`alloca <i_alloca>`, heap allocation functions marked as such, and global
+variables) may *not* change their size. (``realloc``-style operations do not
+change the size of an existing allocated object; instead, they create a new
+allocated object. Even if the object is at the same location as the old one, old
+pointers cannot be used to access this new object.) However, allocated objects
+can also be created by means not recognized by LLVM, e.g. by directly calling
+``mmap``. Those allocated objects are allowed to grow to the right (i.e.,
+keeping the same base address, but increasing their size) while maintaining the
+validity of existing pointers, as long as they always satisfy the properties
+described above. Currently, allocated objects are not permitted to grow to the
+left or to shrink, nor can they have holes.
+
.. _objectlifetime:
Object Lifetime
@@ -3611,7 +3624,7 @@ operation may modify the memory at that address. A volatile operation
may not modify any other memory accessible by the module being compiled.
A volatile operation may not call any code in the current module.
-In general (without target specific context), the address space of a
+In general (without target-specific context), the address space of a
volatile operation may not be changed. Different address spaces may
have different trapping behavior when dereferencing an invalid
pointer.
@@ -3794,7 +3807,7 @@ If an atomic operation is marked ``syncscope("singlethread")``, it only
other operations running in the same thread (for example, in signal handlers).
If an atomic operation is marked ``syncscope("<target-scope>")``, where
-``<target-scope>`` is a target specific synchronization scope, then it is target
+``<target-scope>`` is a target-specific synchronization scope, then it is target
dependent if it *synchronizes with* and participates in the seq\_cst total
orderings of other operations.
@@ -3896,10 +3909,10 @@ Floating-Point Semantics
------------------------
This section defines the semantics for core floating-point operations on types
-that use a format specified by IEEE-745. These types are: ``half``, ``float``,
+that use a format specified by IEEE-754. These types are: ``half``, ``float``,
``double``, and ``fp128``, which correspond to the binary16, binary32, binary64,
and binary128 formats, respectively. The "core" operations are those defined in
-section 5 of IEEE-745, which all have corresponding LLVM operations.
+section 5 of IEEE-754, which all have corresponding LLVM operations.
The value returned by those operations matches that of the corresponding
IEEE-754 operation executed in the :ref:`default LLVM floating-point environment
@@ -8746,11 +8759,11 @@ framework::
The metadata encoding as lists of lists of options, as opposed to a collapsed
list of options, is chosen so that the IR encoding can use multiple option
strings to specify e.g., a single library, while still having that specifier be
-preserved as an atomic element that can be recognized by a target specific
+preserved as an atomic element that can be recognized by a target-specific
assembly writer or object file emitter.
Each individual option is required to be either a valid option for the target's
-linker, or an option that is reserved by the target specific assembly writer or
+linker, or an option that is reserved by the target-specific assembly writer or
object file emitter. No other aspect of these options is defined by the IR.
Dependent Libs Named Metadata
@@ -11928,6 +11941,9 @@ if the ``getelementptr`` has any non-zero indices, the following rules apply:
:ref:`based <pointeraliasing>` on. This means that it points into that
allocated object, or to its end. Note that the object does not have to be
live anymore; being in-bounds of a deallocated object is sufficient.
+ If the allocated object can grow, then the relevant size for being *in
+ bounds* is the maximal size the object could have while satisfying the
+ allocated object rules, not its current size.
* During the successive addition of offsets to the address, the resulting
pointer must remain *in bounds* of the allocated object at each step.
@@ -19508,7 +19524,7 @@ Semantics:
The '``llvm.set.loop.iterations.*``' intrinsics do not perform any arithmetic
on their operand. It's a hint to the backend that can use this to set up the
-hardware-loop count with a target specific instruction, usually a move of this
+hardware-loop count with a target-specific instruction, usually a move of this
value to a special register or a hardware-loop instruction.
@@ -19547,7 +19563,7 @@ Semantics:
The '``llvm.start.loop.iterations.*``' intrinsics do not perform any arithmetic
on their operand. It's a hint to the backend that can use this to set up the
-hardware-loop count with a target specific instruction, usually a move of this
+hardware-loop count with a target-specific instruction, usually a move of this
value to a special register or a hardware-loop instruction.
'``llvm.test.set.loop.iterations.*``' Intrinsic
@@ -19583,7 +19599,7 @@ Semantics:
The '``llvm.test.set.loop.iterations.*``' intrinsics do not perform any
arithmetic on their operand. It's a hint to the backend that can use this to
-set up the hardware-loop count with a target specific instruction, usually a
+set up the hardware-loop count with a target-specific instruction, usually a
move of this value to a special register or a hardware-loop instruction.
The result is the conditional value of whether the given count is not zero.
@@ -19621,7 +19637,7 @@ Semantics:
The '``llvm.test.start.loop.iterations.*``' intrinsics do not perform any
arithmetic on their operand. It's a hint to the backend that can use this to
-set up the hardware-loop count with a target specific instruction, usually a
+set up the hardware-loop count with a target-specific instruction, usually a
move of this value to a special register or a hardware-loop instruction.
The result is a pair of the input and a conditional value of whether the
given count is not zero.
@@ -26639,19 +26655,14 @@ Arguments:
The first argument is a constant integer representing the size of the
object, or -1 if it is variable sized. The second argument is a pointer
-to the object.
+to an ``alloca`` instruction.
Semantics:
""""""""""
-If ``ptr`` is a stack-allocated object and it points to the first byte of
-the object, the object is initially marked as dead.
-``ptr`` is conservatively considered as a non-stack-allocated object if
-the stack coloring algorithm that is used in the optimization pipeline cannot
-conclude that ``ptr`` is a stack-allocated object.
-
-After '``llvm.lifetime.start``', the stack object that ``ptr`` points is marked
-as alive and has an uninitialized value.
+The stack-allocated object that ``ptr`` points to is initially marked as dead.
+After '``llvm.lifetime.start``', the stack object is marked as alive and has an
+uninitialized value.
The stack object is marked as dead when either
:ref:`llvm.lifetime.end <int_lifeend>` to the alloca is executed or the
function returns.
@@ -26661,11 +26672,6 @@ After :ref:`llvm.lifetime.end <int_lifeend>` is called,
The second '``llvm.lifetime.start``' call marks the object as alive, but it
does not change the address of the object.
-If ``ptr`` is a non-stack-allocated object, it does not point to the first
-byte of the object or it is a stack object that is already alive, it simply
-fills all bytes of the object with ``poison``.
-
-
.. _int_lifeend:
'``llvm.lifetime.end``' Intrinsic
@@ -26689,24 +26695,16 @@ Arguments:
The first argument is a constant integer representing the size of the
object, or -1 if it is variable sized. The second argument is a pointer
-to the object.
+to an ``alloca`` instruction.
Semantics:
""""""""""
-If ``ptr`` is a stack-allocated object and it points to the first byte of the
-object, the object is dead.
-``ptr`` is conservatively considered as a non-stack-allocated object if
-the stack coloring algorithm that is used in the optimization pipeline cannot
-conclude that ``ptr`` is a stack-allocated object.
+The stack-allocated object that ``ptr`` points to becomes dead after the call
+to this intrinsic.
Calling ``llvm.lifetime.end`` on an already dead alloca is no-op.
-If ``ptr`` is a non-stack-allocated object or it does not point to the first
-byte of the object, it is equivalent to simply filling all bytes of the object
-with ``poison``.
-
-
'``llvm.invariant.start``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 11017fe..d28eb68 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -1072,6 +1072,8 @@ Syntax:
declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.tile.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.tile.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.tile.gather4.2d(ptr addrspace(7) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %x0, i32 %y0, i32 %y1, i32 %y2, i32 %y3, i16 %mc, i64 %ch, i1 %flag_mc, i1 %flag_ch, i32 %flag_cta_group)
+
Overview:
"""""""""
@@ -1082,7 +1084,13 @@ global memory to shared::cluster memory (indicated by the ``g2s`` prefix)
in ``tile`` mode. In tile mode, the multi-dimensional layout of the
source tensor is preserved at the destination. The dimension of the
tensor data ranges from 1d to 5d with the coordinates specified
-by the ``i32 %d0 ... i32 %d4`` arguments.
+by the ``i32 %d0 ... i32 %d4`` arguments. In ``tile.gather4`` mode,
+four rows in a 2D tensor are combined to form a single 2D destination
+tensor. The first coordinate ``i32 %x0`` denotes the column index
+followed by four coordinates indicating the four row-indices.
+So, this mode takes a total of 5 coordinates as input arguments.
+For more information on ``gather4`` mode, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-tiled-scatter4-gather4-modes>`_.
* The last three arguments to these intrinsics are flags
indicating support for multicast, cache_hint and cta_group::1/2
@@ -1116,10 +1124,18 @@ Syntax:
.. code-block:: llvm
- declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.3d(ptr addrspace(3) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %im2col0, i16 %mc, i64 %ch, i1 %flag_mc, i1 %flag_ch, i32 %flag_cta_group)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.3d(ptr addrspace(7) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %im2col0, i16 %mc, i64 %ch, i1 %flag_mc, i1 %flag_ch, i32 %flag_cta_group)
declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i16 %im2col0, i16 %im2col1, ...)
declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, i16 %im2col0, i16 %im2col1, i16 %im2col2, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.w.3d(ptr addrspace(7) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %wHalo, i16 %wOffset, i16 %mc, i64 %ch, i1 %flag_mc, i1 %flag_ch, i32 %flag_cta_group)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.w.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.w.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.w.128.3d(ptr addrspace(7) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %wHalo, i16 %wOffset, i16 %mc, i64 %ch, i1 %flag_mc, i1 %flag_ch, i32 %flag_cta_group)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.w.128.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.im2col.w.128.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+
Overview:
"""""""""
@@ -1131,10 +1147,105 @@ in ``im2col`` mode. In im2col mode, some dimensions of the source tensor
are unrolled into a single dimensional column at the destination. In this
mode, the tensor has to be at least three-dimensional. Along with the tensor
coordinates, im2col offsets are also specified (denoted by
-``i16 im2col0...i16 %im2col2``). The number of im2col offsets is two less
-than the number of dimensions of the tensor operation. The last three arguments
-to these intrinsics are flags, with the same functionality as described
-in the ``tile`` mode intrinsics above.
+``i16 im2col0...i16 %im2col2``). For the ``im2col`` mode, the number of offsets
+is two less than the number of dimensions of the tensor operation. For the
+``im2col.w`` and ``im2col.w.128`` mode, the number of offsets is always 2,
+denoted by ``i16 %wHalo`` and ``i16 %wOffset`` arguments. For more information
+on ``im2col.w`` and ``im2col.w.128`` modes, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-im2col-w-w128-modes>`_.
+
+The last three arguments to these intrinsics are flags, with the same functionality
+as described in the ``tile`` mode intrinsics above.
+
+For more information, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor>`_.
+
+'``llvm.nvvm.cp.async.bulk.tensor.g2s.cta.tile.[1-5]d``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.tile.1d(ptr addrspace(3) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %d0, i64 %ch, i1 %flag_ch)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.tile.2d(..., i32 %d0, i32 %d1, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.tile.3d(..., i32 %d0, i32 %d1, i32 %d2, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.tile.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.tile.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.tile.gather4.2d(ptr addrspace(3) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %x0, i32 %y0, i32 %y1, i32 %y2, i32 %y3, i64 %ch, i1 %flag_ch)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.cp.async.bulk.tensor.g2s.cta.tile.[1-5]d``' intrinsics
+correspond to the ``cp.async.bulk.tensor.[1-5]d.shared::cta.global.*``
+set of PTX instructions. These instructions initiate an asynchronous
+copy of tensor data from global memory to shared::cta memory in
+``tile`` mode. In tile mode, the multi-dimensional layout of the
+source tensor is preserved at the destination. The dimension of the
+tensor data ranges from 1d to 5d with the coordinates specified
+by the ``i32 %d0 ... i32 %d4`` arguments. In ``tile.gather4`` mode,
+four rows in a 2D tensor are combined to form a single 2D destination
+tensor. The first coordinate ``i32 %x0`` denotes the column index
+followed by four coordinates indicating the four row-indices.
+So, this mode takes a total of 5 coordinates as input arguments.
+For more information on ``gather4`` mode, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-tiled-scatter4-gather4-modes>`_.
+
+* The last argument to these intrinsics is a boolean flag
+ indicating support for cache_hint. This flag argument must
+ be a compile-time constant. When set, it indicates a valid
+ cache_hint (``i64 %ch``) and generates the ``.L2::cache_hint``
+ variant of the PTX instruction.
+
+For more information, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor>`_.
+
+'``llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.[3-5]d``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.3d(ptr addrspace(3) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %im2col0, i64 %ch, i1 %flag_ch)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i16 %im2col0, i16 %im2col1, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, i16 %im2col0, i16 %im2col1, i16 %im2col2, ...)
+
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.w.3d(ptr addrspace(3) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %wHalo, i16 %wOffset, i64 %ch, i1 %flag_ch)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.w.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.w.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.w.128.3d(ptr addrspace(3) %dst, ptr addrspace(3) %bar, ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %wHalo, i16 %wOffset, i64 %ch, i1 %flag_ch)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.w.128.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.w.128.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.cp.async.bulk.tensor.g2s.cta.im2col.[3-5]d``' intrinsics
+correspond to the ``cp.async.bulk.tensor.[1-5]d.shared::cta.global.*``
+set of PTX instructions. These instructions initiate an asynchronous copy
+of tensor data from global memory to shared::cta memory in ``im2col`` mode.
+In im2col mode, some dimensions of the source tensor are unrolled into a
+single dimensional column at the destination. In this mode, the tensor has
+to be at least three-dimensional. Along with the tensor coordinates, im2col
+offsets are also specified (denoted by ``i16 im2col0...i16 %im2col2``).
+For the ``im2col`` mode, the number of offsets is two less than the number
+of dimensions of the tensor operation. For the ``im2col.w`` and ``im2col.w.128``
+mode, the number of offsets is always 2, denoted by ``i16 %wHalo`` and
+``i16 %wOffset`` arguments. For more information on ``im2col.w`` and
+``im2col.w.128`` modes, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-im2col-w-w128-modes>`_.
+
+* The last argument to these intrinsics is a boolean flag
+ indicating support for cache_hint. This flag argument must
+ be a compile-time constant. When set, it indicates a valid
+ cache_hint (``i64 %ch``) and generates the ``.L2::cache_hint``
+ variant of the PTX instruction.
For more information, refer PTX ISA
`<https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor>`_.
@@ -1153,6 +1264,8 @@ Syntax:
declare void @llvm.nvvm.cp.async.bulk.tensor.s2g.tile.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
declare void @llvm.nvvm.cp.async.bulk.tensor.s2g.tile.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.s2g.tile.scatter4.2d(ptr addrspace(3) %src, ptr %tensor_map, i32 %x0, i32 %y0, i32 %y1, i32 %y2, i32 %y3, i64 %ch, i1 %flag_ch)
+
Overview:
"""""""""
@@ -1162,6 +1275,12 @@ These instructions initiate an asynchronous copy of tensor data from
shared::cta to global memory (indicated by the ``s2g`` prefix)
in ``tile`` mode. The dimension of the tensor data ranges from 1d to 5d
with the coordinates specified by the ``i32 %d0 ... i32 %d4`` arguments.
+In ``tile.scatter4`` mode, a single 2D source tensor is divided into
+four rows in the 2D destination tensor. The first coordinate ``i32 %x0``
+denotes the column index followed by four coordinates indicating the
+four row-indices. So, this mode takes a total of 5 coordinates as input arguments.
+For more information on ``scatter4`` mode, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-tiled-scatter4-gather4-modes>`_.
* The last argument to these intrinsics is a boolean flag
indicating support for cache_hint. This flag argument must
@@ -1214,6 +1333,8 @@ Syntax:
declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.tile.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.tile.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.tile.gather4.2d(ptr %tensor_map, i32 %x0, i32 %y0, i32 %y1, i32 %y2, i32 %y3, i64 %ch, i1 %flag_ch)
+
Overview:
"""""""""
@@ -1225,6 +1346,13 @@ multi-dimensional layout of the source tensor is preserved at the destination.
The dimension of the tensor data ranges from 1d to 5d with the coordinates
specified by the ``i32 %d0 ... i32 %d4`` arguments.
+In ``tile.gather4`` mode, four rows in the 2-dimnesional source tensor are
+fetched to the L2 cache. The first coordinate ``i32 %x0`` denotes the column index
+followed by four coordinates indicating the four row-indices. So, this mode takes
+a total of 5 coordinates as input arguments.
+For more information on ``gather4`` mode, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-tiled-scatter4-gather4-modes>`_.
+
* The last argument to these intrinsics is a boolean flag
indicating support for cache_hint. This flag argument must
be a compile-time constant. When set, it indicates a valid
@@ -1246,6 +1374,14 @@ Syntax:
declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.im2col.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i16 %im2col0, i16 %im2col1, ...)
declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.im2col.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, i16 %im2col0, i16 %im2col1, i16 %im2col2, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.im2col.w.3d(ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %wHalo, i16 %wOffset, i64 %ch, i1 %flag_ch)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.im2col.w.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.im2col.w.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+
+ declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.im2col.w.128.3d(ptr %tensor_map, i32 %d0, i32 %d1, i32 %d2, i16 %wHalo, i16 %wOffset, i64 %ch, i1 %flag_ch)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.im2col.w.128.4d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, ...)
+ declare void @llvm.nvvm.cp.async.bulk.tensor.prefetch.im2col.w.128.5d(..., i32 %d0, i32 %d1, i32 %d2, i32 %d3, i32 %d4, ...)
+
Overview:
"""""""""
@@ -1256,9 +1392,16 @@ of tensor data from global memory to the L2 cache. In im2col mode, some
dimensions of the source tensor are unrolled into a single dimensional
column at the destination. In this mode, the tensor has to be at least
three-dimensional. Along with the tensor coordinates, im2col offsets are
-also specified (denoted by ``i16 im2col0...i16 %im2col2``). The number
-of im2col offsets is two less than the number of dimensions of the tensor
-operation. The last argument to these intrinsics is a boolean flag, with
+also specified (denoted by ``i16 im2col0...i16 %im2col2``). For ``im2col``
+mode, the number of offsets is two less than the number of dimensions of
+the tensor operation. For the ``im2col.w`` and ``im2col.w.128`` modes,
+the number of offsets is always 2, denoted by ``i16 %wHalo`` and
+``i16 %wOffset`` arguments. For more information on ``im2col.w`` and
+``im2col.w.128`` modes, refer PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-im2col-w-w128-modes>`_.
+
+
+The last argument to these intrinsics is a boolean flag, with
the same functionality as described in the ``tile`` mode intrinsics above.
For more information, refer PTX ISA
diff --git a/llvm/docs/ProgrammersManual.rst b/llvm/docs/ProgrammersManual.rst
index d417de7..68490c8 100644
--- a/llvm/docs/ProgrammersManual.rst
+++ b/llvm/docs/ProgrammersManual.rst
@@ -135,7 +135,7 @@ rarely have to include this file directly).
return !L->contains(cast<Instruction>(V)->getParent());
}
- Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``,
+ Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``;
for that use the ``dyn_cast<>`` operator.
``dyn_cast<>``:
@@ -234,8 +234,8 @@ the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen
<https://llvm.org/doxygen/StringRef_8h_source.html>`__) for more
information.
-You should rarely use the ``StringRef`` class directly, because it contains
-pointers to external memory it is not generally safe to store an instance of the
+You should rarely use the ``StringRef`` class directly. Because it contains
+pointers to external memory, it is not generally safe to store an instance of the
class (unless you know that the external storage will not be freed).
``StringRef`` is small and pervasive enough in LLVM that it should always be
passed by value.
@@ -416,14 +416,14 @@ to abort quickly at the point of failure (providing some basic diagnostic) when
invariants are broken at runtime.
The fundamental tools for handling programmatic errors are assertions and the
-llvm_unreachable function. Assertions are used to express invariant conditions,
+``llvm_unreachable`` function. Assertions are used to express invariant conditions,
and should include a message describing the invariant:
.. code-block:: c++
assert(isPhysReg(R) && "All virt regs should have been allocated already.");
-The llvm_unreachable function can be used to document areas of control flow
+The ``llvm_unreachable`` function can be used to document areas of control flow
that should never be entered if the program invariants hold:
.. code-block:: c++
@@ -598,7 +598,7 @@ semantics. For example:
}
This third form works with any type that can be assigned to from ``T&&``. This
-can be useful if the ``Expected<T>`` value needs to be stored an already-declared
+can be useful if the ``Expected<T>`` value needs to be stored in an already-declared
``std::optional<T>``. For example:
.. code-block:: c++
@@ -619,7 +619,7 @@ can be useful if the ``Expected<T>`` value needs to be stored an already-declare
All ``Error`` instances, whether success or failure, must be either checked or
moved from (via ``std::move`` or a return) before they are destructed.
-Accidentally discarding an unchecked error will cause a program abort at the
+Accidentally discarding an unchecked error will cause a program to abort at the
point where the unchecked value's destructor is run, making it easy to identify
and fix violations of this rule.
@@ -661,7 +661,7 @@ a variadic list of "handlers", each of which must be a callable type (a
function, lambda, or class with a call operator) with one argument. The
``handleErrors`` function will visit each handler in the sequence and check its
argument type against the dynamic type of the error, running the first handler
-that matches. This is the same decision process that is used decide which catch
+that matches. This is the same decision process that is used to decide which catch
clause to run for a C++ exception.
Since the list of handlers passed to ``handleErrors`` may not cover every error
@@ -869,10 +869,10 @@ T value:
}
Like the ExitOnError utility, cantFail simplifies control flow. Their treatment
-of error cases is very different however: Where ExitOnError is guaranteed to
+of error cases is very different, however: Where ExitOnError is guaranteed to
terminate the program on an error input, cantFail simply asserts that the result
is success. In debug builds this will result in an assertion failure if an error
-is encountered. In release builds the behavior of cantFail for failure values is
+is encountered. In release builds, the behavior of cantFail for failure values is
undefined. As such, care must be taken in the use of cantFail: clients must be
certain that a cantFail wrapped call really can not fail with the given
arguments.
@@ -928,7 +928,7 @@ well-formed Foo or an Error, never an object in an invalid state.
Propagating and consuming errors based on types
"""""""""""""""""""""""""""""""""""""""""""""""
-In some contexts, certain types of error are known to be benign. For example,
+In some contexts, certain types of errors are known to be benign. For example,
when walking an archive, some clients may be happy to skip over badly formatted
object files rather than terminating the walk immediately. Skipping badly
formatted objects could be achieved using an elaborate handler method, but the
@@ -956,7 +956,7 @@ type inspection method, ``isA``, and the ``consumeError`` function:
Concatenating Errors with joinErrors
""""""""""""""""""""""""""""""""""""
-In the archive walking example above ``BadFileFormat`` errors are simply
+In the archive walking example above, ``BadFileFormat`` errors are simply
consumed and ignored. If the client had wanted report these errors after
completing the walk over the archive they could use the ``joinErrors`` utility:
@@ -982,13 +982,13 @@ The ``joinErrors`` routine builds a special error type called ``ErrorList``,
which holds a list of user defined errors. The ``handleErrors`` routine
recognizes this type and will attempt to handle each of the contained errors in
order. If all contained errors can be handled, ``handleErrors`` will return
-``Error::success()``, otherwise ``handleErrors`` will concatenate the remaining
+``Error::success()``; otherwise, ``handleErrors`` will concatenate the remaining
errors and return the resulting ``ErrorList``.
Building fallible iterators and iterator ranges
"""""""""""""""""""""""""""""""""""""""""""""""
-The archive walking examples above retrieve archive members by index, however
+The archive walking examples above retrieve archive members by index; however,
this requires considerable boiler-plate for iteration and error checking. We can
clean this up by using the "fallible iterator" pattern, which supports the
following natural iteration idiom for fallible containers like Archive:
@@ -1039,7 +1039,7 @@ fallible_iterator utility which provides ``operator++`` and ``operator--``,
returning any errors via a reference passed in to the wrapper at construction
time. The fallible_iterator wrapper takes care of (a) jumping to the end of the
range on error, and (b) marking the error as checked whenever an iterator is
-compared to ``end`` and found to be inequal (in particular: this marks the
+compared to ``end`` and found to be inequal (in particular, this marks the
error as checked throughout the body of a range-based for loop), enabling early
exit from the loop without redundant error checking.
@@ -1068,7 +1068,7 @@ functions. E.g.:
Using the fallible_iterator utility allows for both natural construction of
fallible iterators (using failing ``inc`` and ``dec`` operations) and
-relatively natural use of c++ iterator/loop idioms.
+relatively natural use of C++ iterator/loop idioms.
.. _function_apis:
@@ -1175,7 +1175,7 @@ Then you can run your pass like this:
I am here!
Using the ``LLVM_DEBUG()`` macro instead of a home-brewed solution allows you to not
-have to create "yet another" command line option for the debug output for your
+have to create "yet another" command-line option for the debug output for your
pass. Note that ``LLVM_DEBUG()`` macros are disabled for non-asserts builds, so they
do not cause a performance impact at all (for the same reason, they should also
not contain side-effects!).
@@ -1349,7 +1349,7 @@ certain number of times.
The ``llvm/Support/DebugCounter.h`` (`doxygen
<https://llvm.org/doxygen/DebugCounter_8h_source.html>`__) file
provides a class named ``DebugCounter`` that can be used to create
-command line counter options that control execution of parts of your code.
+command-line counter options that control execution of parts of your code.
Define your DebugCounter like this:
@@ -1364,7 +1364,7 @@ is specified by the first argument. The name of the counter
argument, and the description used in the help is specified by the
third argument.
-Whatever code you want that control, use ``DebugCounter::shouldExecute`` to control it.
+Whatever code you want to control, use ``DebugCounter::shouldExecute`` to control it.
.. code-block:: c++
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 5591ac6..48d2ef1 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -56,37 +56,9 @@ Makes programs 10x faster by doing Special New Thing.
Changes to the LLVM IR
----------------------
-* It is no longer permitted to inspect the uses of ConstantData. Use
- count APIs will behave as if they have no uses (i.e. use_empty() is
- always true).
-
-* The `nocapture` attribute has been replaced by `captures(none)`.
-* The constant expression variants of the following instructions have been
- removed:
-
- * `mul`
-
-* Updated semantics of `llvm.type.checked.load.relative` to match that of
- `llvm.load.relative`.
-* Inline asm calls no longer accept ``label`` arguments. Use ``callbr`` instead.
-
-* Updated semantics of the `callbr` instruction to clarify that its
- 'indirect labels' are not expected to be reached by indirect (as in
- register-controlled) branch instructions, and therefore are not
- guaranteed to start with a `bti` or `endbr64` instruction, where
- those exist.
-
Changes to LLVM infrastructure
------------------------------
-* Removed support for target intrinsics being defined in the target directories
- themselves (i.e., the `TargetIntrinsicInfo` class).
-* Fix Microsoft demangling of string literals to be stricter
- (#GH129970))
-* Added the support for ``fmaximum`` and ``fminimum`` in ``atomicrmw`` instruction. The
- comparison is expected to match the behavior of ``llvm.maximum.*`` and
- ``llvm.minimum.*`` respectively.
-
Changes to building LLVM
------------------------
@@ -99,31 +71,9 @@ Changes to Interprocedural Optimizations
Changes to the AArch64 Backend
------------------------------
-* Added the `execute-only` target feature, which indicates that the generated
- program code doesn't contain any inline data, and there are no data accesses
- to code sections. On ELF targets this property is indicated by the
- `SHF_AARCH64_PURECODE` section flag.
- ([#125687](https://github.com/llvm/llvm-project/pull/125687),
- [#132196](https://github.com/llvm/llvm-project/pull/132196),
- [#133084](https://github.com/llvm/llvm-project/pull/133084))
-
Changes to the AMDGPU Backend
-----------------------------
-* Enabled the
- [FWD_PROGRESS bit](https://llvm.org/docs/AMDGPUUsage.html#code-object-v3-kernel-descriptor)
- for all GFX ISAs greater or equal to 10, for the AMDHSA OS.
-
-* Bump the default `.amdhsa_code_object_version` to 6. ROCm 6.3 is required to run any program compiled with COV6.
-
-* Add a new `amdgcn.load.to.lds` intrinsic that wraps the existing global.load.lds
-intrinsic and has the same semantics. This intrinsic allows using buffer fat pointers
-(`ptr addrspace(7)`) as arguments, allowing loads to LDS from these pointers to be
-represented in the IR without needing to use buffer resource intrinsics directly.
-This intrinsic is exposed to Clang as `__builtin_amdgcn_load_to_lds`, though
-buffer fat pointers are not yet enabled in Clang. Migration to this intrinsic is
-optional, and there are no plans to deprecate `amdgcn.global.load.lds`.
-
Changes to the ARM Backend
--------------------------
@@ -136,106 +86,27 @@ Changes to the DirectX Backend
Changes to the Hexagon Backend
------------------------------
-* The default Hexagon architecture version in ELF object files produced by
- the tools such as llvm-mc is changed to v68. This version will be set if
- the user does not provide the CPU version in the command line.
-
Changes to the LoongArch Backend
--------------------------------
-* Changing the default code model from `small` to `medium` for 64-bit.
-* Added inline asm support for the `q` constraint.
-* Added the `32s` target feature for LA32S ISA extensions.
-* Added codegen support for atomic-ops (`cmpxchg`, `max`, `min`, `umax`, `umin`) on LA32.
-* Added codegen support for the ILP32D calling convention.
-* Added several codegen and vectorization optimizations.
-
Changes to the MIPS Backend
---------------------------
-* `-mcpu=i6400` and `-mcpu=i6500` were added.
-
Changes to the PowerPC Backend
------------------------------
Changes to the RISC-V Backend
-----------------------------
-* Adds experimental assembler support for the Qualcomm uC 'Xqcilb` (Long Branch)
- extension.
-* Adds experimental assembler support for the Qualcomm uC 'Xqcili` (Load Large Immediate)
- extension.
-* Adds experimental assembler support for the Qualcomm uC 'Xqcilia` (Large Immediate Arithmetic)
- extension.
-* Adds experimental assembler support for the Qualcomm uC 'Xqcibm` (Bit Manipulation)
- extension.
-* Adds experimental assembler support for the Qualcomm uC 'Xqcibi` (Branch Immediate)
- extension.
-* Adds experimental assembler and code generation support for the Qualcomm
- 'Xqccmp' extension, which is a frame-pointer convention compatible version of
- Zcmp.
-* Added non-quadratic ``log-vrgather`` cost model for ``vrgather.vv`` instruction
-* Adds experimental assembler support for the Qualcomm uC 'Xqcisim` (Simulation Hint)
- extension.
-* Adds experimental assembler support for the Qualcomm uC 'Xqcisync` (Sync Delay)
- extension.
-* Adds experimental assembler support for the Qualcomm uC 'Xqciio` (External Input Output)
- extension.
-* Adds assembler support for the 'Zilsd` (Load/Store Pair Instructions)
- extension.
-* Adds assembler support for the 'Zclsd` (Compressed Load/Store Pair Instructions)
- extension.
-* Adds experimental assembler support for Zvqdotq.
-* Adds Support for Qualcomm's `qci-nest` and `qci-nonest` interrupt types, which
- use instructions from `Xqciint` to save and restore some GPRs during interrupt
- handlers.
-* When the experimental extension `Xqcili` is enabled, `qc.e.li` and `qc.li` may
- now be used to materialize immediates.
-* Adds assembler support for ``.option exact``, which disables automatic compression,
- and branch and linker relaxation. This can be disabled with ``.option noexact``,
- which is also the default.
-* `-mcpu=xiangshan-kunminghu` was added.
-* `-mcpu=andes-n45` and `-mcpu=andes-nx45` were added.
-* `-mcpu=andes-a45` and `-mcpu=andes-ax45` were added.
-* Adds support for the 'Ziccamoc` (Main Memory Supports Atomics in Zacas) extension, which was introduced as an optional extension of the RISC-V Profiles specification.
-* Adds experimental assembler support for SiFive CLIC CSRs, under the names
- `Zsfmclic` for the M-mode registers and `Zsfsclic` for the S-mode registers.
-* Adds Support for SiFive CLIC interrupt attributes, which automate writing CLIC
- interrupt handlers without using inline assembly.
-* Adds assembler support for the Andes `XAndesperf` (Andes Performance extension).
-* `-mcpu=sifive-p870` was added.
-* Adds assembler support for the Andes `XAndesvpackfph` (Andes Vector Packed FP16 extension).
-* Adds assembler support for the Andes `XAndesvdot` (Andes Vector Dot Product extension).
-* Adds assembler support for the standard `Q` (Quad-Precision Floating Point)
- extension.
-* Adds experimental assembler support for the SiFive Xsfmm* Attached Matrix
- Extensions.
-* `-mcpu=andes-a25` and `-mcpu=andes-ax25` were added.
-* The `Shlcofideleg` extension was added.
-* `-mcpu=sifive-x390` was added.
-* `-mtune=andes-45-series` was added.
-* Adds assembler support for the Andes `XAndesvbfhcvt` (Andes Vector BFLOAT16 Conversion extension).
-* `-mcpu=andes-ax45mpv` was added.
-* Removed -mattr=+no-rvc-hints that could be used to disable parsing and generation of RVC hints.
-* Adds assembler support for the Andes `XAndesvsintload` (Andes Vector INT4 Load extension).
-* Adds assembler support for the Andes `XAndesbfhcvt` (Andes Scalar BFLOAT16 Conversion extension).
-
Changes to the WebAssembly Backend
----------------------------------
Changes to the Windows Target
-----------------------------
-* `fp128` is now passed indirectly, meaning it uses the same calling convention
- as `i128`.
-
Changes to the X86 Backend
--------------------------
-* `fp128` will now use `*f128` libcalls on 32-bit GNU targets as well.
-* On x86-32, `fp128` and `i128` are now passed with the expected 16-byte stack
- alignment.
-
Changes to the OCaml bindings
-----------------------------
@@ -245,25 +116,6 @@ Changes to the Python bindings
Changes to the C API
--------------------
-* The following functions for creating constant expressions have been removed,
- because the underlying constant expressions are no longer supported. Instead,
- an instruction should be created using the `LLVMBuildXYZ` APIs, which will
- constant fold the operands if possible and create an instruction otherwise:
-
- * `LLVMConstMul`
- * `LLVMConstNUWMul`
- * `LLVMConstNSWMul`
-
-* Added `LLVMConstDataArray` and `LLVMGetRawDataValues` to allow creating and
- reading `ConstantDataArray` values without needing extra `LLVMValueRef`s for
- individual elements.
-
-* Added ``LLVMDIBuilderCreateEnumeratorOfArbitraryPrecision`` for creating
- debugging metadata of enumerators larger than 64 bits.
-
-* Added ``LLVMGetICmpSameSign`` and ``LLVMSetICmpSameSign`` for the `samesign`
- flag on `icmp` instructions.
-
Changes to the CodeGen infrastructure
-------------------------------------
@@ -276,59 +128,9 @@ Changes to the Debug Info
Changes to the LLVM tools
---------------------------------
-* llvm-objcopy now supports the `--update-section` flag for intermediate Mach-O object files.
-* llvm-strip now supports continuing to process files on encountering an error.
-* In llvm-objcopy/llvm-strip's ELF port, `--discard-locals` and `--discard-all` now allow and preserve symbols referenced by relocations.
- ([#47468](https://github.com/llvm/llvm-project/issues/47468))
-* llvm-addr2line now supports a `+` prefix when specifying an address.
-* Support for `SHT_LLVM_BB_ADDR_MAP` versions 0 and 1 has been dropped.
-
Changes to LLDB
---------------------------------
-* When building LLDB with Python support, the minimum version of Python is now
- 3.8.
-* LLDB now supports hardware watchpoints for AArch64 Windows targets. Windows
- does not provide API to query the number of supported hardware watchpoints.
- Therefore current implementation allows only 1 watchpoint, as tested with
- Windows 11 on the Microsoft SQ2 and Snapdragon Elite X platforms.
-* LLDB now steps through C++ thunks. This fixes an issue where previously, it
- wouldn't step into multiple inheritance virtual functions.
-* A statusline was added to command-line LLDB to show progress events and
- information about the current state of the debugger at the bottom of the
- terminal. This is on by default and can be configured using the
- `show-statusline` and `statusline-format` settings. It is not currently
- supported on Windows.
-* The `min-gdbserver-port` and `max-gdbserver-port` options have been removed
- from `lldb-server`'s platform mode. Since the changes to `lldb-server`'s port
- handling in LLDB 20, these options have had no effect.
-* LLDB now supports `process continue --reverse` when used with debug servers
- supporting reverse execution, such as [rr](https://rr-project.org).
- When using reverse execution, `process continue --forward` returns to the
- forward execution.
-* LLDB now supports RISC-V 32-bit ELF core files.
-* LLDB now supports siginfo descriptions for Linux user-space signals. User space
- signals will now have descriptions describing the method and sender.
- ```
- stop reason = SIGSEGV: sent by tkill system call (sender pid=649752, uid=2667987)
- ```
-* ELF Cores can now have their siginfo structures inspected using `thread siginfo`.
-* LLDB now uses
- [DIL](https://discourse.llvm.org/t/rfc-data-inspection-language/69893) as the
- default implementation for 'frame variable'. This should not change the
- behavior of 'frame variable' at all, at this time. To revert to using the
- old implementation use: `settings set target.experimental.use-DIL false`.
-* Disassembly of unknown instructions now produces `<unknown>` instead of
- nothing at all
-* Changed the format of opcode bytes to match llvm-objdump when disassembling
- RISC-V code with `disassemble`'s `--byte` option.
-
-
-### Changes to lldb-dap
-
-* Breakpoints can now be set for specific columns within a line.
-* Function return value is now displayed on step-out.
-
Changes to BOLT
---------------------------------
diff --git a/llvm/docs/TestingGuide.rst b/llvm/docs/TestingGuide.rst
index b6dda6a..76b6b4e 100644
--- a/llvm/docs/TestingGuide.rst
+++ b/llvm/docs/TestingGuide.rst
@@ -152,12 +152,12 @@ can run the LLVM and Clang tests simultaneously using:
% make check-all
-To run the tests with Valgrind (Memcheck by default), use the ``LIT_ARGS`` make
+To run the tests with Valgrind (Memcheck by default), use the ``LIT_OPTS`` make
variable to pass the required options to lit. For example, you can use:
.. code-block:: bash
- % make check LIT_ARGS="-v --vg --vg-leak"
+ % make check LIT_OPTS="-v --vg --vg-leak"
to enable testing with valgrind and with leak checking enabled.
diff --git a/llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl04.rst b/llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl04.rst
index 5660802..5ebff3b 100644
--- a/llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl04.rst
+++ b/llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl04.rst
@@ -86,7 +86,7 @@ instead of computing "``x+3``" twice.
Unfortunately, no amount of local analysis will be able to detect and
correct this. This requires two transformations: reassociation of
-expressions (to make the add's lexically identical) and Common
+expressions (to make the adds lexically identical) and Common
Subexpression Elimination (CSE) to delete the redundant add instruction.
Fortunately, LLVM provides a broad range of optimizations that you can
use, in the form of "passes".