7 files changed, 292 insertions, 153 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d13f95b..c3d4833 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -768,6 +768,9 @@ For example:
                                                   performant than code generated for XNACK replay
                                                   disabled.
 
+     cu-stores       TODO                         On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used.
+                                                  If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater.
+
      =============== ============================ ==================================================
 
 .. _amdgpu-target-id:
@@ -5107,7 +5110,9 @@ The fields used by CP for code objects before V3 also match those specified in
                                                      and must be 0,
      >454    1 bit   ENABLE_SGPR_PRIVATE_SEGMENT
                      _SIZE
-     457:455 3 bits                                  Reserved, must be 0.
+     455     1 bit   USES_CU_STORES                  GFX12.5: Whether the ``cu-stores`` target attribute is enabled.
+                                                     If 0, then all stores are ``SCOPE_SE`` or higher.
+     457:456 2 bits                                  Reserved, must be 0.
      458     1 bit   ENABLE_WAVEFRONT_SIZE32         GFX6-GFX9
                                                        Reserved, must be 0.
                                                      GFX10-GFX11
@@ -18188,6 +18193,8 @@ terminated by an ``.end_amdhsa_kernel`` directive.
                                                                                   GFX942)
      ``.amdhsa_user_sgpr_private_segment_size``               0                   GFX6-GFX12   Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
                                                                                                :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
+     ``.amdhsa_uses_cu_stores``                               0                   GFX12.5      Controls USES_CU_STORES in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
      ``.amdhsa_wavefront_size32``                             Target              GFX10-GFX12  Controls ENABLE_WAVEFRONT_SIZE32 in
                                                               Feature                          :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
                                                               Specific
diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst
index 13fe4996..2f00c9f 100644
--- a/llvm/docs/CommandGuide/llvm-ir2vec.rst
+++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst
@@ -13,17 +13,21 @@ DESCRIPTION
 
 :program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It
 generates IR2Vec embeddings for LLVM IR and supports triplet generation 
-for vocabulary training. It provides two main operation modes:
+for vocabulary training. It provides three main operation modes:
 
-1. **Triplet Mode**: Generates triplets (opcode, type, operands) for vocabulary
+1. **Triplet Mode**: Generates numeric triplets in train2id format for vocabulary
    training from LLVM IR.
 
-2. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary
+2. **Entity Mode**: Generates entity mapping files (entity2id.txt) for vocabulary 
+   training.
+
+3. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary
    at different granularity levels (instruction, basic block, or function).
 
 The tool is designed to facilitate machine learning applications that work with
 LLVM IR by converting the IR into numerical representations that can be used by
-ML models.
+ML models. The triplet mode generates numeric IDs directly instead of string 
+triplets, streamlining the training data preparation workflow.
 
 .. note::
 
@@ -34,18 +38,49 @@ ML models.
 OPERATION MODES
 ---------------
 
+Triplet Generation and Entity Mapping Modes are used for preparing
+vocabulary and training data for knowledge graph embeddings. The Embedding Mode
+is used for generating embeddings from LLVM IR using a pre-trained vocabulary.
+
+The Seed Embedding Vocabulary of IR2Vec is trained on a large corpus of LLVM IR
+by modeling the relationships between opcodes, types, and operands as a knowledge
+graph. For this purpose, Triplet Generation and Entity Mapping Modes generate
+triplets and entity mappings in the standard format used for knowledge graph
+embedding training (see 
+<https://github.com/thunlp/OpenKE/tree/OpenKE-PyTorch?tab=readme-ov-file#data-format> 
+for details).
+
+See `llvm/utils/mlgo-utils/IR2Vec/generateTriplets.py` for more details on how
+these two modes are used to generate the triplets and entity mappings.
+
 Triplet Generation Mode
 ~~~~~~~~~~~~~~~~~~~~~~~
 
-In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts triplets
-consisting of opcodes, types, and operands. These triplets can be used to train
-vocabularies for embedding generation.
+In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts numeric
+triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets 
+are generated in the standard format used for knowledge graph embedding training. 
+The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping 
+infrastructure, eliminating the need for string-to-ID preprocessing.
 
 Usage:
 
 .. code-block:: bash
 
-   llvm-ir2vec --mode=triplets input.bc -o triplets.txt
+   llvm-ir2vec --mode=triplets input.bc -o triplets_train2id.txt
+
+Entity Mapping Generation Mode
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In entity mode, :program:`llvm-ir2vec` generates the entity mappings supported by
+IR2Vec in the standard format used for knowledge graph embedding training. This
+mode outputs all supported entities (opcodes, types, and operands) with their
+corresponding numeric IDs, and is not specific for an LLVM IR file.
+
+Usage:
+
+.. code-block:: bash
+
+   llvm-ir2vec --mode=entities -o entity2id.txt
 
 Embedding Generation Mode
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -67,6 +102,7 @@ OPTIONS
  Specify the operation mode. Valid values are:
 
  * ``triplets`` - Generate triplets for vocabulary training
+ * ``entities`` - Generate entity mappings for vocabulary training
  * ``embeddings`` - Generate embeddings using trained vocabulary (default)
 
 .. option:: --level=<level>
@@ -115,7 +151,7 @@ OPTIONS
 
    ``--level``, ``--function``, ``--ir2vec-vocab-path``, ``--ir2vec-opc-weight``, 
    ``--ir2vec-type-weight``, and ``--ir2vec-arg-weight`` are only used in embedding 
-   mode. These options are ignored in triplet mode.
+   mode. These options are ignored in triplet and entity modes.
 
 INPUT FILE FORMAT
 -----------------
@@ -129,14 +165,34 @@ OUTPUT FORMAT
 Triplet Mode Output
 ~~~~~~~~~~~~~~~~~~~
 
-In triplet mode, the output consists of lines containing space-separated triplets:
+In triplet mode, the output consists of numeric triplets in train2id format with
+metadata headers. The format includes:
+
+.. code-block:: text
+
+   MAX_RELATIONS=<max_relations_count>
+   <head_entity_id> <tail_entity_id> <relation_id>
+   <head_entity_id> <tail_entity_id> <relation_id>
+   ...
+
+Each line after the metadata header represents one instruction relationship,
+with numeric IDs for head entity, relation, and tail entity. The metadata 
+header (MAX_RELATIONS) provides counts for post-processing and training setup.
+
+Entity Mode Output
+~~~~~~~~~~~~~~~~~~
+
+In entity mode, the output consists of entity mapping in the format:
 
 .. code-block:: text
 
-   <opcode> <type> <operand1> <operand2> ...
+   <total_entities>
+   <entity_string>	<numeric_id>
+   <entity_string>	<numeric_id>
+   ...
 
-Each line represents the information of one instruction, with the opcode, type,
-and operands.
+The first line contains the total number of entities, followed by one entity
+mapping per line with tab-separated entity string and numeric ID.
 
 Embedding Mode Output
 ~~~~~~~~~~~~~~~~~~~~~
diff --git a/llvm/docs/HowToCrossCompileBuiltinsOnArm.rst b/llvm/docs/HowToCrossCompileBuiltinsOnArm.rst
index 2e199a0..31ead45 100644
--- a/llvm/docs/HowToCrossCompileBuiltinsOnArm.rst
+++ b/llvm/docs/HowToCrossCompileBuiltinsOnArm.rst
@@ -14,117 +14,113 @@ targets are welcome.
 
 The instructions in this document depend on libraries and programs external to
 LLVM, there are many ways to install and configure these dependencies so you
-may need to adapt the instructions here to fit your own local situation.
+may need to adapt the instructions here to fit your own situation.
 
 Prerequisites
 =============
 
-In this use case we'll be using cmake on a Debian-based Linux system,
-cross-compiling from an x86_64 host to a hard-float Armv7-A target. We'll be
+In this use case we will be using cmake on a Debian-based Linux system,
+cross-compiling from an x86_64 host to a hard-float Armv7-A target. We will be
 using as many of the LLVM tools as we can, but it is possible to use GNU
 equivalents.
 
- * ``A build of LLVM/clang for the llvm-tools and llvm-config``
- * ``A clang executable with support for the ARM target``
- * ``compiler-rt sources``
- * ``The qemu-arm user mode emulator``
- * ``An arm-linux-gnueabihf sysroot``
+You will need:
+ * A build of LLVM for the llvm-tools and ``llvm-config``.
+ * A clang executable with support for the ``ARM`` target.
+ * compiler-rt sources.
+ * The ``qemu-arm`` user mode emulator.
+ * An ``arm-linux-gnueabihf`` sysroot.
 
-In this example we will be using ninja.
+In this example we will be using ``ninja`` as the build tool.
 
-See https://compiler-rt.llvm.org/ for more information about the dependencies
+See https://compiler-rt.llvm.org/ for information about the dependencies
 on clang and LLVM.
 
 See https://llvm.org/docs/GettingStarted.html for information about obtaining
-the source for LLVM and compiler-rt. Note that the getting started guide
-places compiler-rt in the projects subdirectory, but this is not essential and
-if you are using the BaremetalARM.cmake cache for v6-M, v7-M and v7-EM then
-compiler-rt must be placed in the runtimes directory.
+the source for LLVM and compiler-rt.
 
 ``qemu-arm`` should be available as a package for your Linux distribution.
 
-The most complicated of the prerequisites to satisfy is the arm-linux-gnueabihf
+The most complicated of the prerequisites to satisfy is the ``arm-linux-gnueabihf``
 sysroot. In theory it is possible to use the Linux distributions multiarch
 support to fulfill the dependencies for building but unfortunately due to
-/usr/local/include being added some host includes are selected. The easiest way
-to supply a sysroot is to download the arm-linux-gnueabihf toolchain. This can
-be found at:
-* https://developer.arm.com/open-source/gnu-toolchain/gnu-a/downloads for gcc 8 and above
-* https://releases.linaro.org/components/toolchain/binaries/ for gcc 4.9 to 7.3
+``/usr/local/include`` being added some host includes are selected.
+
+The easiest way to supply a sysroot is to download an ``arm-linux-gnueabihf``
+toolchain from https://developer.arm.com/open-source/gnu-toolchain/gnu-a/downloads.
 
 Building compiler-rt builtins for Arm
 =====================================
+
 We will be doing a standalone build of compiler-rt using the following cmake
-options.
-
-* ``path/to/compiler-rt``
-* ``-G Ninja``
-* ``-DCMAKE_AR=/path/to/llvm-ar``
-* ``-DCMAKE_ASM_COMPILER_TARGET="arm-linux-gnueabihf"``
-* ``-DCMAKE_ASM_FLAGS="build-c-flags"``
-* ``-DCMAKE_C_COMPILER=/path/to/clang``
-* ``-DCMAKE_C_COMPILER_TARGET="arm-linux-gnueabihf"``
-* ``-DCMAKE_C_FLAGS="build-c-flags"``
-* ``-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld"``
-* ``-DCMAKE_NM=/path/to/llvm-nm``
-* ``-DCMAKE_RANLIB=/path/to/llvm-ranlib``
-* ``-DCOMPILER_RT_BUILD_BUILTINS=ON``
-* ``-DCOMPILER_RT_BUILD_LIBFUZZER=OFF``
-* ``-DCOMPILER_RT_BUILD_MEMPROF=OFF``
-* ``-DCOMPILER_RT_BUILD_PROFILE=OFF``
-* ``-DCOMPILER_RT_BUILD_SANITIZERS=OFF``
-* ``-DCOMPILER_RT_BUILD_XRAY=OFF``
-* ``-DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON``
-* ``-DLLVM_CONFIG_PATH=/path/to/llvm-config``
+options::
+
+  cmake path/to/compiler-rt \
+    -G Ninja \
+    -DCMAKE_AR=/path/to/llvm-ar \
+    -DCMAKE_ASM_COMPILER_TARGET="arm-linux-gnueabihf" \
+    -DCMAKE_ASM_FLAGS="build-c-flags" \
+    -DCMAKE_C_COMPILER=/path/to/clang \
+    -DCMAKE_C_COMPILER_TARGET="arm-linux-gnueabihf" \
+    -DCMAKE_C_FLAGS="build-c-flags" \
+    -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld" \
+    -DCMAKE_NM=/path/to/llvm-nm \
+    -DCMAKE_RANLIB=/path/to/llvm-ranlib \
+    -DCOMPILER_RT_BUILD_BUILTINS=ON \
+    -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
+    -DCOMPILER_RT_BUILD_MEMPROF=OFF \
+    -DCOMPILER_RT_BUILD_PROFILE=OFF \
+    -DCOMPILER_RT_BUILD_SANITIZERS=OFF \
+    -DCOMPILER_RT_BUILD_XRAY=OFF \
+    -DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON \
+    -DLLVM_CONFIG_PATH=/path/to/llvm-config
 
 The ``build-c-flags`` need to be sufficient to pass the C-make compiler check,
 compile compiler-rt, and if you are running the tests, compile and link the
 tests. When cross-compiling with clang we will need to pass sufficient
-information to generate code for the Arm architecture we are targeting. We will
-need to select the Arm target, select the Armv7-A architecture and choose
-between using Arm or Thumb.
-instructions. For example:
+information to generate code for the Arm architecture we are targeting.
 
-* ``--target=arm-linux-gnueabihf``
-* ``-march=armv7a``
-* ``-mthumb``
+We will need to select:
+ * The Arm target and Armv7-A architecture with ``--target=arm-linux-gnueabihf -march=armv7a``.
+ * Whether to generate Arm (the default) or Thumb instructions (``-mthumb``).
 
-When using a GCC arm-linux-gnueabihf toolchain the following flags are
+When using a GCC ``arm-linux-gnueabihf`` toolchain the following flags are
 needed to pick up the includes and libraries:
 
-* ``--gcc-toolchain=/path/to/dir/toolchain``
-* ``--sysroot=/path/to/toolchain/arm-linux-gnueabihf/libc``
+ * ``--gcc-toolchain=/path/to/dir/toolchain``
+ * ``--sysroot=/path/to/toolchain/arm-linux-gnueabihf/libc``
 
 In this example we will be adding all of the command line options to both
 ``CMAKE_C_FLAGS`` and ``CMAKE_ASM_FLAGS``. There are cmake flags to pass some of
-these options individually which can be used to simplify the ``build-c-flags``:
+these options individually which can be used to simplify the ``build-c-flags``::
 
-* ``-DCMAKE_C_COMPILER_TARGET="arm-linux-gnueabihf"``
-* ``-DCMAKE_ASM_COMPILER_TARGET="arm-linux-gnueabihf"``
-* ``-DCMAKE_C_COMPILER_EXTERNAL_TOOLCHAIN=/path/to/dir/toolchain``
-* ``-DCMAKE_SYSROOT=/path/to/dir/toolchain/arm-linux-gnueabihf/libc``
+ -DCMAKE_C_COMPILER_TARGET="arm-linux-gnueabihf"
+ -DCMAKE_ASM_COMPILER_TARGET="arm-linux-gnueabihf"
+ -DCMAKE_C_COMPILER_EXTERNAL_TOOLCHAIN=/path/to/dir/toolchain
+ -DCMAKE_SYSROOT=/path/to/dir/toolchain/arm-linux-gnueabihf/libc
 
 Once cmake has completed the builtins can be built with ``ninja builtins``
 
 Testing compiler-rt builtins using qemu-arm
 ===========================================
+
 To test the builtins library we need to add a few more cmake flags to enable
 testing and set up the compiler and flags for test case. We must also tell
-cmake that we wish to run the tests on ``qemu-arm``.
+cmake that we wish to run the tests on ``qemu-arm``::
 
-* ``-DCOMPILER_RT_EMULATOR="qemu-arm -L /path/to/armhf/sysroot``
-* ``-DCOMPILER_RT_INCLUDE_TESTS=ON``
-* ``-DCOMPILER_RT_TEST_COMPILER="/path/to/clang"``
-* ``-DCOMPILER_RT_TEST_COMPILER_CFLAGS="test-c-flags"``
+ -DCOMPILER_RT_EMULATOR="qemu-arm -L /path/to/armhf/sysroot"
+ -DCOMPILER_RT_INCLUDE_TESTS=ON
+ -DCOMPILER_RT_TEST_COMPILER="/path/to/clang"
+ -DCOMPILER_RT_TEST_COMPILER_CFLAGS="test-c-flags"
 
 The ``/path/to/armhf/sysroot`` should be the same as the one passed to
-``--sysroot`` in the "build-c-flags".
+``--sysroot`` in the ``build-c-flags``.
 
-The "test-c-flags" need to include the target, architecture, gcc-toolchain,
-sysroot and arm/thumb state. The additional cmake defines such as
+The ``test-c-flags`` need to include the target, architecture, gcc-toolchain,
+sysroot and Arm/Thumb state. The additional cmake defines such as
 ``CMAKE_C_COMPILER_EXTERNAL_TOOLCHAIN`` do not apply when building the tests. If
-you have put all of these in "build-c-flags" then these can be repeated. If you
-wish to use lld to link the tests then add ``"-fuse-ld=lld``.
+you have put all of these in ``build-c-flags`` then these can be repeated. If you
+wish to use lld to link the tests then add ``-fuse-ld=lld``.
 
 Once cmake has completed the tests can be built and run using
 ``ninja check-builtins``
@@ -142,19 +138,21 @@ This stage can often fail at link time if the ``--sysroot=`` and
 ``CMAKE_C_FLAGS`` and ``CMAKE_C_COMPILER_TARGET`` flags.
 
 It can be useful to build a simple example outside of cmake with your toolchain
-to make sure it is working. For example: ``clang --target=arm-linux-gnueabi -march=armv7a --gcc-toolchain=/path/to/gcc-toolchain --sysroot=/path/to/gcc-toolchain/arm-linux-gnueabihf/libc helloworld.c``
+to make sure it is working. For example::
+
+  clang --target=arm-linux-gnueabi -march=armv7a --gcc-toolchain=/path/to/gcc-toolchain --sysroot=/path/to/gcc-toolchain/arm-linux-gnueabihf/libc helloworld.c
 
 Clang uses the host header files
 --------------------------------
 On debian based systems it is possible to install multiarch support for
-arm-linux-gnueabi and arm-linux-gnueabihf. In many cases clang can successfully
+``arm-linux-gnueabi`` and ``arm-linux-gnueabihf``. In many cases clang can successfully
 use this multiarch support when ``--gcc-toolchain=`` and ``--sysroot=`` are not supplied.
 Unfortunately clang adds ``/usr/local/include`` before
 ``/usr/include/arm-linux-gnueabihf`` leading to errors when compiling the hosts
 header files.
 
 The multiarch support is not sufficient to build the builtins you will need to
-use a separate arm-linux-gnueabihf toolchain.
+use a separate ``arm-linux-gnueabihf`` toolchain.
 
 No target passed to clang
 -------------------------
@@ -164,12 +162,13 @@ as ``error: unknown directive .syntax unified``.
 
 You can check the clang invocation in the error message to see if there is no
 ``--target`` or if it is set incorrectly. The cause is usually
-``CMAKE_ASM_FLAGS`` not containing ``--target`` or ``CMAKE_ASM_COMPILER_TARGET`` not being present.
+``CMAKE_ASM_FLAGS`` not containing ``--target`` or ``CMAKE_ASM_COMPILER_TARGET``
+not being present.
 
 Arm architecture not given
 --------------------------
-The ``--target=arm-linux-gnueabihf`` will default to arm architecture v4t which
-cannot assemble the barrier instructions used in the synch_and_fetch source
+The ``--target=arm-linux-gnueabihf`` will default to Arm architecture v4t which
+cannot assemble the barrier instructions used in the ``synch_and_fetch`` source
 files.
 
 The cause is usually a missing ``-march=armv7a`` from the ``CMAKE_ASM_FLAGS``.
@@ -202,7 +201,7 @@ may need extra c-flags such as ``-mfloat-abi=softfp`` for use of floating-point
 instructions, and ``-mfloat-abi=soft -mfpu=none`` for software floating-point
 emulation.
 
-You will need to use an arm-linux-gnueabi GNU toolchain for soft-float.
+You will need to use an ``arm-linux-gnueabi`` GNU toolchain for soft-float.
 
 AArch64 Target
 --------------
@@ -220,8 +219,12 @@ Armv6-m, Armv7-m and Armv7E-M targets
 To build and test the libraries using a similar method to Armv7-A is possible
 but more difficult. The main problems are:
 
-* There isn't a ``qemu-arm`` user-mode emulator for bare-metal systems. The ``qemu-system-arm`` can be used but this is significantly more difficult to setup.
-* The targets to compile compiler-rt have the suffix -none-eabi. This uses the BareMetal driver in clang and by default won't find the libraries needed to pass the cmake compiler check.
+* There is not a ``qemu-arm`` user-mode emulator for bare-metal systems.
+  ``qemu-system-arm`` can be used but this is significantly more difficult
+  to setup.
+* The targets to compile compiler-rt have the suffix ``-none-eabi``. This uses
+  the BareMetal driver in clang and by default will not find the libraries
+  needed to pass the cmake compiler check.
 
 As the Armv6-M, Armv7-M and Armv7E-M builds of compiler-rt only use instructions
 that are supported on Armv7-A we can still get most of the value of running the
@@ -233,32 +236,30 @@ builtins use instructions that are supported on Armv7-A but not Armv6-M,
 Armv7-M and Armv7E-M.
 
 To get the cmake compile test to pass you will need to pass the libraries
-needed to successfully link the cmake test via ``CMAKE_CFLAGS``. It is
-strongly recommended that you use version 3.6 or above of cmake so you can use
-``CMAKE_TRY_COMPILE_TARGET=STATIC_LIBRARY`` to skip the link step.
-
-* ``-DCMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY``
-* ``-DCOMPILER_RT_OS_DIR="baremetal"``
-* ``-DCOMPILER_RT_BUILD_BUILTINS=ON``
-* ``-DCOMPILER_RT_BUILD_SANITIZERS=OFF``
-* ``-DCOMPILER_RT_BUILD_XRAY=OFF``
-* ``-DCOMPILER_RT_BUILD_LIBFUZZER=OFF``
-* ``-DCOMPILER_RT_BUILD_PROFILE=OFF``
-* ``-DCMAKE_C_COMPILER=${host_install_dir}/bin/clang``
-* ``-DCMAKE_C_COMPILER_TARGET="your *-none-eabi target"``
-* ``-DCMAKE_ASM_COMPILER_TARGET="your *-none-eabi target"``
-* ``-DCMAKE_AR=/path/to/llvm-ar``
-* ``-DCMAKE_NM=/path/to/llvm-nm``
-* ``-DCMAKE_RANLIB=/path/to/llvm-ranlib``
-* ``-DCOMPILER_RT_BAREMETAL_BUILD=ON``
-* ``-DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON``
-* ``-DLLVM_CONFIG_PATH=/path/to/llvm-config``
-* ``-DCMAKE_C_FLAGS="build-c-flags"``
-* ``-DCMAKE_ASM_FLAGS="build-c-flags"``
-* ``-DCOMPILER_RT_EMULATOR="qemu-arm -L /path/to/armv7-A/sysroot"``
-* ``-DCOMPILER_RT_INCLUDE_TESTS=ON``
-* ``-DCOMPILER_RT_TEST_COMPILER="/path/to/clang"``
-* ``-DCOMPILER_RT_TEST_COMPILER_CFLAGS="test-c-flags"``
+needed to successfully link the cmake test via ``CMAKE_CFLAGS``::
+
+ -DCMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY \
+ -DCOMPILER_RT_OS_DIR="baremetal" \
+ -DCOMPILER_RT_BUILD_BUILTINS=ON \
+ -DCOMPILER_RT_BUILD_SANITIZERS=OFF \
+ -DCOMPILER_RT_BUILD_XRAY=OFF \
+ -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
+ -DCOMPILER_RT_BUILD_PROFILE=OFF \
+ -DCMAKE_C_COMPILER=${host_install_dir}/bin/clang \
+ -DCMAKE_C_COMPILER_TARGET="your *-none-eabi target" \
+ -DCMAKE_ASM_COMPILER_TARGET="your *-none-eabi target" \
+ -DCMAKE_AR=/path/to/llvm-ar \
+ -DCMAKE_NM=/path/to/llvm-nm \
+ -DCMAKE_RANLIB=/path/to/llvm-ranlib \
+ -DCOMPILER_RT_BAREMETAL_BUILD=ON \
+ -DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON \
+ -DLLVM_CONFIG_PATH=/path/to/llvm-config \
+ -DCMAKE_C_FLAGS="build-c-flags" \
+ -DCMAKE_ASM_FLAGS="build-c-flags" \
+ -DCOMPILER_RT_EMULATOR="qemu-arm -L /path/to/armv7-A/sysroot" \
+ -DCOMPILER_RT_INCLUDE_TESTS=ON \
+ -DCOMPILER_RT_TEST_COMPILER="/path/to/clang" \
+ -DCOMPILER_RT_TEST_COMPILER_CFLAGS="test-c-flags"
 
 The Armv6-M builtins will use the soft-float ABI. When compiling the tests for
 Armv7-A we must include ``"-mthumb -mfloat-abi=soft -mfpu=none"`` in the
@@ -267,25 +268,21 @@ test-c-flags. We must use an Armv7-A soft-float abi sysroot for ``qemu-arm``.
 Depending on the linker used for the test cases you may encounter BuildAttribute
 mismatches between the M-profile objects from compiler-rt and the A-profile
 objects from the test. The lld linker does not check the profile
-BuildAttribute so it can be used to link the tests by adding -fuse-ld=lld to the
+BuildAttribute so it can be used to link the tests by adding ``-fuse-ld=lld`` to the
 ``COMPILER_RT_TEST_COMPILER_CFLAGS``.
 
 Alternative using a cmake cache
 -------------------------------
 If you wish to build, but not test compiler-rt for Armv6-M, Armv7-M or Armv7E-M
-the easiest way is to use the BaremetalARM.cmake recipe in clang/cmake/caches.
-
-You will need a bare metal sysroot such as that provided by the GNU ARM
-Embedded toolchain.
-
-The libraries can be built with the cmake options:
+the easiest way is to use the ``BaremetalARM.cmake`` recipe in ``clang/cmake/caches``.
 
-* ``-DBAREMETAL_ARMV6M_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi``
-* ``-DBAREMETAL_ARMV7M_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi``
-* ``-DBAREMETAL_ARMV7EM_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi``
-* ``-C /path/to/llvm/source/tools/clang/cmake/caches/BaremetalARM.cmake``
-* ``/path/to/llvm``
+You will need a bare metal sysroot such as that provided by the GNU ARM Embedded
+toolchain.
 
-**Note** that for the recipe to work the compiler-rt source must be checked out
-into the directory llvm/runtimes. You will also need clang and lld checked out.
+The libraries can be built with the cmake options::
 
+ -DBAREMETAL_ARMV6M_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi \
+ -DBAREMETAL_ARMV7M_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi \
+ -DBAREMETAL_ARMV7EM_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi \
+ -C /path/to/llvm/source/tools/clang/cmake/caches/BaremetalARM.cmake \
+ /path/to/llvm
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index bac13cc..527abc4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -413,6 +413,8 @@ added in the future:
     - On AArch64 the callee preserves all general purpose registers, except
       X0-X8 and X16-X18. Not allowed with ``nest``.
 
+    - On RISC-V the callee preserve x5-x31 except x6, x7 and x28 registers.
+
     The idea behind this convention is to support calls to runtime functions
     that have a hot path and a cold path. The hot path is usually a small piece
     of code that doesn't use many registers. The cold path might need to call out to
@@ -7958,6 +7960,67 @@ The attributes in this metadata is added to all followup loops of the
 loop distribution pass. See
 :ref:`Transformation Metadata <transformation-metadata>` for details.
 
+'``llvm.loop.estimated_trip_count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata records an estimated trip count for the loop.  The first operand
+is the string ``llvm.loop.estimated_trip_count``.  The second operand is an
+integer constant of type ``i32`` or smaller specifying the count, which might be
+omitted for the reasons described below.  For example:
+
+.. code-block:: llvm
+
+   !0 = !{!"llvm.loop.estimated_trip_count", i32 8}
+   !1 = !{!"llvm.loop.estimated_trip_count"}
+
+Purpose
+"""""""
+
+A loop's estimated trip count is an estimate of the average number of loop
+iterations (specifically, the number of times the loop's header executes) each
+time execution reaches the loop.  It is usually only an estimate based on, for
+example, profile data.  The actual number of iterations might vary widely.
+
+The estimated trip count serves as a parameter for various loop transformations
+and typically helps estimate transformation cost.  For example, it can help
+determine how many iterations to peel or how aggressively to unroll.
+
+Initialization and Maintenance
+""""""""""""""""""""""""""""""
+
+The ``pgo-estimate-trip-counts`` pass typically runs immediately after profile
+ingestion to add this metadata to all loops.  It estimates each loop's trip
+count from the loop's ``branch_weights`` metadata.  This way of initially
+estimating trip counts appears to be useful for the passes that consume them.
+
+As passes transform existing loops and create new loops, they must be free to
+update and create ``branch_weights`` metadata to maintain accurate block
+frequencies.  Trip counts estimated from this new ``branch_weights`` metadata
+are not necessarily useful to the passes that consume them.  In general, when
+passes transform and create loops, they should separately estimate new trip
+counts from previously estimated trip counts, and they should record them by
+creating or updating this metadata.  For this or any other work involving
+estimated trip counts, passes should always call
+``llvm::getLoopEstimatedTripCount`` and ``llvm::setLoopEstimatedTripCount``.
+
+Missing Metadata and Values
+"""""""""""""""""""""""""""
+
+If the current implementation of ``pgo-estimate-trip-counts`` cannot estimate a
+trip count from the loop's ``branch_weights`` metadata due to the loop's form or
+due to missing profile data, it creates this metadata for the loop but omits the
+value.  This situation is currently common (e.g., the LLVM IR loop that Clang
+emits for a simple C ``for`` loop).  A later pass (e.g., ``loop-rotate``) might
+modify the loop's form in a way that enables estimating its trip count even if
+those modifications provably never impact the actual number of loop iterations.
+That later pass should then add an appropriate value to the metadata.
+
+However, not all such passes currently do so.  Thus, if this metadata has no
+value, ``llvm::getLoopEstimatedTripCount`` will disregard it and estimate the
+trip count from the loop's ``branch_weights`` metadata.  It does the same when
+the metadata is missing altogether, perhaps because ``pgo-estimate-trip-counts``
+was not specified in a minimal pass list to a tool like ``opt``.
+
 '``llvm.licm.disable``' Metadata
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -26653,9 +26716,10 @@ object's lifetime.
 Arguments:
 """"""""""
 
-The first argument is a constant integer representing the size of the
-object, or -1 if it is variable sized. The second argument is a pointer
-to an ``alloca`` instruction.
+The first argument is a constant integer, which is ignored and will be removed
+in the future.
+
+The second argument is a pointer to an ``alloca`` instruction.
 
 Semantics:
 """"""""""
@@ -26693,9 +26757,10 @@ The '``llvm.lifetime.end``' intrinsic specifies the end of a
 Arguments:
 """"""""""
 
-The first argument is a constant integer representing the size of the
-object, or -1 if it is variable sized. The second argument is a pointer
-to an ``alloca`` instruction.
+The first argument is a constant integer, which is ignored and will be removed
+in the future.
+
+The second argument is a pointer to an ``alloca`` instruction.
 
 Semantics:
 """"""""""
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 021f321..0c49fc8 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -137,6 +137,9 @@ Changes to the LLVM tools
 Changes to LLDB
 ---------------------------------
 
+* LLDB can now set breakpoints, show backtraces, and display variables when
+  debugging Wasm with supported runtimes (WAMR and V8).
+
 Changes to BOLT
 ---------------------------------
 
diff --git a/llvm/docs/TableGen/BackGuide.rst b/llvm/docs/TableGen/BackGuide.rst
index 4828f9b..83f8f470 100644
--- a/llvm/docs/TableGen/BackGuide.rst
+++ b/llvm/docs/TableGen/BackGuide.rst
@@ -191,7 +191,7 @@ Some of these classes have additional members that
 are described in the following subsections.
 
 *All* of the classes derived from ``RecTy`` provide the ``get()`` function.
-It returns an instance of ``Recty`` corresponding to the derived class.
+It returns an instance of ``RecTy`` corresponding to the derived class.
 Some of the ``get()`` functions require an argument to
 specify which particular variant of the type is desired. These arguments are
 described in the following subsections.
@@ -354,12 +354,12 @@ The class provides many additional functions:
 * Functions to determine whether there are any operands and to get the
   number of operands.
 
-* Functions to the get the operands, both individually and together.
+* Functions to get the operands, both individually and together.
 
 * Functions to determine whether there are any names and to
   get the number of names
 
-* Functions to the get the names, both individually and together.
+* Functions to get the names, both individually and together.
 
 * Functions to get the operand iterator ``begin()`` and ``end()`` values.
 
@@ -605,7 +605,7 @@ null if the field does not exist.
 
 The field is assumed to have another record as its value. That record is returned
 as a pointer to a ``Record``. If the field does not exist or is unset, the
-functions returns null.
+function returns null.
 
 Getting Record Superclasses
 ===========================
diff --git a/llvm/docs/TableGen/ProgRef.rst b/llvm/docs/TableGen/ProgRef.rst
index 7b30698..2b1af05 100644
--- a/llvm/docs/TableGen/ProgRef.rst
+++ b/llvm/docs/TableGen/ProgRef.rst
@@ -219,17 +219,17 @@ TableGen provides "bang operators" that have a wide variety of uses:
 
 .. productionlist::
    BangOperator: one of
-               : !add         !and         !cast        !con         !dag
-               : !div         !empty       !eq          !exists      !filter
-               : !find        !foldl       !foreach     !ge          !getdagarg
-               : !getdagname  !getdagop    !gt          !head        !if
-               : !initialized !instances   !interleave  !isa         !le
-               : !listconcat  !listflatten !listremove  !listsplat   !logtwo
-               : !lt          !match       !mul         !ne          !not
-               : !or          !range       !repr        !setdagarg   !setdagname
-               : !setdagop    !shl         !size        !sra         !srl
-               : !strconcat   !sub         !subst       !substr      !tail
-               : !tolower     !toupper     !xor
+               : !add         !and         !cast         !con         !dag
+               : !div         !empty       !eq           !exists      !filter
+               : !find        !foldl       !foreach      !ge          !getdagarg
+               : !getdagname  !getdagop    !getdagopname !gt          !head
+               : !if          !initialized !instances    !interleave  !isa
+               : !le          !listconcat  !listflatten  !listremove  !listsplat
+               : !logtwo      !lt          !match        !mul         !ne
+               : !not         !or          !range        !repr        !setdagarg
+               : !setdagname  !setdagop    !setdagopname !shl         !size
+               : !sra         !srl         !strconcat    !sub         !subst
+               : !substr      !tail        !tolower      !toupper     !xor
 
 The ``!cond`` operator has a slightly different
 syntax compared to other bang operators, so it is defined separately:
@@ -1443,7 +1443,8 @@ DAG.
 
 The following bang operators are useful for working with DAGs:
 ``!con``, ``!dag``, ``!empty``, ``!foreach``, ``!getdagarg``, ``!getdagname``,
-``!getdagop``, ``!setdagarg``, ``!setdagname``, ``!setdagop``, ``!size``.
+``!getdagop``, ``!getdagopname``, ``!setdagarg``, ``!setdagname``, ``!setdagop``,
+``!setdagopname``, ``!size``.
 
 Defvar in a record body
 -----------------------
@@ -1695,9 +1696,11 @@ and non-0 as true.
     This operator concatenates the DAG nodes *a*, *b*, etc. Their operations
     must equal.
 
-    ``!con((op a1:$name1, a2:$name2), (op b1:$name3))``
+    ``!con((op:$lhs a1:$name1, a2:$name2), (op:$rhs b1:$name3))``
 
-    results in the DAG node ``(op a1:$name1, a2:$name2, b1:$name3)``.
+    results in the DAG node ``(op:$lhs a1:$name1, a2:$name2, b1:$name3)``.
+    The name of the dag operator is derived from the LHS DAG node if it is
+    set, otherwise from the RHS DAG node.
 
 ``!cond(``\ *cond1* ``:`` *val1*\ ``,`` *cond2* ``:`` *val2*\ ``, ...,`` *condn* ``:`` *valn*\ ``)``
     This operator tests *cond1* and returns *val1* if the result is true.
@@ -1819,6 +1822,10 @@ and non-0 as true.
 
       dag d = !dag(!getdagop(someDag), args, names);
 
+``!getdagopname(``\ *dag*\ ``)``
+    This operator retrieves the name of the given *dag* operator. If the operator
+    has no name associated, ``?`` is returned.
+
 ``!gt(``\ *a*\ `,` *b*\ ``)``
     This operator produces 1 if *a* is greater than *b*; 0 otherwise.
     The arguments must be ``bit``, ``bits``, ``int``, or ``string`` values.
@@ -1949,6 +1956,10 @@ and non-0 as true.
 
     Example: ``!setdagop((foo 1, 2), bar)`` results in ``(bar 1, 2)``.
 
+``!setdagopname(``\ *dag*\ ``,``\ *name*\ ``)``
+    This operator produces a DAG node with the same operator and arguments as
+    *dag*, but replacing the name of the operator with *name*.
+
 ``!shl(``\ *a*\ ``,`` *count*\ ``)``
     This operator shifts *a* left logically by *count* bits and produces the resulting
     value. The operation is performed on a 64-bit integer; the result