20 files changed, 155 insertions, 76 deletions
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 30b22a4..ba0e53b 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1180,6 +1180,51 @@ is conservatively correct for OpenCL.
                              other operations within the same address space.
      ======================= ===================================================
 
+Target Types
+------------
+
+The AMDGPU backend implements some target extension types.
+
+.. _amdgpu-types-named-barriers:
+
+Named Barriers
+~~~~~~~~~~~~~~
+
+Named barriers are fixed function hardware barrier objects that are available
+in gfx12.5+ in addition to the traditional default barriers.
+
+In LLVM IR, named barriers are represented by global variables of type
+``target("amdgcn.named.barrier", 0)`` in the LDS address space. Named barrier
+global variables do not occupy actual LDS memory, but their lifetime and
+allocation scope matches that of global variables in LDS. Programs in LLVM IR
+refer to named barriers using pointers.
+
+The following named barrier types are supported in global variables, defined
+recursively:
+
+* a single, standalone ``target("amdgcn.named.barrier", 0)``
+* an array of supported types
+* a struct containing a single element of supported type
+
+.. code-block:: llvm
+
+      @bar = addrspace(3) global target("amdgcn.named.barrier", 0) undef
+      @foo = addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] undef
+      @baz = addrspace(3) global { target("amdgcn.named.barrier", 0) } undef
+
+      ...
+
+      %foo.i = getelementptr [2 x target("amdgcn.named.barrier", 0)], ptr addrspace(3) @foo, i32 0, i32 %i
+      call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) %foo.i, i32 0)
+
+Named barrier types may not be used in ``alloca``.
+
+Named barriers do not have an underlying byte representation.
+It is undefined behavior to use a pointer to any part of a named barrier object
+as the pointer operand of a regular memory access instruction or intrinsic.
+Pointers to named barrier objects are intended to be used with dedicated
+intrinsics. Reading from or writing to such pointers is undefined behavior.
+
 LLVM IR Intrinsics
 ------------------
 
@@ -2645,7 +2690,7 @@ are deprecated and should not be used.
   ``vendor_name_size`` and ``architecture_name_size`` are the length of the
   vendor and architecture names respectively, including the NUL character.
 
-  ``vendor_and_architecture_name`` contains the NUL terminates string for the
+  ``vendor_and_architecture_name`` contains the NUL terminated string for the
   vendor, immediately followed by the NUL terminated string for the
   architecture.
 
@@ -3337,7 +3382,7 @@ location.
 
 If the lane is inactive, but was active on entry to the subprogram, then this is
 the program location in the subprogram at which execution of the lane is
-conceptual positioned.
+conceptually positioned.
 
 If the lane was not active on entry to the subprogram, then this will be the
 undefined location. A client debugger can check if the lane is part of a valid
@@ -4709,7 +4754,7 @@ same *vendor-name*.
                                                      "image", or "pipe". This may be
                                                      more restrictive than indicated
                                                      by ".access" to reflect what the
-                                                     kernel actual does. If not
+                                                     kernel actually does. If not
                                                      present then the runtime must
                                                      assume what is implied by
                                                      ".access" and ".is_const"      . Values
@@ -5088,7 +5133,7 @@ supported except by flat and scratch instructions in GFX9-GFX11.
 
 The generic address space uses the hardware flat address support available in
 GFX7-GFX11. This uses two fixed ranges of virtual addresses (the private and
-local apertures), that are outside the range of addressible global memory, to
+local apertures), that are outside the range of addressable global memory, to
 map from a flat address to a private or local address.
 
 FLAT instructions can take a flat address and access global, private (scratch)
@@ -6541,7 +6586,7 @@ Acquire memory ordering is not meaningful on store atomic instructions and is
 treated as non-atomic.
 
 Release memory ordering is not meaningful on load atomic instructions and is
-treated a non-atomic.
+treated as non-atomic.
 
 Acquire-release memory ordering is not meaningful on load or store atomic
 instructions and is treated as acquire and release respectively.
diff --git a/llvm/docs/AddingConstrainedIntrinsics.rst b/llvm/docs/AddingConstrainedIntrinsics.rst
index bd14f12..41e7dec 100644
--- a/llvm/docs/AddingConstrainedIntrinsics.rst
+++ b/llvm/docs/AddingConstrainedIntrinsics.rst
@@ -31,7 +31,7 @@ node ``FADD`` must be ``STRICT_FADD``.
 Update mappings
 ===============
 
-Add new record to the mapping of instructions to constrained intrinsic and
+Add new record to the mapping of instructions to constrained intrinsics and
 DAG nodes::
 
   include/llvm/IR/ConstrainedOps.def
diff --git a/llvm/docs/Atomics.rst b/llvm/docs/Atomics.rst
index 522aed1..1bcd864 100644
--- a/llvm/docs/Atomics.rst
+++ b/llvm/docs/Atomics.rst
@@ -408,7 +408,7 @@ operations:
   MemoryDependencyAnalysis (which is also used by other passes like GVN).
 
 * Folding a load: Any atomic load from a constant global can be constant-folded,
-  because it cannot be observed.  Similar reasoning allows sroa with
+  because it cannot be observed.  Similar reasoning allows SROA with
   atomic loads and stores.
 
 Atomics and Codegen
diff --git a/llvm/docs/BranchWeightMetadata.rst b/llvm/docs/BranchWeightMetadata.rst
index 3fa2172..71d7a7d 100644
--- a/llvm/docs/BranchWeightMetadata.rst
+++ b/llvm/docs/BranchWeightMetadata.rst
@@ -92,7 +92,7 @@ The second weight is optional and corresponds to the unwind branch.
 If only one weight is set, then it contains the execution count of the call
 and used in SamplePGO mode only as described for the call instruction. If both
 weights are specified then the second weight contains the count of unwind branch
-taken and the first weights contains the execution count of the call minus
+taken and the first weight contains the execution count of the call minus
 the count of unwind branch taken. Both weights specified are used to calculate
 BranchProbability as for BranchInst and for SamplePGO the sum of both weights
 is used.
@@ -223,7 +223,7 @@ indicates that it was called 2,590 times at runtime.
   !1 = !{!"function_entry_count", i64 2590}
 
 If "function_entry_count" has more than 2 operands, the subsequent operands are
-the GUID of the functions that needs to be imported by ThinLTO. This is only
+the GUID of the functions that need to be imported by ThinLTO. This is only
 set by sampling-based profile. It is needed because the sampling-based profile
 was collected on a binary that had already imported and inlined these functions,
 and we need to ensure the IR matches in the ThinLTO backends for profile
diff --git a/llvm/docs/CIBestPractices.rst b/llvm/docs/CIBestPractices.rst
index 855e2cc..a2270da 100644
--- a/llvm/docs/CIBestPractices.rst
+++ b/llvm/docs/CIBestPractices.rst
@@ -146,7 +146,7 @@ for LLVM infrastructure.
 Using Fully Qualified Container Names
 -------------------------------------
 
-When referencing container images from a registry, such as in Github Actions
+When referencing container images from a registry, such as in GitHub Actions
 workflows, or in ``Dockerfile`` files used for building images, prefer fully
 qualified names (i.e., including the registry domain) over just the image.
 For example, prefer ``docker.io/ubuntu:24.04`` over ``ubuntu:24.04``. This
diff --git a/llvm/docs/CommandGuide/llc.rst b/llvm/docs/CommandGuide/llc.rst
index cc670f6..ffcccfb 100644
--- a/llvm/docs/CommandGuide/llc.rst
+++ b/llvm/docs/CommandGuide/llc.rst
@@ -129,6 +129,12 @@ End-user Options
 
  Print statistics recorded by code-generation passes.
 
+.. option:: --save-stats, --save-stats=cwd, --save-stats=obj
+
+ Save LLVM statistics to a file in the current directory
+ (:option:`--save-stats`/"--save-stats=cwd") or the directory
+ of the output file ("--save-stats=obj") in JSON format.
+
 .. option:: --time-passes
 
  Record the amount of time needed for each pass and print a report to standard
diff --git a/llvm/docs/CompileCudaWithLLVM.rst b/llvm/docs/CompileCudaWithLLVM.rst
index 376d8ee..0bd121a 100644
--- a/llvm/docs/CompileCudaWithLLVM.rst
+++ b/llvm/docs/CompileCudaWithLLVM.rst
@@ -36,7 +36,7 @@ CUDA installation on a handful of common Linux distributions, but in general the
 most reliable way to make it work is to install CUDA in a single directory from
 NVIDIA's `.run` package and specify its location via `--cuda-path=...` argument.
 
-CUDA compilation is supported on Linux. Compilation on MacOS and Windows may or
+CUDA compilation is supported on Linux. Compilation on macOS and Windows may or
 may not work and currently have no maintainers.
 
 Invoking clang
@@ -64,7 +64,7 @@ brackets as described below:
   y[2] = 6
   y[3] = 8
 
-On MacOS, replace `-lcudart_static` with `-lcudart`; otherwise, you may get
+On macOS, replace `-lcudart_static` with `-lcudart`; otherwise, you may get
 "CUDA driver version is insufficient for CUDA runtime version" errors when you
 run your program.
 
diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst
index 13d2da4..0e6b49c 100644
--- a/llvm/docs/Coroutines.rst
+++ b/llvm/docs/Coroutines.rst
@@ -193,7 +193,7 @@ Values live across a suspend point need to be stored in the coroutine frame to
 be available in the continuation function. This frame is stored as a tail to the
 `async context`.
 
-Every suspend point takes an `context projection function` argument which
+Every suspend point takes a `context projection function` argument which
 describes how-to obtain the continuations `async context` and every suspend
 point has an associated `resume function` denoted by the
 `llvm.coro.async.resume` intrinsic. The coroutine is resumed by calling this
@@ -221,7 +221,7 @@ a parameter to the `llvm.coro.suspend.async` intrinsic.
                                               ptr %resume_func_ptr,
                                               ptr %context_projection_function
 
-The frontend should provide a `async function pointer` struct associated with
+The frontend should provide an `async function pointer` struct associated with
 each async coroutine by `llvm.coro.id.async`'s argument. The initial size and
 alignment of the `async context` must be provided as arguments to the
 `llvm.coro.id.async` intrinsic. Lowering will update the size entry with the
@@ -314,7 +314,7 @@ coroutine handle. The second parameter of `coro.begin` is given a block of memor
 to be used if the coroutine frame needs to be allocated dynamically.
 
 The `coro.id`_ intrinsic serves as coroutine identity useful in cases when the
-`coro.begin`_ intrinsic get duplicated by optimization passes such as
+`coro.begin`_ intrinsic gets duplicated by optimization passes such as
 jump-threading.
 
 The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
@@ -2149,7 +2149,7 @@ CoroEarly
 The CoroEarly pass ensures later middle end passes correctly interpret coroutine 
 semantics and lowers coroutine intrinsics that not needed to be preserved to 
 help later coroutine passes. This pass lowers `coro.promise`_, `coro.frame`_ and 
-`coro.done`_ intrinsics. Afterwards, it replace uses of promise alloca with 
+`coro.done`_ intrinsics. Afterwards, it replaces uses of promise alloca with 
 `coro.promise`_ intrinsic.
 
 .. _CoroSplit:
@@ -2188,7 +2188,7 @@ Attributes
 coro_only_destroy_when_complete
 -------------------------------
 
-When the coroutine are marked with coro_only_destroy_when_complete, it indicates
+When the coroutine is marked with coro_only_destroy_when_complete, it indicates
 the coroutine must reach the final suspend point when it get destroyed.
 
 This attribute only works for switched-resume coroutines now.
@@ -2199,7 +2199,7 @@ coro_elide_safe
 When a Call or Invoke instruction to switch ABI coroutine `f` is marked with
 `coro_elide_safe`, CoroSplitPass generates a `f.noalloc` ramp function.
 `f.noalloc` has one more argument than its original ramp function `f`, which is
-the pointer to the allocated frame. `f.noalloc` also suppressed any allocations
+the pointer to the allocated frame. `f.noalloc` also suppresses any allocations
 or deallocations that may be guarded by `@llvm.coro.alloc` and `@llvm.coro.free`.
 
 CoroAnnotationElidePass performs the heap elision when possible. Note that for
diff --git a/llvm/docs/Docker.rst b/llvm/docs/Docker.rst
index 5f8e619..29078d1f 100644
--- a/llvm/docs/Docker.rst
+++ b/llvm/docs/Docker.rst
@@ -16,7 +16,7 @@ to fill out in order to produce Dockerfiles for a new docker image.
 Why?
 ----
 Docker images provide a way to produce binary distributions of
-software inside a controlled environment. Having Dockerfiles to builds docker images
+software inside a controlled environment. Having Dockerfiles to build docker images
 inside LLVM repo makes them much more discoverable than putting them into any other
 place.
 
@@ -35,7 +35,7 @@ A snapshot of a docker container filesystem is called a *docker image*.
 One can start a container from a prebuilt docker image.
 
 Docker images are built from a so-called *Dockerfile*, a source file written in
-a specialized language that defines instructions to be used when build
+a specialized language that defines instructions to be used when building
 the docker image (see `official
 documentation <https://docs.docker.com/engine/reference/builder/>`_ for more
 details). A minimal Dockerfile typically contains a base image and a number
diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst
index 4bff111..0d7f599 100644
--- a/llvm/docs/Extensions.rst
+++ b/llvm/docs/Extensions.rst
@@ -792,7 +792,7 @@ emission of Variable Length Arrays (VLAs).
 The Windows ARM Itanium ABI extends the base ABI by adding support for emitting
 a dynamic stack allocation.  When emitting a variable stack allocation, a call
 to ``__chkstk`` is emitted unconditionally to ensure that guard pages are setup
-properly.  The emission of this stack probe emission is handled similar to the
+properly.  The emission of this stack probe emission is handled similarly to the
 standard stack probe emission.
 
 The MSVC environment does not emit code for VLAs currently.
@@ -813,7 +813,7 @@ in the following fashion:
   sub sp, sp, x15, lsl #4
 
 However, this has the limitation of 256 MiB (±128MiB).  In order to accommodate
-larger binaries, LLVM supports the use of ``-mcmodel=large`` to allow a 8GiB
+larger binaries, LLVM supports the use of ``-mcmodel=large`` to allow an 8GiB
 (±4GiB) range via a slight deviation.  It will generate an indirect jump as
 follows:
 
diff --git a/llvm/docs/FatLTO.rst b/llvm/docs/FatLTO.rst
index 5864944..c883513 100644
--- a/llvm/docs/FatLTO.rst
+++ b/llvm/docs/FatLTO.rst
@@ -38,7 +38,7 @@ This pipeline will:
 
    Previously, we conservatively ran independent pipelines on separate copies
    of the LLVM module to generate the bitcode section and the object code,
-   which happen to be identical to those used outside of FatLTO. While that
+   which happened to be identical to those used outside of FatLTO. While that
    resulted in  compiled artifacts that were identical to those produced by the
    default and (Thin)LTO pipelines, module cloning led to some cases of
    miscompilation, and we have moved away from trying to keep bitcode
diff --git a/llvm/docs/FaultMaps.rst b/llvm/docs/FaultMaps.rst
index a089a38..5dc5e57 100644
--- a/llvm/docs/FaultMaps.rst
+++ b/llvm/docs/FaultMaps.rst
@@ -9,7 +9,7 @@ FaultMaps and implicit checks
 Motivation
 ==========
 
-Code generated by managed language runtimes tend to have checks that
+Code generated by managed language runtimes tends to have checks that
 are required for safety but never fail in practice.  In such cases, it
 is profitable to make the non-failing case cheaper even if it makes
 the failing case significantly more expensive.  This asymmetry can be
@@ -28,7 +28,7 @@ the same memory location.
 The Fault Map Section
 =====================
 
-Information about implicit checks generated by LLVM are put in a
+Information about implicit checks generated by LLVM is put in a
 special "fault map" section.  On Darwin this section is named
 ``__llvm_faultmaps``.
 
diff --git a/llvm/docs/GarbageCollection.rst b/llvm/docs/GarbageCollection.rst
index 67be080..d5fdfbb 100644
--- a/llvm/docs/GarbageCollection.rst
+++ b/llvm/docs/GarbageCollection.rst
@@ -487,7 +487,7 @@ The 'Erlang' and 'OCaml' GCs
 LLVM ships with two example collectors which leverage the ``gcroot``
 mechanisms.  To our knowledge, these are not actually used by any language
 runtime, but they do provide a reasonable starting point for someone interested
-in writing an ``gcroot`` compatible GC plugin.  In particular, these are the
+in writing a ``gcroot`` compatible GC plugin.  In particular, these are the
 only in-tree examples of how to produce a custom binary stack map format using
 a ``gcroot`` strategy.
 
diff --git a/llvm/docs/GetElementPtr.rst b/llvm/docs/GetElementPtr.rst
index 6831a8e..09389a0 100644
--- a/llvm/docs/GetElementPtr.rst
+++ b/llvm/docs/GetElementPtr.rst
@@ -496,10 +496,10 @@ primitive integer expressions, which allows them to be combined with other
 integer expressions and/or split into multiple separate integer expressions. If
 they've made non-trivial changes, translating back into LLVM IR can involve
 reverse-engineering the structure of the addressing in order to fit it into the
-static type of the original first operand. It isn't always possibly to fully
+static type of the original first operand. It isn't always possible to fully
 reconstruct this structure; sometimes the underlying addressing doesn't
 correspond with the static type at all. In such cases the optimizer instead will
-emit a GEP with the base pointer casted to a simple address-unit pointer, using
+emit a GEP with the base pointer cast to a simple address-unit pointer, using
 the name "uglygep". This isn't pretty, but it's just as valid, and it's
 sufficient to preserve the pointer aliasing guarantees that GEP provides.
 
diff --git a/llvm/docs/GettingInvolved.rst b/llvm/docs/GettingInvolved.rst
index 0dba941..ad54434 100644
--- a/llvm/docs/GettingInvolved.rst
+++ b/llvm/docs/GettingInvolved.rst
@@ -562,7 +562,7 @@ An example invite looks as follows
 .. code-block:: none
 
   This event is a meetup for all developers of LLDB. Meeting agendas are posted
-  on discourse before the event.
+  on Discourse before the event.
 
   Attendees must adhere to the LLVM Code of Conduct
   (https://llvm.org/docs/CodeOfConduct.html). For any Code of Conduct reports,
diff --git a/llvm/docs/GettingStartedVS.rst b/llvm/docs/GettingStartedVS.rst
index e65fd8f..b82a4a0 100644
--- a/llvm/docs/GettingStartedVS.rst
+++ b/llvm/docs/GettingStartedVS.rst
@@ -244,7 +244,7 @@ Build the LLVM Suite:
 * The Fibonacci project is a sample program that uses the JIT. Modify the
   project's debugging properties to provide a numeric command-line argument
   or run it from the command line.  The program will print the
-  corresponding fibonacci value.
+  corresponding Fibonacci value.
 
 
 Links
diff --git a/llvm/docs/GwpAsan.rst b/llvm/docs/GwpAsan.rst
index 675a61d..937956f 100644
--- a/llvm/docs/GwpAsan.rst
+++ b/llvm/docs/GwpAsan.rst
@@ -31,7 +31,7 @@ Unlike `AddressSanitizer <https://clang.llvm.org/docs/AddressSanitizer.html>`_,
 GWP-ASan does not induce a significant performance overhead. ASan often requires
 the use of dedicated canaries to be viable in production environments, and as
 such is often impractical. Moreover, ASan's runtime is not developed with
-security consideration in mind, making compiled binaries more vulnerable to
+security considerations in mind, making compiled binaries more vulnerable to
 exploits.
 
 However, GWP-ASan is only capable of finding a subset of the memory issues
diff --git a/llvm/docs/HowToBuildWindowsItaniumPrograms.rst b/llvm/docs/HowToBuildWindowsItaniumPrograms.rst
index 48ca7b2..d932d9d 100644
--- a/llvm/docs/HowToBuildWindowsItaniumPrograms.rst
+++ b/llvm/docs/HowToBuildWindowsItaniumPrograms.rst
@@ -8,7 +8,7 @@ Introduction
 This document contains information describing how to create a Windows Itanium toolchain.
 
 Windows Itanium allows you to deploy Itanium C++ ABI applications on top of the MS VS CRT.
-This environment can use the Windows SDK headers directly and does not required additional
+This environment can use the Windows SDK headers directly and does not require additional
 headers or additional runtime machinery (such as is used by mingw).
 
 Windows Itanium Stack:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index bd0337f..ab085ca 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -20368,6 +20368,77 @@ Arguments:
 """"""""""
 The argument to this intrinsic must be a vector of floating-point values.
 
+Vector Partial Reduction Intrinsics
+-----------------------------------
+
+Partial reductions of vectors can be expressed using the intrinsics described in
+this section. Each one reduces the concatenation of the two vector arguments
+down to the number of elements of the result vector type.
+
+Other than the reduction operator (e.g. add, fadd), the way in which the
+concatenated arguments is reduced is entirely unspecified. By their nature these
+intrinsics are not expected to be useful in isolation but can instead be used to
+implement the first phase of an overall reduction operation.
+
+The typical use case is loop vectorization where reductions are split into an
+in-loop phase, where maintaining an unordered vector result is important for
+performance, and an out-of-loop phase is required to calculate the final scalar
+result.
+
+By avoiding the introduction of new ordering constraints, these intrinsics
+enhance the ability to leverage a target's accumulation instructions.
+
+'``llvm.vector.partial.reduce.add.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
+      declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
+      declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
+      declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
+
+Arguments:
+""""""""""
+
+The first argument is an integer vector with the same type as the result.
+
+The second argument is a vector with a length that is a known integer multiple
+of the result's type, while maintaining the same element type.
+
+'``llvm.vector.partial.reduce.fadd.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+      declare <4 x f32> @llvm.vector.partial.reduce.fadd.v4f32.v8f32(<4 x f32> %a, <8 x f32> %b)
+      declare <vscale x 4 x f32> @llvm.vector.partial.reduce.fadd.nxv4f32.nxv8f32(<vscale x 4 x f32> %a, <vscale x 8 x f32> %b)
+
+Arguments:
+""""""""""
+
+The first argument is a floating-point vector with the same type as the result.
+
+The second argument is a vector with a length that is a known integer multiple
+of the result's type, while maintaining the same element type.
+
+Semantics:
+""""""""""
+
+As the way in which the arguments to this floating-point intrinsic are reduced
+is unspecified, this intrinsic will assume floating-point reassociation and
+contraction can be leveraged to implement the reduction, which may result in
+variations to the results due to reordering or by lowering to different
+instructions (including combining multiple instructions into a single one).
+
 '``llvm.vector.insert``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -20741,50 +20812,6 @@ Note that it has the following implications:
 -  If ``%cnt`` is non-zero, the return value is non-zero as well.
 -  If ``%cnt`` is less than or equal to ``%max_lanes``, the return value is equal to ``%cnt``.
 
-'``llvm.vector.partial.reduce.add.*``' Intrinsic
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Syntax:
-"""""""
-This is an overloaded intrinsic.
-
-::
-
-      declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
-      declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
-      declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
-      declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
-
-Overview:
-"""""""""
-
-The '``llvm.vector.partial.reduce.add.*``' intrinsics reduce the
-concatenation of the two vector arguments down to the number of elements of the
-result vector type.
-
-Arguments:
-""""""""""
-
-The first argument is an integer vector with the same type as the result.
-
-The second argument is a vector with a length that is a known integer multiple
-of the result's type, while maintaining the same element type.
-
-Semantics:
-""""""""""
-
-Other than the reduction operator (e.g., add) the way in which the concatenated
-arguments is reduced is entirely unspecified. By their nature these intrinsics
-are not expected to be useful in isolation but instead implement the first phase
-of an overall reduction operation.
-
-The typical use case is loop vectorization where reductions are split into an
-in-loop phase, where maintaining an unordered vector result is important for
-performance, and an out-of-loop phase to calculate the final scalar result.
-
-By avoiding the introduction of new ordering constraints, these intrinsics
-enhance the ability to leverage a target's accumulation instructions.
-
 '``llvm.experimental.vector.histogram.*``' Intrinsic
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index 23bba99..fd78c97 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -182,6 +182,7 @@ Changes to the LLVM tools
 * `llvm-readelf` now dumps all hex format values in lower-case mode.
 * Some code paths for supporting Python 2.7 in `llvm-lit` have been removed.
 * Support for `%T` in lit has been removed.
+* Add `--save-stats` option to `llc` to save LLVM statistics to a file. Compatible with the Clang option.
 
 * `llvm-config` gained a new flag `--quote-paths` which quotes and escapes paths
   emitted on stdout, to account for spaces or other special characters in path.