43 files changed, 621 insertions, 429 deletions
diff --git a/llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst b/llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst
index 95ae4f7..ba670d3 100644
--- a/llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst
+++ b/llvm/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.rst
@@ -1187,7 +1187,7 @@ There are five kinds of location storage:
   operations. It would specify the debugger information entry and byte offset
   provided by the operations.
 
-*Location descriptions are a language independent representation of addressing
+*Location descriptions are a language-independent representation of addressing
 rules.*
 
 * *They can be the result of evaluating a debugger information entry attribute
@@ -1523,8 +1523,8 @@ expression.
       states that relocation of references from one executable or shared object
       file to another must be performed by the consumer. But given that DR is
       defined as an offset in a ``.debug_info`` section this seems impossible.
-      If DR was defined as an implementation defined value, then the consumer
-      could choose to interpret the value in an implementation defined manner to
+      If DR was defined as an implementation-defined value, then the consumer
+      could choose to interpret the value in an implementation-defined manner to
       reference a debug information in another executable or shared object.
 
       In ELF the ``.debug_info`` section is in a non-\ ``PT_LOAD`` segment so
@@ -4188,7 +4188,7 @@ The register rules are:
     conversion as the bit contents of the register is simply interpreted as a
     value of the address.
 
-    GDB has a per register hook that allows a target specific conversion on a
+    GDB has a per register hook that allows a target-specific conversion on a
     register by register basis. It defaults to truncation of bigger registers,
     and to actually reading bytes from the next register (or reads out of bounds
     for the last register) for smaller registers. There are no GDB tests that
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d13f95b..5343d66 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -768,6 +768,9 @@ For example:
                                                   performant than code generated for XNACK replay
                                                   disabled.
 
+     cu-stores       TODO                         On GFX12.5, controls whether ``scope:SCOPE_CU`` stores may be used.
+                                                  If disabled, all stores will be done at ``scope:SCOPE_SE`` or greater.
+
      =============== ============================ ==================================================
 
 .. _amdgpu-target-id:
@@ -1887,7 +1890,7 @@ The AMDGPU backend supports the following calling conventions:
 AMDGPU MCExpr
 -------------
 
-As part of the AMDGPU MC layer, AMDGPU provides the following target specific
+As part of the AMDGPU MC layer, AMDGPU provides the following target-specific
 ``MCExpr``\s.
 
   .. table:: AMDGPU MCExpr types:
@@ -5107,7 +5110,9 @@ The fields used by CP for code objects before V3 also match those specified in
                                                      and must be 0,
      >454    1 bit   ENABLE_SGPR_PRIVATE_SEGMENT
                      _SIZE
-     457:455 3 bits                                  Reserved, must be 0.
+     455     1 bit   USES_CU_STORES                  GFX12.5: Whether the ``cu-stores`` target attribute is enabled.
+                                                     If 0, then all stores are ``SCOPE_SE`` or higher.
+     457:456 2 bits                                  Reserved, must be 0.
      458     1 bit   ENABLE_WAVEFRONT_SIZE32         GFX6-GFX9
                                                        Reserved, must be 0.
                                                      GFX10-GFX11
@@ -18188,6 +18193,8 @@ terminated by an ``.end_amdhsa_kernel`` directive.
                                                                                   GFX942)
      ``.amdhsa_user_sgpr_private_segment_size``               0                   GFX6-GFX12   Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in
                                                                                                :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
+     ``.amdhsa_uses_cu_stores``                               0                   GFX12.5      Controls USES_CU_STORES in
+                                                                                               :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
      ``.amdhsa_wavefront_size32``                             Target              GFX10-GFX12  Controls ENABLE_WAVEFRONT_SIZE32 in
                                                               Feature                          :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`.
                                                               Specific
diff --git a/llvm/docs/CMake.rst b/llvm/docs/CMake.rst
index 17be41b..365365c 100644
--- a/llvm/docs/CMake.rst
+++ b/llvm/docs/CMake.rst
@@ -615,11 +615,11 @@ enabled sub-projects. Nearly all of these variable names begin with
   .. note::
     The list should not have duplicates with ``LLVM_ENABLE_PROJECTS``.
 
-  The full list is:
-
-  ``libc;libunwind;libcxxabi;libcxx;compiler-rt;openmp;llvm-libgcc;offload``
+  To list all possible runtimes, include an invalid name. For example
+  ``-DLLVM_ENABLE_RUNTIMES=notaruntime``. The resulting CMake error will list
+  the possible runtime names.
 
-  To enable all of them, use:
+  To enable all of the runtimes, use:
 
   ``LLVM_ENABLE_RUNTIMES=all``
 
diff --git a/llvm/docs/CodeGenerator.rst b/llvm/docs/CodeGenerator.rst
index 020eb09..8260b5c 100644
--- a/llvm/docs/CodeGenerator.rst
+++ b/llvm/docs/CodeGenerator.rst
@@ -323,7 +323,7 @@ provide one of these objects through the ``getJITInfo`` method.
 Machine code description classes
 ================================
 
-At the high-level, LLVM code is translated to a machine specific representation
+At the high-level, LLVM code is translated to a machine-specific representation
 formed out of :raw-html:`<tt>` `MachineFunction`_ :raw-html:`</tt>`,
 :raw-html:`<tt>` `MachineBasicBlock`_ :raw-html:`</tt>`, and :raw-html:`<tt>`
 `MachineInstr`_ :raw-html:`</tt>` instances (defined in
@@ -462,7 +462,7 @@ code:
   ret
 
 This approach is extremely general (if it can handle the X86 architecture, it
-can handle anything!) and allows all of the target specific knowledge about the
+can handle anything!) and allows all of the target-specific knowledge about the
 instruction stream to be isolated in the instruction selector.  Note that
 physical registers should have a short lifetime for good code generation, and
 all physical registers are assumed dead on entry to and exit from basic blocks
@@ -634,7 +634,7 @@ file (MCObjectStreamer).  MCAsmStreamer is a straightforward implementation
 that prints out a directive for each method (e.g. ``EmitValue -> .byte``), but
 MCObjectStreamer implements a full assembler.
 
-For target specific directives, the MCStreamer has a MCTargetStreamer instance.
+For target-specific directives, the MCStreamer has a MCTargetStreamer instance.
 Each target that needs it defines a class that inherits from it and is a lot
 like MCStreamer itself: It has one method per directive and two classes that
 inherit from it, a target object streamer and a target asm streamer. The target
diff --git a/llvm/docs/CommandGuide/lit.rst b/llvm/docs/CommandGuide/lit.rst
index 938b7f9..eb90e95 100644
--- a/llvm/docs/CommandGuide/lit.rst
+++ b/llvm/docs/CommandGuide/lit.rst
@@ -356,6 +356,11 @@ The timing data is stored in the `test_exec_root` in a file named
   primary purpose is to suppress an ``XPASS`` result without modifying a test
   case that uses the ``XFAIL`` directive.
 
+.. option:: --exclude-xfail
+
+  ``XFAIL`` tests won't be run, unless they are listed in the ``--xfail-not``
+  (or ``LIT_XFAIL_NOT``) lists.
+
 .. option:: --num-shards M
 
  Divide the set of selected tests into ``M`` equal-sized subsets or
diff --git a/llvm/docs/CommandGuide/llvm-bcanalyzer.rst b/llvm/docs/CommandGuide/llvm-bcanalyzer.rst
index 8f15e03..1e0b581 100644
--- a/llvm/docs/CommandGuide/llvm-bcanalyzer.rst
+++ b/llvm/docs/CommandGuide/llvm-bcanalyzer.rst
@@ -14,7 +14,7 @@ DESCRIPTION
 The :program:`llvm-bcanalyzer` command is a small utility for analyzing bitcode
 files.  The tool reads a bitcode file (such as generated with the
 :program:`llvm-as` tool) and produces a statistical report on the contents of
-the bitcode file.  The tool can also dump a low level but human readable
+the bitcode file.  The tool can also dump a low level but human-readable
 version of the bitcode file.  This tool is probably not of much interest or
 utility except for those working directly with the bitcode file format.  Most
 LLVM users can just ignore this tool.
@@ -30,7 +30,7 @@ OPTIONS
 
 .. option:: --dump
 
- Causes :program:`llvm-bcanalyzer` to dump the bitcode in a human readable
+ Causes :program:`llvm-bcanalyzer` to dump the bitcode in a human-readable
  format.  This format is significantly different from LLVM assembly and
  provides details about the encoding of the bitcode file.
 
diff --git a/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst b/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst
index 1264f80..6a4e348 100644
--- a/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst
+++ b/llvm/docs/CommandGuide/llvm-debuginfo-analyzer.rst
@@ -14,7 +14,7 @@ DESCRIPTION
 -----------
 :program:`llvm-debuginfo-analyzer` parses debug and text sections in
 binary object files and prints their contents in a logical view, which
-is a human readable representation that closely matches the structure
+is a human-readable representation that closely matches the structure
 of the original user source code. Supported object file formats include
 ELF, Mach-O, WebAssembly, PDB and COFF.
 
diff --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst
index 25e8969..5996026 100644
--- a/llvm/docs/CommandGuide/llvm-exegesis.rst
+++ b/llvm/docs/CommandGuide/llvm-exegesis.rst
@@ -106,7 +106,7 @@ properly.
   using the loop repetition mode. :program:`llvm-exegesis` needs to keep track
   of the current loop iteration within the loop repetition mode in a performant
   manner (i.e., no memory accesses), and uses a register to do this. This register
-  has an architecture specific default (e.g., `R8` on X86), but this might conflict
+  has an architecture-specific default (e.g., `R8` on X86), but this might conflict
   with some snippets. This annotation allows changing the register to prevent
   interference between the loop index register and the snippet.
 
diff --git a/llvm/docs/CommandGuide/llvm-ifs.rst b/llvm/docs/CommandGuide/llvm-ifs.rst
index 1fe81c2..e3582b3 100644
--- a/llvm/docs/CommandGuide/llvm-ifs.rst
+++ b/llvm/docs/CommandGuide/llvm-ifs.rst
@@ -11,7 +11,7 @@ SYNOPSIS
 DESCRIPTION
 -----------
 
-:program:`llvm-ifs` is a tool that jointly produces human readable text-based
+:program:`llvm-ifs` is a tool that jointly produces human-readable text-based
 stubs (.ifs files) for shared objects and linkable shared object stubs
 (.so files) from either ELF shared objects or text-based stubs. The text-based
 stubs is useful for monitoring ABI changes of the shared object. The linkable
diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst
index 13fe4996..0c9fb6e 100644
--- a/llvm/docs/CommandGuide/llvm-ir2vec.rst
+++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst
@@ -6,24 +6,28 @@ llvm-ir2vec - IR2Vec Embedding Generation Tool
 SYNOPSIS
 --------
 
-:program:`llvm-ir2vec` [*options*] *input-file*
+:program:`llvm-ir2vec` [*subcommand*] [*options*]
 
 DESCRIPTION
 -----------
 
 :program:`llvm-ir2vec` is a standalone command-line tool for IR2Vec. It
 generates IR2Vec embeddings for LLVM IR and supports triplet generation 
-for vocabulary training. It provides two main operation modes:
+for vocabulary training. The tool provides three main subcommands:
 
-1. **Triplet Mode**: Generates triplets (opcode, type, operands) for vocabulary
+1. **triplets**: Generates numeric triplets in train2id format for vocabulary
    training from LLVM IR.
 
-2. **Embedding Mode**: Generates IR2Vec embeddings using a trained vocabulary
+2. **entities**: Generates entity mapping files (entity2id.txt) for vocabulary 
+   training.
+
+3. **embeddings**: Generates IR2Vec embeddings using a trained vocabulary
    at different granularity levels (instruction, basic block, or function).
 
 The tool is designed to facilitate machine learning applications that work with
 LLVM IR by converting the IR into numerical representations that can be used by
-ML models.
+ML models. The `triplets` subcommand generates numeric IDs directly instead of string 
+triplets, streamlining the training data preparation workflow.
 
 .. note::
 
@@ -34,94 +38,130 @@ ML models.
 OPERATION MODES
 ---------------
 
-Triplet Generation Mode
-~~~~~~~~~~~~~~~~~~~~~~~
+Triplet Generation and Entity Mapping Modes are used for preparing
+vocabulary and training data for knowledge graph embeddings. The Embedding Mode
+is used for generating embeddings from LLVM IR using a pre-trained vocabulary.
+
+The Seed Embedding Vocabulary of IR2Vec is trained on a large corpus of LLVM IR
+by modeling the relationships between opcodes, types, and operands as a knowledge
+graph. For this purpose, Triplet Generation and Entity Mapping Modes generate
+triplets and entity mappings in the standard format used for knowledge graph
+embedding training (see 
+<https://github.com/thunlp/OpenKE/tree/OpenKE-PyTorch?tab=readme-ov-file#data-format> 
+for details).
+
+See `llvm/utils/mlgo-utils/IR2Vec/generateTriplets.py` for more details on how
+these two modes are used to generate the triplets and entity mappings.
+
+Triplet Generation
+~~~~~~~~~~~~~~~~~~
 
-In triplet mode, :program:`llvm-ir2vec` analyzes LLVM IR and extracts triplets
-consisting of opcodes, types, and operands. These triplets can be used to train
-vocabularies for embedding generation.
+With the `triplets` subcommand, :program:`llvm-ir2vec` analyzes LLVM IR and extracts
+numeric triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets
+are generated in the standard format used for knowledge graph embedding training.
+The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping
+infrastructure, eliminating the need for string-to-ID preprocessing.
 
 Usage:
 
 .. code-block:: bash
 
-   llvm-ir2vec --mode=triplets input.bc -o triplets.txt
+   llvm-ir2vec triplets input.bc -o triplets_train2id.txt
 
-Embedding Generation Mode
-~~~~~~~~~~~~~~~~~~~~~~~~~~
+Entity Mapping Generation
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
-In embedding mode, :program:`llvm-ir2vec` uses a pre-trained vocabulary to
+With the `entities` subcommand, :program:`llvm-ir2vec` generates the entity mappings
+supported by IR2Vec in the standard format used for knowledge graph embedding
+training. This subcommand outputs all supported entities (opcodes, types, and
+operands) with their corresponding numeric IDs, and is not specific for an
+LLVM IR file.
+
+Usage:
+
+.. code-block:: bash
+
+   llvm-ir2vec entities -o entity2id.txt
+
+Embedding Generation
+~~~~~~~~~~~~~~~~~~~~
+
+With the `embeddings` subcommand, :program:`llvm-ir2vec` uses a pre-trained vocabulary to
 generate numerical embeddings for LLVM IR at different levels of granularity.
 
 Example Usage:
 
 .. code-block:: bash
 
-   llvm-ir2vec --mode=embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt
+   llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt
 
 OPTIONS
 -------
 
-.. option:: --mode=<mode>
+Global options:
+
+.. option:: -o <filename>
+
+   Specify the output filename. Use ``-`` to write to standard output (default).
+
+.. option:: --help
+
+   Print a summary of command line options.
+
+Subcommand-specific options:
+
+**embeddings** subcommand:
 
- Specify the operation mode. Valid values are:
+.. option:: <input-file>
 
- * ``triplets`` - Generate triplets for vocabulary training
- * ``embeddings`` - Generate embeddings using trained vocabulary (default)
+   The input LLVM IR or bitcode file to process. This positional argument is
+   required for the `embeddings` subcommand.
 
 .. option:: --level=<level>
 
- Specify the embedding generation level. Valid values are:
+   Specify the embedding generation level. Valid values are:
 
- * ``inst`` - Generate instruction-level embeddings
- * ``bb`` - Generate basic block-level embeddings  
- * ``func`` - Generate function-level embeddings (default)
+   * ``inst`` - Generate instruction-level embeddings
+   * ``bb`` - Generate basic block-level embeddings  
+   * ``func`` - Generate function-level embeddings (default)
 
 .. option:: --function=<name>
 
- Process only the specified function instead of all functions in the module.
+   Process only the specified function instead of all functions in the module.
 
 .. option:: --ir2vec-vocab-path=<path>
 
- Specify the path to the vocabulary file (required for embedding mode).
- The vocabulary file should be in JSON format and contain the trained
- vocabulary for embedding generation. See `llvm/lib/Analysis/models`
- for pre-trained vocabulary files.
+   Specify the path to the vocabulary file (required for embedding generation).
+   The vocabulary file should be in JSON format and contain the trained
+   vocabulary for embedding generation. See `llvm/lib/Analysis/models`
+   for pre-trained vocabulary files.
 
 .. option:: --ir2vec-opc-weight=<weight>
 
- Specify the weight for opcode embeddings (default: 1.0). This controls
- the relative importance of instruction opcodes in the final embedding.
+   Specify the weight for opcode embeddings (default: 1.0). This controls
+   the relative importance of instruction opcodes in the final embedding.
 
 .. option:: --ir2vec-type-weight=<weight>
 
- Specify the weight for type embeddings (default: 0.5). This controls
- the relative importance of type information in the final embedding.
+   Specify the weight for type embeddings (default: 0.5). This controls
+   the relative importance of type information in the final embedding.
 
 .. option:: --ir2vec-arg-weight=<weight>
 
- Specify the weight for argument embeddings (default: 0.2). This controls
- the relative importance of operand information in the final embedding.
+   Specify the weight for argument embeddings (default: 0.2). This controls
+   the relative importance of operand information in the final embedding.
 
-.. option:: -o <filename>
-
- Specify the output filename. Use ``-`` to write to standard output (default).
 
-.. option:: --help
+**triplets** subcommand:
 
- Print a summary of command line options.
+.. option:: <input-file>
 
-.. note::
+   The input LLVM IR or bitcode file to process. This positional argument is
+   required for the `triplets` subcommand.
 
-   ``--level``, ``--function``, ``--ir2vec-vocab-path``, ``--ir2vec-opc-weight``, 
-   ``--ir2vec-type-weight``, and ``--ir2vec-arg-weight`` are only used in embedding 
-   mode. These options are ignored in triplet mode.
+**entities** subcommand:
 
-INPUT FILE FORMAT
------------------
-
-:program:`llvm-ir2vec` accepts LLVM bitcode files (``.bc``) and LLVM IR files 
-(``.ll``) as input. The input file should contain valid LLVM IR.
+   No subcommand-specific options.
 
 OUTPUT FORMAT
 -------------
@@ -129,14 +169,34 @@ OUTPUT FORMAT
 Triplet Mode Output
 ~~~~~~~~~~~~~~~~~~~
 
-In triplet mode, the output consists of lines containing space-separated triplets:
+In triplet mode, the output consists of numeric triplets in train2id format with
+metadata headers. The format includes:
+
+.. code-block:: text
+
+   MAX_RELATIONS=<max_relations_count>
+   <head_entity_id> <tail_entity_id> <relation_id>
+   <head_entity_id> <tail_entity_id> <relation_id>
+   ...
+
+Each line after the metadata header represents one instruction relationship,
+with numeric IDs for head entity, relation, and tail entity. The metadata 
+header (MAX_RELATIONS) provides counts for post-processing and training setup.
+
+Entity Mode Output
+~~~~~~~~~~~~~~~~~~
+
+In entity mode, the output consists of entity mapping in the format:
 
 .. code-block:: text
 
-   <opcode> <type> <operand1> <operand2> ...
+   <total_entities>
+   <entity_string>	<numeric_id>
+   <entity_string>	<numeric_id>
+   ...
 
-Each line represents the information of one instruction, with the opcode, type,
-and operands.
+The first line contains the total number of entities, followed by one entity
+mapping per line with tab-separated entity string and numeric ID.
 
 Embedding Mode Output
 ~~~~~~~~~~~~~~~~~~~~~
diff --git a/llvm/docs/CommandGuide/llvm-locstats.rst b/llvm/docs/CommandGuide/llvm-locstats.rst
index 3186566..7f436c1 100644
--- a/llvm/docs/CommandGuide/llvm-locstats.rst
+++ b/llvm/docs/CommandGuide/llvm-locstats.rst
@@ -13,7 +13,7 @@ DESCRIPTION
 
 :program:`llvm-locstats` works like a wrapper around :program:`llvm-dwarfdump`.
 It parses :program:`llvm-dwarfdump` statistics regarding debug location by
-pretty printing it in a more human readable way.
+pretty printing it in a more human-readable way.
 
 The line 0% shows the number and the percentage of DIEs with no location
 information, but the line 100% shows the information for DIEs where there is
diff --git a/llvm/docs/CommandGuide/llvm-mca.rst b/llvm/docs/CommandGuide/llvm-mca.rst
index bea1931..1daae5d 100644
--- a/llvm/docs/CommandGuide/llvm-mca.rst
+++ b/llvm/docs/CommandGuide/llvm-mca.rst
@@ -241,7 +241,7 @@ option specifies "``-``", then the output will also be sent to standard output.
 .. option:: -disable-cb
 
   Force usage of the generic CustomBehaviour and InstrPostProcess classes rather
-  than using the target specific implementation. The generic classes never
+  than using the target-specific implementation. The generic classes never
   detect any custom hazards or make any post processing modifications to
   instructions.
 
@@ -1125,9 +1125,9 @@ CustomBehaviour class can be used in these cases to enforce proper
 instruction modeling (often by customizing data dependencies and detecting
 hazards that :program:`llvm-mca` has no way of knowing about).
 
-:program:`llvm-mca` comes with one generic and multiple target specific
+:program:`llvm-mca` comes with one generic and multiple target-specific
 CustomBehaviour classes. The generic class will be used if the ``-disable-cb``
-flag is used or if a target specific CustomBehaviour class doesn't exist for
+flag is used or if a target-specific CustomBehaviour class doesn't exist for
 that target. (The generic class does nothing.) Currently, the CustomBehaviour
 class is only a part of the in-order pipeline, but there are plans to add it
 to the out-of-order pipeline in the future.
@@ -1141,7 +1141,7 @@ if you don't know the exact number and a value of 0 represents no stall).
 
 If you'd like to add a CustomBehaviour class for a target that doesn't
 already have one, refer to an existing implementation to see how to set it
-up. The classes are implemented within the target specific backend (for
+up. The classes are implemented within the target-specific backend (for
 example `/llvm/lib/Target/AMDGPU/MCA/`) so that they can access backend symbols.
 
 Instrument Manager
@@ -1177,12 +1177,12 @@ classes (MCSubtargetInfo, MCInstrInfo, etc.), please add it to the
 AND requires unexposed backend symbols or functionality, you can define it in
 the `/lib/Target/<TargetName>/MCA/` directory.
 
-To enable this target specific View, you will have to use this target's
+To enable this target-specific View, you will have to use this target's
 CustomBehaviour class to override the `CustomBehaviour::getViews()` methods.
 There are 3 variations of these methods based on where you want your View to
 appear in the output: `getStartViews()`, `getPostInstrInfoViews()`, and
 `getEndViews()`. These methods returns a vector of Views so you will want to
-return a vector containing all of the target specific Views for the target in
+return a vector containing all of the target-specific Views for the target in
 question.
 
 Because these target specific (and backend dependent) Views require the
diff --git a/llvm/docs/CommandGuide/llvm-profdata.rst b/llvm/docs/CommandGuide/llvm-profdata.rst
index b2c0457..0b1cd02 100644
--- a/llvm/docs/CommandGuide/llvm-profdata.rst
+++ b/llvm/docs/CommandGuide/llvm-profdata.rst
@@ -338,7 +338,7 @@ OPTIONS
 
  Instruct the profile dumper to show profile counts in the text format of the
  instrumentation-based profile data representation. By default, the profile
- information is dumped in a more human readable form (also in text) with
+ information is dumped in a more human-readable form (also in text) with
  annotations.
 
 .. option:: --topn=<n>
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 2da1b24..fb86a69 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -371,7 +371,7 @@ OPTIONS
   * Prints an address's debug-data discriminator when it is non-zero. One way to
     produce discriminators is to compile with clang's -fdebug-info-for-profiling.
 
-  ``JSON`` style provides a machine readable output in JSON. If addresses are
+  ``JSON`` style provides a machine-readable output in JSON. If addresses are
     supplied via stdin, the output JSON will be a series of individual objects.
     Otherwise, all results will be contained in a single array.
 
@@ -444,7 +444,7 @@ OPTIONS
 
 .. option:: --pretty-print, -p
 
-  Print human readable output. If :option:`--inlining` is specified, the
+  Print human-readable output. If :option:`--inlining` is specified, the
   enclosing scope is prefixed by (inlined by).
   For JSON output, the option will cause JSON to be indented and split over
   new lines. Otherwise, the JSON output will be printed in a compact form.
diff --git a/llvm/docs/CommandGuide/opt.rst b/llvm/docs/CommandGuide/opt.rst
index f067f62..da93b8e 100644
--- a/llvm/docs/CommandGuide/opt.rst
+++ b/llvm/docs/CommandGuide/opt.rst
@@ -46,12 +46,12 @@ OPTIONS
 
  Write output in LLVM intermediate language (instead of bitcode).
 
-.. option:: -{passname}
+.. option:: -passes=<string>
 
- :program:`opt` provides the ability to run any of LLVM's optimization or
- analysis passes in any order.  The :option:`-help` option lists all the passes
- available.  The order in which the options occur on the command line are the
- order in which they are executed (within pass constraints).
+ A textual (comma-separated) description of the pass pipeline,
+ e.g., ``-passes="sroa,instcombine"``. See
+ `invoking opt <../NewPassManager.html#invoking-opt>`_ for more details on the
+ pass pipeline syntax.
 
 .. option:: -strip-debug
 
diff --git a/llvm/docs/Coroutines.rst b/llvm/docs/Coroutines.rst
index 7472c68..dde73c9 100644
--- a/llvm/docs/Coroutines.rst
+++ b/llvm/docs/Coroutines.rst
@@ -37,7 +37,7 @@ then destroy it:
 
 .. _coroutine frame:
 
-In addition to the function stack frame which exists when a coroutine is
+In addition to the function stack frame, which exists when a coroutine is
 executing, there is an additional region of storage that contains objects that
 keep the coroutine state when a coroutine is suspended. This region of storage
 is called the **coroutine frame**. It is created when a coroutine is called
@@ -145,7 +145,7 @@ lowerings:
   yielded values.
 
   The coroutine indicates that it has run to completion by returning
-  a null continuation pointer. Any yielded values will be `undef`
+  a null continuation pointer. Any yielded values will be `undef` and
   should be ignored.
 
 - In yield-once returned-continuation lowering, the coroutine must
@@ -159,7 +159,7 @@ passed to the `coro.id` intrinsic, which guarantees a certain size
 and alignment statically. The same buffer must be passed to the
 continuation function(s). The coroutine will allocate memory if the
 buffer is insufficient, in which case it will need to store at
-least that pointer in the buffer; therefore the buffer must always
+least that pointer in the buffer; therefore, the buffer must always
 be at least pointer-sized. How the coroutine uses the buffer may
 vary between suspend points.
 
@@ -182,7 +182,7 @@ handling of control-flow must be handled explicitly by the frontend.
 In this lowering, a coroutine is assumed to take the current `async context` as
 one of its arguments (the argument position is determined by
 `llvm.coro.id.async`). It is used to marshal arguments and return values of the
-coroutine. Therefore an async coroutine returns `void`.
+coroutine. Therefore, an async coroutine returns `void`.
 
 .. code-block:: llvm
 
@@ -321,7 +321,7 @@ The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
 given the coroutine handle, returns a pointer of the memory block to be freed or
 `null` if the coroutine frame was not allocated dynamically. The `cleanup`
 block is entered when coroutine runs to completion by itself or destroyed via
-call to the `coro.destroy`_ intrinsic.
+a call to the `coro.destroy`_ intrinsic.
 
 The `suspend` block contains code to be executed when coroutine runs to
 completion or suspended. The `coro.end`_ intrinsic marks the point where
@@ -337,7 +337,7 @@ Coroutine Transformation
 ------------------------
 
 One of the steps of coroutine lowering is building the coroutine frame. The
-def-use chains are analyzed to determine which objects need be kept alive across
+def-use chains are analyzed to determine which objects need to be kept alive across
 suspend points. In the coroutine shown in the previous section, use of virtual register
 `%inc` is separated from the definition by a suspend point, therefore, it
 cannot reside on the stack frame since the latter goes away once the coroutine
@@ -532,7 +532,7 @@ as follows:
     ret void
   }
 
-If different cleanup code needs to get executed for different suspend points,
+If different cleanup code needs to be executed for different suspend points,
 a similar switch will be in the `f.destroy` function.
 
 .. note ::
@@ -740,7 +740,7 @@ looks like this:
     <SUSPEND final=true> // injected final suspend point
   }
 
-and python iterator `__next__` would look like:
+and Python iterator `__next__` would look like:
 
 .. code-block:: c++
 
@@ -829,7 +829,7 @@ A swifterror alloca or parameter can only be loaded, stored, or passed as a swif
 These rules, not coincidentally, mean that you can always perfectly model the data flow in the alloca, and LLVM CodeGen actually has to do that in order to emit code.
 
 For coroutine lowering the default treatment of allocas breaks those rules — splitting will try to replace the alloca with an entry in the coro frame, which can lead to trying to pass that as a swifterror argument.
-To pass a swifterror argument in a split function, we need to still have the alloca around; but we also potentially need the coro frame slot, since useful data can (in theory) be stored in the swifterror alloca slot across suspensions in the presplit coroutine. 
+To pass a swifterror argument in a split function, we need to still have the alloca around, but we also potentially need the coro frame slot, since useful data can (in theory) be stored in the swifterror alloca slot across suspensions in the presplit coroutine.
 When split a coroutine it is consequently necessary to keep both the frame slot as well as the alloca itself and then keep them in sync.
 
 Intrinsics
@@ -965,7 +965,7 @@ Semantics:
 Using this intrinsic on a coroutine that does not have a coroutine promise
 leads to undefined behavior. It is possible to read and modify coroutine
 promise of the coroutine which is currently executing. The coroutine author and
-a coroutine user are responsible to makes sure there is no data races.
+a coroutine user are responsible for ensuring no data races.
 
 Example:
 """"""""
@@ -1181,7 +1181,7 @@ Overview:
 """""""""
 
 The '``llvm.coro.alloc``' intrinsic returns `true` if dynamic allocation is
-required to obtain a memory for the coroutine frame and `false` otherwise.
+required to obtain memory for the coroutine frame and `false` otherwise.
 This is not supported for returned-continuation coroutines.
 
 Arguments:
@@ -1628,7 +1628,7 @@ The second argument should be `true` if this coro.end is in the block that is
 part of the unwind sequence leaving the coroutine body due to an exception and
 `false` otherwise.
 
-The third argument if present should specify a function to be called.
+The third argument, if present, should specify a function to be called.
 
 If the third argument is present, the remaining arguments are the arguments to
 the function call.
@@ -1700,7 +1700,7 @@ Semantics:
 If a coroutine that was suspended at the suspend point marked by this intrinsic
 is resumed via `coro.resume`_ the control will transfer to the basic block
 of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the
-basic block indicated by the 1-case. To suspend, coroutine proceed to the
+basic block indicated by the 1-case. To suspend, coroutine proceeds to the
 default label.
 
 If suspend intrinsic is marked as final, it can consider the `true` branch
@@ -1717,7 +1717,7 @@ unreachable and can perform optimizations that can take advantage of that fact.
 Overview:
 """""""""
 
-The '``llvm.coro.save``' marks the point where a coroutine need to update its
+The '``llvm.coro.save``' marks the point where a coroutine needs to update its
 state to prepare for resumption to be considered suspended (and thus eligible
 for resumption). It is illegal to merge two '``llvm.coro.save``' calls unless their
 '``llvm.coro.suspend``' users are also merged. So '``llvm.coro.save``' is currently
@@ -1793,7 +1793,7 @@ The fourth to six argument are the arguments for the third argument.
 Semantics:
 """"""""""
 
-The result of the intrinsic are mapped to the arguments of the resume function.
+The results of the intrinsic are mapped to the arguments of the resume function.
 Execution is suspended at this intrinsic and resumed when the resume function is
 called.
 
diff --git a/llvm/docs/DirectX/DXContainer.rst b/llvm/docs/DirectX/DXContainer.rst
index 4ace8a1..17452d9 100644
--- a/llvm/docs/DirectX/DXContainer.rst
+++ b/llvm/docs/DirectX/DXContainer.rst
@@ -280,7 +280,7 @@ elements are:
    This represents ``f5`` in the source.
 
 The LLVM ``obj2yaml`` tool can parse this data out of the PSV and present it in
-human readable YAML. For the example above it produces the output:
+human-readable YAML. For the example above it produces the output:
 
 .. code-block:: YAML
 
diff --git a/llvm/docs/Frontend/PerformanceTips.rst b/llvm/docs/Frontend/PerformanceTips.rst
index 4baf127..b81df70 100644
--- a/llvm/docs/Frontend/PerformanceTips.rst
+++ b/llvm/docs/Frontend/PerformanceTips.rst
@@ -35,7 +35,7 @@ The Basics
 ^^^^^^^^^^^
 
 #. Make sure that your Modules contain both a data layout specification and
-   target triple. Without these pieces, non of the target specific optimization
+   target triple. Without these pieces, non of the target-specific optimization
    will be enabled.  This can have a major effect on the generated code quality.
 
 #. For each function or global emitted, use the most private linkage type
diff --git a/llvm/docs/FuzzingLLVM.rst b/llvm/docs/FuzzingLLVM.rst
index 6b32eea..a0355d7 100644
--- a/llvm/docs/FuzzingLLVM.rst
+++ b/llvm/docs/FuzzingLLVM.rst
@@ -128,7 +128,7 @@ llvm-mc-assemble-fuzzer
 -----------------------
 
 A |generic fuzzer| that fuzzes the MC layer's assemblers by treating inputs as
-target specific assembly.
+target-specific assembly.
 
 Note that this fuzzer has an unusual command line interface which is not fully
 compatible with all of libFuzzer's features. Fuzzer arguments must be passed
diff --git a/llvm/docs/GettingStarted.rst b/llvm/docs/GettingStarted.rst
index e4dbb64b..8d0adf3 100644
--- a/llvm/docs/GettingStarted.rst
+++ b/llvm/docs/GettingStarted.rst
@@ -919,11 +919,11 @@ the `Command Guide <CommandGuide/index.html>`_.
 
 ``llvm-as``
 
-  The assembler transforms the human readable LLVM assembly to LLVM bitcode.
+  The assembler transforms the human-readable LLVM assembly to LLVM bitcode.
 
 ``llvm-dis``
 
-  The disassembler transforms the LLVM bitcode to human readable LLVM assembly.
+  The disassembler transforms the LLVM bitcode to human-readable LLVM assembly.
 
 ``llvm-link``
 
diff --git a/llvm/docs/GlobalISel/GMIR.rst b/llvm/docs/GlobalISel/GMIR.rst
index 633dfb8..be7e677 100644
--- a/llvm/docs/GlobalISel/GMIR.rst
+++ b/llvm/docs/GlobalISel/GMIR.rst
@@ -26,7 +26,7 @@ Generic Machine Instructions
   Reference.
 
 Whereas MIR deals largely in Target Instructions and only has a small set of
-target independent opcodes such as ``COPY``, ``PHI``, and ``REG_SEQUENCE``,
+target-independent opcodes such as ``COPY``, ``PHI``, and ``REG_SEQUENCE``,
 gMIR defines a rich collection of ``Generic Opcodes`` which are target
 independent and describe operations which are typically supported by targets.
 One example is ``G_ADD`` which is the generic opcode for an integer addition.
diff --git a/llvm/docs/GlobalISel/GenericOpcode.rst b/llvm/docs/GlobalISel/GenericOpcode.rst
index 4816094..eefd76d 100644
--- a/llvm/docs/GlobalISel/GenericOpcode.rst
+++ b/llvm/docs/GlobalISel/GenericOpcode.rst
@@ -1105,7 +1105,7 @@ G_TRAP, G_DEBUGTRAP, G_UBSANTRAP
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Represents :ref:`llvm.trap <llvm.trap>`, :ref:`llvm.debugtrap <llvm.debugtrap>`
-and :ref:`llvm.ubsantrap <llvm.ubsantrap>` that generate a target dependent
+and :ref:`llvm.ubsantrap <llvm.ubsantrap>` that generate a target-dependent
 trap instructions.
 
 .. code-block:: none
diff --git a/llvm/docs/GlobalISel/Pipeline.rst b/llvm/docs/GlobalISel/Pipeline.rst
index 01bd4df..b9085e8 100644
--- a/llvm/docs/GlobalISel/Pipeline.rst
+++ b/llvm/docs/GlobalISel/Pipeline.rst
@@ -80,7 +80,7 @@ Combiner
   alternatives but Combiners can also focus on code size or other metrics.
 
 Additional passes such as these can be inserted to support higher optimization
-levels or target specific needs. A likely pipeline is:
+levels or target-specific needs. A likely pipeline is:
 
 .. image:: pipeline-overview-with-combiners.png
 
diff --git a/llvm/docs/HowToCrossCompileBuiltinsOnArm.rst b/llvm/docs/HowToCrossCompileBuiltinsOnArm.rst
index 2e199a0..31ead45 100644
--- a/llvm/docs/HowToCrossCompileBuiltinsOnArm.rst
+++ b/llvm/docs/HowToCrossCompileBuiltinsOnArm.rst
@@ -14,117 +14,113 @@ targets are welcome.
 
 The instructions in this document depend on libraries and programs external to
 LLVM, there are many ways to install and configure these dependencies so you
-may need to adapt the instructions here to fit your own local situation.
+may need to adapt the instructions here to fit your own situation.
 
 Prerequisites
 =============
 
-In this use case we'll be using cmake on a Debian-based Linux system,
-cross-compiling from an x86_64 host to a hard-float Armv7-A target. We'll be
+In this use case we will be using cmake on a Debian-based Linux system,
+cross-compiling from an x86_64 host to a hard-float Armv7-A target. We will be
 using as many of the LLVM tools as we can, but it is possible to use GNU
 equivalents.
 
- * ``A build of LLVM/clang for the llvm-tools and llvm-config``
- * ``A clang executable with support for the ARM target``
- * ``compiler-rt sources``
- * ``The qemu-arm user mode emulator``
- * ``An arm-linux-gnueabihf sysroot``
+You will need:
+ * A build of LLVM for the llvm-tools and ``llvm-config``.
+ * A clang executable with support for the ``ARM`` target.
+ * compiler-rt sources.
+ * The ``qemu-arm`` user mode emulator.
+ * An ``arm-linux-gnueabihf`` sysroot.
 
-In this example we will be using ninja.
+In this example we will be using ``ninja`` as the build tool.
 
-See https://compiler-rt.llvm.org/ for more information about the dependencies
+See https://compiler-rt.llvm.org/ for information about the dependencies
 on clang and LLVM.
 
 See https://llvm.org/docs/GettingStarted.html for information about obtaining
-the source for LLVM and compiler-rt. Note that the getting started guide
-places compiler-rt in the projects subdirectory, but this is not essential and
-if you are using the BaremetalARM.cmake cache for v6-M, v7-M and v7-EM then
-compiler-rt must be placed in the runtimes directory.
+the source for LLVM and compiler-rt.
 
 ``qemu-arm`` should be available as a package for your Linux distribution.
 
-The most complicated of the prerequisites to satisfy is the arm-linux-gnueabihf
+The most complicated of the prerequisites to satisfy is the ``arm-linux-gnueabihf``
 sysroot. In theory it is possible to use the Linux distributions multiarch
 support to fulfill the dependencies for building but unfortunately due to
-/usr/local/include being added some host includes are selected. The easiest way
-to supply a sysroot is to download the arm-linux-gnueabihf toolchain. This can
-be found at:
-* https://developer.arm.com/open-source/gnu-toolchain/gnu-a/downloads for gcc 8 and above
-* https://releases.linaro.org/components/toolchain/binaries/ for gcc 4.9 to 7.3
+``/usr/local/include`` being added some host includes are selected.
+
+The easiest way to supply a sysroot is to download an ``arm-linux-gnueabihf``
+toolchain from https://developer.arm.com/open-source/gnu-toolchain/gnu-a/downloads.
 
 Building compiler-rt builtins for Arm
 =====================================
+
 We will be doing a standalone build of compiler-rt using the following cmake
-options.
-
-* ``path/to/compiler-rt``
-* ``-G Ninja``
-* ``-DCMAKE_AR=/path/to/llvm-ar``
-* ``-DCMAKE_ASM_COMPILER_TARGET="arm-linux-gnueabihf"``
-* ``-DCMAKE_ASM_FLAGS="build-c-flags"``
-* ``-DCMAKE_C_COMPILER=/path/to/clang``
-* ``-DCMAKE_C_COMPILER_TARGET="arm-linux-gnueabihf"``
-* ``-DCMAKE_C_FLAGS="build-c-flags"``
-* ``-DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld"``
-* ``-DCMAKE_NM=/path/to/llvm-nm``
-* ``-DCMAKE_RANLIB=/path/to/llvm-ranlib``
-* ``-DCOMPILER_RT_BUILD_BUILTINS=ON``
-* ``-DCOMPILER_RT_BUILD_LIBFUZZER=OFF``
-* ``-DCOMPILER_RT_BUILD_MEMPROF=OFF``
-* ``-DCOMPILER_RT_BUILD_PROFILE=OFF``
-* ``-DCOMPILER_RT_BUILD_SANITIZERS=OFF``
-* ``-DCOMPILER_RT_BUILD_XRAY=OFF``
-* ``-DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON``
-* ``-DLLVM_CONFIG_PATH=/path/to/llvm-config``
+options::
+
+  cmake path/to/compiler-rt \
+    -G Ninja \
+    -DCMAKE_AR=/path/to/llvm-ar \
+    -DCMAKE_ASM_COMPILER_TARGET="arm-linux-gnueabihf" \
+    -DCMAKE_ASM_FLAGS="build-c-flags" \
+    -DCMAKE_C_COMPILER=/path/to/clang \
+    -DCMAKE_C_COMPILER_TARGET="arm-linux-gnueabihf" \
+    -DCMAKE_C_FLAGS="build-c-flags" \
+    -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld" \
+    -DCMAKE_NM=/path/to/llvm-nm \
+    -DCMAKE_RANLIB=/path/to/llvm-ranlib \
+    -DCOMPILER_RT_BUILD_BUILTINS=ON \
+    -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
+    -DCOMPILER_RT_BUILD_MEMPROF=OFF \
+    -DCOMPILER_RT_BUILD_PROFILE=OFF \
+    -DCOMPILER_RT_BUILD_SANITIZERS=OFF \
+    -DCOMPILER_RT_BUILD_XRAY=OFF \
+    -DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON \
+    -DLLVM_CONFIG_PATH=/path/to/llvm-config
 
 The ``build-c-flags`` need to be sufficient to pass the C-make compiler check,
 compile compiler-rt, and if you are running the tests, compile and link the
 tests. When cross-compiling with clang we will need to pass sufficient
-information to generate code for the Arm architecture we are targeting. We will
-need to select the Arm target, select the Armv7-A architecture and choose
-between using Arm or Thumb.
-instructions. For example:
+information to generate code for the Arm architecture we are targeting.
 
-* ``--target=arm-linux-gnueabihf``
-* ``-march=armv7a``
-* ``-mthumb``
+We will need to select:
+ * The Arm target and Armv7-A architecture with ``--target=arm-linux-gnueabihf -march=armv7a``.
+ * Whether to generate Arm (the default) or Thumb instructions (``-mthumb``).
 
-When using a GCC arm-linux-gnueabihf toolchain the following flags are
+When using a GCC ``arm-linux-gnueabihf`` toolchain the following flags are
 needed to pick up the includes and libraries:
 
-* ``--gcc-toolchain=/path/to/dir/toolchain``
-* ``--sysroot=/path/to/toolchain/arm-linux-gnueabihf/libc``
+ * ``--gcc-toolchain=/path/to/dir/toolchain``
+ * ``--sysroot=/path/to/toolchain/arm-linux-gnueabihf/libc``
 
 In this example we will be adding all of the command line options to both
 ``CMAKE_C_FLAGS`` and ``CMAKE_ASM_FLAGS``. There are cmake flags to pass some of
-these options individually which can be used to simplify the ``build-c-flags``:
+these options individually which can be used to simplify the ``build-c-flags``::
 
-* ``-DCMAKE_C_COMPILER_TARGET="arm-linux-gnueabihf"``
-* ``-DCMAKE_ASM_COMPILER_TARGET="arm-linux-gnueabihf"``
-* ``-DCMAKE_C_COMPILER_EXTERNAL_TOOLCHAIN=/path/to/dir/toolchain``
-* ``-DCMAKE_SYSROOT=/path/to/dir/toolchain/arm-linux-gnueabihf/libc``
+ -DCMAKE_C_COMPILER_TARGET="arm-linux-gnueabihf"
+ -DCMAKE_ASM_COMPILER_TARGET="arm-linux-gnueabihf"
+ -DCMAKE_C_COMPILER_EXTERNAL_TOOLCHAIN=/path/to/dir/toolchain
+ -DCMAKE_SYSROOT=/path/to/dir/toolchain/arm-linux-gnueabihf/libc
 
 Once cmake has completed the builtins can be built with ``ninja builtins``
 
 Testing compiler-rt builtins using qemu-arm
 ===========================================
+
 To test the builtins library we need to add a few more cmake flags to enable
 testing and set up the compiler and flags for test case. We must also tell
-cmake that we wish to run the tests on ``qemu-arm``.
+cmake that we wish to run the tests on ``qemu-arm``::
 
-* ``-DCOMPILER_RT_EMULATOR="qemu-arm -L /path/to/armhf/sysroot``
-* ``-DCOMPILER_RT_INCLUDE_TESTS=ON``
-* ``-DCOMPILER_RT_TEST_COMPILER="/path/to/clang"``
-* ``-DCOMPILER_RT_TEST_COMPILER_CFLAGS="test-c-flags"``
+ -DCOMPILER_RT_EMULATOR="qemu-arm -L /path/to/armhf/sysroot"
+ -DCOMPILER_RT_INCLUDE_TESTS=ON
+ -DCOMPILER_RT_TEST_COMPILER="/path/to/clang"
+ -DCOMPILER_RT_TEST_COMPILER_CFLAGS="test-c-flags"
 
 The ``/path/to/armhf/sysroot`` should be the same as the one passed to
-``--sysroot`` in the "build-c-flags".
+``--sysroot`` in the ``build-c-flags``.
 
-The "test-c-flags" need to include the target, architecture, gcc-toolchain,
-sysroot and arm/thumb state. The additional cmake defines such as
+The ``test-c-flags`` need to include the target, architecture, gcc-toolchain,
+sysroot and Arm/Thumb state. The additional cmake defines such as
 ``CMAKE_C_COMPILER_EXTERNAL_TOOLCHAIN`` do not apply when building the tests. If
-you have put all of these in "build-c-flags" then these can be repeated. If you
-wish to use lld to link the tests then add ``"-fuse-ld=lld``.
+you have put all of these in ``build-c-flags`` then these can be repeated. If you
+wish to use lld to link the tests then add ``-fuse-ld=lld``.
 
 Once cmake has completed the tests can be built and run using
 ``ninja check-builtins``
@@ -142,19 +138,21 @@ This stage can often fail at link time if the ``--sysroot=`` and
 ``CMAKE_C_FLAGS`` and ``CMAKE_C_COMPILER_TARGET`` flags.
 
 It can be useful to build a simple example outside of cmake with your toolchain
-to make sure it is working. For example: ``clang --target=arm-linux-gnueabi -march=armv7a --gcc-toolchain=/path/to/gcc-toolchain --sysroot=/path/to/gcc-toolchain/arm-linux-gnueabihf/libc helloworld.c``
+to make sure it is working. For example::
+
+  clang --target=arm-linux-gnueabi -march=armv7a --gcc-toolchain=/path/to/gcc-toolchain --sysroot=/path/to/gcc-toolchain/arm-linux-gnueabihf/libc helloworld.c
 
 Clang uses the host header files
 --------------------------------
 On debian based systems it is possible to install multiarch support for
-arm-linux-gnueabi and arm-linux-gnueabihf. In many cases clang can successfully
+``arm-linux-gnueabi`` and ``arm-linux-gnueabihf``. In many cases clang can successfully
 use this multiarch support when ``--gcc-toolchain=`` and ``--sysroot=`` are not supplied.
 Unfortunately clang adds ``/usr/local/include`` before
 ``/usr/include/arm-linux-gnueabihf`` leading to errors when compiling the hosts
 header files.
 
 The multiarch support is not sufficient to build the builtins you will need to
-use a separate arm-linux-gnueabihf toolchain.
+use a separate ``arm-linux-gnueabihf`` toolchain.
 
 No target passed to clang
 -------------------------
@@ -164,12 +162,13 @@ as ``error: unknown directive .syntax unified``.
 
 You can check the clang invocation in the error message to see if there is no
 ``--target`` or if it is set incorrectly. The cause is usually
-``CMAKE_ASM_FLAGS`` not containing ``--target`` or ``CMAKE_ASM_COMPILER_TARGET`` not being present.
+``CMAKE_ASM_FLAGS`` not containing ``--target`` or ``CMAKE_ASM_COMPILER_TARGET``
+not being present.
 
 Arm architecture not given
 --------------------------
-The ``--target=arm-linux-gnueabihf`` will default to arm architecture v4t which
-cannot assemble the barrier instructions used in the synch_and_fetch source
+The ``--target=arm-linux-gnueabihf`` will default to Arm architecture v4t which
+cannot assemble the barrier instructions used in the ``synch_and_fetch`` source
 files.
 
 The cause is usually a missing ``-march=armv7a`` from the ``CMAKE_ASM_FLAGS``.
@@ -202,7 +201,7 @@ may need extra c-flags such as ``-mfloat-abi=softfp`` for use of floating-point
 instructions, and ``-mfloat-abi=soft -mfpu=none`` for software floating-point
 emulation.
 
-You will need to use an arm-linux-gnueabi GNU toolchain for soft-float.
+You will need to use an ``arm-linux-gnueabi`` GNU toolchain for soft-float.
 
 AArch64 Target
 --------------
@@ -220,8 +219,12 @@ Armv6-m, Armv7-m and Armv7E-M targets
 To build and test the libraries using a similar method to Armv7-A is possible
 but more difficult. The main problems are:
 
-* There isn't a ``qemu-arm`` user-mode emulator for bare-metal systems. The ``qemu-system-arm`` can be used but this is significantly more difficult to setup.
-* The targets to compile compiler-rt have the suffix -none-eabi. This uses the BareMetal driver in clang and by default won't find the libraries needed to pass the cmake compiler check.
+* There is not a ``qemu-arm`` user-mode emulator for bare-metal systems.
+  ``qemu-system-arm`` can be used but this is significantly more difficult
+  to setup.
+* The targets to compile compiler-rt have the suffix ``-none-eabi``. This uses
+  the BareMetal driver in clang and by default will not find the libraries
+  needed to pass the cmake compiler check.
 
 As the Armv6-M, Armv7-M and Armv7E-M builds of compiler-rt only use instructions
 that are supported on Armv7-A we can still get most of the value of running the
@@ -233,32 +236,30 @@ builtins use instructions that are supported on Armv7-A but not Armv6-M,
 Armv7-M and Armv7E-M.
 
 To get the cmake compile test to pass you will need to pass the libraries
-needed to successfully link the cmake test via ``CMAKE_CFLAGS``. It is
-strongly recommended that you use version 3.6 or above of cmake so you can use
-``CMAKE_TRY_COMPILE_TARGET=STATIC_LIBRARY`` to skip the link step.
-
-* ``-DCMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY``
-* ``-DCOMPILER_RT_OS_DIR="baremetal"``
-* ``-DCOMPILER_RT_BUILD_BUILTINS=ON``
-* ``-DCOMPILER_RT_BUILD_SANITIZERS=OFF``
-* ``-DCOMPILER_RT_BUILD_XRAY=OFF``
-* ``-DCOMPILER_RT_BUILD_LIBFUZZER=OFF``
-* ``-DCOMPILER_RT_BUILD_PROFILE=OFF``
-* ``-DCMAKE_C_COMPILER=${host_install_dir}/bin/clang``
-* ``-DCMAKE_C_COMPILER_TARGET="your *-none-eabi target"``
-* ``-DCMAKE_ASM_COMPILER_TARGET="your *-none-eabi target"``
-* ``-DCMAKE_AR=/path/to/llvm-ar``
-* ``-DCMAKE_NM=/path/to/llvm-nm``
-* ``-DCMAKE_RANLIB=/path/to/llvm-ranlib``
-* ``-DCOMPILER_RT_BAREMETAL_BUILD=ON``
-* ``-DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON``
-* ``-DLLVM_CONFIG_PATH=/path/to/llvm-config``
-* ``-DCMAKE_C_FLAGS="build-c-flags"``
-* ``-DCMAKE_ASM_FLAGS="build-c-flags"``
-* ``-DCOMPILER_RT_EMULATOR="qemu-arm -L /path/to/armv7-A/sysroot"``
-* ``-DCOMPILER_RT_INCLUDE_TESTS=ON``
-* ``-DCOMPILER_RT_TEST_COMPILER="/path/to/clang"``
-* ``-DCOMPILER_RT_TEST_COMPILER_CFLAGS="test-c-flags"``
+needed to successfully link the cmake test via ``CMAKE_CFLAGS``::
+
+ -DCMAKE_TRY_COMPILE_TARGET_TYPE=STATIC_LIBRARY \
+ -DCOMPILER_RT_OS_DIR="baremetal" \
+ -DCOMPILER_RT_BUILD_BUILTINS=ON \
+ -DCOMPILER_RT_BUILD_SANITIZERS=OFF \
+ -DCOMPILER_RT_BUILD_XRAY=OFF \
+ -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
+ -DCOMPILER_RT_BUILD_PROFILE=OFF \
+ -DCMAKE_C_COMPILER=${host_install_dir}/bin/clang \
+ -DCMAKE_C_COMPILER_TARGET="your *-none-eabi target" \
+ -DCMAKE_ASM_COMPILER_TARGET="your *-none-eabi target" \
+ -DCMAKE_AR=/path/to/llvm-ar \
+ -DCMAKE_NM=/path/to/llvm-nm \
+ -DCMAKE_RANLIB=/path/to/llvm-ranlib \
+ -DCOMPILER_RT_BAREMETAL_BUILD=ON \
+ -DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON \
+ -DLLVM_CONFIG_PATH=/path/to/llvm-config \
+ -DCMAKE_C_FLAGS="build-c-flags" \
+ -DCMAKE_ASM_FLAGS="build-c-flags" \
+ -DCOMPILER_RT_EMULATOR="qemu-arm -L /path/to/armv7-A/sysroot" \
+ -DCOMPILER_RT_INCLUDE_TESTS=ON \
+ -DCOMPILER_RT_TEST_COMPILER="/path/to/clang" \
+ -DCOMPILER_RT_TEST_COMPILER_CFLAGS="test-c-flags"
 
 The Armv6-M builtins will use the soft-float ABI. When compiling the tests for
 Armv7-A we must include ``"-mthumb -mfloat-abi=soft -mfpu=none"`` in the
@@ -267,25 +268,21 @@ test-c-flags. We must use an Armv7-A soft-float abi sysroot for ``qemu-arm``.
 Depending on the linker used for the test cases you may encounter BuildAttribute
 mismatches between the M-profile objects from compiler-rt and the A-profile
 objects from the test. The lld linker does not check the profile
-BuildAttribute so it can be used to link the tests by adding -fuse-ld=lld to the
+BuildAttribute so it can be used to link the tests by adding ``-fuse-ld=lld`` to the
 ``COMPILER_RT_TEST_COMPILER_CFLAGS``.
 
 Alternative using a cmake cache
 -------------------------------
 If you wish to build, but not test compiler-rt for Armv6-M, Armv7-M or Armv7E-M
-the easiest way is to use the BaremetalARM.cmake recipe in clang/cmake/caches.
-
-You will need a bare metal sysroot such as that provided by the GNU ARM
-Embedded toolchain.
-
-The libraries can be built with the cmake options:
+the easiest way is to use the ``BaremetalARM.cmake`` recipe in ``clang/cmake/caches``.
 
-* ``-DBAREMETAL_ARMV6M_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi``
-* ``-DBAREMETAL_ARMV7M_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi``
-* ``-DBAREMETAL_ARMV7EM_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi``
-* ``-C /path/to/llvm/source/tools/clang/cmake/caches/BaremetalARM.cmake``
-* ``/path/to/llvm``
+You will need a bare metal sysroot such as that provided by the GNU ARM Embedded
+toolchain.
 
-**Note** that for the recipe to work the compiler-rt source must be checked out
-into the directory llvm/runtimes. You will also need clang and lld checked out.
+The libraries can be built with the cmake options::
 
+ -DBAREMETAL_ARMV6M_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi \
+ -DBAREMETAL_ARMV7M_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi \
+ -DBAREMETAL_ARMV7EM_SYSROOT=/path/to/bare/metal/toolchain/arm-none-eabi \
+ -C /path/to/llvm/source/tools/clang/cmake/caches/BaremetalARM.cmake \
+ /path/to/llvm
diff --git a/llvm/docs/HowToUpdateDebugInfo.rst b/llvm/docs/HowToUpdateDebugInfo.rst
index 915e289..ca420e7 100644
--- a/llvm/docs/HowToUpdateDebugInfo.rst
+++ b/llvm/docs/HowToUpdateDebugInfo.rst
@@ -499,7 +499,7 @@ a JSON file as follows:
   $ opt -verify-debuginfo-preserve -verify-di-preserve-export=sample.json -pass-to-test sample.ll
 
 and then use the ``llvm/utils/llvm-original-di-preservation.py`` script
-to generate an HTML page with the issues reported in a more human readable form
+to generate an HTML page with the issues reported in a more human-readable form
 as follows:
 
 .. code-block:: bash
diff --git a/llvm/docs/JITLink.rst b/llvm/docs/JITLink.rst
index 8902712..370281b 100644
--- a/llvm/docs/JITLink.rst
+++ b/llvm/docs/JITLink.rst
@@ -1072,7 +1072,7 @@ Major outstanding projects include:
 
 * Refactor architecture support to maximize sharing across formats.
 
-  All formats should be able to share the bulk of the architecture specific
+  All formats should be able to share the bulk of the architecture-specific
   code (especially relocations) for each supported architecture.
 
 * Refactor ELF link graph construction.
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 5279e69..2259799 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -26,7 +26,7 @@ readable assembly language representation. This allows LLVM to provide a
 powerful intermediate representation for efficient compiler
 transformations and analysis, while providing a natural means to debug
 and visualize the transformations. The three different forms of LLVM are
-all equivalent. This document describes the human readable
+all equivalent. This document describes the human-readable
 representation and notation.
 
 The LLVM representation aims to be light-weight and low-level while
@@ -413,6 +413,8 @@ added in the future:
     - On AArch64 the callee preserves all general purpose registers, except
       X0-X8 and X16-X18. Not allowed with ``nest``.
 
+    - On RISC-V the callee preserve x5-x31 except x6, x7 and x28 registers.
+
     The idea behind this convention is to support calls to runtime functions
     that have a hot path and a cold path. The hot path is usually a small piece
     of code that doesn't use many registers. The cold path might need to call out to
@@ -5173,6 +5175,8 @@ The following is the syntax for constant expressions:
     Perform the :ref:`trunc operation <i_trunc>` on constants.
 ``ptrtoint (CST to TYPE)``
     Perform the :ref:`ptrtoint operation <i_ptrtoint>` on constants.
+``ptrtoaddr (CST to TYPE)``
+    Perform the :ref:`ptrtoaddr operation <i_ptrtoaddr>` on constants.
 ``inttoptr (CST to TYPE)``
     Perform the :ref:`inttoptr operation <i_inttoptr>` on constants.
     This one is *really* dangerous!
@@ -12537,19 +12541,19 @@ Overview:
 """""""""
 
 The '``ptrtoaddr``' instruction converts the pointer or a vector of
-pointers ``value`` to the underlying integer address (or vector of integers) of
+pointers ``value`` to the underlying integer address (or vector of addresses) of
 type ``ty2``. This is different from :ref:`ptrtoint <i_ptrtoint>` in that it
-only operates on the index bits of the pointer and ignores all other bits.
-``ty2`` must be the integer type (or vector of integers) matching the pointer
-index width of the address space of ``ty``.
+only operates on the index bits of the pointer and ignores all other bits, and
+does not capture the provenance of the pointer.
 
 Arguments:
 """"""""""
 
 The '``ptrtoaddr``' instruction takes a ``value`` to cast, which must be
 a value of type :ref:`pointer <t_pointer>` or a vector of pointers, and a
-type to cast it to ``ty2``, which must be an :ref:`integer <t_integer>` or
-a vector of integers type.
+type to cast it to ``ty2``, which must be must be the :ref:`integer <t_integer>`
+type (or vector of integers) matching the pointer index width of the address
+space of ``ty``.
 
 Semantics:
 """"""""""
@@ -12569,9 +12573,8 @@ This example assumes pointers in address space 1 are 64 bits in size with an
 address width of 32 bits (``p1:64:64:64:32`` :ref:`datalayout string<langref_datalayout>`)
 .. code-block:: llvm
 
-      %X = ptrtoaddr ptr addrspace(1) %P to i8  ; extracts low 32 bits and truncates
-      %Y = ptrtoaddr ptr addrspace(1) %P to i64 ; extracts low 32 bits and zero extends
-      %Z = ptrtoaddr <4 x ptr addrspace(1)> %P to <4 x i64>; yields vector zero extension of low 32 bits for each pointer
+      %X = ptrtoaddr ptr addrspace(1) %P to i32  ; extracts low 32 bits of pointer
+      %Y = ptrtoaddr <4 x ptr addrspace(1)> %P to <4 x i32>; yields vector of low 32 bits for each pointer
 
 
 .. _i_inttoptr:
@@ -21340,7 +21343,7 @@ Semantics:
 On some architectures the address of the code to be executed needs to be
 different than the address where the trampoline is actually stored. This
 intrinsic returns the executable address corresponding to ``tramp``
-after performing the required machine specific adjustments. The pointer
+after performing the required machine-specific adjustments. The pointer
 returned can then be :ref:`bitcast and executed <int_trampoline>`.
 
 
@@ -24294,6 +24297,92 @@ Examples:
      %also.r = call <8 x i8> @llvm.masked.load.v8i8.p0(ptr %ptr, i32 2, <8 x i1> %mask, <8 x i8> poison)
 
 
+.. _int_vp_load_ff:
+
+'``llvm.vp.load_ff``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+This is an overloaded intrinsic.
+
+::
+
+    declare {<4 x float>, i32} @llvm.vp.load.ff.v4f32.p0(ptr %ptr, <4 x i1> %mask, i32 %evl)
+    declare {<vscale x 2 x i16>, i32} @llvm.vp.load.ff.nxv2i16.p0(ptr %ptr, <vscale x 2 x i1> %mask, i32 %evl)
+    declare {<8 x float>, i32} @llvm.vp.load.ff.v8f32.p1(ptr addrspace(1) %ptr, <8 x i1> %mask, i32 %evl)
+    declare {<vscale x 1 x i64>, i32} @llvm.vp.load.ff.nxv1i64.p6(ptr addrspace(6) %ptr, <vscale x 1 x i1> %mask, i32 %evl)
+
+Overview:
+"""""""""
+
+The '``llvm.vp.load.ff.*``' intrinsic is similar to
+'``llvm.vp.load.*``', but will not trap if there are not ``evl`` readable
+lanes at the pointer. '``ff``' stands for fault-first or fault-only-first.
+
+Arguments:
+""""""""""
+
+The first argument is the base pointer for the load. The second argument is a
+vector of boolean values with the same number of elements as the first return
+type.  The third is the explicit vector length of the operation. The first
+return type and underlying type of the base pointer are the same vector types.
+
+The :ref:`align <attr_align>` parameter attribute can be provided for the first
+argument.
+
+Semantics:
+""""""""""
+
+The '``llvm.vp.load.ff``' is designed for reading vector lanes in a single
+IR operation where the number of lanes that can be read is not known and can
+only be determined by looking at the data. This is useful for vectorizing
+strcmp or strlen like loops where the data contains a null terminator. Some
+targets have a fault-only-first load instruction that this intrinsic can be
+lowered to. Other targets may support this intrinsic differently, for example by
+lowering to a single scalar load guarded by ``evl!=0`` and ``mask[0]==1`` and
+indicating only 1 lane could be read.
+
+Like '``llvm.vp.load``', this intrinsic reads memory based on a ``mask`` and an
+``evl``. If ``evl`` is non-zero and the first lane is masked-on, then the
+first lane of the vector needs to be inbounds of an allocation. The remaining
+masked-on lanes with index less than ``evl`` do not need to be inbounds of
+an the same allocation or any allocation.
+
+The second return value from the intrinsic indicates the index of the first
+lane that could not be read for some reason or ``evl`` if all lanes could be
+be read. Lanes at this index or higher in the first return value are
+:ref:`poison value <poisonvalues>`. If ``evl`` is non-zero, the result in the
+second return value must be at least 1, even if the first lane is masked-off.
+
+The second result is usually less than ``evl`` when an exception would occur
+for reading that lane, but it can be reduced for any reason. This facilitates
+emulating this intrinsic when the hardware only supports narrower vector
+types natively or when when hardware does not support fault-only-first loads.
+
+Masked-on lanes that are not inbounds of the allocation that contains the first
+lane are :ref:`poison value <poisonvalues>`. There should be a marker in the
+allocation that indicates where valid data stops such as a null terminator. The
+terminator should be checked for after calling this intrinsic to prevent using
+any lanes past the terminator. Even if second return value is less than
+``evl``, the terminator value may not have been read.
+
+This intrinsic will typically be called in a loop until a terminator is
+found. The second result should be used to indicates how many elements are
+valid to look for the null terminator. If the terminator is not found, the
+pointer should be advanced by the number of elements in the second result and
+the intrinsic called again.
+
+The default alignment is taken as the ABI alignment of the first return
+type as specified by the :ref:`datalayout string<langref_datalayout>`.
+
+Examples:
+"""""""""
+
+.. code-block:: text
+
+     %r = call {<8 x i8>, i32} @llvm.vp.load.ff.v8i8.p0(ptr align 2 %ptr, <8 x i1> %mask, i32 %evl)
+
 .. _int_vp_store:
 
 '``llvm.vp.store``' Intrinsic
@@ -26706,16 +26795,20 @@ object's lifetime.
 Arguments:
 """"""""""
 
-The first argument is a constant integer representing the size of the
-object, or -1 if it is variable sized. The second argument is a pointer
-to an ``alloca`` instruction.
+The first argument is a constant integer, which is ignored and will be removed
+in the future.
+
+The second argument is either a pointer to an ``alloca`` instruction or
+a ``poison`` value.
 
 Semantics:
 """"""""""
 
-The stack-allocated object that ``ptr`` points to is initially marked as dead.
-After '``llvm.lifetime.start``', the stack object is marked as alive and has an
-uninitialized value.
+If ``ptr`` is a ``poison`` value, the intrinsic has no effect.
+
+Otherwise, the stack-allocated object that ``ptr`` points to is initially
+marked as dead. After '``llvm.lifetime.start``', the stack object is marked as
+alive and has an uninitialized value.
 The stack object is marked as dead when either
 :ref:`llvm.lifetime.end <int_lifeend>` to the alloca is executed or the
 function returns.
@@ -26746,15 +26839,19 @@ The '``llvm.lifetime.end``' intrinsic specifies the end of a
 Arguments:
 """"""""""
 
-The first argument is a constant integer representing the size of the
-object, or -1 if it is variable sized. The second argument is a pointer
-to an ``alloca`` instruction.
+The first argument is a constant integer, which is ignored and will be removed
+in the future.
+
+The second argument is either a pointer to an ``alloca`` instruction or
+a ``poison`` value.
 
 Semantics:
 """"""""""
 
-The stack-allocated object that ``ptr`` points to becomes dead after the call
-to this intrinsic.
+If ``ptr`` is a ``poison`` value, the intrinsic has no effect.
+
+Otherwise, the stack-allocated object that ``ptr`` points to becomes dead after
+the call to this intrinsic.
 
 Calling ``llvm.lifetime.end`` on an already dead alloca is no-op.
 
@@ -29431,7 +29528,7 @@ None.
 Semantics:
 """"""""""
 
-This intrinsic is lowered to the target dependent trap instruction. If
+This intrinsic is lowered to the target-dependent trap instruction. If
 the target does not have a trap instruction, this intrinsic will be
 lowered to a call of the ``abort()`` function.
 
diff --git a/llvm/docs/Lexicon.rst b/llvm/docs/Lexicon.rst
index 1d4894f..05315a8 100644
--- a/llvm/docs/Lexicon.rst
+++ b/llvm/docs/Lexicon.rst
@@ -192,7 +192,7 @@ L
 **LSDA**
     Language Specific Data Area.  C++ "zero cost" unwinding is built on top a
     generic unwinding mechanism.  As the unwinder walks each frame, it calls
-    a "personality" function to do language specific analysis.  Each function's
+    a "personality" function to do language-specific analysis.  Each function's
     FDE points to an optional LSDA which is passed to the personality function.
     For C++, the LSDA contain info about the type and location of catch
     statements in that function.
diff --git a/llvm/docs/MIRLangRef.rst b/llvm/docs/MIRLangRef.rst
index b4b59db..3f4c3cd 100644
--- a/llvm/docs/MIRLangRef.rst
+++ b/llvm/docs/MIRLangRef.rst
@@ -12,7 +12,7 @@ Introduction
 ============
 
 This document is a reference manual for the Machine IR (MIR) serialization
-format. MIR is a human readable serialization format that is used to represent
+format. MIR is a human-readable serialization format that is used to represent
 LLVM's :ref:`machine specific intermediate representation
 <machine code representation>`.
 
@@ -27,7 +27,7 @@ data serialization language, and the full YAML language spec can be read at
 `yaml.org
 <http://www.yaml.org/spec/1.2/spec.html#Introduction>`_.
 
-A MIR file is split up into a series of `YAML documents`_. The first document
+A MIR file is split into a series of `YAML documents`_. The first document
 can contain an optional embedded LLVM IR module, and the rest of the documents
 contain the serialized machine functions.
 
@@ -65,22 +65,22 @@ after the name with a comma.
 
    ``llc -stop-after=dead-mi-elimination,1 bug-trigger.ll -o test.mir``
 
-After generating the input MIR file, you'll have to add a run line that uses
+After generating the input MIR file, you'll have to add a ``RUN`` line that uses
 the ``-run-pass`` option to it. In order to test the post register allocation
 pseudo instruction expansion pass on X86-64, a run line like the one shown
 below can be used:
 
     ``# RUN: llc -o - %s -mtriple=x86_64-- -run-pass=postrapseudos | FileCheck %s``
 
-The MIR files are target dependent, so they have to be placed in the target
-specific test directories (``lib/CodeGen/TARGETNAME``). They also need to
-specify a target triple or a target architecture either in the run line or in
+The MIR files are target dependent, so they have to be placed in the
+target-specific test directories (``lib/CodeGen/TARGETNAME``). They also need to
+specify a target triple or a target architecture either in the ``RUN`` line or in
 the embedded LLVM IR module.
 
 Simplifying MIR files
 ^^^^^^^^^^^^^^^^^^^^^
 
-The MIR code coming out of ``-stop-after``/``-stop-before`` is very verbose;
+The MIR code coming out of ``-stop-after``/``-stop-before`` is very verbose.
 Tests are more accessible and future proof when simplified:
 
 - Use the ``-simplify-mir`` option with llc.
@@ -113,12 +113,12 @@ Tests are more accessible and future proof when simplified:
   If the test doesn't depend on (good) alias analysis the references can be
   dropped: `:: (load 8)`
 
-- MIR blocks can reference IR blocks for debug printing, profile information
+- MIR blocks can reference IR blocks for debug printing, profile information,
   or debug locations. Example: `bb.42.myblock` in MIR references the IR block
   `myblock`. It is usually possible to drop the `.myblock` reference and simply
   use `bb.42`.
 
-- If there are no memory operands or blocks referencing the IR then the
+- If there are no memory operands or blocks referencing the IR, then the
   IR function can be replaced by a parameterless dummy function like
   `define @func() { ret void }`.
 
@@ -143,7 +143,7 @@ can serialize:
 - The ``MCSymbol`` machine operands don't support temporary or local symbols.
 
 - A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI
-  instructions and the variable debug information from MMI is serialized right
+  instructions and the variable debug information from MMI are serialized right
   now.
 
 These limitations impose restrictions on what you can test with the MIR format.
@@ -182,7 +182,7 @@ Machine Functions
 -----------------
 
 The remaining YAML documents contain the machine functions. This is an example
-of such YAML document:
+of such a YAML document:
 
 .. code-block:: text
 
@@ -221,7 +221,7 @@ Machine Instructions Format Reference
 =====================================
 
 The machine basic blocks and their instructions are represented using a custom,
-human readable serialization language. This language is used in the
+human-readable serialization language. This language is used in the
 `YAML block literal string`_ that corresponds to the machine function's body.
 
 A source string that uses this language contains a list of machine basic
@@ -299,7 +299,7 @@ instructions:
     bb.2.else:
       <instructions>
 
-The branch weights can be specified in brackets after the successor blocks.
+The branch weights can be specified in parentheses after the successor blocks.
 The example below defines a block that has two successors with branch weights
 of 32 and 16:
 
@@ -314,7 +314,7 @@ Live In Registers
 ^^^^^^^^^^^^^^^^^
 
 The machine basic block's live in registers have to be specified before any of
-the instructions:
+its instructions:
 
 .. code-block:: text
 
@@ -322,14 +322,14 @@ the instructions:
       liveins: $edi, $esi
 
 The list of live in registers and successors can be empty. The language also
-allows multiple live in register and successor lists - they are combined into
+allows multiple live in register and successor lists; they are combined into
 one list by the parser.
 
 Miscellaneous Attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
 The attributes ``IsAddressTaken``, ``IsLandingPad``,
-``IsInlineAsmBrIndirectTarget`` and ``Alignment`` can be specified in brackets
+``IsInlineAsmBrIndirectTarget`` and ``Alignment`` can be specified in parentheses
 after the block's definition:
 
 .. code-block:: text
@@ -417,7 +417,7 @@ and ``}`` are bundled with the first instruction.
 Registers
 ---------
 
-Registers are one of the key primitives in the machine instructions
+Registers are one of the key primitives in the machine instruction
 serialization language. They are primarily used in the
 :ref:`register machine operands <register-operands>`,
 but they can also be used in a number of other places, like the
@@ -503,9 +503,9 @@ will be printed as ``%subreg.sub_32``:
 
     %1:gpr64 = SUBREG_TO_REG 0, %0, %subreg.sub_32
 
-For integers > 64bit, we use a special machine operand, ``MO_CImmediate``,
+For integers > 64 bits, we use a special machine operand, ``MO_CImmediate``,
 which stores the immediate in a ``ConstantInt`` using an ``APInt`` (LLVM's
-arbitrary precision integers).
+arbitrary-precision integers).
 
 .. TODO: Describe the FPIMM immediate operands.
 
@@ -626,7 +626,7 @@ For a CPI with the index 0 and offset -12:
     %1:gr64 = MOV64ri %const.0 - 12
 
 A constant pool entry is bound to a LLVM IR ``Constant`` or a target-specific
-``MachineConstantPoolValue``. When serializing all the function's constants the
+``MachineConstantPoolValue``. When serializing all the function's constants, the
 following format is used:
 
 .. code-block:: text
@@ -695,7 +695,7 @@ and the offset 8:
 Jump-table Index Operands
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
-A jump-table index operand with the index 0 is printed as following:
+A jump-table index operand with the index 0 is printed as follows:
 
 .. code-block:: text
 
@@ -711,7 +711,7 @@ A machine jump-table entry contains a list of ``MachineBasicBlocks``. When seria
         - id:             <index>
           blocks:         [ <bbreference>, <bbreference>, ... ]
 
-where ``<kind>`` is describing how the jump table is represented and emitted (plain address, relocations, PIC, etc.), and each ``<index>`` is a 32-bit unsigned integer and ``blocks`` contains a list of :ref:`machine basic block references <block-references>`.
+where ``<kind>`` describes how the jump table is represented and emitted (plain address, relocations, PIC, etc.), and each ``<index>`` is a 32-bit unsigned integer and ``blocks`` contains a list of :ref:`machine basic block references <block-references>`.
 
 Example:
 
@@ -741,7 +741,7 @@ Example:
 MCSymbol Operands
 ^^^^^^^^^^^^^^^^^
 
-A MCSymbol operand is holding a pointer to a ``MCSymbol``. For the limitations
+A MCSymbol operand holds a pointer to a ``MCSymbol``. For the limitations
 of this operand in MIR, see :ref:`limitations <limitations>`.
 
 The syntax is:
@@ -754,7 +754,7 @@ Debug Instruction Reference Operands
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 A debug instruction reference operand is a pair of indices, referring to an
-instruction and an operand within that instruction respectively; see
+instruction and an operand within that instruction, respectively; see
 :ref:`Instruction referencing locations <instruction-referencing-locations>`.
 
 The example below uses a reference to Instruction 1, Operand 0:
@@ -766,7 +766,7 @@ The example below uses a reference to Instruction 1, Operand 0:
 CFIIndex Operands
 ^^^^^^^^^^^^^^^^^
 
-A CFI Index operand is holding an index into a per-function side-table,
+A CFI Index operand holds an index into a per-function side-table,
 ``MachineFunction::getFrameInstructions()``, which references all the frame
 instructions in a ``MachineFunction``. A ``CFI_INSTRUCTION`` may look like it
 contains multiple operands, but the only operand it contains is the CFI Index.
@@ -842,7 +842,7 @@ Comments can be added or customized by overriding InstrInfo's hook
 Debug-Info constructs
 ---------------------
 
-Most of the debugging information in a MIR file is to be found in the metadata
+Most of the debugging information in a MIR file is found in the metadata
 of the embedded module. Within a machine function, that metadata is referred to
 by various constructs to describe source locations and variable locations.
 
diff --git a/llvm/docs/MergeFunctions.rst b/llvm/docs/MergeFunctions.rst
index 02344bc..c27f603 100644
--- a/llvm/docs/MergeFunctions.rst
+++ b/llvm/docs/MergeFunctions.rst
@@ -7,7 +7,7 @@ MergeFunctions pass, how it works
 
 Introduction
 ============
-Sometimes code contains equal functions, or functions that does exactly the same
+Sometimes code contains equal functions, or functions that do exactly the same
 thing even though they are non-equal on the IR level (e.g.: multiplication on 2
 and 'shl 1'). It could happen due to several reasons: mainly, the usage of
 templates and automatic code generators. Though, sometimes the user itself could
@@ -16,7 +16,7 @@ write the same thing twice :-)
 The main purpose of this pass is to recognize such functions and merge them.
 
 This document is the extension to pass comments and describes the pass logic. It
-describes the algorithm that is used in order to compare functions and
+describes the algorithm used to compare functions and
 explains how we could combine equal functions correctly to keep the module
 valid.
 
@@ -58,7 +58,7 @@ It's especially important to understand chapter 3 of tutorial:
 
 :doc:`tutorial/LangImpl03`
 
-The reader should also know how passes work in LLVM. They could use this
+The reader should also know how passes work in LLVM. They can use this
 article as a reference and start point here:
 
 :doc:`WritingAnLLVMPass`
@@ -68,7 +68,7 @@ debugging and bug-fixing.
 
 Narrative structure
 -------------------
-The article consists of three parts. The first part explains pass functionality
+This article consists of three parts. The first part explains pass functionality
 on the top-level. The second part describes the comparison procedure itself.
 The third part describes the merging process.
 
@@ -130,7 +130,7 @@ access lookup? The answer is: "yes".
 
 Random-access
 """""""""""""
-How it could this be done? Just convert each function to a number, and gather
+How can this be done? Just convert each function to a number, and gather
 all of them in a special hash-table. Functions with equal hashes are equal.
 Good hashing means, that every function part must be taken into account. That
 means we have to convert every function part into some number, and then add it
@@ -190,17 +190,17 @@ The algorithm is pretty simple:
 
 1. Put all module's functions into the *worklist*.
 
-2. Scan *worklist*'s functions twice: first enumerate only strong functions and
+2. Scan *worklist*'s functions twice: first, enumerate only strong functions and
 then only weak ones:
 
    2.1. Loop body: take a function from *worklist*  (call it *FCur*) and try to
    insert it into *FnTree*: check whether *FCur* is equal to one of functions
    in *FnTree*. If there *is* an equal function in *FnTree*
-   (call it *FExists*): merge function *FCur* with *FExists*. Otherwise add
+   (call it *FExists*): merge function *FCur* with *FExists*. Otherwise, add
    the function from the *worklist* to *FnTree*.
 
 3. Once the *worklist* scanning and merging operations are complete, check the
-*Deferred* list. If it is not empty: refill the *worklist* contents with
+*Deferred* list. If it is not empty, refill the *worklist* contents with
 *Deferred* list and redo step 2, if the *Deferred* list is empty, then exit
 from method.
 
@@ -249,14 +249,14 @@ Below, we will use the following operations:
 
 The rest of the article is based on *MergeFunctions.cpp* source code
 (found in *<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like
-to ask reader to keep this file open, so we could use it as a reference
+to ask the reader to keep this file open, so we could use it as a reference
 for further explanations.
 
 Now, we're ready to proceed to the next chapter and see how it works.
 
 Functions comparison
 ====================
-At first, let's define how exactly we compare complex objects.
+First, let's define exactly how we compare complex objects.
 
 Complex object comparison (function, basic-block, etc) is mostly based on its
 sub-object comparison results. It is similar to the next "tree" objects
@@ -307,7 +307,7 @@ to those we met later in function body (value we met first would be *less*).
 This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``”
 method (will be described a bit later).
 
-4. Function body comparison. As it written in method comments:
+4. Function body comparison. As written in method comments:
 
 “We do a CFG-ordered walk since the actual ordering of the blocks in the linked
 list is immaterial. Our walk starts at the entry block for both functions, then
@@ -477,7 +477,7 @@ Of course, we can combine insertion and comparison:
     = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
   return cmpNumbers(LeftRes.first->second, RightRes.first->second);
 
-Let's look, how whole method could be implemented.
+Let's look at how the whole method could be implemented.
 
 1. We have to start with the bad news. Consider function self and
 cross-referencing cases:
@@ -519,7 +519,7 @@ the result of numbers comparison:
    if (LeftRes.first->second < RightRes.first->second) return -1;
    return 1;
 
-Now when *cmpValues* returns 0, we can proceed the comparison procedure.
+Now, when *cmpValues* returns 0, we can proceed with the comparison procedure.
 Otherwise, if we get (-1 or 1), we need to pass this result to the top level,
 and finish comparison procedure.
 
@@ -549,7 +549,7 @@ losslessly bitcasted to each other. The further explanation is modification of
    2.1.3.1. If types are vectors, compare their bitwidth using the
    *cmpNumbers*. If result is not 0, return it.
 
-   2.1.3.2. Different types, but not a vectors:
+   2.1.3.2. Different types, but not vectors:
 
    * if both of them are pointers, good for us, we can proceed to step 3.
    * if one of types is pointer, return result of *isPointer* flags
@@ -654,7 +654,7 @@ O(N*N) to O(log(N)).
 
 Merging process, mergeTwoFunctions
 ==================================
-Once *MergeFunctions* detected that current function (*G*) is equal to one that
+Once *MergeFunctions* detects that current function (*G*) is equal to one that
 were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*,
 Function*)``.
 
@@ -664,7 +664,7 @@ Operation affects ``FnTree`` contents with next way: *F* will stay in
 functions that calls *G* would be put into ``Deferred`` set and removed from
 ``FnTree``, and analyzed again.
 
-The approach is next:
+The approach is as follows:
 
 1. Most wished case: when we can use alias and both of *F* and *G* are weak. We
 make both of them with aliases to the third strong function *H*. Actually *H*
@@ -691,12 +691,12 @@ ok: we can use alias to *F* instead of *G* or change call instructions itself.
 
 HasGlobalAliases, removeUsers
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-First consider the case when we have global aliases of one function name to
+First, consider the case when we have global aliases of one function name to
 another. Our purpose is  make both of them with aliases to the third strong
 function. Though if we keep *F* alive and without major changes we can leave it
 in ``FnTree``. Try to combine these two goals.
 
-Do stub replacement of *F* itself with an alias to *F*.
+Do a stub replacement of *F* itself with an alias to *F*.
 
 1. Create stub function *H*, with the same name and attributes like function
 *F*. It takes maximum alignment of *F* and *G*.
@@ -725,7 +725,7 @@ also have alias to *F*.
 
 No global aliases, replaceDirectCallers
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-If global aliases are not supported. We call ``replaceDirectCallers``. Just
+If global aliases are not supported, we call ``replaceDirectCallers``. Just
 go through all calls of *G* and replace it with calls of *F*. If you look into
 the method you will see that it scans all uses of *G* too, and if use is callee
 (if user is call instruction and *G* is used as what to be called), we replace
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index d28eb68..2dc8f9f 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -971,6 +971,10 @@ Syntax:
   declare void  @llvm.nvvm.prefetch.L1(ptr %ptr)
   declare void  @llvm.nvvm.prefetch.L2(ptr %ptr)
   
+  declare void  @llvm.nvvm.prefetch.tensormap.p0(ptr %ptr)
+  declare void  @llvm.nvvm.prefetch.tensormap.p4(ptr addrspace(4) %const_ptr)
+  declare void  @llvm.nvvm.prefetch.tensormap.p101(ptr addrspace(101) %param_ptr)  
+  
   declare void  @llvm.nvvm.prefetch.global.L2.evict.normal(ptr addrspace(1) %global_ptr)
   declare void  @llvm.nvvm.prefetch.global.L2.evict.last(ptr addrspace(1) %global_ptr)
 
@@ -983,7 +987,10 @@ The '``@llvm.nvvm.prefetch.*``' and '``@llvm.nvvm.prefetchu.*``' intrinsic
 correspond to the '``prefetch.*``;' and '``prefetchu.*``' family of PTX instructions. 
 The '``prefetch.*``' instructions bring the cache line containing the
 specified address in global or local memory address space into the 
-specified cache level (L1 or L2). The '`prefetchu.*``' instruction brings the cache line 
+specified cache level (L1 or L2). If the '``.tensormap``' qualifier is specified then the 
+prefetch instruction brings the cache line containing the specified address in the 
+'``.const``' or '``.param memory``' state space for subsequent use by the '``cp.async.bulk.tensor``' 
+instruction. The '`prefetchu.*``' instruction brings the cache line 
 containing the specified generic address into the specified uniform cache level.
 If no address space is specified, it is assumed to be generic address. The intrinsic 
 uses and eviction priority which can be accessed by the '``.level::eviction_priority``' modifier.
diff --git a/llvm/docs/PDB/CodeViewTypes.rst b/llvm/docs/PDB/CodeViewTypes.rst
index 7a93ebe..996d8f9 100644
--- a/llvm/docs/PDB/CodeViewTypes.rst
+++ b/llvm/docs/PDB/CodeViewTypes.rst
@@ -123,7 +123,7 @@ The ``Size`` field of the Attributes bitmask is a 1-byte value indicating the
 pointer size.  For example, a `void*` would have a size of either 4 or 8 depending
 on the target architecture.  On the other hand, if ``Mode`` indicates that this is
 a pointer to member function or pointer to data member, then the size can be any
-implementation defined number.
+implementation-defined number.
 
 The ``Member Ptr Info`` field of the ``LF_POINTER`` record is only present if the
 attributes indicate that this is a pointer to member.
diff --git a/llvm/docs/ProgrammersManual.rst b/llvm/docs/ProgrammersManual.rst
index 9ddeebd..1e1e5b3 100644
--- a/llvm/docs/ProgrammersManual.rst
+++ b/llvm/docs/ProgrammersManual.rst
@@ -486,7 +486,7 @@ Success values are very cheap to construct and return - they have minimal
 impact on program performance.
 
 Failure values are constructed using ``make_error<T>``, where ``T`` is any class
-that inherits from the ErrorInfo utility, E.g.:
+that inherits from the ``ErrorInfo`` utility, E.g.:
 
 .. code-block:: c++
 
@@ -1351,7 +1351,7 @@ The ``llvm/Support/DebugCounter.h`` (`doxygen
 provides a class named ``DebugCounter`` that can be used to create
 command-line counter options that control execution of parts of your code.
 
-Define your DebugCounter like this:
+Define your ``DebugCounter`` like this:
 
 .. code-block:: c++
 
@@ -1677,7 +1677,7 @@ page and one extra indirection when accessing elements with their positional
 index.
 
 In order to minimise the memory footprint of this container, it's important to
-balance the PageSize so that it's not too small (otherwise the overhead of the
+balance the ``PageSize`` so that it's not too small (otherwise the overhead of the
 pointer per page might become too high) and not too big (otherwise the memory
 is wasted if the page is not fully used).
 
@@ -2203,17 +2203,17 @@ inserting elements into both a set-like container and the sequential container,
 using the set-like container for uniquing and the sequential container for
 iteration.
 
-The difference between SetVector and other sets is that the order of iteration
-is guaranteed to match the order of insertion into the SetVector.  This property
+The difference between ``SetVector`` and other sets is that the order of iteration
+is guaranteed to match the order of insertion into the ``SetVector``.  This property
 is really important for things like sets of pointers.  Because pointer values
 are non-deterministic (e.g. vary across runs of the program on different
 machines), iterating over the pointers in the set will not be in a well-defined
 order.
 
-The drawback of SetVector is that it requires twice as much space as a normal
+The drawback of ``SetVector`` is that it requires twice as much space as a normal
 set and has the sum of constant factors from the set-like container and the
 sequential container that it uses.  Use it **only** if you need to iterate over
-the elements in a deterministic order.  SetVector is also expensive to delete
+the elements in a deterministic order.  ``SetVector`` is also expensive to delete
 elements out of (linear time), unless you use its "pop_back" method, which is
 faster.
 
@@ -2369,22 +2369,22 @@ llvm/IR/ValueMap.h
 
 ValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping
 ``Value*``\ s (or subclasses) to another type.  When a Value is deleted or
-RAUW'ed, ValueMap will update itself so the new version of the key is mapped to
+RAUW'ed, ``ValueMap`` will update itself so the new version of the key is mapped to
 the same value, just as if the key were a WeakVH.  You can configure exactly how
 this happens, and what else happens on these two events, by passing a ``Config``
-parameter to the ValueMap template.
+parameter to the ``ValueMap`` template.
 
 .. _dss_intervalmap:
 
 llvm/ADT/IntervalMap.h
 ^^^^^^^^^^^^^^^^^^^^^^
 
-IntervalMap is a compact map for small keys and values.  It maps key intervals
+``IntervalMap`` is a compact map for small keys and values.  It maps key intervals
 instead of single keys, and it will automatically coalesce adjacent intervals.
 When the map only contains a few intervals, they are stored in the map object
 itself to avoid allocations.
 
-The IntervalMap iterators are quite big, so they should not be passed around as
+The ``IntervalMap`` iterators are quite big, so they should not be passed around as
 STL iterators.  The heavyweight iterators allow a smaller data structure.
 
 .. _dss_intervaltree:
@@ -2396,7 +2396,7 @@ llvm/ADT/IntervalTree.h
 allows finding all intervals that overlap with any given point. At this time,
 it does not support any deletion or rebalancing operations.
 
-The IntervalTree is designed to be set up once, and then queried without any
+The ``IntervalTree`` is designed to be set up once, and then queried without any
 further additions.
 
 .. _dss_map:
@@ -2435,10 +2435,10 @@ necessary to remove elements, it's best to remove them in bulk using
 llvm/ADT/IntEqClasses.h
 ^^^^^^^^^^^^^^^^^^^^^^^
 
-IntEqClasses provides a compact representation of equivalence classes of small
+``IntEqClasses`` provides a compact representation of equivalence classes of small
 integers.  Initially, each integer in the range 0..n-1 has its own equivalence
 class.  Classes can be joined by passing two class representatives to the
-join(a, b) method.  Two integers are in the same class when findLeader() returns
+``join(a, b)`` method.  Two integers are in the same class when ``findLeader()`` returns
 the same representative.
 
 Once all equivalence classes are formed, the map can be compressed so each
@@ -2451,11 +2451,11 @@ it can be edited again.
 llvm/ADT/ImmutableMap.h
 ^^^^^^^^^^^^^^^^^^^^^^^
 
-ImmutableMap is an immutable (functional) map implementation based on an AVL
+``ImmutableMap`` is an immutable (functional) map implementation based on an AVL
 tree.  Adding or removing elements is done through a Factory object and results
-in the creation of a new ImmutableMap object.  If an ImmutableMap already exists
+in the creation of a new ``ImmutableMap`` object.  If an ``ImmutableMap`` already exists
 with the given key set, then the existing one is returned; equality is compared
-with a FoldingSetNodeID.  The time and space complexity of add or remove
+with a ``FoldingSetNodeID``.  The time and space complexity of add or remove
 operations is logarithmic in the size of the original map.
 
 .. _dss_othermap:
@@ -2490,11 +2490,11 @@ somehow.  In any case, please don't use it.
 BitVector
 ^^^^^^^^^
 
-The BitVector container provides a dynamic size set of bits for manipulation.
+The ``BitVector`` container provides a dynamic size set of bits for manipulation.
 It supports individual bit setting/testing, as well as set operations.  The set
 operations take time O(size of bitvector), but operations are performed one word
-at a time, instead of one bit at a time.  This makes the BitVector very fast for
-set operations compared to other containers.  Use the BitVector when you expect
+at a time, instead of one bit at a time.  This makes the ``BitVector`` very fast for
+set operations compared to other containers.  Use the ``BitVector`` when you expect
 the number of set bits to be high (i.e. a dense set).
 
 .. _dss_smallbitvector:
@@ -2516,29 +2516,29 @@ its operator[] does not provide an assignable lvalue.
 SparseBitVector
 ^^^^^^^^^^^^^^^
 
-The SparseBitVector container is much like BitVector, with one major difference:
-Only the bits that are set, are stored.  This makes the SparseBitVector much
-more space efficient than BitVector when the set is sparse, as well as making
+The ``SparseBitVector`` container is much like ``BitVector``, with one major difference:
+Only the bits that are set, are stored.  This makes the ``SparseBitVector`` much
+more space efficient than ``BitVector`` when the set is sparse, as well as making
 set operations O(number of set bits) instead of O(size of universe).  The
-downside to the SparseBitVector is that setting and testing of random bits is
-O(N), and on large SparseBitVectors, this can be slower than BitVector.  In our
+downside to the ``SparseBitVector`` is that setting and testing of random bits is
+O(N), and on large ``SparseBitVectors``, this can be slower than ``BitVector``.  In our
 implementation, setting or testing bits in sorted order (either forwards or
 reverse) is O(1) worst case.  Testing and setting bits within 128 bits (depends
 on size) of the current bit is also O(1).  As a general statement,
-testing/setting bits in a SparseBitVector is O(distance away from last set bit).
+testing/setting bits in a ``SparseBitVector`` is O(distance away from last set bit).
 
 .. _dss_coalescingbitvector:
 
 CoalescingBitVector
 ^^^^^^^^^^^^^^^^^^^
 
-The CoalescingBitVector container is similar in principle to a SparseBitVector,
+The ``CoalescingBitVector`` container is similar in principle to a ``SparseBitVector``,
 but is optimized to represent large contiguous ranges of set bits compactly. It
 does this by coalescing contiguous ranges of set bits into intervals. Searching
-for a bit in a CoalescingBitVector is O(log(gaps between contiguous ranges)).
+for a bit in a ``CoalescingBitVector`` is O(log(gaps between contiguous ranges)).
 
-CoalescingBitVector is a better choice than BitVector when gaps between ranges
-of set bits are large. It's a better choice than SparseBitVector when find()
+``CoalescingBitVector`` is a better choice than ``BitVector`` when gaps between ranges
+of set bits are large. It's a better choice than ``SparseBitVector`` when find()
 operations must have fast, predictable performance. However, it's not a good
 choice for representing sets which have lots of very short ranges. E.g. the set
 `{2*x : x \in [0, n)}` would be a pathological input.
@@ -2773,7 +2773,7 @@ Turning an iterator into a class pointer (and vice-versa)
 
 Sometimes, it'll be useful to grab a reference (or pointer) to a class instance
 when all you've got at hand is an iterator.  Well, extracting a reference or a
-pointer from an iterator is very straight-forward.  Assuming that ``i`` is a
+pointer from an iterator is very straightforward.  Assuming that ``i`` is a
 ``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``:
 
 .. code-block:: c++
@@ -2805,7 +2805,7 @@ Say that you're writing a FunctionPass and would like to count all the locations
 in the entire module (that is, across every ``Function``) where a certain
 function (i.e., some ``Function *``) is already in scope.  As you'll learn
 later, you may want to use an ``InstVisitor`` to accomplish this in a much more
-straight-forward manner, but this example will allow us to explore how you'd do
+straightforward manner, but this example will allow us to explore how you'd do
 it if you didn't have ``InstVisitor`` around.  In pseudo-code, this is what we
 want to do:
 
@@ -2932,7 +2932,7 @@ Creating and inserting new ``Instruction``\ s
 
 *Instantiating Instructions*
 
-Creation of ``Instruction``\ s is straight-forward: simply call the constructor
+Creation of ``Instruction``\ s is straightforward: simply call the constructor
 for the kind of instruction to instantiate and provide the necessary parameters.
 For example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``.  Thus:
 
@@ -3050,7 +3050,7 @@ Deleting Instructions
 ^^^^^^^^^^^^^^^^^^^^^
 
 Deleting an instruction from an existing sequence of instructions that form a
-BasicBlock_ is very straight-forward: just call the instruction's
+``BasicBlock`` is very straightforward: just call the instruction's
 ``eraseFromParent()`` method.  For example:
 
 .. code-block:: c++
@@ -3850,16 +3850,16 @@ Important Subclasses of Constant
   any width.
 
   * ``const APInt& getValue() const``: Returns the underlying
-    value of this constant, an APInt value.
+    value of this constant, an ``APInt`` value.
 
   * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an
-    int64_t via sign extension.  If the value (not the bit width) of the APInt
-    is too large to fit in an int64_t, an assertion will result.  For this
+    ``int64_t`` via sign extension.  If the value (not the bit width) of the APInt
+    is too large to fit in an ``int64_t``, an assertion will result.  For this
     reason, use of this method is discouraged.
 
-  * ``uint64_t getZExtValue() const``: Converts the underlying APInt value
-    to a uint64_t via zero extension.  IF the value (not the bit width) of the
-    APInt is too large to fit in a uint64_t, an assertion will result.  For this
+  * ``uint64_t getZExtValue() const``: Converts the underlying ``APInt`` value
+    to a ``uint64_t`` via zero extension.  If the value (not the bit width) of the
+    APInt is too large to fit in a ``uint64_t``, an assertion will result.  For this
     reason, use of this method is discouraged.
 
   * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt
@@ -4148,7 +4148,7 @@ Important Public Members of the ``BasicBlock`` class
   new block, and a :ref:`Function <c_Function>` to insert it into.  If the
   ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically
   inserted at the end of the specified :ref:`Function <c_Function>`, if not
-  specified, the BasicBlock must be manually inserted into the :ref:`Function
+  specified, the ``BasicBlock`` must be manually inserted into the :ref:`Function
   <c_Function>`.
 
 * | ``BasicBlock::iterator`` - Typedef for instruction list iterator
diff --git a/llvm/docs/RISCVUsage.rst b/llvm/docs/RISCVUsage.rst
index 9f6ac55..a29e06c 100644
--- a/llvm/docs/RISCVUsage.rst
+++ b/llvm/docs/RISCVUsage.rst
@@ -136,7 +136,7 @@ on support follow.
      ``Smepmp``        Supported
      ``Smmpm``         Supported
      ``Smnpm``         Supported
-     ``Smrnmi``        Assembly Support
+     ``Smrnmi``        Supported
      ``Smstateen``     Assembly Support
      ``Ssaia``         Supported
      ``Ssccfg``        Supported
diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md
index c15d148..6e85675 100644
--- a/llvm/docs/ReleaseNotes.md
+++ b/llvm/docs/ReleaseNotes.md
@@ -77,6 +77,7 @@ Changes to Vectorizers
 
 * Added initial support for copyable elements in SLP, which models copyable
   elements as add <element>, 0, i.e. uses identity constants for missing lanes.
+* SLP vectorizer supports initial recognition of FMA/FMAD pattern
 
 Changes to the AArch64 Backend
 ------------------------------
@@ -141,6 +142,9 @@ Changes to the LLVM tools
 Changes to LLDB
 ---------------------------------
 
+* LLDB can now set breakpoints, show backtraces, and display variables when
+  debugging Wasm with supported runtimes (WAMR and V8).
+
 Changes to BOLT
 ---------------------------------
 
diff --git a/llvm/docs/SPIRVUsage.rst b/llvm/docs/SPIRVUsage.rst
index 1f563fb..fdefc53 100644
--- a/llvm/docs/SPIRVUsage.rst
+++ b/llvm/docs/SPIRVUsage.rst
@@ -131,9 +131,23 @@ Extensions
 
 The SPIR-V backend supports a variety of `extensions <https://github.com/KhronosGroup/SPIRV-Registry/tree/main/extensions>`_
 that enable or enhance features beyond the core SPIR-V specification.
-These extensions can be enabled using the ``-spirv-extensions`` option
-followed by the name of the extension(s) you wish to enable. Below is a
-list of supported SPIR-V extensions, sorted alphabetically by their extension names:
+The enabled extensions can be controlled using the ``-spirv-ext`` option followed by a list of
+extensions to enable or disable, each prefixed with ``+`` or ``-``, respectively.
+
+To enable multiple extensions, list them separated by comma. For example, to enable support for atomic operations on floating-point numbers and arbitrary precision integers, use:
+
+``-spirv-ext=+SPV_EXT_shader_atomic_float_add,+SPV_INTEL_arbitrary_precision_integers``
+
+To enable all extensions, use the following option:
+``-spirv-ext=all``
+
+To enable all KHR extensions, use the following option:
+``-spirv-ext=khr``
+
+To enable all extensions except specified, specify ``all`` followed by a list of disallowed extensions. For example:
+``-spirv-ext=all,-SPV_INTEL_arbitrary_precision_integers``
+
+Below is a list of supported SPIR-V extensions, sorted alphabetically by their extension names:
 
 .. list-table:: Supported SPIR-V Extensions
    :widths: 50 150
@@ -220,16 +234,6 @@ list of supported SPIR-V extensions, sorted alphabetically by their extension na
    * - ``SPV_KHR_float_controls2``
      - Adds ability to specify the floating-point environment in shaders. It can be used on whole modules and individual instructions.
 
-To enable multiple extensions, list them separated by comma. For example, to enable support for atomic operations on floating-point numbers and arbitrary precision integers, use:
-
-``-spirv-ext=+SPV_EXT_shader_atomic_float_add,+SPV_INTEL_arbitrary_precision_integers``
-
-To enable all extensions, use the following option:
-``-spirv-ext=all``
-
-To enable all extensions except specified, specify ``all`` followed by a list of disallowed extensions. For example:
-``-spirv-ext=all,-SPV_INTEL_arbitrary_precision_integers``
-
 SPIR-V representation in LLVM IR
 ================================
 
diff --git a/llvm/docs/SymbolizerMarkupFormat.rst b/llvm/docs/SymbolizerMarkupFormat.rst
index d5b17d7..75ead44 100644
--- a/llvm/docs/SymbolizerMarkupFormat.rst
+++ b/llvm/docs/SymbolizerMarkupFormat.rst
@@ -315,7 +315,7 @@ Trigger elements
 ================
 
 These elements cause an external action and will be presented to the user in a
-human readable form. Generally they trigger an external action to occur that
+human-readable form. Generally they trigger an external action to occur that
 results in a linkable page. The link or some other informative information about
 the external action can then be presented to the user.
 
diff --git a/llvm/docs/TableGen/BackGuide.rst b/llvm/docs/TableGen/BackGuide.rst
index 4828f9b..83f8f470 100644
--- a/llvm/docs/TableGen/BackGuide.rst
+++ b/llvm/docs/TableGen/BackGuide.rst
@@ -191,7 +191,7 @@ Some of these classes have additional members that
 are described in the following subsections.
 
 *All* of the classes derived from ``RecTy`` provide the ``get()`` function.
-It returns an instance of ``Recty`` corresponding to the derived class.
+It returns an instance of ``RecTy`` corresponding to the derived class.
 Some of the ``get()`` functions require an argument to
 specify which particular variant of the type is desired. These arguments are
 described in the following subsections.
@@ -354,12 +354,12 @@ The class provides many additional functions:
 * Functions to determine whether there are any operands and to get the
   number of operands.
 
-* Functions to the get the operands, both individually and together.
+* Functions to get the operands, both individually and together.
 
 * Functions to determine whether there are any names and to
   get the number of names
 
-* Functions to the get the names, both individually and together.
+* Functions to get the names, both individually and together.
 
 * Functions to get the operand iterator ``begin()`` and ``end()`` values.
 
@@ -605,7 +605,7 @@ null if the field does not exist.
 
 The field is assumed to have another record as its value. That record is returned
 as a pointer to a ``Record``. If the field does not exist or is unset, the
-functions returns null.
+function returns null.
 
 Getting Record Superclasses
 ===========================
diff --git a/llvm/docs/TableGen/ProgRef.rst b/llvm/docs/TableGen/ProgRef.rst
index 7b30698..2b1af05 100644
--- a/llvm/docs/TableGen/ProgRef.rst
+++ b/llvm/docs/TableGen/ProgRef.rst
@@ -219,17 +219,17 @@ TableGen provides "bang operators" that have a wide variety of uses:
 
 .. productionlist::
    BangOperator: one of
-               : !add         !and         !cast        !con         !dag
-               : !div         !empty       !eq          !exists      !filter
-               : !find        !foldl       !foreach     !ge          !getdagarg
-               : !getdagname  !getdagop    !gt          !head        !if
-               : !initialized !instances   !interleave  !isa         !le
-               : !listconcat  !listflatten !listremove  !listsplat   !logtwo
-               : !lt          !match       !mul         !ne          !not
-               : !or          !range       !repr        !setdagarg   !setdagname
-               : !setdagop    !shl         !size        !sra         !srl
-               : !strconcat   !sub         !subst       !substr      !tail
-               : !tolower     !toupper     !xor
+               : !add         !and         !cast         !con         !dag
+               : !div         !empty       !eq           !exists      !filter
+               : !find        !foldl       !foreach      !ge          !getdagarg
+               : !getdagname  !getdagop    !getdagopname !gt          !head
+               : !if          !initialized !instances    !interleave  !isa
+               : !le          !listconcat  !listflatten  !listremove  !listsplat
+               : !logtwo      !lt          !match        !mul         !ne
+               : !not         !or          !range        !repr        !setdagarg
+               : !setdagname  !setdagop    !setdagopname !shl         !size
+               : !sra         !srl         !strconcat    !sub         !subst
+               : !substr      !tail        !tolower      !toupper     !xor
 
 The ``!cond`` operator has a slightly different
 syntax compared to other bang operators, so it is defined separately:
@@ -1443,7 +1443,8 @@ DAG.
 
 The following bang operators are useful for working with DAGs:
 ``!con``, ``!dag``, ``!empty``, ``!foreach``, ``!getdagarg``, ``!getdagname``,
-``!getdagop``, ``!setdagarg``, ``!setdagname``, ``!setdagop``, ``!size``.
+``!getdagop``, ``!getdagopname``, ``!setdagarg``, ``!setdagname``, ``!setdagop``,
+``!setdagopname``, ``!size``.
 
 Defvar in a record body
 -----------------------
@@ -1695,9 +1696,11 @@ and non-0 as true.
     This operator concatenates the DAG nodes *a*, *b*, etc. Their operations
     must equal.
 
-    ``!con((op a1:$name1, a2:$name2), (op b1:$name3))``
+    ``!con((op:$lhs a1:$name1, a2:$name2), (op:$rhs b1:$name3))``
 
-    results in the DAG node ``(op a1:$name1, a2:$name2, b1:$name3)``.
+    results in the DAG node ``(op:$lhs a1:$name1, a2:$name2, b1:$name3)``.
+    The name of the dag operator is derived from the LHS DAG node if it is
+    set, otherwise from the RHS DAG node.
 
 ``!cond(``\ *cond1* ``:`` *val1*\ ``,`` *cond2* ``:`` *val2*\ ``, ...,`` *condn* ``:`` *valn*\ ``)``
     This operator tests *cond1* and returns *val1* if the result is true.
@@ -1819,6 +1822,10 @@ and non-0 as true.
 
       dag d = !dag(!getdagop(someDag), args, names);
 
+``!getdagopname(``\ *dag*\ ``)``
+    This operator retrieves the name of the given *dag* operator. If the operator
+    has no name associated, ``?`` is returned.
+
 ``!gt(``\ *a*\ `,` *b*\ ``)``
     This operator produces 1 if *a* is greater than *b*; 0 otherwise.
     The arguments must be ``bit``, ``bits``, ``int``, or ``string`` values.
@@ -1949,6 +1956,10 @@ and non-0 as true.
 
     Example: ``!setdagop((foo 1, 2), bar)`` results in ``(bar 1, 2)``.
 
+``!setdagopname(``\ *dag*\ ``,``\ *name*\ ``)``
+    This operator produces a DAG node with the same operator and arguments as
+    *dag*, but replacing the name of the operator with *name*.
+
 ``!shl(``\ *a*\ ``,`` *count*\ ``)``
     This operator shifts *a* left logically by *count* bits and produces the resulting
     value. The operation is performed on a 64-bit integer; the result
diff --git a/llvm/docs/TestingGuide.rst b/llvm/docs/TestingGuide.rst
index 76b6b4e..b1819c7 100644
--- a/llvm/docs/TestingGuide.rst
+++ b/llvm/docs/TestingGuide.rst
@@ -30,9 +30,9 @@ LLVM Testing Infrastructure Organization
 ========================================
 
 The LLVM testing infrastructure contains three major categories of tests:
-unit tests, regression tests and whole programs. The unit tests and regression
+unit tests, regression tests, and whole programs. The unit tests and regression
 tests are contained inside the LLVM repository itself under ``llvm/unittests``
-and ``llvm/test`` respectively and are expected to always pass -- they should be
+and ``llvm/test`` respectively and are expected to always pass. They should be
 run before every commit.
 
 The whole programs tests are referred to as the "LLVM test suite" (or
@@ -48,7 +48,7 @@ Unit tests
 Unit tests are written using `Google Test <https://github.com/google/googletest/blob/master/docs/primer.md>`_
 and `Google Mock <https://github.com/google/googletest/blob/master/docs/gmock_for_dummies.md>`_
 and are located in the ``llvm/unittests`` directory.
-In general unit tests are reserved for targeting the support library and other
+In general, unit tests are reserved for targeting the support library and other
 generic data structure, we prefer relying on regression tests for testing
 transformations and analysis on the IR.
 
@@ -61,7 +61,7 @@ written in depends on the part of LLVM being tested. These tests are driven by
 the :doc:`Lit <CommandGuide/lit>` testing tool (which is part of LLVM), and
 are located in the ``llvm/test`` directory.
 
-Typically when a bug is found in LLVM, a regression test containing just
+Typically, when a bug is found in LLVM, a regression test containing just
 enough code to reproduce the problem should be written and placed
 somewhere underneath this directory. For example, it can be a small
 piece of LLVM IR distilled from an actual application or benchmark.
@@ -82,10 +82,10 @@ for an example of such test.
 
 The test suite contains whole programs, which are pieces of code which
 can be compiled and linked into a stand-alone program that can be
-executed. These programs are generally written in high level languages
-such as C or C++.
+executed. These programs are generally written in high-level languages,
+such as C and C++.
 
-These programs are compiled using a user specified compiler and set of
+These programs are compiled using a user-specified compiler and set of
 flags, and then executed to capture the program output and timing
 information. The output of these programs is compared to a reference
 output to ensure that the program is being compiled correctly.
@@ -103,11 +103,11 @@ See the :doc:`TestSuiteGuide` for details.
 Debugging Information tests
 ---------------------------
 
-The test suite contains tests to check quality of debugging information.
-The test are written in C based languages or in LLVM assembly language.
+The test suite contains tests to check the quality of debugging information.
+The tests are written in C based languages or in LLVM assembly language.
 
 These tests are compiled and run under a debugger. The debugger output
-is checked to validate of debugging information. See README.txt in the
+is checked to validate the debugging information. See ``README.txt`` in the
 test suite for more information. This test suite is located in the
 ``cross-project-tests/debuginfo-tests`` directory.
 
@@ -126,13 +126,13 @@ and C++ programs. See the :doc:`TestSuiteGuide` for details.
 Unit and Regression tests
 -------------------------
 
-To run all of the LLVM unit tests use the check-llvm-unit target:
+To run all of the LLVM unit tests, use the ``check-llvm-unit`` target:
 
 .. code-block:: bash
 
     % make check-llvm-unit
 
-To run all of the LLVM regression tests use the check-llvm target:
+To run all of the LLVM regression tests, use the ``check-llvm`` target:
 
 .. code-block:: bash
 
@@ -163,7 +163,7 @@ to enable testing with valgrind and with leak checking enabled.
 
 To run individual tests or subsets of tests, you can use the ``llvm-lit``
 script which is built as part of LLVM. For example, to run the
-``Integer/BitPacked.ll`` test by itself you can run:
+``Integer/BitPacked.ll`` test by itself, you can run:
 
 .. code-block:: bash
 
@@ -224,35 +224,35 @@ only directories does not need the ``lit.local.cfg`` file. Read the :doc:`Lit
 documentation <CommandGuide/lit>` for more information.
 
 Each test file must contain lines starting with "RUN:" that tell :program:`lit`
-how to run it. If there are no RUN lines, :program:`lit` will issue an error
+how to run it. If there are no ``RUN`` lines, :program:`lit` will issue an error
 while running a test.
 
-RUN lines are specified in the comments of the test program using the
+``RUN`` lines are specified in the comments of the test program using the
 keyword ``RUN`` followed by a colon, and lastly the command (pipeline)
 to execute. Together, these lines form the "script" that :program:`lit`
-executes to run the test case. The syntax of the RUN lines is similar to a
+executes to run the test case. The syntax of the ``RUN`` lines is similar to a
 shell's syntax for pipelines including I/O redirection and variable
 substitution. However, even though these lines may *look* like a shell
-script, they are not. RUN lines are interpreted by :program:`lit`.
+script, they are not. ``RUN`` lines are interpreted by :program:`lit`.
 Consequently, the syntax differs from shell in a few ways. You can specify
-as many RUN lines as needed.
+as many ``RUN`` lines as needed.
 
-:program:`lit` performs substitution on each RUN line to replace LLVM tool names
+:program:`lit` performs substitution on each ``RUN`` line to replace LLVM tool names
 with the full paths to the executable built for each tool (in
 ``$(LLVM_OBJ_ROOT)/bin``). This ensures that :program:`lit` does
 not invoke any stray LLVM tools in the user's path during testing.
 
-Each RUN line is executed on its own, distinct from other lines unless
-its last character is ``\``. This continuation character causes the RUN
-line to be concatenated with the next one. In this way you can build up
+Each ``RUN`` line is executed on its own, distinct from other lines unless
+its last character is ``\``. This continuation character causes the ``RUN``
+line to be concatenated with the next one. In this way, you can build up
 long pipelines of commands without making huge line lengths. The lines
-ending in ``\`` are concatenated until a RUN line that doesn't end in
-``\`` is found. This concatenated set of RUN lines then constitutes one
+ending in ``\`` are concatenated until a ``RUN`` line that doesn't end in
+``\`` is found. This concatenated set of ``RUN`` lines then constitutes one
 execution. :program:`lit` will substitute variables and arrange for the pipeline
 to be executed. If any process in the pipeline fails, the entire line (and
 test case) fails too.
 
-Below is an example of legal RUN lines in a ``.ll`` file:
+Below is an example of legal ``RUN`` lines in a ``.ll`` file:
 
 .. code-block:: llvm
 
@@ -260,19 +260,19 @@ Below is an example of legal RUN lines in a ``.ll`` file:
     ; RUN: llvm-dis < %s.bc-13 > %t2
     ; RUN: diff %t1 %t2
 
-As with a Unix shell, the RUN lines permit pipelines and I/O
+As with a Unix shell, the ``RUN`` lines permit pipelines and I/O
 redirection to be used.
 
 There are some quoting rules that you must pay attention to when writing
-your RUN lines. In general nothing needs to be quoted. :program:`lit` won't
-strip off any quote characters so they will get passed to the invoked program.
+your ``RUN`` lines. In general, nothing needs to be quoted. :program:`lit` won't
+strip off any quote characters, so they will get passed to the invoked program.
 To avoid this use curly braces to tell :program:`lit` that it should treat
 everything enclosed as one value.
 
-In general, you should strive to keep your RUN lines as simple as possible,
+In general, you should strive to keep your ``RUN`` lines as simple as possible,
 using them only to run tools that generate textual output you can then examine.
 The recommended way to examine output to figure out if the test passes is using
-the :doc:`FileCheck tool <CommandGuide/FileCheck>`. *[The usage of grep in RUN
+the :doc:`FileCheck tool <CommandGuide/FileCheck>`. *[The usage of grep in ``RUN``
 lines is deprecated - please do not send or commit patches that use it.]*
 
 Put related tests into a single file rather than having a separate file per
@@ -283,11 +283,11 @@ Generating assertions in regression tests
 -----------------------------------------
 
 Some regression test cases are very large and complex to write/update by hand.
-In that case to reduce the human work we can use the scripts available in
-llvm/utils/ to generate the assertions.
+In that case, to reduce the manual work, we can use the scripts available in
+``llvm/utils/`` to generate the assertions.
 
-For example to generate assertions in an :program:`llc`-based test, after
-adding one or more RUN lines use:
+For example, to generate assertions in an :program:`llc`-based test, after
+adding one or more ``RUN`` lines, use:
 
  .. code-block:: bash
 
@@ -368,7 +368,7 @@ Best practices for regression tests
 Extra files
 -----------
 
-If your test requires extra files besides the file containing the ``RUN:`` lines
+If your test requires extra files besides the file containing the ``RUN:`` lines,
 and the extra files are small, consider specifying them in the same file and
 using ``split-file`` to extract them. For example,
 
@@ -442,7 +442,7 @@ Elaborated tests
 
 Generally, IR and assembly test files benefit from being cleaned to remove
 unnecessary details. However, for tests requiring elaborate IR or assembly
-files where cleanup is less practical (e.g., large amount of debug information
+files where cleanup is less practical (e.g., a large amount of debug information
 output from Clang), you can include generation instructions within
 ``split-file`` part called ``gen``. Then, run
 ``llvm/utils/update_test_body.py`` on the test file to generate the needed
@@ -472,7 +472,7 @@ then rewrite the part after ``gen`` with its stdout.
 
 For convenience, if the test needs one single assembly file, you can also wrap
 ``gen`` and its required files with ``.ifdef`` and ``.endif``. Then you can
-skip ``split-file`` in RUN lines.
+skip ``split-file`` in ``RUN`` lines.
 
 .. code-block:: none
 
@@ -521,7 +521,7 @@ utilize ``split-file`` in ``RUN`` lines.
 Fragile tests
 -------------
 
-It is easy to write a fragile test that would fail spuriously if the tool being
+It is easy to write a fragile test that could fail spuriously if the tool being
 tested outputs a full path to the input file.  For example, :program:`opt` by
 default outputs a ``ModuleID``:
 
@@ -552,7 +552,7 @@ default outputs a ``ModuleID``:
 
 This test will fail if placed into a ``download`` directory.
 
-To make your tests robust, always use ``opt ... < %s`` in the RUN line.
+To make your tests robust, always use ``opt ... < %s`` in the ``RUN`` line.
 :program:`opt` does not output a ``ModuleID`` when input comes from stdin.
 
 Platform-Specific Tests
@@ -560,21 +560,21 @@ Platform-Specific Tests
 
 Whenever adding tests that require the knowledge of a specific platform,
 either related to code generated, specific output or back-end features,
-you must make sure to isolate the features, so that buildbots that
+you must isolate the features, so that buildbots that
 run on different architectures (and don't even compile all back-ends),
 don't fail.
 
 The first problem is to check for target-specific output, for example sizes
 of structures, paths and architecture names, for example:
 
-* Tests containing Windows paths will fail on Linux and vice-versa.
+* Tests containing Windows paths will fail on Linux and vice versa.
 * Tests that check for ``x86_64`` somewhere in the text will fail anywhere else.
 * Tests where the debug information calculates the size of types and structures.
 
-Also, if the test rely on any behaviour that is coded in any back-end, it must
+Also, if the test relies on any behaviour that is coded in any back-end, it must
 go in its own directory. So, for instance, code generator tests for ARM go
 into ``test/CodeGen/ARM`` and so on. Those directories contain a special
-``lit`` configuration file that ensure all tests in that directory will
+``lit`` configuration file that ensures all tests in that directory will
 only run if a specific back-end is compiled and available.
 
 For instance, on ``test/CodeGen/ARM``, the ``lit.local.cfg`` is:
@@ -622,7 +622,7 @@ with debug builds or on particular platforms. Use ``REQUIRES``
 and ``UNSUPPORTED`` to control when the test is enabled.
 
 Some tests are expected to fail. For example, there may be a known bug
-that the test detect. Use ``XFAIL`` to mark a test as an expected failure.
+that the test detects. Use ``XFAIL`` to mark a test as an expected failure.
 An ``XFAIL`` test will be successful if its execution fails, and
 will be a failure if its execution succeeds.
 
@@ -645,7 +645,7 @@ list of boolean expressions. The values in each expression may be:
   expressions can appear inside an identifier, so for example ``he{{l+}}o`` would match
   ``helo``, ``hello``, ``helllo``, and so on.
 - The default target triple, preceded by the string ``target=`` (for example,
-  ``target=x86_64-pc-windows-msvc``). Typically regular expressions are used
+  ``target=x86_64-pc-windows-msvc``). Typically, regular expressions are used
   to match parts of the triple (for example, ``target={{.*}}-windows{{.*}}``
   to match any Windows target triple).
 
@@ -684,7 +684,7 @@ have different effects. ``UNSUPPORTED`` causes the test to be skipped;
 this saves execution time, but then you'll never know whether the test
 actually would start working. Conversely, ``XFAIL`` actually runs the test
 but expects a failure output, taking extra execution time but alerting you
-if/when the test begins to behave correctly (an XPASS test result). You
+if/when the test begins to behave correctly (an ``XPASS`` test result). You
 need to decide which is more appropriate in each case.
 
 **Using ``target=...``**
@@ -698,7 +698,7 @@ and it's generally a good idea to use a trailing wildcard to allow for
 unexpected suffixes.
 
 Also, it's generally better to write regular expressions that use entire
-triple components, than to do something clever to shorten them. For
+triple components than to do something clever to shorten them. For
 example, to match both freebsd and netbsd in an expression, you could write
 ``target={{.*(free|net)bsd.*}}`` and that would work. However, it would
 prevent a ``grep freebsd`` from finding this test. Better to use:
@@ -708,8 +708,8 @@ prevent a ``grep freebsd`` from finding this test. Better to use:
 Substitutions
 -------------
 
-Besides replacing LLVM tool names the following substitutions are performed in
-RUN lines:
+Besides replacing LLVM tool names, the following substitutions are performed in
+``RUN`` lines:
 
 ``%%``
    Replaced by a single ``%``. This allows escaping other substitutions.
@@ -726,7 +726,7 @@ RUN lines:
    Example: ``/home/user/llvm/test/MC/ELF``
 
 ``%t``
-   File path to a temporary file name that could be used for this test case.
+   File path to a temporary file name that can be used for this test case.
    The file name won't conflict with other test cases. You can append to it
    if you need multiple temporaries. This is useful as the destination of
    some redirected output.
@@ -811,7 +811,7 @@ RUN lines:
   optional integer offset.  These expand only if they appear
   immediately in ``RUN:``, ``DEFINE:``, and ``REDEFINE:`` directives.
   Occurrences in substitutions defined elsewhere are never expanded.
-  For example, this can be used in tests with multiple RUN lines,
+  For example, this can be used in tests with multiple ``RUN`` lines,
   which reference the test file's line numbers.
 
 **LLVM-specific substitutions:**
@@ -988,7 +988,7 @@ directives:
 - **Substitution value**: The value includes all text from the first
   non-whitespace character after ``=`` to the last non-whitespace character.  If
   there is no non-whitespace character after ``=``, the value is the empty
-  string.  Escape sequences that can appear in python ``re.sub`` replacement
+  string.  Escape sequences that can appear in Python ``re.sub`` replacement
   strings are treated as plain text in the value.
 - **Line continuations**: If the last non-whitespace character on the line after
   ``:`` is ``\``, then the next directive must use the same directive keyword
@@ -1057,7 +1057,7 @@ producing incorrect output.
 Options
 -------
 
-The llvm lit configuration allows to customize some things with user options:
+The llvm lit configuration allows some things to be customized with user options:
 
 ``llc``, ``opt``, ...
     Substitute the respective llvm tool name with a custom command line. This
@@ -1076,8 +1076,8 @@ The llvm lit configuration allows to customize some things with user options:
 Other Features
 --------------
 
-To make RUN line writing easier, there are several helper programs. These
-helpers are in the PATH when running tests, so you can just call them using
+To make ``RUN`` line writing easier, several helper programs are available. These
+helpers are in the ``PATH`` when running tests, so you can just call them using
 their name. For example:
 
 ``not``
diff --git a/llvm/docs/WritingAnLLVMBackend.rst b/llvm/docs/WritingAnLLVMBackend.rst
index 3c5d594..cab6471 100644
--- a/llvm/docs/WritingAnLLVMBackend.rst
+++ b/llvm/docs/WritingAnLLVMBackend.rst
@@ -150,7 +150,7 @@ any other naming scheme will confuse ``llvm-config`` and produce a lot of
 To make your target actually do something, you need to implement a subclass of
 ``TargetMachine``.  This implementation should typically be in the file
 ``lib/Target/DummyTargetMachine.cpp``, but any file in the ``lib/Target``
-directory will be built and should work.  To use LLVM's target independent code
+directory will be built and should work.  To use LLVM's target-independent code
 generator, you should do what all current machine backends do: create a
 subclass of ``CodeGenTargetMachineImpl``.  (To create a target from scratch, create a
 subclass of ``TargetMachine``.)
@@ -1671,7 +1671,7 @@ For example in ``SparcTargetAsmInfo.cpp``:
   }
 
 The X86 assembly printer implementation (``X86TargetAsmInfo``) is an example
-where the target specific ``TargetAsmInfo`` class uses an overridden methods:
+where the target-specific ``TargetAsmInfo`` class uses an overridden methods:
 ``ExpandInlineAsm``.
 
 A target-specific implementation of ``AsmPrinter`` is written in
diff --git a/llvm/docs/WritingAnLLVMPass.rst b/llvm/docs/WritingAnLLVMPass.rst
index 9c2c383..eec9887 100644
--- a/llvm/docs/WritingAnLLVMPass.rst
+++ b/llvm/docs/WritingAnLLVMPass.rst
@@ -431,7 +431,7 @@ The ``print`` method
   virtual void print(llvm::raw_ostream &O, const Module *M) const;
 
 The ``print`` method must be implemented by "analyses" in order to print a
-human readable version of the analysis results.  This is useful for debugging
+human-readable version of the analysis results.  This is useful for debugging
 an analysis itself, as well as for other people to figure out how an analysis
 works.  Use the opt ``-analyze`` argument to invoke this method.
 
diff --git a/llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl10.rst b/llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl10.rst
index 7b9105b..a739936 100644
--- a/llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl10.rst
+++ b/llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl10.rst
@@ -129,7 +129,7 @@ course, C source code is not actually portable in general either - ever
 port a really old application from 32- to 64-bits?).
 
 The problem with C (again, in its full generality) is that it is heavily
-laden with target specific assumptions. As one simple example, the
+laden with target-specific assumptions. As one simple example, the
 preprocessor often destructively removes target-independence from the
 code when it processes the input text: