Updates to the memory consistency model spec

This giant patch is the result of months of work from a lot of different people in the memory model TG.
author: Daniel Lustig <dlustig@nvidia.com> 2018-05-02 16:31:03 -0700
committer: Daniel Lustig <dlustig@nvidia.com> 2018-05-02 16:31:03 -0700
commit: 03a5e722fc0fe7b94dd0a49f550ff7b41a63f612 (patch)
tree: f6db80e1e442798654d12bc5e9bc151930d49570 /src/rvwmo.tex
parent: 3559c11db55e96e1220c6b032d9d920b1808f151 (diff)
download: riscv-isa-manual-03a5e722fc0fe7b94dd0a49f550ff7b41a63f612.zip
riscv-isa-manual-03a5e722fc0fe7b94dd0a49f550ff7b41a63f612.tar.gz
riscv-isa-manual-03a5e722fc0fe7b94dd0a49f550ff7b41a63f612.tar.bz2
1 files changed, 206 insertions, 0 deletions
diff --git a/src/rvwmo.tex b/src/rvwmo.tex
new file mode 100644
index 0000000..59867e8
--- /dev/null
+++ b/src/rvwmo.tex
@@ -0,0 +1,206 @@
+\chapter{RVWMO Memory Consistency Model, Version 0.1}
+\label{ch:memorymodel}
+
+This chapter defines the RISC-V memory consistency model.
+A memory consistency model is a set of rules specifying the values that can be returned by loads of memory.
+RISC-V uses a memory model called ``RVWMO'' (RISC-V Weak Memory Ordering) which is designed to provide flexibility for architects to build high-performance scalable designs while simultaneously supporting a tractable programming model.
+
+Under RVWMO, code running on a single hart appears to execute in order from the perspective of other memory instructions in the same hart, but memory instructions from another hart may observe the memory instructions from the first hart being executed in a different order.
+Therefore, multithreaded code may require explicit synchronization to guarantee ordering between memory instructions from different harts.
+The base RISC-V ISA provides a FENCE instruction for this purpose, described in Section~\ref{sec:fence}, while the atomics extension ``A'' additionally defines load-reserved/store-conditional and atomic read-modify-write instructions.
+
+The standard ISA extension for misaligned atomics ``Zam'' (Chapter~\ref{sec:zam}) and the standard ISA extension for total store ordering ``Ztso'' (Chapter~\ref{sec:ztso}) augment RVWMO with additional rules specific to those extensions.
+
+The appendices to this specification provide both axiomatic and operational formalizations of the memory consistency model as well as additional explanatory material.
+
+\begin{commentary}
+  This chapter defines the memory model for regular main memory operations.  The interaction of the memory model with I/O memory, instruction fetches, FENCE.I, page table walks, and SFENCE.VMA is not (yet) formalized.  Some or all of the above may be formalized in a future revision of this specification.  Future ISA extensions such as the ``V'' vector, ``T'' transactional memory, and ``J'' JIT extensions will need to be incorporated into a future revision as well.
+
+  Memory consistency models supporting overlapping memory accesses of different widths simultaneously remain an active area of academic research and are not yet fully understood.  The specifics of how memory accesses of different sizes interact under RVWMO are specified to the best of our current abilities, but they are subject to revision should new issues be uncovered.
+\end{commentary}
+
+\section{Definition of the RVWMO Memory Model}
+\label{sec:rvwmo}
+
+The RVWMO memory model is defined in terms of the {\em global memory order}, a total ordering of the memory operations produced by all harts.
+In general, a multithreaded program has many different possible executions, with each execution having its own corresponding global memory order.
+
+The global memory order is defined over the primitive load and store operations generated by memory instructions.
+It is then subject to the constraints defined in the rest of this chapter.
+Any execution satisfying all of the memory model constraints is a legal execution (as far as the memory model is concerned).
+
+\subsection*{Memory Model Primitives}
+\label{sec:rvwmo:primitives}
+The {\em program order} over memory operations reflects the order in which the instructions that generate each load and store are logically laid out in that hart's dynamic instruction stream; i.e., the order in which a simple in-order processor would execute the instructions of that hart.
+
+Memory-accessing instructions give rise to {\em memory operations}.
+A memory operation can be either a {\em load operation}, a {\em store operation}, or both simultaneously.
+All memory operations are single-copy atomic: they can never be observed in a partially-complete state.
+
+Each aligned memory instruction other than an unsuccessful SC instruction gives rise to exactly one memory operation.
+An aligned AMO gives rise to a single memory operation that is both a load operation and a store operation simultaneously.
+Unsuccessful SC instructions do not generate architecturally-visible memory operations.
+
+A misaligned load or store instruction may be decomposed into a set of component memory operations of any granularity.
+The memory operations generated by a misaligned instruction are not ordered with respect to each other in program order, but they are ordered normally with respect to the memory operations generated by preceding and subsequent instructions in program order.
+The atomics extension ``A'' does not require execution environments to support misaligned atomic instructions at all; however, if misaligned atomics are supported via the ``Zam'' extension, LRs, SCs, and AMOs may be decomposed subject to the constraints of the atomicity axiom for misaligned atomics, which is defined in Chapter~\ref{sec:zam}.
+
+\begin{commentary}
+  The decomposition of misaligned memory operations down to byte granularity facilitates emulation on implementations that do not natively support misaligned accesses.
+  Such implementations might, for example, simply iterate over the bytes of a misaligned access one by one.
+\end{commentary}
+
+An LR and SC are said to be {\em paired} if the LR precedes the SC in program order and if there are no other LR or SC instructions in between; the corresponding memory operations are said to be paired as well (except in case of a failed SC, where no store operation is generated).
+Both LR and SC instructions, and the memory operations generated by them, may be unpaired if they do not meet the condition above.
+The complete list of conditions determining whether an SC must succeed, may succeed, or must fail is defined in Section~\ref{sec:lrsc}.
+
+Load and store operations may also carry one or more ordering annotations from the following set: ``acquire-RCpc'', ``acquire-RCsc'', ``release-RCpc'', and ``release-RCsc''.
+In the memory model literature, the term ``RCpc'' stands for release consistency with processor-consistent synchronization operations, and the term ``RCsc'' stands for release consistency with sequentially-consistent synchronization operations~\cite{Gharachorloo90memoryconsistency}.
+
+An AMO or LR instruction with {\em aq} set has an ``acquire-RCsc'' annotation.
+An AMO or SC instruction with {\em rl} set has a ``release-RCsc'' annotation.
+An AMO, LR, or SC instruction with both {\em aq} and {\em rl} set has both ``acquire-RCsc'' and ``release-RCsc'' annotations.
+
+\begin{commentary}
+  ``RCpc'' annotations are currently only used when implicitly assigned to every memory access per the standard extension ``Ztso'' (Chapter~\ref{sec:ztso}).  Furthermore, although the ISA does not currently contain native load-acquire or store-relase instructions, nor RCpc variants thereof the current {\em aq} and {\em rl} bits, the RVWMO model itself is designed to be forwards-compatible with the potential addition of any or all of the above into the ISA in a future extension.
+\end{commentary}
+
+For convenience, we use the term ``acquire annotation'' to refer to an acquire-RCpc annotation or an acquire-RCsc annotation.
+Likewise, a ``release annotation'' refers to a release-RCpc annotation or a release-RCsc annotation.
+An ``RCpc annotation'' refers to an acquire-RCpc annotation or a release-RCpc annotation.
+An ``RCsc annotation'' refers to an acquire-RCsc annotation or a release-RCsc annotation.
+
+\subsection*{Syntactic Dependencies}
+\label{sec:memorymodel:dependencies}
+The definition of the RVWMO memory model depends in part on the notion of a syntactic dependency, defined as follows.
+
+In the context of defining dependencies, a ``register'' refers either to an entire general-purpose register, some portion of a CSR, or an entire CSR.  The granularity at which dependencies are tracked through CSRs is specific to each CSR and is defined in Section~\ref{sec:csr-granularity}.
+
+Syntactic dependencies are defined in terms of instructions' {\em source registers}, instructions' {\em destination registers}, and the way instructions {\em carry a dependency} from their source registers to their destination registers.
+This section provides a general definition of all of these terms; however, Section~\ref{sec:source-dest-regs} provides a complete listing of the specifics for each instruction.
+
+In general, a register $r$ other than {\tt x0} is a {\em source register} for an instruction $i$ if any of the following hold:
+\begin{itemize}
+  \item In the opcode of $i$, {\em rs1}, {\em rs2}, or {\em rs3} is set to $r$
+  \item $i$ is a CSR instruction, and in the opcode of $i$, {\em csr} is set to $r$, unless $i$ is CSRRW or CSRRWI and {\em rd} is set to {\tt x0}
+  \item $r$ is a CSR and an implicit source register for $i$, as defined in Section~\ref{sec:source-dest-regs}
+  \item $r$ is a CSR that aliases with another source register for $i$
+\end{itemize}
+Memory instructions also further specify which source registers are {\em address source registers} and which are {\em data source registers}.
+
+In general, a register $r$ other than {\tt x0} is a {\em destination register} for an instruction $i$ if any of the following hold:
+\begin{itemize}
+  \item In the opcode of $i$, {\em rd} is set to $r$
+  \item $i$ is a CSR instruction, and in the opcode of $i$, {\em csr} is set to $r$, unless $i$ is CSRRS or CSRRC and {\em rs1} is set to {\tt x0} or $i$ is CSRRSI or CSRRCI and uimm[4:0] is set to zero.
+  \item $r$ is a CSR and an implicit destination register for $i$, as defined in Section~\ref{sec:source-dest-regs}
+  \item $r$ is a CSR that aliases with another destination register for $i$
+\end{itemize}
+
+Most non-memory instructions {\em carry a dependency} from each of their source registers to each of their destination registers.
+However, there are exceptions to this rule; see Section~\ref{sec:source-dest-regs}
+
+Instruction $j$ has a {\em syntactic dependency} on instruction $i$ via destination register $s$ of $i$ and source register $r$ of $j$ if either of the following hold:
+\begin{itemize}
+  \item $s$ is the same as $r$, and no instruction program-ordered between $i$ and $j$ has $r$ as a destination register
+  \item There is an instruction $m$ program-ordered between $i$ and $j$ such that all of the following hold:
+    \begin{enumerate}
+      \item $j$ has a syntactic dependency on $m$ via destination register $q$ and source register $r$
+      \item $m$ has a syntactic dependency on $i$ via destination register $s$ and source register $p$
+      \item $m$ carries a dependency from $p$ to $q$
+    \end{enumerate}
+\end{itemize}
+
+Finally, in the definitions that follow, let $a$ and $b$ be two memory operations, and let $i$ and $j$ be the instructions that generate $a$ and $b$, respectively.
+
+$b$ has a {\em syntactic address dependency} on $a$ if $r$ is an address source register for $j$ and $j$ has a syntactic dependency on $i$ via source register $r$
+
+$b$ has a {\em syntactic data dependency} on $a$ if $b$ is a store operation, $r$ is a data source register for $j$, and $j$ has a syntactic dependency on $i$ via source register $r$
+
+$b$ has a {\em syntactic control dependency} on $a$ if there is an instruction $m$ program-ordered between $i$ and $j$ such that $m$ is a branch or indirect jump and $m$ has a syntactic dependency on $i$.
+
+\subsection*{Preserved Program Order}
+The global memory order for any given execution of a program respects some but not all of each hart's program order.
+The subset of program order that must be respected by the global memory order is known as {\em preserved program order}.
+
+\newcommand{\ppost}{$b$ is a store, and $a$ and $b$ access overlapping memory addresses}
+\newcommand{\ppofence}{There is a FENCE instruction that orders $a$ before $b$}
+\newcommand{\ppoacquire}{$a$ has an acquire annotation}
+\newcommand{\pporelease}{$b$ has a release annotation}
+\newcommand{\pporcsc}{$a$ and $b$ both have RCsc annotations}
+\newcommand{\ppoamoforward}{$a$ is generated by an AMO or SC instruction, $b$ is a load, and $b$ returns a value written by $a$}
+\newcommand{\ppoaddr}{$b$ has a syntactic address dependency on $a$}
+\newcommand{\ppodata}{$b$ has a syntactic data dependency on $a$}
+\newcommand{\ppoctrl}{$b$ is a store, and $b$ has a syntactic control dependency on $a$}
+\newcommand{\ppopair}{$a$ is paired with $b$}
+\newcommand{\ppordw}{$a$ and $b$ are loads, $x$ is a byte read by both $a$ and $b$, there is no store to $x$ between $a$ and $b$ in program order, and $a$ and $b$ return values for $x$ written by different memory operations}
+\newcommand{\ppoaddrdatarfi}{$b$ is a load, and there exists some store $m$ between $a$ and $b$ in program order such that $m$ has an address or data dependency on $a$, and $b$ returns a value written by $m$}
+\newcommand{\ppoaddrpo}{$b$ is a store, and there exists some instruction $m$ between $a$ and $b$ in program order such that $m$ has an address dependency on $a$}
+%\newcommand{\ppoctrlcfence}{$a$ and $b$ are loads, $b$ has a syntactic control dependency on $a$, and there exists a {\tt fence.i} between the branch used to form the control dependency and $b$ in program order}
+%\newcommand{\ppoaddrpocfence}{$a$ is a load, there exists an instruction $m$ which has a syntactic address dependency on $a$, and there exists a {\tt fence.i} between $m$ and $b$ in program order}
+
+The complete definition of preserved program order is as follows (and note that AMOs are simultaneously both loads and stores):
+memory operation $a$ precedes memory operation $b$ in preserved program order (and hence also in the global memory order) if $a$ precedes $b$ in program order, $a$ and $b$ both access regular main memory (rather than I/O regions), and any of the following hold:
+
+\begin{itemize}
+  \item Overlapping-Address Orderings:
+    \begin{enumerate}
+      \item\label{ppo:->st} \ppost
+      \item\label{ppo:rdw} \ppordw
+      \item\label{ppo:amoforward} \ppoamoforward
+    \end{enumerate}
+  \item Explicit Synchronization
+    \begin{enumerate}[resume]
+      \item\label{ppo:fence} \ppofence
+      \item\label{ppo:acquire} \ppoacquire
+      \item\label{ppo:release} \pporelease
+      \item\label{ppo:rcsc} \pporcsc
+      \item\label{ppo:pair} \ppopair
+    \end{enumerate}
+  \item Syntactic Dependencies
+    \begin{enumerate}[resume]
+      \item\label{ppo:addr} \ppoaddr
+      \item\label{ppo:data} \ppodata
+      \item\label{ppo:ctrl} \ppoctrl
+    \end{enumerate}
+  \item Pipeline Dependencies
+    \begin{enumerate}[resume]
+      \item\label{ppo:addrdatarfi} \ppoaddrdatarfi
+      \item\label{ppo:addrpo} \ppoaddrpo
+      %\item\label{ppo:ctrlcfence} \ppoctrlcfence
+      %\item\label{ppo:addrpocfence} \ppoaddrpocfence
+    \end{enumerate}
+\end{itemize}
+
+\subsection*{Memory Model Axioms}
+
+An execution of a RISC-V program obeys the RVWMO memory consistency model only if there exists a global memory order conforming to preserved program order and satisfying the {\em load value axiom}, the {\em atomicity axiom}, and the {\em progress axiom}.
+
+\newcommand{\loadvalueaxiom}{
+  Each byte of each load $i$ returns the value written to that byte by the store that is the latest in global memory order among the following stores:
+  \begin{enumerate}
+    \item Stores that write that byte and that precede $i$ in the global memory order
+    \item Stores that write that byte and that precede $i$ in program order
+  \end{enumerate}
+}
+
+\newcommand{\atomicityaxiom}{If $r$ and $w$ are paired load and store operations generated by aligned LR and SC instructions in a hart $h$, and if $s$ is any store to byte $x$ from which $r$ returns a value, then $s$ must precede $w$ in the global memory order, and there can be no store from hart other than $h$ to byte $x$ following $s$ and preceding $w$ in the global memory order.}
+
+\newcommand{\progressaxiom}{No memory operation may be preceded in the global memory order by an infinite sequence of other memory operations.}
+
+\paragraph{Load Value Axiom}
+\label{rvwmo:ax:load}
+\loadvalueaxiom
+
+\paragraph{Atomicity Axiom}
+\label{rvwmo:ax:atom}
+\atomicityaxiom
+
+\begin{commentary}
+  The \nameref{rvwmo:ax:atom} theoretically supports LR/SC pairs of different widths and to mismatched addresses, since implementations are permitted to allow SC operations to succeed in such cases.  However, in practice, we expect such patterns to be rare, and their use is discouraged.
+\end{commentary}
+
+\paragraph{Progress Axiom}
+\label{rvwmo:ax:prog}
+\progressaxiom
+
+\input{dep-table}
author	Daniel Lustig <dlustig@nvidia.com>	2018-05-02 16:31:03 -0700
committer	Daniel Lustig <dlustig@nvidia.com>	2018-05-02 16:31:03 -0700
commit	03a5e722fc0fe7b94dd0a49f550ff7b41a63f612 (patch)
tree	f6db80e1e442798654d12bc5e9bc151930d49570 /src/rvwmo.tex
parent	3559c11db55e96e1220c6b032d9d920b1808f151 (diff)
download	riscv-isa-manual-03a5e722fc0fe7b94dd0a49f550ff7b41a63f612.zip riscv-isa-manual-03a5e722fc0fe7b94dd0a49f550ff7b41a63f612.tar.gz riscv-isa-manual-03a5e722fc0fe7b94dd0a49f550ff7b41a63f612.tar.bz2