aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDaniel Lustig <dlustig@nvidia.com>2021-06-02 09:49:23 -0400
committerDaniel Lustig <dlustig@nvidia.com>2021-06-02 09:49:23 -0400
commite2f81a1791cf5f0483200c3c37150d0f2eccc485 (patch)
tree12b9364c66ecf3a7d9c5538f0234a5f02343d0da
parent883bd183513ccdfed62a31d862d1c843aed3e3d0 (diff)
parent58b9433f58d5b199a7e470f11cb7474368cb7d11 (diff)
downloadriscv-isa-manual-e2f81a1791cf5f0483200c3c37150d0f2eccc485.zip
riscv-isa-manual-e2f81a1791cf5f0483200c3c37150d0f2eccc485.tar.gz
riscv-isa-manual-e2f81a1791cf5f0483200c3c37150d0f2eccc485.tar.bz2
Merge branch 'virtual-memory' into Svnapot
-rw-r--r--.travis.yml3
-rw-r--r--marchid.md5
-rw-r--r--src/a.tex4
-rw-r--r--src/b.tex2
-rw-r--r--src/c.tex8
-rw-r--r--src/csr.tex2
-rw-r--r--src/extensions.tex9
-rw-r--r--src/f.tex20
-rw-r--r--src/hypervisor.tex16
-rw-r--r--src/instr-table.tex22
-rw-r--r--src/intro.tex2
-rw-r--r--src/listofitems.sty4
-rw-r--r--src/listofitems.tex397
-rw-r--r--src/m.tex20
-rw-r--r--src/machine.tex359
-rw-r--r--src/memory.tex14
-rw-r--r--src/naming.tex3
-rw-r--r--src/preface.tex1
-rw-r--r--src/priv-preface.tex6
-rw-r--r--src/riscv-spec.tex1
-rw-r--r--src/rv32.tex60
-rw-r--r--src/rv64.tex15
-rw-r--r--src/rvc-instr-table.tex6
-rw-r--r--src/rvwmo.tex4
-rw-r--r--src/supervisor.tex91
-rw-r--r--src/zihintpause.tex52
26 files changed, 838 insertions, 288 deletions
diff --git a/.travis.yml b/.travis.yml
index d10a57c..b72e692 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,5 +1,4 @@
-sudo: required
-dist: trusty
+dist: focal
before_install:
- sudo apt-get -qq update && sudo apt-get install -y --no-install-recommends texlive-fonts-recommended texlive-latex-extra texlive-fonts-extra dvipng texlive-latex-recommended
script:
diff --git a/marchid.md b/marchid.md
index ec92c09..655da50 100644
--- a/marchid.md
+++ b/marchid.md
@@ -37,3 +37,8 @@ SweRV EL2 | Western Digital Corporation | [Thomas Wicki](mailto:Thomas.W
SweRV EH2 | Western Digital Corporation | [Thomas Wicki](mailto:Thomas.Wicki@wdc.com) | 17 | https://github.com/chipsalliance/Cores-SweRV-EH2
SERV | Olof Kindgren Enterprises | [Olof Kindgren](mailto:olof.kindgren@gmail.com) | 18 | https://github.com/olofk/serv
NEORV32 | Stephan Nolting | [Stephan Nolting](mailto:stnolting@gmail.com) | 19 | https://github.com/stnolting/neorv32
+CV32E40X | OpenHW Group | [Arjan Bink](mailto:arjan.bink@silabs.com), Silicon Laboratories | 20 | https://github.com/openhwgroup/cv32e40x
+CV32E40S | OpenHW Group | [Arjan Bink](mailto:arjan.bink@silabs.com), Silicon Laboratories | 21 | https://github.com/openhwgroup/cv32e40s
+Ibex | lowRISC | [lowRISC Hardware Team](mailto:hardware@lowrisc.org) | 22 | https://github.com/lowRISC/ibex
+RudolV | Jörg Mische | [Jörg Mische](mailto:bobbl@gmx.de) | 23 | https://github.com/bobbl/rudolv
+Steel Core | Rafael Calcada | [Rafael Calcada](mailto:rafaelcalcada@gmail.com) | 24 | https://github.com/rafaelcalcada/steel-core
diff --git a/src/a.tex b/src/a.tex
index 96e662d..5d64cbc 100644
--- a/src/a.tex
+++ b/src/a.tex
@@ -103,7 +103,7 @@ For RV64, LR.W and SC.W sign-extend the value placed in {\em rd}.
Both compare-and-swap (CAS) and LR/SC can be used to build lock-free
data structures. After extensive discussion, we opted for LR/SC for
several reasons: 1) CAS suffers from the ABA problem, which LR/SC
-avoids because it monitors all accesses to the address rather than
+avoids because it monitors all writes to the address rather than
only checking for changes in the data value; 2) CAS would also require
a new integer instruction format to support three source operands
(address, compare value, swap value) as well as a different memory
@@ -435,7 +435,7 @@ are encoded with an R-type instruction format. These AMO instructions
atomically load a data value from the address in {\em rs1}, place the
value into register {\em rd}, apply a binary operator to the loaded
value and the original value in {\em rs2}, then store the result back
-to the address in {\em rs1}. AMOs can either operate on 64-bit (RV64
+to the original address in {\em rs1}. AMOs can either operate on 64-bit (RV64
only) or 32-bit words in memory. For RV64, 32-bit AMOs always
sign-extend the value placed in {\em rd}, and ignore the upper 32 bits
of the original value of {\em rs2}.
diff --git a/src/b.tex b/src/b.tex
index 0951df4..0c4e497 100644
--- a/src/b.tex
+++ b/src/b.tex
@@ -9,7 +9,7 @@ shifts, and bit and byte permutations.
\begin{commentary}
Although bit manipulation instructions are very effective in some
application domains, particularly when dealing with externally packed
-data structures, we excluded them from the base ISA as they are not
+data structures, we excluded them from the base ISAs as they are not
useful in all domains and can add additional complexity or instruction
formats to supply all needed operands.
diff --git a/src/c.tex b/src/c.tex
index 8799bc4..fc174da 100644
--- a/src/c.tex
+++ b/src/c.tex
@@ -39,7 +39,7 @@ instructions allows significantly greater code density.
The compressed instruction encodings are mostly common across RV32C,
RV64C, and RV128C, but as shown in Table~\ref{rvcopcodemap}, a few
-opcodes are used for different purposes depending on base ISA width.
+opcodes are used for different purposes depending on base ISA.
For example, the wider address-space RV64C and RV128C variants require
additional opcodes to compress loads and stores of 64-bit integer
values, while RV32C uses the same opcodes to compress loads and stores
@@ -73,9 +73,9 @@ Short-range subroutine calls are more likely in small binaries for
microcontrollers, hence the motivation to include these in RV32C.
Although reusing opcodes for different purposes for different base
-register widths adds some complexity to documentation, the impact on
+ISAs adds some complexity to documentation, the impact on
implementation complexity is small even for designs that support
-multiple base ISA register widths. The compressed floating-point load
+multiple base ISAs. The compressed floating-point load
and store variants use the same instruction format with the same
register specifiers as the wider integer loads and stores.
\end{commentary}
@@ -1255,7 +1255,7 @@ least-significant bits set, corresponds to instructions wider
than 16 bits, including those in the base ISAs. Several instructions
are only valid for certain operands; when invalid, they are marked
either {\em RES} to indicate that the opcode is reserved for future
-standard extensions; {\em NSE} to indicate that the opcode is designated
+standard extensions; {\em Custom} to indicate that the opcode is designated
for custom extensions; or {\em HINT} to indicate that the opcode
is reserved for microarchitectural hints (see Section~\ref{sec:rvc-hints}).
diff --git a/src/csr.tex b/src/csr.tex
index 266e02a..539f42e 100644
--- a/src/csr.tex
+++ b/src/csr.tex
@@ -12,7 +12,7 @@ set of CSR instructions that operate on these CSRs.
The counters and timers are no longer considered mandatory parts of
the standard base ISAs, and so the CSR instructions required to
- access them have been moved out of the base ISA chapter into this
+ access them have been moved out of Chapter~\ref{rv32} into this
separate chapter.
\end{commentary}
diff --git a/src/extensions.tex b/src/extensions.tex
index a9050a1..56cc912 100644
--- a/src/extensions.tex
+++ b/src/extensions.tex
@@ -48,7 +48,7 @@ An instruction encoding space is some number of instruction bits
within which a base ISA or ISA extension is encoded. RISC-V supports
varying instruction lengths, but even within a single instruction
length, there are various sizes of encoding space available. For
-example, the base ISA is defined within a 30-bit encoding space (bits
+example, the base ISAs are defined within a 30-bit encoding space (bits
31--2 of the 32-bit instruction), while the atomic extension ``A''
fits within a 25-bit encoding space (bits 31--7).
@@ -117,7 +117,8 @@ Note that we consider the standard A extension to have a greenfield
encoding as it defines a new previously empty 25-bit encoding space in
the leftmost bits of the full 32-bit base instruction encoding, even
though its standard prefix locates it within the 30-bit encoding space
-of the base ISA. Changing only its single 7-bit prefix could move the
+of its parent base ISA.
+Changing only its single 7-bit prefix could move the
A extension to a different 30-bit encoding space while only worrying
about conflicts at the prefix level, not within the encoding space
itself.
@@ -201,7 +202,7 @@ standard-compatible global encodings can be used in a number of ways.
One use-case is developing highly specialized custom accelerators,
designed to run kernels from important application domains. These
might want to drop all but the base integer ISA and add in only the
-extensions that are required for the task in hand. The base ISA has
+extensions that are required for the task in hand. The base ISAs have
been designed to place minimal requirements on a hardware
implementation, and has been encoded to use only a small fraction of a
32-bit instruction encoding space.
@@ -376,7 +377,7 @@ unaware of the VLIW extension would have both prefix bits set (11) and
thus have the correct semantics, with each instruction at the end of a
group and not predicated.
-The main disadvantage of this approach is that the base ISA lacks the
+The main disadvantage of this approach is that the base ISAs lack the
complex predication support usually required in an aggressive VLIW
system, and it is difficult to add space to specify more predicate
registers in the standard 30-bit encoding space.
diff --git a/src/f.tex b/src/f.tex
index 9f61c88..81545fd 100644
--- a/src/f.tex
+++ b/src/f.tex
@@ -162,8 +162,10 @@ depend on rounding mode when executed with a reserved rounding mode is
{\em reserved}, including both static reserved rounding modes (101--110) and
dynamic reserved rounding modes (101--111). Some instructions, including
widening conversions, have the {\em rm} field but are nevertheless
-unaffected by the rounding mode; software should set their {\em rm}
-field to RNE (000).
+mathematically unaffected by the rounding mode; software should set their
+{\em rm} field to RNE (000) but implementations must treat the {\em rm}
+field as usual (in particular, with regard to decoding legal vs. reserved
+encodings).
\begin{table}[htp]
\begin{small}
@@ -272,7 +274,7 @@ exception flag.
\begin{commentary}
As allowed by the standard, we do not support traps on floating-point
-exceptions in the base ISA, but instead require explicit checks of the flags
+exceptions in the F extension, but instead require explicit checks of the flags
in software. We considered adding branches controlled directly by the
contents of the floating-point accrued exception flags, but ultimately chose
to omit these instructions to keep the ISA simple.
@@ -322,7 +324,7 @@ Detecting tininess after rounding results in fewer spurious underflow signals.
\section{Single-Precision Load and Store Instructions}
Floating-point loads and stores use the same base+offset addressing
-mode as the integer base ISA, with a base address in register {\em
+mode as the integer base ISAs, with a base address in register {\em
rs1} and a 12-bit signed byte offset. The FLW instruction loads a
single-precision floating-point value from memory into floating-point
register {\em rd}. FSW stores a single-precision value from
@@ -594,9 +596,9 @@ instructions round according to the {\em rm} field. A floating-point register
can be initialized to floating-point positive zero using FCVT.S.W {\em rd},
{\tt x0}, which will never set any exception flags.
-All floating-point conversion instructions raise the Inexact exception if the
-result differs from its operand value, yet is representable in the destination
-format.
+All floating-point conversion instructions set the Inexact exception flag if
+the rounded result differs from the operand value and the Invalid exception
+flag is not set.
\vspace{-0.2in}
\begin{center}
@@ -732,7 +734,7 @@ FMV.W.X & S & 0 & src & 000 & dest & OP-FP \\
The base floating-point ISA was defined so as to allow implementations
to employ an internal recoding of the floating-point format in
registers to simplify handling of subnormal values and possibly to
-reduce functional unit latency. To this end, the base ISA avoids
+reduce functional unit latency. To this end, the F extension avoids
representing integer values in the floating-point registers by
defining conversion and comparison operations that read and write the
integer register file directly. This also removes many of the common
@@ -782,7 +784,7 @@ FCMP & S & src2 & src1 & EQ/LT/LE & dest & OP-FP \\
\end{center}
\begin{commentary}
-The F extension provides a $\leq$ comparison, whereas the base ISA provides
+The F extension provides a $\leq$ comparison, whereas the base ISAs provide
a $\geq$ branch comparison. Because $\leq$ can be synthesized from $\geq$ and
vice-versa, there is no performance implication to this inconsistency, but it
is nevertheless an unfortunate incongruity in the ISA.
diff --git a/src/hypervisor.tex b/src/hypervisor.tex
index 4626ad3..e93b803 100644
--- a/src/hypervisor.tex
+++ b/src/hypervisor.tex
@@ -75,6 +75,14 @@ possible operating modes of a RISC-V hart with the hypervisor extension.
\label{h-operating-modes}
\end{table*}
+For purposes of interrupt global enables, HS-mode is considered more privileged
+than VS-mode, and VS-mode is considered more privileged than VU-mode.
+VS-mode interrupts are globally disabled when executing in U-mode.
+
+\begin{commentary}
+This description does not consider the possibility of U-mode or VU-mode interrupts and will be revised if the N extension for user-level interrupts is ultimately adopted.
+\end{commentary}
+
\section{Hypervisor and Virtual Supervisor CSRs}
An OS or hypervisor running in HS-mode uses the supervisor CSRs to interact with the exception,
@@ -1866,6 +1874,11 @@ the guest-physical memory management of all virtual machines, or even a global
fence for all memory-management data structures.
\end{commentary}
+If {\tt hgatp}.MODE is changed for a given VMID, an HFENCE.GVMA with
+{\em rs1}={\tt x0} (and {\em rs2} set to either {\tt x0} or the VMID) must
+be executed to order subsequent guest translations with the MODE
+change---even if the old MODE or new MODE is Bare.
+
Attempts to execute HFENCE.VVMA or HFENCE.GVMA when V=1 cause a virtual
instruction trap, while attempts to do the same in U-mode
cause an illegal instruction trap.
@@ -2501,6 +2514,9 @@ or store, not for the original access type.
However, any exception is always reported for the original access type
(instruction, load, or store/AMO).
+The G~bit in all G-stage PTEs is reserved for future standard use, should be cleared
+by software for forward compatibility, and must be ignored by hardware.
+
\begin{commentary}
G-stage address translation uses the identical format for PTEs as
regular address translation, even including the U~bit, due to the
diff --git a/src/instr-table.tex b/src/instr-table.tex
index bb6531c..604d51e 100644
--- a/src/instr-table.tex
+++ b/src/instr-table.tex
@@ -441,6 +441,28 @@
&
+\multicolumn{2}{|c|}{1000} &
+\multicolumn{3}{c|}{0011} &
+\multicolumn{1}{c|}{0011} &
+\multicolumn{1}{c|}{00000} &
+\multicolumn{1}{c|}{000} &
+\multicolumn{1}{c|}{00000} &
+\multicolumn{1}{c|}{0001111} & FENCE.TSO \\
+\cline{2-11}
+
+
+&
+\multicolumn{2}{|c|}{0000} &
+\multicolumn{3}{c|}{0001} &
+\multicolumn{1}{c|}{0000} &
+\multicolumn{1}{c|}{00000} &
+\multicolumn{1}{c|}{000} &
+\multicolumn{1}{c|}{00000} &
+\multicolumn{1}{c|}{0001111} & PAUSE \\
+\cline{2-11}
+
+
+&
\multicolumn{6}{|c|}{000000000000} &
\multicolumn{1}{c|}{00000} &
\multicolumn{1}{c|}{000} &
diff --git a/src/intro.tex b/src/intro.tex
index 25bb89f..112e7b4 100644
--- a/src/intro.tex
+++ b/src/intro.tex
@@ -26,7 +26,7 @@ implementations. Our goals in defining RISC-V include:
specialized variants.
\item Both 32-bit and 64-bit address space variants for
applications, operating system kernels, and hardware implementations.
-\item An ISA with support for highly-parallel multicore
+\item An ISA with support for highly parallel multicore
or manycore implementations, including heterogeneous multiprocessors.
\item Optional {\em variable-length instructions} to both expand available
instruction encoding space and to support an optional {\em dense
diff --git a/src/listofitems.sty b/src/listofitems.sty
new file mode 100644
index 0000000..fb45d1d
--- /dev/null
+++ b/src/listofitems.sty
@@ -0,0 +1,4 @@
+\expandafter\let\csname loi_fromsty\endcsname\relax
+\input listofitems.tex
+\ProvidesPackage\loiname[\loidate\space v\loiver\space Grab items in lists using user-specified sep char (CT)]
+\endinput \ No newline at end of file
diff --git a/src/listofitems.tex b/src/listofitems.tex
new file mode 100644
index 0000000..5ea3de6
--- /dev/null
+++ b/src/listofitems.tex
@@ -0,0 +1,397 @@
+% Ce fichier contient le code de l'extension "listofitems"
+%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% %
+\def\loiname {listofitems} %
+\def\loiver {1.53} %
+% %
+\def\loidate {2018/03/13} %
+% %
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% Author : Christian Tellechea, Steven B. Segletes
+% Status : Maintained
+% Maintainer : Christian Tellechea
+% Email : unbonpetit@netc.fr
+% steven.b.segletes.civ@mail.mil
+% Package URL: https://www.ctan.org/pkg/listofitems
+% Bug tracker: https://framagit.org/unbonpetit/listofitems/issues
+% Repository : https://framagit.org/unbonpetit/listofitems/tree/master
+% Copyright : Christian Tellechea 2017-2018
+% Licence : Released under the LaTeX Project Public License v1.3c
+% or later, see http://www.latex-project.org/lppl.txt
+% Files : 1) listofitems.tex
+% 2) listofitems.sty
+% 3) listofitems-fr.tex
+% 4) listofitems-fr.pdf
+% 5) listofitems-en.tex
+% 6) listofitems-en.pdf
+% 7) README
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\ifdefined\ProvidesPackage\else
+ \immediate\write -1 {%
+ Package: \loidate\space v\loiver\space Grab items in lists using user-specified sep char (CT)}%
+\fi
+
+\expandafter\edef\csname loi_restorecatcode\endcsname{\catcode\number`\_=\number\catcode`\_\relax}
+\catcode`\_11
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%% gestion des erreurs %%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\ifdefined\PackageError
+ \def\loi_error#1{\PackageError\loiname{#1}{Read the manual}}% pour LaTeX
+\else
+ \def\loi_error#1{\errmessage{Package \loiname\space Error: #1^^J}}% pour TeX
+\fi
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%% v\'erification de la pr`'esence de etex %%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\begingroup
+\edef\__tempa{\meaning\eTeXversion}\edef\__tempb{\string\eTeXversion}%
+\ifx\__tempa\__tempb
+ \endgroup
+\else
+ \endgroup
+ \loi_error{You are not using an eTeX engine, listofitems cannot work.}%
+ \expandafter\loi_restorecatcode\expandafter\endinput
+\fi
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%% macros auxiliaires %%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%---- macros de d\'eveloppement et de manipulation d'arguments
+\chardef\loi_stop=0
+\def\loi_quark{\loi_quark}
+\long\def\loi_identity#1{#1}
+\def\loi_gobarg#1{}
+\long\def\loi_first#1#2{#1}
+\long\def\loi_second#1#2{#2}
+\long\def\loi_firsttonil#1#2\_nil{#1}
+\long\def\loi_antefi#1#2\fi{#2\fi#1}
+\long\def\loi_exparg#1#2{\expandafter\loi_exparg_i\expandafter{#2}{#1}}% \loi_exparg{<a>}{<b>} devient <a>{<*b>}
+\long\def\loi_exparg_i#1#2{#2{#1}}
+\long\def\loi_expafter#1#2{\expandafter\loi_expafter_i\expandafter{#2}{#1}}% \loi_expafter{<a>}{<b>} devient <a><*b>
+\long\def\loi_expafter_i#1#2{#2#1}
+\def\loi_macroname{\loi_ifinrange\escapechar[[0:255]]{\expandafter\loi_gobarg}{}\string}
+\def\loi_argcsname#1#{\loi_argcsname_i{#1}}
+\def\loi_argcsname_i#1#2{\loi_expafter{#1}{\csname#2\endcsname}}
+
+%--- macros de test
+\long\def\loi_ifnum#1{\ifnum#1\expandafter\loi_first\else\expandafter\loi_second\fi}
+\long\def\loi_ifx#1{\ifx#1\expandafter\loi_first\else\expandafter\loi_second\fi}
+\long\def\loi_ifempty#1{\loi_exparg\loi_ifx{\expandafter\relax\detokenize{#1}\relax}}
+\def\loi_ifstar#1#2{\def\loi_ifstar_i{\loi_ifx{*\loi_nxttok}{\loi_first{#1}}{#2}}\futurelet\loi_nxttok\loi_ifstar_i}
+\long\def\loi_ifprimitive#1{\edef\loi_tempa{\meaning#1}\edef\loi_tempb{\string#1}\loi_ifx{\loi_tempa\loi_tempb}}
+\long\def\loi_ifcs#1{% #1 est-il une sc (n'\'etant pas une primitive) ?
+ \loi_ifempty{#1}
+ \loi_second% si #1 est vide, faux
+ {\loi_ifspacefirst{#1}
+ \loi_second% si espace en 1er, faux
+ {\loi_exparg\loi_ifempty{\loi_gobarg#1}% 1 seul token ?
+ {\begingroup \escapechar`\_
+ \if\expandafter\loi_firsttonil\detokenize{#1}\_nil\expandafter\loi_firsttonil\string\relax\_nil
+ \loi_ifprimitive{#1}
+ {\endgroup\expandafter\loi_second}
+ {\endgroup\expandafter\loi_first}%
+ \else
+ \endgroup\expandafter\loi_second
+ \fi
+ }
+ \loi_second% si plusieurs tokens, faux
+ }%
+ }%
+}
+\def\loi_ifinrange#1[[#2:#3]]{\loi_ifnum{\numexpr(#1-#2)*(#1-#3)>0 }\loi_second\loi_first}
+
+%--- macro de type for
+% Voir codes 150 \`a 155 ici --> http://progtex.fr/wp-content/uploads/2014/09/code.txt
+% et pages 175 \`a 184 du livre "Apprendre \`a programmer en TeX"
+\def\loi_fornum#1=#2to#3\do{%
+ \edef#1{\number\numexpr#2}\edef\loi_sgncmp{\ifnum#1<\numexpr#3\relax>+\else<-\fi}%
+ \expandafter\loi_fornum_i\csname loi_fornum_\string#1\expandafter\endcsname\expandafter{\number\numexpr#3\expandafter}\loi_sgncmp#1%
+}
+\long\def\loi_fornum_i#1#2#3#4#5#6{\def#1{\unless\ifnum#5#3#2\relax\loi_antefi{#6\edef#5{\number\numexpr#5#41\relax}#1}\fi}#1}
+
+%--- macros retirant les espaces extr\^emes
+% Voir codes 320 \`a 324 ici --> http://progtex.fr/wp-content/uploads/2014/09/code.txt
+% et pages 339 \`a 343 de "Apprendre \`a programmer en TeX"
+\long\def\loi_ifspacefirst#1{\expandafter\loi_ifspacefirst_i\detokenize{#10} \_nil}
+\long\def\loi_ifspacefirst_i#1 #2\_nil{\loi_ifempty{#1}}
+\expandafter\def\expandafter\loi_gobspace\space{}
+\def\loi_removefirstspaces{\romannumeral\loi_removefirstspaces_i}
+\long\def\loi_removefirstspaces_i#1{\loi_ifspacefirst{#1}{\expandafter\loi_removefirstspaces_i\expandafter{\loi_gobspace#1}}{\loi_stop#1}}
+\edef\loi_restorezerocatcode{\catcode0=\number\catcode0 \relax}
+\catcode0 12
+\long\def\loi_removelastspaces#1{\romannumeral\loi_removelastspaces_i#1^^00 ^^00\_nil}
+\long\def\loi_removelastspaces_i#1 ^^00{\loi_removelastspaces_ii#1^^00}
+\long\def\loi_removelastspaces_ii#1^^00#2\_nil{\loi_ifspacefirst{#2}{\loi_removelastspaces_i#1^^00 ^^00\_nil}{\loi_stop#1}}
+\loi_restorezerocatcode
+\long\def\loi_removeextremespaces#1{% #1=texte o\`u les espaces extr\^emes sont retir\'es
+ \romannumeral\expandafter\expandafter\expandafter\loi_removelastspaces\expandafter\expandafter\expandafter
+ {\expandafter\expandafter\expandafter\loi_stop\loi_removefirstspaces{#1}}%
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%% macro publique \setsepchar %%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\def\setsepchar{\futurelet\loi_nxttok\setsepchar_i}
+\def\setsepchar_i{\loi_ifx{[\loi_nxttok}\setsepchar_ii{\setsepchar_ii[/]}}
+\long\def\setsepchar_ii[#1]#2{% #1=sepcar de <liste des sepcar> #2=<liste des sepcar>
+ \loi_ifempty{#1}
+ {\loi_error{Empty separator not allowed, separator "/" used}%
+ \setsepchar_ii[/]{#2}%
+ }
+ {\def\loi_currentsep{#1}%
+ \_removeextremespacesfalse
+ \loi_nestcnt1 % r\'einitaliser niveau initial \`a 1
+ \def\nestdepth{1}%
+ \loi_argcsname\let{loi_previndex[\number\loi_nestcnt]}\empty
+ \def\loi_listname{loi_listofsep}%
+ \let\loi_def\def \let\loi_edef\edef \let\loi_let\let
+ \loi_ifempty{#2}
+ {\loi_error{Empty list of separators not allowed, "," used}%
+ \readlist_iv1{,}%
+ }
+ {\readlist_iv1{#2}}%
+ \loi_argcsname\let\nestdepth{loi_listofseplen[0]}%
+ \loi_argcsname\let\loi_currentsep{loi_listofsep[1]}% 1er car de s\'eparation
+ }%
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%% macro normalisant l'index %%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\def\loi_normalizeindex#1#2#3{% #1=macroname #2=liste d'index #3=profondeur max --> renvoie {err}{indx norm}
+ \loi_ifempty{#2}
+ {\loi_stop{}{}}
+ {\loi_normalizeindex_i1{}{#3}{#1}#2,\loi_quark,}%
+}%
+\def\loi_normalizeindex_i#1#2#3#4#5,{% #1=compteur de profondeur #2=index pr\'ec\'edents #3=profondeur max #4=macroname #5=index courant
+ \loi_ifx{\loi_quark#5}
+ {\loi_normalizeindex_iii#2\loi_quark}% supprimer la derni\`ere virgule
+ {\loi_ifnum{#1>#3 }
+ {\loi_invalidindex{Too deeply nested index, index [.] retained}{#2}}% si profondeur trop grande
+ {\loi_ifinrange\ifnum\numexpr#5<0 -1*\fi(#5)[[1:\csname #4len[#20]\endcsname]]% si abs(#5) hors de [1,len]
+ {\loi_exparg\loi_normalizeindex_ii{\number\numexpr#5\ifnum\numexpr#5<0 +\csname #4len[#20]\endcsname+1\fi}{#1}{#2}{#3}{#4}}
+ {\loi_invalidindex{#5 is an invalid index, index [.] retained}{#2}}%
+ }%
+ }%
+}
+\def\loi_normalizeindex_ii#1#2#3{\loi_exparg\loi_normalizeindex_i{\number\numexpr#2+1}{#3#1,}}% #1=index \`a rajouter #2=compteur de profondeur #3=index pr\'ec\'edents
+\def\loi_normalizeindex_iii#1,\loi_quark{\loi_stop{}{#1}}
+\def\loi_invalidindex#1#2{\loi_ifempty{#2}{\loi_invalidindex_i{#1},}\loi_invalidindex_i{#1}{#2}}
+\def\loi_invalidindex_i#1#2{\loi_invalidindex_ii#1\loi_quark#2\loi_quark}
+\def\loi_invalidindex_ii#1[.]#2\loi_quark#3,\loi_quark#4\loi_quark,{\loi_stop{#1[#3]#2}{#3}}% #4= index ignor\'es
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%% macro publique \readlist %%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\newcount\loi_nestcnt
+\def\greadlist{\let\loi_def\gdef\let\loi_edef\xdef\def\loi_let{\global\let}\readlist_o}%
+\def\readlist{\let\loi_def\def\let\loi_edef\edef\let\loi_let\let\readlist_o}
+\def\readlist_o{%
+ \loi_nestcnt1 % niveau initial = 1
+ \loi_argcsname\let{loi_previndex[\number\loi_nestcnt]}\empty
+ \loi_ifstar{\_removeextremespacestrue\readlist_i}{\_removeextremespacesfalse\readlist_i}%
+}
+\long\def\readlist_i#1#2{% #1=macro stockant les \'el\'ements #2=liste des \'el\'ements
+ \loi_ifcs{#2}
+ {\loi_exparg{\readlist_i#1}{#2}}
+ {\loi_edef\loi_listname{\loi_macroname#1}%
+ \loi_argcsname\loi_let{\loi_listname nest}\nestdepth
+ \loi_argcsname\loi_def{\loi_listname[]}{#2}% la liste enti\`ere
+ \loi_argcsname\loi_def{\loi_listname sep[]}{}% s\'eparateur vide
+ \loi_ifempty{#2}
+ {\loi_def#1[##1]{}%
+ \loi_argcsname\loi_def{\loi_listname len}{0}\loi_argcsname\loi_def{\loi_listname len[0]}{0}%
+ \loi_error{Empty list ignored, nothing to do}%
+ }
+ {\loi_edef#1[##1]{\unexpanded{\romannumeral\expandafter\loi_auxmacrolistitem\romannumeral\loi_normalizeindex}{\loi_listname}{##1}{\csname\loi_listname nest\endcsname}{\loi_listname}}%
+ \loi_argcsname\loi_edef{\loi_listname sep}[##1]{\unexpanded{\romannumeral\expandafter\loi_auxmacrolistitem\romannumeral\loi_normalizeindex}{\loi_listname}{##1}{\csname\loi_listname nest\endcsname}{\loi_listname sep}}%
+ \readlist_ii{#2}%
+ \loi_argcsname\loi_argcsname\loi_let{\loi_listname len}{\loi_listname len[0]}% longueur du niveau 0
+ }%
+ }%
+}
+\def\loi_auxmacrolistitem#1#2#3{%
+ \expandafter\expandafter\expandafter\loi_stop\csname#3[#2]\expandafter\endcsname
+ \romannumeral\loi_ifempty{#1}{\loi_stop}{\loi_stop\loi_error{#1}}%
+}
+\def\readlist_ii{%
+ \loi_argcsname\loi_let\loi_currentsep{loi_listofsep[\number\loi_nestcnt]}%
+ \expandafter\readlist_iii\loi_currentsep||\_nil
+}
+\long\def\readlist_iii#1||#2\_nil#3{\readlist_iv1{#3#1}}% #1=<sep courant simple> #3=liste -> rajoute un \'el\'ement vide pour le test ifempty ci dessous
+\long\def\readlist_iv#1#2{% #1=compteur d'index #2=liste d'\'el\'ements \`a examiner
+ \loi_ifempty{#2}
+ {\loi_argcsname\loi_edef{\loi_listname len[\csname loi_previndex[\number\loi_nestcnt]\endcsname0]}{\number\numexpr#1-1\relax}%
+ \loi_argcsname\loi_let{\loi_listname sep[\csname loi_previndex[\number\loi_nestcnt]\endcsname\number\numexpr#1-1\relax]}\empty% le dernier <sep> est <vide> ##NEW v1.52
+ \advance\loi_nestcnt-1
+ \loi_argcsname\loi_let\loi_currentsep{loi_listofsep[\number\loi_nestcnt]}%
+ }
+ {\loi_expafter{\readlist_vi{#2}{}}\loi_currentsep||\loi_quark||#2\_nil{#1}}% aller isoler le 1er item
+}
+\long\def\readlist_v#1#2{\readlist_vi{#2}{}#1||\loi_quark||#2\_nil}% #1=liste s\'eparateurs (s\'ep=||) #2=chaine de tokens
+\long\def\readlist_vi#1#2#3||{% #1=liste restante #2=dernier <sep utile> #3=<sep courant>
+ \loi_ifx{\loi_quark#3}
+ {\loi_ifempty{#2}% si #1 vide, aucun <sep utile> n'a \'et\'e trouv\'e, il reste \`a lire "<liste compl\`ete>\_nil"
+ {\long\def\readlist_vii##1\_nil##2{\loi_exparg{\readlist_ix{##2}{}}{\loi_gobarg##1}{#2}}}% ##2=compteur d'index
+ {\long\def\readlist_vii##1#2{\loi_exparg\readlist_viii{\loi_gobarg##1}\relax}%
+ \long\def\readlist_viii##1##2\_nil##3{\loi_exparg{\readlist_ix{##3}}{\loi_gobarg##2}{##1}{#2}}% ##3=compteur d'index
+ }%
+ \readlist_vii\relax% le \relax meuble l'argument d\'elimit\'e
+ }
+ {\long\def\readlist_vii##1#3##2\_nil{%
+ \loi_ifempty{##2}% si <liste restante> ne contient pas le <sep courant>
+ {\readlist_vi{#1}{#2}}% recommencer avec le m\^eme <sep utile>
+ {\loi_exparg\readlist_vi{\loi_gobarg##1#3}{#3}}% sinon raccourcir <liste restante> et <sep courant>:=<sep utile>% ##BUGFIX v1.53
+ }%
+ \readlist_vii\relax#1#3\_nil% ##BUGFIX v1.53
+ }%
+}
+\long\def\readlist_ix#1#2#3{% #1=compteur d'index #2=liste restante #3=\'el\'ement courant
+ \loi_ifnum{0\loi_exparg\loi_ifspacefirst{\loi_currentsep}{}1\if_removeextremespaces1\fi=11 }% s'il faur retirer les espaces extr\^emes
+ {\loi_exparg{\loi_exparg{\readlist_x{#1}{#2}}}{\loi_removeextremespaces{#3}}}% red\'efinir l'\'el\'ement courant
+ {\readlist_x{#1}{#2}{#3}}%
+}
+\long\def\readlist_x#1#2#3#4{% #1=compteur d'index #2=liste restante #3=\'el\'ement courant #4=sep utilis\'e
+ \loi_ifnum{0\if_ignoreemptyitems1\fi\loi_ifempty{#3}1{}=11 }
+ {\readlist_iv{#1}{#2}}% si l'on n'ignore pas les \'el\'ements vides :
+ {\loi_argcsname\loi_def{\loi_listname[\csname loi_previndex[\number\loi_nestcnt]\endcsname#1]}{#3}% assignation de l'item ctuel \`a la macro
+ \loi_argcsname\loi_def{\loi_listname sep[\csname loi_previndex[\number\loi_nestcnt]\endcsname#1]}{#4}% assignation du <sep> actuel \`a la macro \<macrolist>sep
+ \loi_ifnum{\loi_nestcnt<\nestdepth\relax}% si imbrication max non atteinte
+ {\advance\loi_nestcnt1
+ \loi_argcsname\edef{loi_previndex[\number\loi_nestcnt]}{\csname loi_previndex[\number\numexpr\loi_nestcnt-1]\endcsname#1,}%
+ \readlist_ii{#3}% recommencer avec l'\'el\'ement courant
+ }
+ {}%
+ \loi_exparg\readlist_iv{\number\numexpr#1+1}{#2}% puis chercher l'\'el\'ement suivant dans la liste restante
+ }%
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%% macro \listlen %%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\def\listlen#1[#2]{%
+ \romannumeral\loi_ifempty{#2}
+ {\expandafter\expandafter\expandafter\loi_stop\csname\loi_macroname#1len[0]\endcsname}
+ {\loi_exparg\listlen_i{\romannumeral-`\.\loi_macroname#1}{#2}}%
+}
+\def\listlen_i#1#2{% #1=macro name #2=index non normalis\'e prendre <profondeur max-1>
+ \loi_exparg{\expandafter\listlen_ii\romannumeral\loi_normalizeindex{#1}{#2}}{\number\numexpr\csname#1nest\endcsname-1}{#1}%
+}
+\def\listlen_ii#1#2#3{% #1=err #2=index normalis\'e #3=macroname
+ \expandafter\expandafter\expandafter\loi_stop\csname#3len[#2,0]\expandafter\endcsname
+ \romannumeral\loi_ifempty{#1}{\loi_stop}{\loi_stop\loi_error{#1}}%
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%% macro \foreachitem %%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\def\foreachitem#1\in#2{%
+ \edef\foreachitem_i{\noexpand\foreachitem_iii\noexpand#1{\expandafter\noexpand\csname\loi_macroname#1cnt\endcsname}{\loi_macroname#2}}%
+ \futurelet\loi_nxttok\foreachitem_ii
+}
+\def\foreachitem_ii{\loi_ifx{\loi_nxttok[}\foreachitem_i{\foreachitem_i[]}}
+\def\foreachitem_iii#1#2#3[#4]{% prendre <profondeur max-1>
+ \loi_exparg{\expandafter\foreachitem_iv\romannumeral\loi_normalizeindex{#3}{#4}}{\number\numexpr\csname#3nest\endcsname-1}#1{#2}{#3}%
+}
+\def\foreachitem_iv#1#2{\loi_ifempty{#2}{\foreachitem_v{#1}{}}{\foreachitem_v{#1}{#2,}}}% #1=err #2=index norm
+\long\def\foreachitem_v#1#2#3#4#5#6{% #1=err #2=index norm #3=macroiter #4=compteur associ\'e #5=nom de macrolist #6=code
+ \loi_ifnum{\csname#5len[#20]\endcsname>0 }
+ {\loi_ifempty{#1}{}{\loi_error{#1}}%
+ \loi_fornum#4=1to\csname#5len[#20]\endcsname\do{\loi_argcsname\let#3{#5[#2#4]}#6}%
+ }
+ {}%
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%% macro \showitem %%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\def\showitems{\loi_ifstar{\let\showitems_cmd\detokenize\showitems_i}{\let\showitems_cmd\loi_identity\showitems_i}}
+\def\showitems_i#1{\def\showitems_ii{\showitems_iv#1}\futurelet\loi_nxttok\showitems_iii}
+\def\showitems_iii{\loi_ifx{\loi_nxttok[}\showitems_ii{\showitems_ii[]}}
+\def\showitems_iv#1[#2]{\foreachitem\showitems_iter\in#1[#2]{\showitemsmacro{\expandafter\showitems_cmd\expandafter{\showitems_iter}}}}
+\unless\ifdefined\fbox
+ \newdimen\fboxrule \newdimen\fboxsep \fboxrule=.4pt \fboxsep=3pt % r\'eglages identiques \`a LaTeX
+ \def\fbox#1{% imitation de la macro \fbox de LaTeX, voir codes 251 \`a 254 ici --> http://progtex.fr/wp-content/uploads/2014/09/code.txt
+ \hbox{% et pages 271 \`a 274 de "Apprendre \`a programmer en TeX"
+ \vrule width\fboxrule
+ \vtop{%
+ \vbox{\hrule height\fboxrule \kern\fboxsep \hbox{\kern\fboxsep#1\kern\fboxsep}}%
+ \kern\fboxsep \hrule height\fboxrule
+ }\vrule width\fboxrule
+ }%
+ }
+\fi
+\def\showitemsmacro#1{% encadrement par d\'efaut
+ \begingroup\fboxsep=0.25pt \fboxrule=0.5pt \fbox{\strut#1}\endgroup
+ \hskip0.25em\relax
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%% macro \itemtomacro %%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\def\itemtomacro#1[#2]{% #1[#2]=item non encore lu: #3=macro
+ \edef\loi_listname{\loi_macroname#1}%
+ \loi_exparg{\expandafter\itemtomacro_i\romannumeral\expandafter\loi_normalizeindex\expandafter{\loi_listname}{#2}}{\csname\loi_listname nest\endcsname}\let
+}
+\def\gitemtomacro#1[#2]{% #1[#2]=item
+ \xdef\loi_listname{\loi_macroname#1}%
+ \loi_exparg{\expandafter\itemtomacro_i\romannumeral\expandafter\loi_normalizeindex\expandafter{\loi_listname}{#2}}{\csname\loi_listname nest\endcsname}{\global\let}%
+}
+\def\itemtomacro_i#1#2#3#4{%
+ \loi_ifempty{#1}{}{\loi_error{#1}}%
+ \loi_argcsname#3#4{\loi_listname[#2]}%
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%% r\'eglages par d\'efaut %%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\newif\if_removeextremespaces
+\newif\if_ignoreemptyitems
+\let\ignoreemptyitems\_ignoreemptyitemstrue
+\let\reademptyitems\_ignoreemptyitemsfalse
+\setsepchar{,}
+\reademptyitems
+
+\loi_restorecatcode
+\endinput
+
+######################################################################
+# Historique #
+######################################################################
+
+v1.0 19/8/2016
+ - Premi\`ere version publique
+
+v1.1 01/09/2016
+ - Stockage des s\'eparateurs dans <macrolist>sep
+ - bug corrig\'e dans \loi_restorecatcode
+
+v1.2 22/10/2016
+ - macros \greadlist et \gitemtomacro pour la globalit\'e
+
+v1.3 18/11/2016
+ - bugs corrig\'es dans la gestion de la globalit\'e
+
+v1.4 05/10/2017
+ - test \loi_ifprimitive ajout\'e au test \loi_ifcs
+ - suppression de \loi_expafternil, cr\'eation de \loi_expafter,
+ modification de \loi_argcsname
+ - correction d'un bug : \setsepchar{\par} ne provoque plus
+ d'erreur. \loi_ifnum devient \long
+
+v1.5 06/10/2017
+ - correction d'un bug dans \loi_ifcs
+
+v1.51 24/10/2017
+ - correction d'un bug dans \loi_ifcs
+
+v1.52 13/01/2018
+ - le dernier s\'eparateur est <vide>
+
+v1.53 13/03/2018
+ - correction d'un bug dans \readlist_vii \ No newline at end of file
diff --git a/src/m.tex b/src/m.tex
index fb3ac6a..5551a9b 100644
--- a/src/m.tex
+++ b/src/m.tex
@@ -14,6 +14,7 @@ attached accelerators.
\end{commentary}
\section{Multiplication Operations}
+\label{multiplication-operations}
\vspace{-0.2in}
\begin{center}
@@ -166,3 +167,22 @@ unsigned divider implementations. Signed division is often
implemented using an unsigned division circuit and specifying the same
overflow result simplifies the hardware.
\end{commentary}
+
+\section{Zmmul Extension, Version 0.1}
+
+The Zmmul extension implements the multiplication subset of the M extension.
+It adds all of the instructions defined in Section~\ref{multiplication-operations},
+namely: MUL, MULH, MULHU, MULHSU, and (for RV64 only) MULW.
+The encodings are identical to those of the corresponding M-extension instructions.
+
+\begin{commentary}
+The Zmmul extension enables low-cost implementations that require
+multiplication operations but not division.
+For many microcontroller applications, division operations are too
+infrequent to justify the cost of divider hardware.
+By contrast, multiplication operations are more frequent, making the cost of
+multiplier hardware more justifiable.
+Simple FPGA soft cores particularly benefit from eliminating division but
+retaining multiplication, since many FPGAs provide hardwired multipliers
+but require dividers be implemented in soft logic.
+\end{commentary}
diff --git a/src/machine.tex b/src/machine.tex
index a6f7fbd..6b82746 100644
--- a/src/machine.tex
+++ b/src/machine.tex
@@ -35,7 +35,7 @@ mechanism.
\instbitrange{25}{0} \\
\hline
\multicolumn{1}{|c|}{MXL[1:0] (\warl)} &
-\multicolumn{1}{c|}{\wlrl} &
+\multicolumn{1}{c|}{0 (\warl)} &
\multicolumn{1}{c|}{Extensions[25:0] (\warl)} \\
\hline
2 & MXLEN-28 & 26 \\
@@ -49,7 +49,7 @@ mechanism.
The MXL (Machine XLEN) field encodes the native base integer ISA width
as shown in Table~\ref{misabase}. The MXL field may be writable in
-implementations that support multiple base ISA widths. The effective
+implementations that support multiple base ISAs. The effective
XLEN in M-mode, {\em MXLEN}, is given by the setting of MXL, or has a
fixed value if {\tt misa} is zero. The MXL field is always set to the
widest supported ISA variant at reset.
@@ -59,7 +59,7 @@ widest supported ISA variant at reset.
\begin{tabular}{|r|r|}
\hline
MXL & XLEN \\
-\hline
+\hline
1 & 32 \\
2 & 64 \\
3 & 128 \\
@@ -114,7 +114,7 @@ The ``X'' bit will be set if there are any non-standard extensions.
\begin{tabular}{|r|r|l|}
\hline
Bit & Character & Description \\
-\hline
+\hline
0 & A & Atomic extension \\
1 & B & {\em Tentatively reserved for Bit-Manipulation extension} \\
2 & C & Compressed extension \\
@@ -363,8 +363,8 @@ of the largest hart ID used in a system.
\subsection{Machine Status Registers ({\tt mstatus} and {\tt mstatush})}
The {\tt mstatus} register is an MXLEN-bit read/write register
-formatted as shown in Figure~\ref{mstatusreg} for RV64 and
-Figure~\ref{mstatusreg-rv32} for RV32. The {\tt mstatus}
+formatted as shown in Figure~\ref{mstatusreg-rv32} for RV32 and
+Figure~\ref{mstatusreg} for RV64. The {\tt mstatus}
register keeps track of and controls the hart's current operating
state. A restricted view of {\tt mstatus} appears as the
{\tt sstatus} register in the S-level ISA.
@@ -373,44 +373,33 @@ state. A restricted view of {\tt mstatus} appears as the
{\footnotesize
\begin{center}
\setlength{\tabcolsep}{4pt}
-\scalebox{0.95}{
-\begin{tabular}{cRccccYcccccc}
+\begin{tabular}{cKccccccc}
\\
-\instbit{MXLEN-1} &
-\instbitrange{MXLEN-2}{38} &
-\instbit{37} &
-\instbit{36} &
-\instbitrange{35}{34} &
-\instbitrange{33}{32} &
-\instbitrange{31}{23} &
+\instbit{31} &
+\instbitrange{30}{23} &
\instbit{22} &
\instbit{21} &
\instbit{20} &
\instbit{19} &
\instbit{18} &
+\instbit{17} &
\\
\hline
\multicolumn{1}{|c|}{SD} &
\multicolumn{1}{c|}{\wpri} &
-\multicolumn{1}{c|}{MBE} &
-\multicolumn{1}{c|}{SBE} &
-\multicolumn{1}{c|}{SXL[1:0]} &
-\multicolumn{1}{c|}{UXL[1:0]} &
-\multicolumn{1}{c|}{\wpri} &
\multicolumn{1}{c|}{TSR} &
\multicolumn{1}{c|}{TW} &
\multicolumn{1}{c|}{TVM} &
\multicolumn{1}{c|}{MXR} &
\multicolumn{1}{c|}{SUM} &
+\multicolumn{1}{c|}{MPRV} &
\\
\hline
-1 & MXLEN-39 & 1 & 1 & 2 & 2 & 9 & 1 & 1 & 1 & 1 & 1 & \\
-\end{tabular}}
-\scalebox{0.95}{
-\begin{tabular}{ccWWcWccccccccc}
+1 & 8 & 1 & 1 & 1 & 1 & 1 & 1 & \\
+\end{tabular}
+\begin{tabular}{cWWcWccccccccc}
\\
&
-\instbit{17} &
\instbitrange{16}{15} &
\instbitrange{14}{13} &
\instbitrange{12}{11} &
@@ -426,8 +415,7 @@ state. A restricted view of {\tt mstatus} appears as the
\instbit{0} \\
\hline
&
-\multicolumn{1}{|c|}{MPRV} &
-\multicolumn{1}{c|}{XS[1:0]} &
+\multicolumn{1}{|c|}{XS[1:0]} &
\multicolumn{1}{c|}{FS[1:0]} &
\multicolumn{1}{c|}{MPP[1:0]} &
\multicolumn{1}{c|}{\wpri} &
@@ -441,46 +429,57 @@ state. A restricted view of {\tt mstatus} appears as the
\multicolumn{1}{c|}{SIE} &
\multicolumn{1}{c|}{\wpri} \\
\hline
- & 1 & 2 & 2 & 2 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
-\end{tabular}}
+ & 2 & 2 & 2 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
+\end{tabular}
\end{center}
}
\vspace{-0.1in}
-\caption{Machine-mode status register ({\tt mstatus}) for RV64.}
-\label{mstatusreg}
+\caption{Machine-mode status register ({\tt mstatus}) for RV32.}
+\label{mstatusreg-rv32}
\end{figure*}
\begin{figure*}[h!]
{\footnotesize
\begin{center}
\setlength{\tabcolsep}{4pt}
-\begin{tabular}{cKccccccc}
+\scalebox{0.95}{
+\begin{tabular}{cRccccYcccccc}
\\
-\instbit{31} &
-\instbitrange{30}{23} &
+\instbit{63} &
+\instbitrange{62}{38} &
+\instbit{37} &
+\instbit{36} &
+\instbitrange{35}{34} &
+\instbitrange{33}{32} &
+\instbitrange{31}{23} &
\instbit{22} &
\instbit{21} &
\instbit{20} &
\instbit{19} &
\instbit{18} &
-\instbit{17} &
\\
\hline
\multicolumn{1}{|c|}{SD} &
\multicolumn{1}{c|}{\wpri} &
+\multicolumn{1}{c|}{MBE} &
+\multicolumn{1}{c|}{SBE} &
+\multicolumn{1}{c|}{SXL[1:0]} &
+\multicolumn{1}{c|}{UXL[1:0]} &
+\multicolumn{1}{c|}{\wpri} &
\multicolumn{1}{c|}{TSR} &
\multicolumn{1}{c|}{TW} &
\multicolumn{1}{c|}{TVM} &
\multicolumn{1}{c|}{MXR} &
\multicolumn{1}{c|}{SUM} &
-\multicolumn{1}{c|}{MPRV} &
\\
\hline
-1 & 8 & 1 & 1 & 1 & 1 & 1 & 1 & \\
-\end{tabular}
-\begin{tabular}{cWWcWccccccccc}
+1 & 25 & 1 & 1 & 2 & 2 & 9 & 1 & 1 & 1 & 1 & 1 & \\
+\end{tabular}}
+\scalebox{0.95}{
+\begin{tabular}{ccWWcWccccccccc}
\\
&
+\instbit{17} &
\instbitrange{16}{15} &
\instbitrange{14}{13} &
\instbitrange{12}{11} &
@@ -496,7 +495,8 @@ state. A restricted view of {\tt mstatus} appears as the
\instbit{0} \\
\hline
&
-\multicolumn{1}{|c|}{XS[1:0]} &
+\multicolumn{1}{|c|}{MPRV} &
+\multicolumn{1}{c|}{XS[1:0]} &
\multicolumn{1}{c|}{FS[1:0]} &
\multicolumn{1}{c|}{MPP[1:0]} &
\multicolumn{1}{c|}{\wpri} &
@@ -510,13 +510,13 @@ state. A restricted view of {\tt mstatus} appears as the
\multicolumn{1}{c|}{SIE} &
\multicolumn{1}{c|}{\wpri} \\
\hline
- & 2 & 2 & 2 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
-\end{tabular}
+ & 1 & 2 & 2 & 2 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
+\end{tabular}}
\end{center}
}
\vspace{-0.1in}
-\caption{Machine-mode status register ({\tt mstatus}) for RV32.}
-\label{mstatusreg-rv32}
+\caption{Machine-mode status register ({\tt mstatus}) for RV64.}
+\label{mstatusreg}
\end{figure*}
For RV32 only, {\tt mstatush} is a 32-bit read/write register formatted
@@ -553,7 +553,6 @@ would be hardwired to zero.
\label{mstatushreg}
\end{figure*}
-
\subsubsection{Privilege and Global Interrupt-Enable Stack in {\tt mstatus} register}
\label{privstack}
@@ -613,9 +612,15 @@ a trap in M-mode or S-mode respectively. When
executing an {\em x}\/RET instruction, supposing {\em x}\/PP holds the
value {\em y}, {\em x}\/IE is set to {\em x}\/PIE; the privilege mode
is changed to {\em y}; {\em x}\/PIE is set to 1; and {\em x}\/PP is
-set to U (or M if user-mode is not supported).
+set to the least-privileged supported mode (U if U-mode is implemented, else M).
If {\em x}\/PP$\neq$M, {\em x}\/RET also sets MPRV=0.
+\begin{commentary}
+Setting {\em x}\/PP to the least-privileged supported mode on an {\em x}\/RET
+helps identify software bugs in the management of the two-level privilege-mode
+stack.
+\end{commentary}
+
{\em x}\/PP fields are \warl\ fields that can hold only privilege mode {\em x}
and any implemented privilege mode lower than {\em x}. If privilege mode {\em
x} is not implemented, then {\em x}\/PP must be hardwired to 0.
@@ -714,7 +719,7 @@ in Figure~\ref{sv32pte}) will fault. When SUM=1, these accesses are
permitted. SUM has no effect when page-based virtual memory is not in effect.
Note that, while SUM is ordinarily ignored when not executing in S-mode, it
{\em is} in effect when MPRV=1 and MPP=S. SUM is hardwired to 0 if S-mode is
-not supported.
+not supported or if {\tt satp}.MODE is hardwired to 0.
The MXR and SUM mechanisms only affect the interpretation of permissions
encoded in page-table entries. In particular, they have no impact on whether
@@ -902,7 +907,7 @@ Off, Initial, Clean, and Dirty.
\begin{tabular}{|r|l|l|}
\hline
Status & FS Meaning & XS Meaning\\
-\hline
+\hline
0 & Off & All off \\
1 & Initial & None dirty or clean, some on\\
2 & Clean & None dirty, some clean \\
@@ -1003,6 +1008,10 @@ errant speculation. Some platforms may choose to disallow speculatively
writing FS to close a potential side channel.
\end{commentary}
+If an instruction explicitly or implicitly writes a floating-point register or
+the {\tt fcsr} but does not alter its contents, and FS=Initial or FS=Clean, it
+is implementation-defined whether FS transitions to Dirty.
+
Table~\ref{fsxsstates} shows all the possible state transitions for
the FS or XS status bits. Note that the standard floating-point
extensions do not support user-mode unconfigure or disable/enable
@@ -1017,7 +1026,7 @@ Action & & & &\\
\hline
\hline
\multicolumn{5}{|c|}{At context save in privileged code}\\
-\hline
+\hline
Save state? & No & No & No & Yes \\
Next state & Off & Initial & Clean & Clean \\
\hline
@@ -1034,7 +1043,7 @@ Action? & Exception & Execute & Execute & Execute \\
Next state & Off & Initial & Clean & Dirty \\
\hline
\hline
-\multicolumn{5}{|c|}{Execute instruction to modify state, including configuration}\\
+\multicolumn{5}{|c|}{Execute instruction that possibly modifies state, including configuration}\\
\hline
Action? & Exception & Execute & Execute & Execute \\
Next state & Off & Dirty & Dirty & Dirty \\
@@ -1122,7 +1131,7 @@ vector mode (MODE).
\instbitrange{MXLEN-1}{2} &
\instbitrange{1}{0} \\
\hline
-\multicolumn{1}{|c|}{BASE[MXLEN-1:2] (\warl)} &
+\multicolumn{1}{|c|}{BASE[MXLEN-1:2] (\warl)} &
\multicolumn{1}{c|}{MODE (\warl)} \\
\hline
MXLEN-2 & 2 \\
@@ -1152,7 +1161,7 @@ hand, we wish to allow flexibility for larger systems.
\begin{tabular}{|r|c|l|}
\hline
Value & Name & Description \\
-\hline
+\hline
0 & Direct & All exceptions set {\tt pc} to BASE. \\
1 & Vectored & Asynchronous interrupts set {\tt pc} to BASE+4$\times$cause. \\
$\ge$2 & --- & {\em Reserved} \\
@@ -1522,7 +1531,7 @@ the value of the SEIP bit returned in the {\tt rd} destination
register is the logical-OR of the software-writable bit and the
interrupt signal from the interrupt controller, but the signal from the
interrupt controller is not used to calculate the value written to SEIP.
-Only the software-writeable SEIP bit participates in the
+Only the software-writable SEIP bit participates in the
read-modify-write sequence of a CSRRS or CSRRC instruction.
\begin{commentary}
@@ -1569,7 +1578,7 @@ Synchronous exceptions are of lower priority than all interrupts.
\begin{commentary}
The machine-level interrupt fixed-priority ordering rules were developed
with the following rationale.
-
+
Interrupts for higher privilege modes must be serviced before
interrupts for lower privilege modes to support preemption.
@@ -1605,120 +1614,6 @@ using the {\tt sie} register. Otherwise, the corresponding
bits in {\tt sip} and {\tt sie} appear to be hardwired
to zero.
-\subsection{Machine Timer Registers ({\tt mtime} and {\tt mtimecmp})}
-
-Platforms provide a real-time counter, exposed as a memory-mapped
-machine-mode read-write register, {\tt mtime}. {\tt mtime} must
-increment at constant frequency, and the platform must provide a
-mechanism for determining the timebase of {\tt mtime}. The {\tt
- mtime} register will wrap around if the count overflows.
-
-The {\tt mtime} register has a 64-bit precision on all RV32 and RV64
-systems. Platforms provide a 64-bit memory-mapped machine-mode
-timer compare register ({\tt mtimecmp}).
-A timer interrupt becomes pending whenever {\tt mtime} contains
-a value greater than or equal to {\tt mtimecmp}, treating the values
-as unsigned integers.
-The interrupt remains posted until {\tt mtimecmp} becomes greater than
-{\tt mtime} (typically as a result of writing {\tt mtimecmp}).
-The interrupt will only be taken if interrupts
-are enabled and the MTIE bit is set in the {\tt mie} register.
-
-\begin{figure}[h!]
-{\footnotesize
-\begin{center}
-\begin{tabular}{@{}J}
-\instbitrange{63}{0} \\
-\hline
-\multicolumn{1}{|c|}{\tt mtime} \\
-\hline
-64 \\
-\end{tabular}
-\end{center}
-}
-\vspace{-0.1in}
-\caption{Machine time register (memory-mapped control register).}
-\end{figure}
-
-\begin{figure}[h!]
-{\footnotesize
-\begin{center}
-\begin{tabular}{@{}J}
-\instbitrange{63}{0} \\
-\hline
-\multicolumn{1}{|c|}{\tt mtimecmp} \\
-\hline
-64 \\
-\end{tabular}
-\end{center}
-}
-\vspace{-0.1in}
-\caption{Machine time compare register (memory-mapped control register).}
-\end{figure}
-
-\begin{commentary}
-The timer facility is defined to use wall-clock time rather than a
-cycle counter to support modern processors that run with a highly
-variable clock frequency to save energy through dynamic voltage and
-frequency scaling.
-
-Accurate real-time clocks (RTCs) are relatively expensive to provide
-(requiring a crystal or MEMS oscillator) and have to run even when the
-rest of system is powered down, and so there is usually only one in a
-system located in a different frequency/voltage domain from the
-processors. Hence, the RTC must be shared by all the harts in a
-system and accesses to the RTC will potentially incur the penalty of a
-voltage-level-shifter and clock-domain crossing. It is thus more
-natural to expose {\tt mtime} as a memory-mapped register than as a CSR.
-
-Lower privilege levels do not have their own {\tt timecmp} registers.
-Instead, machine-mode software can implement any number of virtual timers on
-a hart by multiplexing the next timer interrupt into the {\tt mtimecmp}
-register.
-
-Simple fixed-frequency systems can use a single clock for both cycle
-counting and wall-clock time.
-\end{commentary}
-
-Writes to {\tt mtime} and {\tt mtimecmp} are guaranteed to be reflected in
-MTIP eventually, but not necessarily immediately.
-
-\begin{commentary}
-A spurious timer interrupt might occur if an interrupt handler increments {\tt
-mtimecmp} then immediately returns, because MTIP might not yet have fallen in
-the interim. All software should be written to assume this event is possible,
-but most software should assume this event is extremely unlikely. It is
-almost always more performant to incur an occasional spurious timer interrupt
-than to poll MTIP until it falls.
-\end{commentary}
-
-In RV32, memory-mapped writes to {\tt mtimecmp} modify only one 32-bit
-part of the register. The following code sequence sets a 64-bit {\tt
- mtimecmp} value without spuriously generating a timer interrupt due
-to the intermediate value of the comparand:
-
-\begin{figure}[h!]
-\begin{center}
-\begin{verbatim}
- # New comparand is in a1:a0.
- li t0, -1
- la t1, mtimecmp
- sw t0, 0(t1) # No smaller than old value.
- sw a1, 4(t1) # No smaller than new value.
- sw a0, 0(t1) # New value.
-\end{verbatim}
-\end{center}
-\caption{Sample code for setting the 64-bit time comparand in RV32, assuming
- a little-endian memory system and that the registers live in a strongly
- ordered I/O region. Storing -1 to the low-order bits of {\tt mtimecmp}
- prevents {\tt mtimecmp} from temporarily becoming smaller than the lesser
- of the old and new values.}
-\label{mtimecmph}
-\end{figure}
-
-For RV64, naturally aligned 64-bit memory accesses to the {\tt mtime} and {\tt
-mtimecmp} registers are atomic.
-
\subsection{Hardware Performance Monitor}
M-mode includes a basic hardware performance-monitoring facility. The
@@ -2092,7 +1987,7 @@ codes.
\hline
Interrupt & Exception Code & Description \\
- \hline
+ \hline
1 & 0 & {\em Reserved} \\
1 & 1 & Supervisor software interrupt \\
1 & 2 & {\em Reserved} \\
@@ -2109,7 +2004,7 @@ codes.
1 & $\ge$16 & {\em Designated for platform use} \\ \hline
0 & 0 & Instruction address misaligned \\
0 & 1 & Instruction access fault \\
- 0 & 2 & Illegal instruction \\
+ 0 & 2 & Illegal instruction \\
0 & 3 & Breakpoint \\
0 & 4 & Load address misaligned \\
0 & 5 & Load access fault \\
@@ -2192,7 +2087,7 @@ The priority of any custom synchronous exceptions is implementation-defined.
\label{exception-priority}
\end{table*}
-Note that load/store/AMO address-misaligned and page-fault exceptions may have
+Note that load/store/AMO address-misaligned exceptions may have
either higher or lower priority than load/store/AMO page-fault and
access-fault exceptions.
\begin{commentary}
@@ -2330,6 +2225,122 @@ If the feature to return the faulting instruction bits is implemented, {\tt
mtval} must also be able to hold all values less than $2^N$, where $N$ is the
smaller of XLEN and ILEN.
+\section{Machine-Level Memory-Mapped Registers}
+
+\subsection{Machine Timer Registers ({\tt mtime} and {\tt mtimecmp})}
+
+Platforms provide a real-time counter, exposed as a memory-mapped
+machine-mode read-write register, {\tt mtime}. {\tt mtime} must
+increment at constant frequency, and the platform must provide a
+mechanism for determining the timebase of {\tt mtime}. The {\tt
+ mtime} register will wrap around if the count overflows.
+
+The {\tt mtime} register has a 64-bit precision on all RV32 and RV64
+systems. Platforms provide a 64-bit memory-mapped machine-mode
+timer compare register ({\tt mtimecmp}).
+A machine timer interrupt becomes pending whenever {\tt mtime} contains
+a value greater than or equal to {\tt mtimecmp}, treating the values
+as unsigned integers.
+The interrupt remains posted until {\tt mtimecmp} becomes greater than
+{\tt mtime} (typically as a result of writing {\tt mtimecmp}).
+The interrupt will only be taken if interrupts
+are enabled and the MTIE bit is set in the {\tt mie} register.
+
+\begin{figure}[h!]
+ {\footnotesize
+ \begin{center}
+ \begin{tabular}{@{}J}
+ \instbitrange{63}{0} \\
+ \hline
+ \multicolumn{1}{|c|}{\tt mtime} \\
+ \hline
+ 64 \\
+ \end{tabular}
+ \end{center}
+ }
+ \vspace{-0.1in}
+ \caption{Machine time register (memory-mapped control register).}
+\end{figure}
+
+\begin{figure}[h!]
+ {\footnotesize
+ \begin{center}
+ \begin{tabular}{@{}J}
+ \instbitrange{63}{0} \\
+ \hline
+ \multicolumn{1}{|c|}{\tt mtimecmp} \\
+ \hline
+ 64 \\
+ \end{tabular}
+ \end{center}
+ }
+ \vspace{-0.1in}
+ \caption{Machine time compare register (memory-mapped control register).}
+\end{figure}
+
+\begin{commentary}
+ The timer facility is defined to use wall-clock time rather than a
+ cycle counter to support modern processors that run with a highly
+ variable clock frequency to save energy through dynamic voltage and
+ frequency scaling.
+
+ Accurate real-time clocks (RTCs) are relatively expensive to provide
+ (requiring a crystal or MEMS oscillator) and have to run even when the
+ rest of system is powered down, and so there is usually only one in a
+ system located in a different frequency/voltage domain from the
+ processors. Hence, the RTC must be shared by all the harts in a
+ system and accesses to the RTC will potentially incur the penalty of a
+ voltage-level-shifter and clock-domain crossing. It is thus more
+ natural to expose {\tt mtime} as a memory-mapped register than as a CSR.
+
+ Lower privilege levels do not have their own {\tt timecmp} registers.
+ Instead, machine-mode software can implement any number of virtual timers on
+ a hart by multiplexing the next timer interrupt into the {\tt mtimecmp}
+ register.
+
+ Simple fixed-frequency systems can use a single clock for both cycle
+ counting and wall-clock time.
+\end{commentary}
+
+Writes to {\tt mtime} and {\tt mtimecmp} are guaranteed to be reflected in
+MTIP eventually, but not necessarily immediately.
+
+\begin{commentary}
+ A spurious timer interrupt might occur if an interrupt handler increments {\tt
+ mtimecmp} then immediately returns, because MTIP might not yet have fallen in
+ the interim. All software should be written to assume this event is possible,
+ but most software should assume this event is extremely unlikely. It is
+ almost always more performant to incur an occasional spurious timer interrupt
+ than to poll MTIP until it falls.
+\end{commentary}
+
+In RV32, memory-mapped writes to {\tt mtimecmp} modify only one 32-bit
+part of the register. The following code sequence sets a 64-bit {\tt
+ mtimecmp} value without spuriously generating a timer interrupt due
+to the intermediate value of the comparand:
+
+\begin{figure}[h!]
+ \begin{center}
+ \begin{verbatim}
+ # New comparand is in a1:a0.
+ li t0, -1
+ la t1, mtimecmp
+ sw t0, 0(t1) # No smaller than old value.
+ sw a1, 4(t1) # No smaller than new value.
+ sw a0, 0(t1) # New value.
+ \end{verbatim}
+ \end{center}
+ \caption{Sample code for setting the 64-bit time comparand in RV32, assuming
+ a little-endian memory system and that the registers live in a strongly
+ ordered I/O region. Storing -1 to the low-order bits of {\tt mtimecmp}
+ prevents {\tt mtimecmp} from temporarily becoming smaller than the lesser
+ of the old and new values.}
+ \label{mtimecmph}
+\end{figure}
+
+For RV64, naturally aligned 64-bit memory accesses to the {\tt mtime} and {\tt
+ mtimecmp} registers are atomic.
+
\section{Machine-Mode Privileged Instructions}
\subsection{Environment Call and Breakpoint}
@@ -2579,8 +2590,8 @@ Non-maskable interrupts (NMIs) are only used for hardware error
conditions, and cause an immediate jump to an implementation-defined
NMI vector running in M-mode regardless of the state of a hart's
interrupt enable bits. The {\tt mepc} register is written with the
-address of the next instruction to be executed at the time the NMI was
-taken, and {\tt mcause} is set to a value indicating the source of the
+virtual address of the instruction that was interrupted,
+and {\tt mcause} is set to a value indicating the source of the
NMI. The NMI can thus overwrite state in an active machine-mode
interrupt handler.
@@ -3460,5 +3471,5 @@ synchronize the PMP settings with the virtual memory system. This is
accomplished by executing an SFENCE.VMA instruction with {\em rs1}={\tt x0}
and {\em rs2}={\tt x0}, after the PMP CSRs are written.
-If page-based virtual memory is not implemented,
+If page-based virtual memory is not implemented,
memory accesses check the PMP settings synchronously, so no fence is needed.
diff --git a/src/memory.tex b/src/memory.tex
index 08fca50..f35d247 100644
--- a/src/memory.tex
+++ b/src/memory.tex
@@ -619,7 +619,7 @@ Figure~\ref{fig:litmus:address}, even though {\tt a1} XOR {\tt a1} is zero and
hence has no effect on the address accessed by the second load.
The benefit of using dependencies as a lightweight synchronization mechanism is that the ordering enforcement requirement is limited only to the specific two instructions in question.
-Other non-dependent instructions may be freely-reordered by aggressive implementations.
+Other non-dependent instructions may be freely reordered by aggressive implementations.
One alternative would be to use a load-acquire, but this would enforce ordering for the first load with respect to {\em all} subsequent instructions.
Another would be to use a FENCE~R,R, but this would include all previous and all subsequent loads, making this option more expensive.
@@ -826,7 +826,7 @@ memory access $a$ precedes memory access $b$ in global memory order if $a$ prece
\begin{enumerate}
\item $a$ precedes $b$ in preserved program order as defined in Chapter~\ref{ch:memorymodel}, with the exception that acquire and release ordering annotations apply only from one memory operation to another memory operation and from one I/O operation to another I/O operation, but not from a memory operation to an I/O nor vice versa
\item $a$ and $b$ are accesses to overlapping addresses in an I/O region
- \item $a$ and $b$ are accesses to the same strongly-ordered I/O region
+ \item $a$ and $b$ are accesses to the same strongly ordered I/O region
\item $a$ and $b$ are accesses to I/O regions, and the channel associated with the I/O region accessed by either $a$ or $b$ is channel 1
\item $a$ and $b$ are accesses to I/O regions associated with the same channel (except for channel 0)
\end{enumerate}
@@ -859,7 +859,7 @@ Ordering fences simply ensure that memory operations stay in order, while comple
RISC-V does not explicitly distinguish between ordering and completion fences.
Instead, this distinction is simply inferred from different uses of the FENCE bits.
-For implementations that conform to the RISC-V Unix Platform Specification, I/O devices and DMA operations are required to access memory coherently and via strongly-ordered I/O channels.
+For implementations that conform to the RISC-V Unix Platform Specification, I/O devices and DMA operations are required to access memory coherently and via strongly ordered I/O channels.
Therefore, accesses to regular main memory regions that are concurrently accessed by external devices can also use the standard synchronization mechanisms.
Implementations that do not conform to the Unix Platform Specification and/or in which devices do not access memory coherently will need to use mechanisms (which are currently platform-specific or device-specific) to enforce coherency.
@@ -895,7 +895,7 @@ The ordering guarantees in this section may not apply beyond a platform-specific
Table~\ref{tab:tsomappings} provides a mapping from TSO memory operations onto RISC-V memory instructions.
Normal x86 loads and stores are all inherently acquire-RCpc and release-RCpc operations: TSO enforces all load-load, load-store, and store-store ordering by default.
Therefore, under RVWMO, all TSO loads must be mapped onto a load followed by FENCE~R,RW, and all TSO stores must be mapped onto FENCE~RW,W followed by a store.
-TSO atomic read-modify-writes and x86 instructions using the LOCK prefix are fully-ordered and can be implemented either via an AMO with both {\em aq} and {\em rl} set, or via an LR with {\em aq} set, the arithmetic operation in question, an SC with both {\em aq} and {\em rl} set, and a conditional branch checking the success condition.
+TSO atomic read-modify-writes and x86 instructions using the LOCK prefix are fully ordered and can be implemented either via an AMO with both {\em aq} and {\em rl} set, or via an LR with {\em aq} set, the arithmetic operation in question, an SC with both {\em aq} and {\em rl} set, and a conditional branch checking the success condition.
In the latter case, the {\em rl} annotation on the LR turns out (for non-obvious reasons) to be redundant and can be omitted.
Alternatives to Table~\ref{tab:tsomappings} are also possible.
@@ -1044,7 +1044,7 @@ There are a few ways around this problem, including:
\begin{enumerate}
\item Always use FENCE~RW,W/FENCE~R,RW, and never use {\em aq}/{\em rl}. This suffices but is undesirable, as it defeats the purpose of the {\em aq}/{\em rl} modifiers.
\item Always use {\em aq}/{\em rl}, and never use FENCE~RW,W/FENCE~R,RW. This does not currently work due to the lack of load and store opcodes with {\em aq} and {\em rl} modifiers.
- \item Strengthen the mappings of release operations such that they would enforce sufficient orderings in the presence of either type of acquire mapping. This is the currently-recommended solution, and the one shown in Table~\ref{tab:linuxmappings}.
+ \item Strengthen the mappings of release operations such that they would enforce sufficient orderings in the presence of either type of acquire mapping. This is the currently recommended solution, and the one shown in Table~\ref{tab:linuxmappings}.
\end{enumerate}
\begin{figure}[h!]
@@ -1228,7 +1228,7 @@ Note however that the two mappings only interoperate correctly if {\tt atomic\_<
Any AMO can be emulated by an LR/SC pair, but care must be taken to ensure that any PPO orderings that originate from the LR are also made to originate from the SC, and that any PPO orderings that terminate at the SC are also made to terminate at the LR.
For example, the LR must also be made to respect any data dependencies that the AMO has, given that load operations do not otherwise have any notion of a data dependency.
Likewise, the effect a FENCE~R,R elsewhere in the same hart must also be made to apply to the SC, which would not otherwise respect that fence.
-The emulator may achieve this effect by simply mapping AMOs onto {\tt lr.aq;~<op>;~sc.aqrl}, matching the mapping used elsewhere for fully-ordered atomics.
+The emulator may achieve this effect by simply mapping AMOs onto {\tt lr.aq;~<op>;~sc.aqrl}, matching the mapping used elsewhere for fully ordered atomics.
\section{Implementation Guidelines}
@@ -1272,7 +1272,7 @@ Architectures are free to implement any of the memory model rules as conservativ
\item forbid any forwarding of a value from a store in the store buffer to a subsequent AMO or LR to the same address
\item forbid any forwarding of a value from an AMO or SC in the store buffer to a subsequent load to the same address
\item implement TSO on all memory accesses, and ignore any main memory fences that do not include PW and SR ordering (e.g., as Ztso implementations will do)
- \item implement all atomics to be RCsc or even fully-ordered, regardless of annotation
+ \item implement all atomics to be RCsc or even fully ordered, regardless of annotation
\end{itemize}
Architectures that implement RVTSO can safely do the following:
diff --git a/src/naming.tex b/src/naming.tex
index 2006d80..6aa7d6c 100644
--- a/src/naming.tex
+++ b/src/naming.tex
@@ -83,7 +83,7 @@ Chapter~\ref{chap:zifencei}; ``Zifencei2'' and ``Zifencei2p0'' name version
2.0 of same.
The first letter following the ``Z'' conventionally indicates the most closely
-related alphabetical extension category, IMAFDQLCBJTPVN. For the ``Zam''
+related alphabetical extension category, IMAFDQLCBKJTPVN. For the ``Zam''
extension for misaligned atomics, for example, the letter ``a'' indicates the
extension is related to the ``A'' standard extension. If multiple ``Z''
extensions are named, they should be ordered first by category, then
@@ -167,6 +167,7 @@ Quad-Precision Floating-Point & Q & D\\
Decimal Floating-Point & L & \\
16-bit Compressed Instructions & C & \\
Bit Manipulation & B & \\
+Cryptography Extensions & K & \\
Dynamic Languages & J & \\
Transactional Memory & T & \\
Packed-SIMD Extensions & P & \\
diff --git a/src/preface.tex b/src/preface.tex
index 8e5442e..1e420c6 100644
--- a/src/preface.tex
+++ b/src/preface.tex
@@ -110,6 +110,7 @@ The changes in this version of the document include:
December 2019.
\item Defined big-endian ISA variant.
\item Moved N extension for user-mode interrupts into Volume II.
+\item Defined PAUSE hint instruction.
\end{itemize}
\section*{Preface to Document Version 20190608-Base-Ratified}
diff --git a/src/priv-preface.tex b/src/priv-preface.tex
index 14593a8..0ca835c 100644
--- a/src/priv-preface.tex
+++ b/src/priv-preface.tex
@@ -55,6 +55,7 @@ Additionally, the following compatible changes have been made since version
\begin{itemize}
\parskip 0pt
\itemsep 1pt
+\item Moved N extension into its own chapter.
\item Defined the RV32-only CSR {\tt mstatush}, which contains most of the
same fields as the upper 32 bits of RV64's {\tt mstatus}.
\item Permitted the unconditional delegation of less-privileged interrupts.
@@ -66,12 +67,13 @@ Additionally, the following compatible changes have been made since version
\item An additional 48 optional PMP registers have been defined.
\item Added the Svnapot Standard Extension draft, along with the N bit in
Sv39, Sv48, and Sv57 PTEs
-\item Added the C bit to Sv39 and Sv48 PTEs to indicate custom encodings.
\item Described the behavior of address-translation caches a little more
explicitly.
\item Slightly relaxed the atomicity requirement for A and D bit updates
performed by the implementation.
\item Added Sv57 and Sv57x4 address translation modes.
+\item Software breakpoint exceptions are permitted to write either 0
+ or the PC to {\em x}\/{\tt tval}.
\end{itemize}
Finally, the hypervisor architecture proposal has been extensively revised.
@@ -119,7 +121,7 @@ Changes from version 1.10 include:
\item SFENCE.VMA semantics have been clarified.
\item Made the {\tt mstatus}.MPP field \warl, rather than \wlrl.
\item Made the unused {\em x}{\tt ip} fields \wpri, rather than \wiri.
-\item Made the unused {\tt misa} fields \wlrl, rather than \wiri.
+\item Made the unused {\tt misa} fields \warl, rather than \wiri.
\item Made the unused {\tt pmpaddr} and {\tt pmpcfg} fields \warl, rather than \wiri.
\item Required all harts in a system to employ the same PTE-update scheme as each other.
\item Rectified an editing error that misdescribed the mechanism by which
diff --git a/src/riscv-spec.tex b/src/riscv-spec.tex
index 7a4200e..8ef9653 100644
--- a/src/riscv-spec.tex
+++ b/src/riscv-spec.tex
@@ -79,6 +79,7 @@ Andrew Waterman and Krste Asanovi\'{c}, RISC-V Foundation, \specmonthyear.
\input{intro}
\input{rv32}
\input{zifencei}
+\input{zihintpause}
\input{rv32e}
\input{rv64}
\input{rv128}
diff --git a/src/rv32.tex b/src/rv32.tex
index 1a1ad06..afc6730 100644
--- a/src/rv32.tex
+++ b/src/rv32.tex
@@ -1,8 +1,7 @@
\chapter{RV32I Base Integer Instruction Set, Version 2.1}
\label{rv32}
-This chapter describes version 2.0 of the RV32I base integer
-instruction set.
+This chapter describes the RV32I base integer instruction set.
\begin{commentary}
RV32I was designed to be sufficient to form a compiler target and to
@@ -131,7 +130,7 @@ high-performance code, where there can be extensive use of loop
unrolling, software pipelining, and cache tiling.
For these reasons, we chose a conventional size of 32 integer
-registers for the base ISA. Dynamic register usage tends to be
+registers for RV32I. Dynamic register usage tends to be
dominated by a few frequently accessed registers, and regfile
implementations can be optimized to reduce access energy for the
frequently accessed registers~\cite{jtseng:sbbci}. The optional
@@ -1129,7 +1128,7 @@ packed-SIMD extension or handling externally packed data structures.
Our rationale for allowing EEIs to choose to support misaligned
accesses via the regular load and store instructions is to simplify
the addition of misaligned hardware support. One option would have
-been to disallow misaligned accesses in the base ISA and then provide
+been to disallow misaligned accesses in the base ISAs and then provide
some separate ISA support for misaligned accesses, either special
instructions to help software handle misaligned accesses or a new
hardware addressing mode for misaligned accesses. Special
@@ -1352,7 +1351,7 @@ supervisor-level operating system or debugger.
Another use of EBREAK is to support ``semihosting'', where the
execution environment includes a debugger that can provide services
over an alternate system call interface built around the EBREAK
- instruction. Because the RISC-V base ISA does not provide more than
+ instruction. Because the RISC-V base ISAs do not provide more than
one EBREAK instruction, RISC-V semihosting uses a special sequence of
instructions to distinguish a semihosting EBREAK from a debugger
inserted EBREAK.
@@ -1366,7 +1365,7 @@ supervisor-level operating system or debugger.
described in Chapter~\ref{compressed}.
The shift NOP instructions are still considered available for use as
- HINTS.
+ HINTs.
Semihosting is a form of service call and would be more naturally
encoded as an ECALL using an existing ABI, but this would require
@@ -1384,29 +1383,44 @@ supervisor-level operating system or debugger.
RV32I reserves a large encoding space for HINT instructions, which are
usually used to communicate performance hints to the
-microarchitecture. HINTs are encoded as integer computational
-instructions with {\em rd}={\tt x0}. Hence, like the NOP instruction,
-HINTs do not change any architecturally visible state, except for
-advancing the {\tt pc} and any applicable performance counters.
+microarchitecture.
+Like the NOP instruction, HINTs do not change any architecturally visible
+state, except for advancing the {\tt pc} and any applicable performance
+counters.
Implementations are always allowed to ignore the encoded hints.
+Most RV32I HINTs are encoded as integer computational instructions with
+{\em rd}={\tt x0}.
+The other RV32I HINTs are encoded as FENCE instructions with a null
+predecessor or successor set and with {\em fm}=0.
+
\begin{commentary}
-This HINT encoding has been chosen so that simple implementations can ignore
-HINTs altogether, and instead execute a HINT as a regular computational
+These HINT encodings have been chosen so that simple implementations can ignore
+HINTs altogether, and instead execute a HINT as a regular
instruction that happens not to mutate the architectural state. For example, ADD is
a HINT if the destination register is {\tt x0}; the five-bit {\em rs1} and {\em
rs2} fields encode arguments to the HINT. However, a simple implementation can
simply execute the HINT as an ADD of {\em rs1} and {\em rs2} that writes {\tt
x0}, which has no architecturally visible effect.
+
+As another example, a FENCE instruction with a zero {\em pred} field and
+a zero {\em fm} field is a HINT; the {\em succ}, {\em rs1}, and {\em rd}
+fields encode the arguments to the HINT.
+A simple implementation can simply execute the HINT as a FENCE that orders the
+null set of prior memory accesses before whichever subsequent memory accesses
+are encoded in the {\em succ} field.
+Since the intersection of the predecessor and successor sets is null, the
+instruction imposes no memory orderings, and so it has no architecturally
+visible effect.
\end{commentary}
Table~\ref{tab:rv32i-hints} lists all RV32I HINT code points. 91\% of the HINT
-space is reserved for standard HINTs, but none are presently defined. The
-remainder of the HINT space is designated for custom HINTs; no standard HINTs
+space is reserved for standard HINTs. The
+remainder of the HINT space is designated for custom HINTs: no standard HINTs
will ever be defined in this subspace.
\begin{commentary}
-No standard hints are presently defined. We anticipate
+We anticipate
standard hints to eventually include memory-system spatial and
temporal locality hints, branch prediction hints, thread-scheduling
hints, security tags, and instrumentation flags for
@@ -1418,7 +1432,7 @@ simulation/emulation.
\begin{tabular}{|l|l|c|l|}
\hline
Instruction & Constraints & Code Points & Purpose \\ \hline \hline
- LUI & {\em rd}={\tt x0} & $2^{20}$ & \multirow{15}{*}{\em Reserved for future standard use} \\ \cline{1-3}
+ LUI & {\em rd}={\tt x0} & $2^{20}$ & \multirow{25}{*}{\em Reserved for future standard use} \\ \cline{1-3}
AUIPC & {\em rd}={\tt x0} & $2^{20}$ & \\ \cline{1-3}
\multirow{2}{*}{ADDI} & {\em rd}={\tt x0}, and either & \multirow{2}{*}{$2^{17}-1$} & \\
& {\em rs1}$\neq${\tt x0} or {\em imm}$\neq$0 & & \\ \cline{1-3}
@@ -1433,8 +1447,18 @@ simulation/emulation.
SLL & {\em rd}={\tt x0} & $2^{10}$ & \\ \cline{1-3}
SRL & {\em rd}={\tt x0} & $2^{10}$ & \\ \cline{1-3}
SRA & {\em rd}={\tt x0} & $2^{10}$ & \\ \cline{1-3}
- \multirow{2}{*}{FENCE}& {\em fm=0}, and either & \multirow{2}{*}{$2^{5}-1$} & \\
- & {\em pred}=0 or {\em succ}=0 & & \\ \hline \hline
+ \multirow{3}{*}{FENCE}& {\em rd}={\tt x0}, {\em rs1}$\neq${\tt x0}, & \multirow{3}{*}{$2^{10}-63$}& \\
+ & {\em fm}=0, and either & & \\
+ & {\em pred}=0 or {\em succ}=0 & & \\ \cline{1-3}
+ \multirow{3}{*}{FENCE}& {\em rd}$\neq${\tt x0}, {\em rs1}={\tt x0}, & \multirow{3}{*}{$2^{10}-63$}& \\
+ & {\em fm}=0, and either & & \\
+ & {\em pred}=0 or {\em succ}=0 & & \\ \cline{1-3}
+ \multirow{2}{*}{FENCE}& {\em rd}={\em rs1}={\tt x0}, {\em fm}=0, & \multirow{2}{*}{15} & \\
+ & {\em pred}=0, {\em succ}$\neq$0 & & \\ \cline{1-3}
+ \multirow{2}{*}{FENCE}& {\em rd}={\em rs1}={\tt x0}, {\em fm}=0, & \multirow{2}{*}{15} & \\
+ & {\em pred}$\neq$W, {\em succ}=0 & & \\ \hline
+ \multirow{2}{*}{FENCE}& {\em rd}={\em rs1}={\tt x0}, {\em fm}=0, & \multirow{2}{*}{1} & \multirow{2}{*}{PAUSE} \\
+ & {\em pred}=W, {\em succ}=0 & & \\ \hline \hline
SLTI & {\em rd}={\tt x0} & $2^{17}$ & \multirow{7}{*}{\em Designated for custom use} \\ \cline{1-3}
SLTIU & {\em rd}={\tt x0} & $2^{17}$ & \\ \cline{1-3}
SLLI & {\em rd}={\tt x0} & $2^{10}$ & \\ \cline{1-3}
diff --git a/src/rv64.tex b/src/rv64.tex
index 79b1a30..253f2d3 100644
--- a/src/rv64.tex
+++ b/src/rv64.tex
@@ -274,7 +274,7 @@ will ever be defined in this subspace.
\begin{tabular}{|l|l|c|l|}
\hline
Instruction & Constraints & Code Points & Purpose \\ \hline \hline
- LUI & {\em rd}={\tt x0} & $2^{20}$ & \multirow{21}{*}{\em Reserved for future standard use} \\ \cline{1-3}
+ LUI & {\em rd}={\tt x0} & $2^{20}$ & \multirow{32}{*}{\em Reserved for future standard use} \\ \cline{1-3}
AUIPC & {\em rd}={\tt x0} & $2^{20}$ & \\ \cline{1-3}
\multirow{2}{*}{ADDI} & {\em rd}={\tt x0}, and either & \multirow{2}{*}{$2^{17}-1$} & \\
& {\em rs1}$\neq${\tt x0} or {\em imm}$\neq$0 & & \\ \cline{1-3}
@@ -295,7 +295,18 @@ will ever be defined in this subspace.
SLLW & {\em rd}={\tt x0} & $2^{10}$ & \\ \cline{1-3}
SRLW & {\em rd}={\tt x0} & $2^{10}$ & \\ \cline{1-3}
SRAW & {\em rd}={\tt x0} & $2^{10}$ & \\ \cline{1-3}
- FENCE & {\em pred}=0 or {\em succ}=0 & $2^{5}-1$ & \\ \hline \hline
+ \multirow{3}{*}{FENCE}& {\em rd}={\tt x0}, {\em rs1}$\neq${\tt x0}, & \multirow{3}{*}{$2^{10}-63$}& \\
+ & {\em fm}=0, and either & & \\
+ & {\em pred}=0 or {\em succ}=0 & & \\ \cline{1-3}
+ \multirow{3}{*}{FENCE}& {\em rd}$\neq${\tt x0}, {\em rs1}={\tt x0}, & \multirow{3}{*}{$2^{10}-63$}& \\
+ & {\em fm}=0, and either & & \\
+ & {\em pred}=0 or {\em succ}=0 & & \\ \cline{1-3}
+ \multirow{2}{*}{FENCE}& {\em rd}={\em rs1}={\tt x0}, {\em fm}=0, & \multirow{2}{*}{15} & \\
+ & {\em pred}=0, {\em succ}$\neq$0 & & \\ \cline{1-3}
+ \multirow{2}{*}{FENCE}& {\em rd}={\em rs1}={\tt x0}, {\em fm}=0, & \multirow{2}{*}{15} & \\
+ & {\em pred}$\neq$W, {\em succ}=0 & & \\ \hline
+ \multirow{2}{*}{FENCE}& {\em rd}={\em rs1}={\tt x0}, {\em fm}=0, & \multirow{2}{*}{1} & \multirow{2}{*}{PAUSE} \\
+ & {\em pred}=W, {\em succ}=0 & & \\ \hline \hline
SLTI & {\em rd}={\tt x0} & $2^{17}$ & \multirow{10}{*}{\em Designated for custom use} \\ \cline{1-3}
SLTIU & {\em rd}={\tt x0} & $2^{17}$ & \\ \cline{1-3}
SLLI & {\em rd}={\tt x0} & $2^{11}$ & \\ \cline{1-3}
diff --git a/src/rvc-instr-table.tex b/src/rvc-instr-table.tex
index ab365df..68dab32 100644
--- a/src/rvc-instr-table.tex
+++ b/src/rvc-instr-table.tex
@@ -225,7 +225,7 @@
\multicolumn{2}{c|}{00} &
\multicolumn{3}{c|}{\rsoneprime/\rdprime} &
\multicolumn{5}{c|}{nzuimm[4:0]} &
-\multicolumn{2}{c|}{01} & C.SRLI {\em \tiny (RV32 NSE, nzuimm[5]=1)} \\
+\multicolumn{2}{c|}{01} & C.SRLI {\em \tiny (RV32 Custom, nzuimm[5]=1)} \\
\cline{2-17}
&
@@ -243,7 +243,7 @@
\multicolumn{2}{c|}{01} &
\multicolumn{3}{c|}{\rsoneprime/\rdprime} &
\multicolumn{5}{c|}{nzuimm[4:0]} &
-\multicolumn{2}{c|}{01} & C.SRAI {\em \tiny (RV32 NSE, nzuimm[5]=1)} \\
+\multicolumn{2}{c|}{01} & C.SRAI {\em \tiny (RV32 Custom, nzuimm[5]=1)} \\
\cline{2-17}
&
@@ -403,7 +403,7 @@
\multicolumn{1}{c|}{nzuimm[5]} &
\multicolumn{5}{c|}{rs1/rd$\neq$0} &
\multicolumn{5}{c|}{nzuimm[4:0]} &
-\multicolumn{2}{c|}{10} & C.SLLI {\em \tiny (HINT, rd=0; RV32 NSE, nzuimm[5]=1)} \\
+\multicolumn{2}{c|}{10} & C.SLLI {\em \tiny (HINT, rd=0; RV32 Custom, nzuimm[5]=1)} \\
\cline{2-17}
&
diff --git a/src/rvwmo.tex b/src/rvwmo.tex
index edf3f28..228e582 100644
--- a/src/rvwmo.tex
+++ b/src/rvwmo.tex
@@ -35,7 +35,7 @@ The {\em program order} over memory operations reflects the order in which the i
Memory-accessing instructions give rise to {\em memory operations}.
A memory operation can be either a {\em load operation}, a {\em store operation}, or both simultaneously.
-All memory operations are single-copy atomic: they can never be observed in a partially-complete state.
+All memory operations are single-copy atomic: they can never be observed in a partially complete state.
Among instructions in RV32GC and RV64GC, each aligned memory instruction gives rise to exactly one memory operation, with two exceptions.
First, an unsuccessful SC instruction does not give rise to any memory operations.
@@ -70,7 +70,7 @@ An ``RCpc annotation'' refers to an acquire-RCpc annotation or a release-RCpc an
An ``RCsc annotation'' refers to an acquire-RCsc annotation or a release-RCsc annotation.
\begin{commentary}
- In the memory model literature, the term ``RCpc'' stands for release consistency with processor-consistent synchronization operations, and the term ``RCsc'' stands for release consistency with sequentially-consistent synchronization operations~\cite{Gharachorloo90memoryconsistency}.
+ In the memory model literature, the term ``RCpc'' stands for release consistency with processor-consistent synchronization operations, and the term ``RCsc'' stands for release consistency with sequentially consistent synchronization operations~\cite{Gharachorloo90memoryconsistency}.
While there are many different definitions for acquire and release annotations in the literature, in the context of RVWMO these terms are concisely and completely defined by Preserved Program Order rules \ref{ppo:acquire}--\ref{ppo:rcsc}.
diff --git a/src/supervisor.tex b/src/supervisor.tex
index 5279359..d30e619 100644
--- a/src/supervisor.tex
+++ b/src/supervisor.tex
@@ -97,8 +97,8 @@ register keeps track of the processor's current operating state.
\setlength{\tabcolsep}{4pt}
\begin{tabular}{cMFScccc}
\\
-\instbit{SXLEN-1} &
-\instbitrange{SXLEN-2}{34} &
+\instbit{63} &
+\instbitrange{62}{34} &
\instbitrange{33}{32} &
\instbitrange{31}{20} &
\instbit{19} &
@@ -115,7 +115,7 @@ register keeps track of the processor's current operating state.
\multicolumn{1}{c|}{\wpri} &
\\
\hline
-1 & SXLEN-35 & 2 & 12 & 1 & 1 & 1 & \\
+1 & 29 & 2 & 12 & 1 & 1 & 1 & \\
\end{tabular}
\begin{tabular}{cWWFccccWcc}
\\
@@ -216,6 +216,8 @@ SUM has no effect when page-based virtual memory is not in effect, nor when
executing in U-mode. Note that S-mode can never execute instructions from user
pages, regardless of the state of SUM.
+SUM is hardwired to 0 if {\tt satp}.MODE is hardwired to 0.
+
\begin{commentary}
The SUM mechanism prevents supervisor software from inadvertently accessing
user memory. Operating systems can execute the majority of code with SUM clear;
@@ -278,7 +280,7 @@ vector mode (MODE).
\instbitrange{SXLEN-1}{2} &
\instbitrange{1}{0} \\
\hline
-\multicolumn{1}{|c|}{BASE[SXLEN-1:2] (\warl)} &
+\multicolumn{1}{|c|}{BASE[SXLEN-1:2] (\warl)} &
\multicolumn{1}{c|}{MODE (\warl)} \\
\hline
SXLEN-2 & 2 \\
@@ -300,7 +302,7 @@ impose additional alignment constraints on the value in the BASE field.
\begin{tabular}{|r|c|l|}
\hline
Value & Name & Description \\
-\hline
+\hline
0 & Direct & All exceptions set {\tt pc} to BASE. \\
1 & Vectored & Asynchronous interrupts set {\tt pc} to BASE+4$\times$cause. \\
$\ge$2 & --- & {\em Reserved} \\
@@ -679,7 +681,7 @@ it is only guaranteed to hold supported exception codes.
\hline
Interrupt & Exception Code & Description \\
- \hline
+ \hline
1 & 0 & {\em Reserved} \\
1 & 1 & Supervisor software interrupt \\
1 & 2--4 & {\em Reserved} \\
@@ -690,7 +692,7 @@ it is only guaranteed to hold supported exception codes.
1 & $\ge$16 & {\em Designated for platform use} \\ \hline
0 & 0 & Instruction address misaligned \\
0 & 1 & Instruction access fault \\
- 0 & 2 & Illegal instruction \\
+ 0 & 2 & Illegal instruction \\
0 & 3 & Breakpoint \\
0 & 4 & Load address misaligned \\
0 & 5 & Load access fault \\
@@ -915,7 +917,7 @@ Value & Name & Description \\
\multicolumn{3}{|c|}{RV64} \\
\hline
Value & Name & Description \\
-\hline
+\hline
0 & Bare & No translation or protection. \\
1--7 & --- & {\em Reserved for standard use} \\
8 & Sv39 & Page-based 39-bit virtual addressing (see Section~\ref{sec:sv39}). \\
@@ -942,7 +944,7 @@ of ASIDLEN, termed ASIDMAX, is 9 for Sv32 or 16 for Sv39, Sv48, and Sv57.
\begin{commentary}
For many applications, the choice of page size has a substantial
performance impact. A large page size increases TLB reach and loosens
-the associativity constraints on virtually-indexed, physically-tagged
+the associativity constraints on virtually indexed, physically tagged
caches. At the same time, large pages exacerbate internal
fragmentation, wasting physical memory and possibly cache capacity.
@@ -1762,10 +1764,9 @@ quickly distinguish user and supervisor address regions.
\begin{figure*}[h!]
{\footnotesize
\begin{center}
-\begin{tabular}{ccY@{}Y@{}Y@{}Y@{}Fcccccccc}
+\begin{tabular}{cY@{}Y@{}Y@{}Y@{}Fcccccccc}
\instbit{63} &
-\instbit{62} &
-\instbitrange{61}{54} &
+\instbitrange{62}{54} &
\instbitrange{53}{28} &
\instbitrange{27}{19} &
\instbitrange{18}{10} &
@@ -1779,8 +1780,7 @@ quickly distinguish user and supervisor address regions.
\instbit{1} &
\instbit{0} \\
\hline
-\multicolumn{1}{|c|}{C} &
-\multicolumn{1}{c|}{N} &
+\multicolumn{1}{|c|}{N} &
\multicolumn{1}{c|}{\it Reserved} &
\multicolumn{1}{c|}{PPN[2]} &
\multicolumn{1}{c|}{PPN[1]} &
@@ -1795,7 +1795,7 @@ quickly distinguish user and supervisor address regions.
\multicolumn{1}{c|}{R} &
\multicolumn{1}{c|}{V} \\
\hline
-1 & 1 & 8 & 26 & 9 & 9 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
+1 & 9 & 26 & 9 & 9 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
\end{tabular}
\end{center}
}
@@ -1812,32 +1812,13 @@ root page table is stored in the {\tt satp} register's PPN field.
The PTE format for Sv39 is shown in Figure~\ref{sv39pte}. Bits 9--0
have the same meaning as for Sv32.
-\begin{table}[h]
-\begin{center}
-\begin{tabular}{|c||l|}
-\hline
-C & Description \\
-\hline
-0 & The PTE follows standard address-translation rules \\
-1 & {\em Designated for custom use} \\
-\hline
-\end{tabular}
-\end{center}
-\caption{Meaning of PTE C bit in Sv39.}
-\label{tab:pte_cv_bits}
-\end{table}
-
-As shown in Table~\ref{tab:pte_cv_bits}, the C bit is used to indicate that a
-PTE uses a custom implementation-specific encoding in the remaining bits other
-than the V bit.
-
-If the C bit is not set, the N bit indicates that the page represents a
+The N bit indicates that the page represents a
naturally-aligned power-of-two range of contiguous translations, as defined in
the Svnapot extension in Chapter~\ref{svnapot}.
-Bits 61--54 are reserved
+Bits 62--54 are reserved
for future standard use and must be zeroed by software for forward compatibility.
-If any of these bits are set, an access-fault exception is raised.
+If any of these bits are set, a page-fault exception is raised.
\begin{commentary}
We reserved several PTE bits for a possible extension that improves
@@ -1942,10 +1923,9 @@ is untranslated.
\begin{figure*}[h!]
{\footnotesize
\begin{center}
-\begin{tabular}{ccF@{}F@{}F@{}F@{}F@{}Fcccccccc}
+\begin{tabular}{cF@{}F@{}F@{}F@{}F@{}Fcccccccc}
\instbit{63} &
-\instbit{62} &
-\instbitrange{61}{54} &
+\instbitrange{62}{54} &
\instbitrange{53}{37} &
\instbitrange{36}{28} &
\instbitrange{27}{19} &
@@ -1960,8 +1940,7 @@ is untranslated.
\instbit{1} &
\instbit{0} \\
\hline
-\multicolumn{1}{|c|}{C} &
-\multicolumn{1}{c|}{N} &
+\multicolumn{1}{|c|}{N} &
\multicolumn{1}{c|}{\it Reserved} &
\multicolumn{1}{c|}{PPN[3]} &
\multicolumn{1}{c|}{PPN[2]} &
@@ -1977,7 +1956,7 @@ is untranslated.
\multicolumn{1}{c|}{R} &
\multicolumn{1}{c|}{V} \\
\hline
-1 & 1 & 8 & 17 & 9 & 9 & 9 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
+1 & 9 & 17 & 9 & 9 & 9 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
\end{tabular}
\end{center}
}
@@ -2058,20 +2037,22 @@ is untranslated.
\begin{figure*}[h!]
{\footnotesize
\begin{center}
-\begin{tabular}{@{}E@{}O@{}O@{}O@{}O}
-\instbitrange{55}{39} &
+\begin{tabular}{@{}R@{}S@{}S@{}S@{}S@{}S}
+\instbitrange{55}{48} &
+\instbitrange{47}{39} &
\instbitrange{38}{30} &
\instbitrange{29}{21} &
\instbitrange{20}{12} &
\instbitrange{11}{0} \\
\hline
-\multicolumn{1}{|c|}{PPN[3]} &
+\multicolumn{1}{|c|}{PPN[4]} &
+\multicolumn{1}{c|}{PPN[3]} &
\multicolumn{1}{c|}{PPN[2]} &
\multicolumn{1}{c|}{PPN[1]} &
\multicolumn{1}{c|}{PPN[0]} &
\multicolumn{1}{c|}{page offset} \\
\hline
-17 & 9 & 9 & 9 & 12 \\
+8 & 9 & 9 & 9 & 9 & 12 \\
\end{tabular}
\end{center}
}
@@ -2083,11 +2064,11 @@ is untranslated.
\begin{figure*}[h!]
{\footnotesize
\begin{center}
-\begin{tabular}{cc@{}F@{}F@{}F@{}F@{}F@{}Fcccccccc}
+\begin{tabular}{c@{}Y@{}F@{}F@{}F@{}F@{}F@{}Wcccccccc}
\instbit{63} &
-\instbit{62} &
-\instbitrange{61}{54} &
-\instbitrange{53}{37} &
+\instbitrange{62}{54} &
+\instbitrange{53}{46} &
+\instbitrange{45}{37} &
\instbitrange{36}{28} &
\instbitrange{27}{19} &
\instbitrange{18}{10} &
@@ -2101,9 +2082,9 @@ is untranslated.
\instbit{1} &
\instbit{0} \\
\hline
-\multicolumn{1}{|c|}{C} &
-\multicolumn{1}{c|}{N} &
+\multicolumn{1}{|c|}{N} &
\multicolumn{1}{c|}{\it Reserved} &
+\multicolumn{1}{c|}{PPN[4]} &
\multicolumn{1}{c|}{PPN[3]} &
\multicolumn{1}{c|}{PPN[2]} &
\multicolumn{1}{c|}{PPN[1]} &
@@ -2118,7 +2099,7 @@ is untranslated.
\multicolumn{1}{c|}{R} &
\multicolumn{1}{c|}{V} \\
\hline
-1 & 1 & 8 & 17 & 9 & 9 & 9 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
+1 & 9 & 8 & 9 & 9 & 9 & 9 & 2 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
\end{tabular}
\end{center}
}
@@ -2143,7 +2124,7 @@ equals 8.
\chapter{``Svnapot'' Standard Extension for NAPOT Translation Contiguity, Version 0.1}
\label{svnapot}
-In Sv39, Sv48, and Sv57, when a PTE has C=0 and N=1, the PTE represents a
+In Sv39, Sv48, and Sv57, when a PTE has N=1, the PTE represents a
translation that is part of a range of contiguous virtual-to-physical
translations with the same values for PTE bits 5--0. Such ranges must be of a
naturally aligned power-of-2 (NAPOT) granularity larger than the base page
diff --git a/src/zihintpause.tex b/src/zihintpause.tex
new file mode 100644
index 0000000..fd652a2
--- /dev/null
+++ b/src/zihintpause.tex
@@ -0,0 +1,52 @@
+\chapter{``Zihintpause'' Pause Hint, Version 1.0}
+\label{chap:zihintpause}
+
+The PAUSE instruction is a HINT that indicates the current hart's rate of
+instruction retirement should be temporarily reduced or paused. The duration of its
+effect must be bounded and may be zero. No architectural state is changed.
+
+\begin{commentary}
+Software can use the PAUSE instruction to reduce energy consumption while
+executing spin-wait code sequences. Multithreaded cores might temporarily
+relinquish execution resources to other harts when PAUSE is executed.
+It is recommended that a PAUSE instruction generally be included in the code
+sequence for a spin-wait loop.
+
+A future extension might add primitives similar to the x86 MONITOR/MWAIT
+instructions, which provide a more efficient mechanism to wait on writes to
+a specific memory location.
+However, these instructions would not supplant PAUSE.
+PAUSE is more appropriate when polling for non-memory events, when polling for
+multiple events, or when software does not know precisely what events it is
+polling for.
+
+The duration of a PAUSE instruction's effect may vary significantly within and
+among implementations.
+In typical implementations this duration should be much less than the time to
+perform a context switch, probably more on the rough order of an on-chip cache
+miss latency or a cacheless access to main memory.
+
+A series of PAUSE instructions can be used to create a cumulative delay loosely
+proportional to the number of PAUSE instructions.
+In spin-wait loops in portable code, however, only one PAUSE instruction should
+be used before re-evaluating loop conditions, else the hart might stall longer
+than optimal on some implementations, degrading system performance.
+\end{commentary}
+
+PAUSE is encoded as a FENCE instruction with {\em pred}=W, {\em succ}=0,
+{\em fm}=0, {\em rd}={\tt x0}, and {\em rs1}={\tt x0}.
+
+\begin{commentary}
+PAUSE is encoded as a hint within the FENCE opcode because some
+implementations are expected to deliberately stall the PAUSE instruction until outstanding
+memory transactions have completed.
+Because the successor set is null, however, PAUSE does not {\em mandate} any
+particular memory ordering---hence, it truly is a HINT.
+
+Like other FENCE instructions, PAUSE cannot be used within LR/SC sequences
+without voiding the forward-progress guarantee.
+
+The choice of a predecessor set of W is arbitrary, since the successor set is
+null.
+Other HINTs similar to PAUSE might be encoded with other predecessor sets.
+\end{commentary}