aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--src/a.tex524
-rw-r--r--src/b.tex19
-rw-r--r--src/c.tex1268
-rw-r--r--src/counters.tex252
-rw-r--r--src/csr.tex260
-rw-r--r--src/d.tex442
-rw-r--r--src/extensions.tex383
-rw-r--r--src/f.tex851
-rw-r--r--src/history.tex403
-rw-r--r--src/intro.tex770
-rw-r--r--src/j.tex13
-rw-r--r--src/m.tex188
-rw-r--r--src/memory-model-alloy.tex269
-rw-r--r--src/memory-model-herd.tex160
-rw-r--r--src/naming.tex189
-rw-r--r--src/p.tex14
16 files changed, 0 insertions, 6005 deletions
diff --git a/src/a.tex b/src/a.tex
deleted file mode 100644
index 1600cc6..0000000
--- a/src/a.tex
+++ /dev/null
@@ -1,524 +0,0 @@
-\chapter{``A'' Standard Extension for Atomic Instructions, Version 2.1}
-\label{atomics}
-
-The standard atomic-instruction extension, named ``A'',
-contains instructions that atomically
-read-modify-write memory to support synchronization between multiple
-RISC-V harts running in the same memory space. The two forms of
-atomic instruction provided are load-reserved/store-conditional
-instructions and atomic fetch-and-op memory instructions. Both types
-of atomic instruction support various memory consistency orderings
-including unordered, acquire, release, and sequentially consistent
-semantics. These instructions allow RISC-V to support the RCsc memory
-consistency model~\cite{Gharachorloo90memoryconsistency}.
-
-\begin{commentary}
-After much debate, the language community and architecture community
-appear to have finally settled on release consistency as the standard
-memory consistency model and so the RISC-V atomic support is built
-around this model.
-\end{commentary}
-
-\section{Specifying Ordering of Atomic Instructions}
-
-The base RISC-V ISA has a relaxed memory model, with the FENCE
-instruction used to impose additional ordering constraints. The
-address space is divided by the execution environment into memory and
-I/O domains, and the FENCE instruction provides options to order
-accesses to one or both of these two address domains.
-
-To provide more efficient support for release
-consistency~\cite{Gharachorloo90memoryconsistency}, each atomic
-instruction has two bits, {\em aq} and {\em rl}, used to specify
-additional memory ordering constraints as viewed by other RISC-V
-harts. The bits order accesses to one of the two address domains,
-memory or I/O, depending on which address domain the atomic
-instruction is accessing. No ordering constraint is implied to
-accesses to the other domain, and a FENCE instruction should be used
-to order across both domains.
-
-If both bits are clear, no additional ordering constraints are imposed
-on the atomic memory operation. If only the {\em aq} bit is set, the
-atomic memory operation is treated as an {\em acquire} access, i.e.,
-no following memory operations on this RISC-V hart can be observed
-to take place before the acquire memory operation. If only the {\em
- rl} bit is set, the atomic memory operation is treated as a {\em
- release} access, i.e., the release memory operation cannot be
-observed to take place before any earlier memory operations on this
-RISC-V hart. If both the {\em aq} and {\em rl} bits are set, the
-atomic memory operation is {\em sequentially consistent} and cannot be
-observed to happen before any earlier memory operations or after any
-later memory operations in the same RISC-V hart and to the same
-address domain.
-
-\section{Load-Reserved/Store-Conditional Instructions}
-\label{sec:lrsc}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}W@{}W@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbit{26} &
-\instbit{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{aq} &
-\multicolumn{1}{c|}{rl} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 1 & 1 & 5 & 5 & 3 & 5 & 7 \\
-LR.W/D & \multicolumn{2}{c}{ordering} & 0 & addr & width & dest & AMO \\
-SC.W/D & \multicolumn{2}{c}{ordering} & src & addr & width & dest & AMO \\
-\end{tabular}
-\end{center}
-
-Complex atomic memory operations on a single memory word or doubleword are performed
-with the load-reserved (LR) and store-conditional (SC) instructions.
-LR.W loads a word from the address in {\em rs1}, places the sign-extended
-value in {\em rd}, and registers a {\em reservation set}---a set of bytes
-that subsumes the bytes in the addressed word.
-SC.W conditionally writes a word in {\em rs2} to the address in {\em rs1}: the
-SC.W succeeds only if the reservation is still valid and the reservation set
-contains the bytes being written.
-If the SC.W succeeds, the instruction writes the word in {\em rs2} to memory,
-and it writes zero to {\em rd}.
-If the SC.W fails, the instruction does not write to memory, and it writes
-a nonzero value to {\em rd}.
-Regardless of success or failure, executing an SC.W instruction invalidates
-any reservation held by this hart.
-LR.D and SC.D act analogously on doublewords and are only available on RV64.
-For RV64, LR.W and SC.W sign-extend the value placed in {\em rd}.
-
-\begin{commentary}
-Both compare-and-swap (CAS) and LR/SC can be used to build lock-free
-data structures. After extensive discussion, we opted for LR/SC for
-several reasons: 1) CAS suffers from the ABA problem, which LR/SC
-avoids because it monitors all writes to the address rather than
-only checking for changes in the data value; 2) CAS would also require
-a new integer instruction format to support three source operands
-(address, compare value, swap value) as well as a different memory
-system message format, which would complicate microarchitectures; 3)
-Furthermore, to avoid the ABA problem, other systems provide a
-double-wide CAS (DW-CAS) to allow a counter to be tested and
-incremented along with a data word. This requires reading five
-registers and writing two in one instruction, and also a new larger
-memory system message type, further complicating implementations; 4)
-LR/SC provides a more efficient implementation of many primitives as
-it only requires one load as opposed to two with CAS (one load before
-the CAS instruction to obtain a value for speculative computation,
-then a second load as part of the CAS instruction to check if value is
-unchanged before updating).
-
-The main disadvantage of LR/SC over CAS is livelock, which we avoid,
-under certain circumstances,
-with an architected guarantee of eventual forward progress as
-described below. Another concern is whether the influence of the
-current x86 architecture, with its DW-CAS, will complicate porting of
-synchronization libraries and other software that assumes DW-CAS is
-the basic machine primitive. A possible mitigating factor is the
-recent addition of transactional memory instructions to x86, which
-might cause a move away from DW-CAS.
-
-More generally, a multi-word atomic primitive is desirable, but there is
-still considerable debate about what form this should take, and
-guaranteeing forward progress adds complexity to a system.
-\end{commentary}
-
-The failure code with value 1 encodes an unspecified failure.
-Other failure codes are reserved at this time.
-Portable software should only assume the failure code will be non-zero.
-
-\begin{commentary}
-We reserve a failure code of 1 to mean ``unspecified'' so that simple
-implementations may return this value using the existing mux required
-for the SLT/SLTU instructions. More specific failure codes might be
-defined in future versions or extensions to the ISA.
-\end{commentary}
-
-For LR and SC, the A extension requires that the address held in {\em
- rs1} be naturally aligned to the size of the operand (i.e.,
-eight-byte aligned for 64-bit words and four-byte aligned for 32-bit
-words). If the address is not naturally aligned, an address-misaligned
-exception or an access-fault exception will be generated. The access-fault
-exception can be generated for a memory access that would otherwise be
-able to complete except for the misalignment, if the misaligned access
-should not be emulated.
-
-\begin{commentary}
-Emulating misaligned LR/SC sequences is impractical in most systems.
-
-Misaligned LR/SC sequences also raise the possibility of accessing multiple
-reservation sets at once, which present definitions do not provide for.
-\end{commentary}
-
-An implementation can register an arbitrarily large reservation set on each
-LR, provided the reservation set includes all bytes of the addressed data word
-or doubleword.
-An SC can only pair with the most recent LR in program order.
-An SC may succeed only if no store from another hart
-to the reservation set can be observed to have occurred between the LR
-and the SC, and if there is no other SC between the LR and itself in program
-order.
-An SC may succeed only if no write from a device other than a hart
-to the bytes accessed by the LR instruction can be observed to have occurred
-between the LR and SC.
-Note this LR might have had a different effective address and data size, but
-reserved the SC's address as part of the reservation set.
-\begin{commentary}
-Following this model, in systems with memory translation, an SC is allowed to
-succeed if the earlier LR reserved the same location using an alias with
-a different virtual address, but is also allowed to fail if the virtual
-address is different.
-\end{commentary}
-\begin{commentary}
-To accommodate legacy devices and buses, writes from devices other than RISC-V
-harts are only required to invalidate reservations when they overlap the bytes
-accessed by the LR. These writes are not required to invalidate the
-reservation when they access other bytes in the reservation set.
-\end{commentary}
-
-The SC must fail if the address is not within the reservation set of the most
-recent LR in program order.
-The SC must fail if a store to the reservation set from another hart can be
-observed to occur between the LR and SC.
-The SC must fail if a write from some other device to the bytes accessed by
-the LR can be observed to occur between the LR and SC.
-(If such a device writes the reservation set but does not write the bytes
-accessed by the LR, the SC may or may not fail.)
-An SC must fail if there is another SC (to any address) between the LR and the
-SC in program order.
-The precise statement of the atomicity requirements for successful LR/SC
-sequences is defined by the Atomicity Axiom in Section~\ref{sec:rvwmo}.
-
-\begin{commentary}
-The platform should provide a means to determine the size and shape of the
-reservation set.
-
-A platform specification may constrain the size and shape of the reservation
-set.
-\end{commentary}
-
-\begin{commentary}
-A store-conditional instruction to a scratch word of memory should be used
-to forcibly invalidate any existing load reservation:
-\begin{itemize}
-\item during a preemptive context switch, and
-\item if necessary when changing virtual to physical address mappings,
- such as when migrating pages that might contain an active reservation.
-\end{itemize}
-\end{commentary}
-
-\begin{commentary}
-The invalidation of a hart's reservation when it executes an LR or SC
-imply that a hart can only hold one reservation at a time, and that
-an SC can only pair with the most recent LR, and LR with the next
-following SC, in program order. This is a restriction to the
-Atomicity Axiom in Section~\ref{sec:rvwmo} that ensures software runs
-correctly on expected common implementations that operate in this manner.
-\end{commentary}
-
-An SC instruction can never be observed by another RISC-V hart
-before the LR instruction that established the reservation.
-The LR/SC
-sequence can be given acquire semantics by setting the {\em aq} bit on
-the LR instruction. The LR/SC sequence can be given release semantics
-by setting the {\em rl} bit on the SC instruction. Setting the {\em
- aq} bit on the LR instruction, and setting both the {\em aq} and the {\em
- rl} bit on the SC instruction makes the LR/SC sequence sequentially
-consistent, meaning that it cannot be reordered with earlier or
-later memory operations from the same hart.
-
-If neither bit is set on both LR and SC, the LR/SC sequence can be
-observed to occur before or after surrounding memory operations from
-the same RISC-V hart. This can be appropriate when the LR/SC
-sequence is used to implement a parallel reduction operation.
-
-Software should not set the {\em rl} bit on an LR instruction unless the {\em
-aq} bit is also set, nor should software set the {\em aq} bit on an SC
-instruction unless the {\em rl} bit is also set. LR.{\em rl} and SC.{\em aq}
-instructions are not guaranteed to provide any stronger ordering than those
-with both bits clear, but may result in lower performance.
-
-\begin{figure}[h!]
-\begin{center}
-\begin{verbatim}
- # a0 holds address of memory location
- # a1 holds expected value
- # a2 holds desired value
- # a0 holds return value, 0 if successful, !0 otherwise
- cas:
- lr.w t0, (a0) # Load original value.
- bne t0, a1, fail # Doesn't match, so fail.
- sc.w t0, a2, (a0) # Try to update.
- bnez t0, cas # Retry if store-conditional failed.
- li a0, 0 # Set return to success.
- jr ra # Return.
- fail:
- li a0, 1 # Set return to failure.
- jr ra # Return.
-\end{verbatim}
-\end{center}
-\caption{Sample code for compare-and-swap function using LR/SC.}
-\label{cas}
-\end{figure}
-
-LR/SC can be used to construct lock-free data structures. An example
-using LR/SC to implement a compare-and-swap function is shown in
-Figure~\ref{cas}. If inlined, compare-and-swap functionality need
-only take four instructions.
-
-\section{Eventual Success of Store-Conditional Instructions}
-\label{sec:lrscseq}
-
-The standard A extension defines {\em constrained LR/SC loops}, which have
-the following properties:
-\vspace{-0.2in}
-\begin{itemize}
-\parskip 0pt
-\itemsep 1pt
-\item The loop comprises only an LR/SC sequence and code to retry the sequence
- in the case of failure, and must comprise at most 16 instructions placed
- sequentially in memory.
-\item An LR/SC sequence begins with an LR instruction and ends with an SC
- instruction. The dynamic code executed between the LR and SC instructions
- can only contain instructions from the base ``I'' instruction set, excluding
- loads, stores, backward jumps, taken backward branches, JALR, FENCE,
- and SYSTEM instructions.
- If the ``C'' extension is supported, then compressed
- forms of the aforementioned ``I'' instructions are also permitted.
-\item The code to retry a failing LR/SC sequence can contain backwards jumps
- and/or branches to repeat the LR/SC sequence, but otherwise has the same
- constraint as the code between the LR and SC.
-\item The LR and SC addresses must lie within a memory region with the
- {\em LR/SC eventuality} property. The execution environment is responsible
- for communicating which regions have this property.
-\item The SC must be to the same effective address and of the same data size as
- the latest LR executed by the same hart.
-\end{itemize}
-
-LR/SC sequences that do not lie within constrained LR/SC loops are {\em
-unconstrained}. Unconstrained LR/SC sequences might succeed on some attempts
-on some implementations, but might never succeed on other implementations.
-
-\begin{commentary}
-We restricted the length of LR/SC loops to fit within 64 contiguous
-instruction bytes in the base ISA to avoid undue restrictions on instruction
-cache and TLB size and associativity.
-Similarly, we disallowed other loads and stores within the loops to avoid
-restrictions on data-cache associativity in simple implementations that track
-the reservation within a private cache.
-The restrictions on branches and jumps limit the time that
-can be spent in the sequence. Floating-point operations and integer
-multiply/divide were disallowed to simplify the operating system's emulation
-of these instructions on implementations lacking appropriate hardware support.
-
-Software is not forbidden from using unconstrained LR/SC sequences, but
-portable software must detect the case that the sequence repeatedly fails,
-then fall back to an alternate code sequence that does not rely on an
-unconstrained LR/SC sequence. Implementations are permitted to
-unconditionally fail any unconstrained LR/SC sequence.
-\end{commentary}
-
-If a hart {\em H} enters a constrained LR/SC loop, the execution environment
-must guarantee that one of the following events eventually occurs:
-\vspace{-0.2in}
-\begin{itemize}
-\parskip 0pt
-\itemsep 1pt
-\item {\em H} or some other hart executes a successful SC to the reservation
- set of the LR instruction in {\em H}'s constrained LR/SC loops.
-\item Some other hart executes an unconditional store or AMO instruction to
- the reservation set of the LR instruction in {\em H}'s constrained LR/SC
- loop, or some other device in the system writes to that reservation set.
-\item {\em H} executes a branch or jump that exits the constrained LR/SC loop.
-\item {\em H} traps.
-\end{itemize}
-
-\begin{commentary}
-Note that these definitions permit an implementation to fail an SC instruction
-occasionally for any reason, provided the aforementioned guarantee is not
-violated.
-\end{commentary}
-
-\begin{commentary}
-As a consequence of the eventuality guarantee, if some harts in an execution
-environment are executing constrained LR/SC loops, and no other harts or
-devices in the execution environment execute an unconditional store or AMO to
-that reservation set, then at least one hart will eventually exit its
-constrained LR/SC loop.
-By contrast, if other harts or devices continue to write to that reservation
-set, it is not guaranteed that any hart will exit its LR/SC loop.
-
-Loads and load-reserved instructions do not by themselves impede the progress
-of other harts' LR/SC sequences.
-We note this constraint implies, among other things, that loads and
-load-reserved instructions executed by other harts (possibly within the same
-core) cannot impede LR/SC progress indefinitely.
-For example, cache evictions caused by another hart sharing the cache cannot
-impede LR/SC progress indefinitely.
-Typically, this implies reservations are tracked independently of
-evictions from any shared cache.
-Similarly, cache misses caused by speculative execution within a hart cannot
-impede LR/SC progress indefinitely.
-
-These definitions admit the possibility that SC instructions may spuriously
-fail for implementation reasons, provided progress is eventually made.
-\end{commentary}
-
-\begin{commentary}
-One advantage of CAS is that it guarantees that some hart eventually
-makes progress, whereas an LR/SC atomic sequence could livelock
-indefinitely on some systems. To avoid this concern, we added an
-architectural guarantee of livelock freedom for certain LR/SC sequences.
-
-Earlier versions of this specification imposed a stronger starvation-freedom
-guarantee. However, the weaker livelock-freedom guarantee is sufficient to
-implement the C11 and C++11 languages, and is substantially easier to provide
-in some microarchitectural styles.
-\end{commentary}
-
-\section{Atomic Memory Operations}
-\label{sec:amo}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{O@{}W@{}W@{}R@{}R@{}F@{}R@{}R}
-\\
-\instbitrange{31}{27} &
-\instbit{26} &
-\instbit{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{aq} &
-\multicolumn{1}{c|}{rl} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 1 & 1 & 5 & 5 & 3 & 5 & 7 \\
-AMOSWAP.W/D & \multicolumn{2}{c}{ordering} & src & addr & width & dest & AMO \\
-AMOADD.W/D & \multicolumn{2}{c}{ordering} & src & addr & width & dest & AMO \\
-AMOAND.W/D & \multicolumn{2}{c}{ordering} & src & addr & width & dest & AMO \\
-AMOOR.W/D & \multicolumn{2}{c}{ordering} & src & addr & width & dest & AMO \\
-AMOXOR.W/D & \multicolumn{2}{c}{ordering} & src & addr & width & dest & AMO \\
-AMOMAX[U].W/D & \multicolumn{2}{c}{ordering} & src & addr & width & dest & AMO \\
-AMOMIN[U].W/D & \multicolumn{2}{c}{ordering} & src & addr & width & dest & AMO \\
-\end{tabular}
-\end{center}
-
-\vspace{-0.1in} The atomic memory operation (AMO) instructions perform
-read-modify-write operations for multiprocessor synchronization and
-are encoded with an R-type instruction format. These AMO instructions
-atomically load a data value from the address in {\em rs1}, place the
-value into register {\em rd}, apply a binary operator to the loaded
-value and the original value in {\em rs2}, then store the result back
-to the original address in {\em rs1}. AMOs can either operate on 64-bit (RV64
-only) or 32-bit words in memory. For RV64, 32-bit AMOs always
-sign-extend the value placed in {\em rd}, and ignore the upper 32 bits
-of the original value of {\em rs2}.
-
-For AMOs, the A extension requires that the address held in {\em rs1}
-be naturally aligned to the size of the operand (i.e., eight-byte
-aligned for 64-bit words and four-byte aligned for 32-bit words). If
-the address is not naturally aligned, an address-misaligned exception
-or an access-fault exception will be generated. The access-fault exception can be
-generated for a memory access that would otherwise be able to complete
-except for the misalignment, if the misaligned access should not be
-emulated. The ``Zam'' extension, described in Chapter~\ref{sec:zam},
-relaxes this requirement and specifies the semantics of misaligned
-AMOs.
-
-The operations supported are swap, integer add, bitwise AND, bitwise
-OR, bitwise XOR, and signed and unsigned integer maximum and minimum.
-Without ordering constraints, these AMOs can be used to implement
-parallel reduction operations, where typically the return value would
-be discarded by writing to {\tt x0}.
-
-\begin{commentary}
-We provided fetch-and-op style atomic primitives as they scale to
-highly parallel systems better than LR/SC or CAS.
-A simple microarchitecture can implement AMOs using the LR/SC primitives,
-provided the implementation can guarantee the AMO eventually completes.
-More complex implementations might also implement AMOs at memory
-controllers, and can optimize away fetching the original value when
-the destination is {\tt x0}.
-
-The set of AMOs was chosen to support the C11/C++11 atomic memory
-operations efficiently, and also to support parallel reductions in
-memory. Another use of AMOs is to provide atomic updates to
-memory-mapped device registers (e.g., setting, clearing, or toggling
-bits) in the I/O space.
-\end{commentary}
-
-To help implement multiprocessor synchronization, the AMOs optionally
-provide release consistency semantics. If the {\em aq} bit is set,
-then no later memory operations in this RISC-V hart can be observed
-to take place before the AMO.
-Conversely, if the {\em rl} bit is set, then other
-RISC-V harts will not observe the AMO before memory accesses
-preceding the AMO in this RISC-V hart. Setting both the {\em aq} and the {\em
-rl} bit on an AMO makes the sequence sequentially consistent, meaning that
-it cannot be reordered with earlier or later memory operations from the same
-hart.
-
-\begin{commentary}
-The AMOs were designed to implement the C11 and C++11 memory models
-efficiently. Although the FENCE R, RW instruction suffices to
-implement the {\em acquire} operation and FENCE RW, W suffices to
-implement {\em release}, both imply additional unnecessary ordering as
-compared to AMOs with the corresponding {\em aq} or {\em rl} bit set.
-\end{commentary}
-
-An example code sequence for a critical section guarded by a
-test-and-test-and-set spinlock is shown in Figure~\ref{critical}. Note the
-first AMO is marked {\em aq} to order the lock acquisition before the
-critical section, and the second AMO is marked {\em rl} to order
-the critical section before the lock relinquishment.
-
-\begin{figure}[h!]
-\begin{center}
-\begin{verbatim}
- li t0, 1 # Initialize swap value.
- again:
- lw t1, (a0) # Check if lock is held.
- bnez t1, again # Retry if held.
- amoswap.w.aq t1, t0, (a0) # Attempt to acquire lock.
- bnez t1, again # Retry if held.
- # ...
- # Critical section.
- # ...
- amoswap.w.rl x0, x0, (a0) # Release lock by storing 0.
-\end{verbatim}
-\end{center}
-\caption{Sample code for mutual exclusion. {\tt a0} contains the address of the lock.}
-\label{critical}
-\end{figure}
-
-\begin{commentary}
-We recommend the use of the AMO Swap idiom shown above for both lock
-acquire and release to simplify the implementation of speculative lock
-elision~\cite{Rajwar:2001:SLE}.
-\end{commentary}
-
-The instructions in the ``A'' extension can also be used to provide
-sequentially consistent loads and stores. A sequentially consistent load can
-be implemented as an LR with both {\em aq} and {\em rl} set. A sequentially
-consistent store can be implemented as an AMOSWAP that writes the old value to
-x0 and has both {\em aq} and {\em rl} set.
diff --git a/src/b.tex b/src/b.tex
deleted file mode 100644
index 0c4e497..0000000
--- a/src/b.tex
+++ /dev/null
@@ -1,19 +0,0 @@
-\chapter{``B'' Standard Extension for Bit Manipulation, Version 0.0}
-\label{sec:bits}
-
-This chapter is a placeholder for a future standard extension to
-provide bit manipulation instructions, including instructions to
-insert, extract, and test bit fields, and for rotations, funnel
-shifts, and bit and byte permutations.
-
-\begin{commentary}
-Although bit manipulation instructions are very effective in some
-application domains, particularly when dealing with externally packed
-data structures, we excluded them from the base ISAs as they are not
-useful in all domains and can add additional complexity or instruction
-formats to supply all needed operands.
-
-We anticipate the B extension will be a brownfield encoding within the
-base 30-bit instruction space.
-\end{commentary}
-
diff --git a/src/c.tex b/src/c.tex
deleted file mode 100644
index ea63d22..0000000
--- a/src/c.tex
+++ /dev/null
@@ -1,1268 +0,0 @@
-\chapter{``C'' Standard Extension for Compressed Instructions, Version
-2.0}
-\label{compressed}
-
-This chapter describes the RISC-V
-standard compressed instruction-set extension, named ``C'', which
-reduces static and dynamic code size by adding short 16-bit
-instruction encodings for common operations. The C extension can be
-added to any of the base ISAs (RV32, RV64, RV128), and we use the
-generic term ``RVC'' to cover any of these. Typically, 50\%--60\% of
-the RISC-V instructions in a program can be replaced with RVC
-instructions, resulting in a 25\%--30\% code-size reduction.
-
-\section{Overview}
-
-RVC uses a simple compression scheme that offers shorter 16-bit
-versions of common 32-bit RISC-V instructions when:
-\begin{tightlist}
- \item the immediate or address offset is small, or
- \item one of the registers is the zero register ({\tt x0}), the
- ABI link register ({\tt x1}), or the ABI stack pointer ({\tt
- x2}), or
- \item the destination register and the first source register are
- identical, or
- \item the registers used are the 8 most popular ones.
-\end{tightlist}
-
-The C extension is compatible with all other standard instruction
-extensions. The C extension allows 16-bit instructions to be freely
-intermixed with 32-bit instructions, with the latter now able to start
-on any 16-bit boundary, i.e., IALIGN=16. With the addition of the C
-extension, no instructions can raise instruction-address-misaligned
-exceptions.
-
-\begin{commentary}
-Removing the 32-bit alignment constraint on the original 32-bit
-instructions allows significantly greater code density.
-\end{commentary}
-
-The compressed instruction encodings are mostly common across RV32C,
-RV64C, and RV128C, but as shown in Table~\ref{rvcopcodemap}, a few
-opcodes are used for different purposes depending on base ISA.
-For example, the wider address-space RV64C and RV128C variants require
-additional opcodes to compress loads and stores of 64-bit integer
-values, while RV32C uses the same opcodes to compress loads and stores
-of single-precision floating-point values. Similarly, RV128C requires
-additional opcodes to capture loads and stores of 128-bit integer
-values, while these same opcodes are used for loads and stores of
-double-precision floating-point values in RV32C and RV64C. If the C
-extension is implemented, the appropriate compressed floating-point
-load and store instructions must be provided whenever the relevant
-standard floating-point extension (F and/or D) is also implemented.
-In addition, RV32C includes a compressed jump and link instruction to
-compress short-range subroutine calls, where the same opcode is used
-to compress ADDIW for RV64C and RV128C.
-
-\begin{commentary}
-Double-precision loads and stores are a significant fraction of static
-and dynamic instructions, hence the motivation to include them in the
-RV32C and RV64C encoding.
-
-Although single-precision loads and stores are not a significant
-source of static or dynamic compression for benchmarks compiled for
-the currently supported ABIs, for microcontrollers that only provide
-hardware single-precision floating-point units and have an ABI that
-only supports single-precision floating-point numbers, the
-single-precision loads and stores will be used at least as frequently
-as double-precision loads and stores in the measured benchmarks.
-Hence, the motivation to provide compressed support for these in
-RV32C.
-
-Short-range subroutine calls are more likely in small binaries for
-microcontrollers, hence the motivation to include these in RV32C.
-
-Although reusing opcodes for different purposes for different base
-ISAs adds some complexity to documentation, the impact on
-implementation complexity is small even for designs that support
-multiple base ISAs. The compressed floating-point load
-and store variants use the same instruction format with the same
-register specifiers as the wider integer loads and stores.
-\end{commentary}
-
-RVC was designed under the constraint that each RVC instruction
-expands into a single 32-bit instruction in either the base ISA
-(RV32I/E, RV64I, or RV128I) or the F and D standard extensions where
-present. Adopting this constraint has two main benefits:
-
-\begin{tightlist}
-\item Hardware designs can simply expand RVC instructions during
- decode, simplifying verification and minimizing modifications to
- existing microarchitectures.
-\item Compilers can be unaware of the RVC extension and leave code
- compression to the assembler and linker, although a
- compression-aware compiler will generally be able to produce better
- results.
-\end{tightlist}
-
-\begin{commentary}
-We felt the multiple complexity reductions of a simple one-one mapping
-between C and base IFD instructions far outweighed the potential gains
-of a slightly denser encoding that added additional instructions only
-supported in the C extension, or that allowed encoding of multiple IFD
-instructions in one C instruction.
-\end{commentary}
-
-It is important to note that the C extension is not designed to be a
-stand-alone ISA, and is meant to be used alongside a base ISA.
-
-\begin{commentary}
-Variable-length instruction sets have long been used to improve code
-density. For example, the IBM Stretch~\cite{stretch}, developed in
-the late 1950s, had an ISA with 32-bit and 64-bit instructions, where
-some of the 32-bit instructions were compressed versions of the full
-64-bit instructions. Stretch also employed the concept of limiting
-the set of registers that were addressable in some of the shorter
-instruction formats, with short branch instructions that could only
-refer to one of the index registers. The later IBM 360
-architecture~\cite{ibm360} supported a simple variable-length
-instruction encoding with 16-bit, 32-bit, or 48-bit instruction
-formats.
-
-In 1963, CDC introduced the Cray-designed CDC 6600~\cite{cdc6600}, a
-precursor to RISC architectures, that introduced a register-rich
-load-store architecture with instructions of two lengths, 15-bits and
-30-bits. The later Cray-1 design used a very similar instruction
-format, with 16-bit and 32-bit instruction lengths.
-
-The initial RISC ISAs from the 1980s all picked performance over code
-size, which was reasonable for a workstation environment, but not for
-embedded systems. Hence, both ARM and MIPS subsequently made versions
-of the ISAs that offered smaller code size by offering an alternative
-16-bit wide instruction set instead of the standard 32-bit wide
-instructions. The compressed RISC ISAs reduced code size relative to
-their starting points by about 25--30\%, yielding code that was
-significantly \emph{smaller} than 80x86. This result surprised some,
-as their intuition was that the variable-length CISC ISA should be
-smaller than RISC ISAs that offered only 16-bit and 32-bit formats.
-
-Since the original RISC ISAs did not leave sufficient opcode space
-free to include these unplanned compressed instructions, they were
-instead developed as complete new ISAs. This meant compilers needed
-different code generators for the separate compressed ISAs. The first
-compressed RISC ISA extensions (e.g., ARM Thumb and MIPS16) used only
-a fixed 16-bit instruction size, which gave good reductions in static
-code size but caused an increase in dynamic instruction count, which
-led to lower performance compared to the original fixed-width 32-bit
-instruction size. This led to the development of a second generation
-of compressed RISC ISA designs with mixed 16-bit and 32-bit
-instruction lengths (e.g., ARM Thumb2, microMIPS, PowerPC VLE), so
-that performance was similar to pure 32-bit instructions but with
-significant code size savings. Unfortunately, these different
-generations of compressed ISAs are incompatible with each other and
-with the original uncompressed ISA, leading to significant complexity
-in documentation, implementations, and software tools support.
-
-Of the commonly used 64-bit ISAs, only PowerPC and microMIPS currently
-supports a compressed instruction format. It is surprising that the
-most popular 64-bit ISA for mobile platforms (ARM v8) does not include
-a compressed instruction format given that static code size and
-dynamic instruction fetch bandwidth are important metrics. Although
-static code size is not a major concern in larger systems, instruction
-fetch bandwidth can be a major bottleneck in servers running
-commercial workloads, which often have a large instruction working
-set.
-
-Benefiting from 25 years of hindsight, RISC-V was designed to support
-compressed instructions from the outset, leaving enough opcode
-space for RVC to be added as a simple extension on top of the base ISA
-(along with many other extensions). The philosophy of RVC is to
-reduce code size for embedded applications \emph{and} to improve
-performance and energy-efficiency for all applications due to fewer
-misses in the instruction cache. Waterman shows that RVC fetches
-25\%-30\% fewer instruction bits, which reduces instruction cache
-misses by 20\%-25\%, or roughly the same performance impact as
-doubling the instruction cache size~\cite{waterman-ms}.
-\end{commentary}
-
-
-\section{Compressed Instruction Formats}
-
-Table~\ref{rvc-formats} shows the nine compressed instruction
-formats. CR, CI, and CSS can use any of the 32 RVI registers, but CIW,
-CL, CS, CA, and CB are limited to just 8 of them. Table~\ref{registers}
-lists these popular registers, which correspond to registers {\tt x8}
-to {\tt x15}. Note that there is a
-separate version of load and store instructions that use the stack
-pointer as the base address register, since saving to and restoring
-from the stack are so prevalent, and that they use the CI and CSS
-formats to allow access to all 32 data registers. CIW supplies an
-8-bit immediate for the ADDI4SPN instruction.
-
-\begin{commentary}
-The RISC-V ABI was changed to make the frequently used registers map
-to registers {\tt x8}--{\tt x15}. This simplifies the decompression
-decoder by having a contiguous naturally aligned set of register
-numbers, and is also compatible with the RV32E base ISA,
-which only has 16 integer registers.
-\end{commentary}
-
-Compressed register-based floating-point loads and stores also use the
-CL and CS formats respectively, with the eight registers mapping to
-{\tt f8} to {\tt f15}.
-
-\begin{commentary}
-The standard RISC-V calling convention maps the most frequently used
-floating-point registers to registers {\tt f8} to {\tt f15}, which
-allows the same register decompression decoding as for integer
-register numbers.
-\end{commentary}
-
-The formats were designed to keep bits for the two register source
-specifiers in the same place in all instructions, while the
-destination register field can move. When the full 5-bit destination
-register specifier is present, it is in the same place as in the
-32-bit RISC-V encoding. Where immediates are
-sign-extended, the sign-extension is always from bit 12. Immediate
-fields have been scrambled, as in the base specification, to reduce
-the number of immediate muxes required.
-
-\begin{commentary}
-The immediate fields are scrambled in the instruction formats instead
-of in sequential order so that as many bits as possible are in the
-same position in every instruction, thereby simplifying
-implementations.
-\end{commentary}
-
-For many RVC instructions, zero-valued immediates are disallowed and
-{\tt x0} is not a valid 5-bit register specifier. These restrictions
-free up encoding space for other instructions requiring fewer operand
-bits.
-
-\newcommand{\rdprime}{rd\,$'$}
-\newcommand{\rsoneprime}{rs1\,$'$}
-\newcommand{\rstwoprime}{rs2\,$'$}
-
-\begin{table}[h]
-{
-\begin{small}
-\begin{center}
-\begin{tabular}{c c p{0in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}}
-& & & & & & & & & \\
-Format & Meaning &
-\instbit{15} &
-\instbit{14} &
-\instbit{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbit{11} &
-\instbit{10} &
-\instbit{9} &
-\instbit{8} &
-\instbit{7} &
-\instbit{6} &
-\multicolumn{1}{r}{\instbit{5}} &
-\instbit{4} &
-\instbit{3} &
-\instbit{2} &
-\instbit{1} &
-\instbit{0} \\
-\cline{3-18}
-
-CR & Register &
-\multicolumn{4}{|c|}{funct4} &
-\multicolumn{5}{c|}{rd/rs1} &
-\multicolumn{5}{c|}{rs2} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-CI & Immediate &
-\multicolumn{3}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{5}{c|}{rd/rs1} &
-\multicolumn{5}{c|}{imm} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-CSS & Stack-relative Store &
-\multicolumn{3}{|c|}{funct3} &
-\multicolumn{6}{c|}{imm} &
-\multicolumn{5}{c|}{rs2} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-CIW & Wide Immediate &
-\multicolumn{3}{|c|}{funct3} &
-\multicolumn{8}{c|}{imm} &
-\multicolumn{3}{c|}{\rdprime} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-CL & Load &
-\multicolumn{3}{|c|}{funct3} &
-\multicolumn{3}{c|}{imm} &
-\multicolumn{3}{c|}{\rsoneprime} &
-\multicolumn{2}{c|}{imm} &
-\multicolumn{3}{c|}{\rdprime} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-CS & Store &
-\multicolumn{3}{|c|}{funct3} &
-\multicolumn{3}{c|}{imm} &
-\multicolumn{3}{c|}{\rsoneprime} &
-\multicolumn{2}{c|}{imm} &
-\multicolumn{3}{c|}{\rstwoprime} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-CA & Arithmetic &
-\multicolumn{6}{|c|}{funct6} &
-\multicolumn{3}{c|}{\rdprime/\rsoneprime} &
-\multicolumn{2}{c|}{funct2} &
-\multicolumn{3}{c|}{\rstwoprime} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-CB & Branch/Arithmetic &
-\multicolumn{3}{|c|}{funct3} &
-\multicolumn{3}{c|}{offset} &
-\multicolumn{3}{c|}{\rdprime/\rsoneprime} &
-\multicolumn{5}{c|}{offset} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-CJ & Jump &
-\multicolumn{3}{|c|}{funct3} &
-\multicolumn{11}{c|}{jump target} &
-\multicolumn{2}{c|}{op} \\
-\cline{3-18}
-
-\end{tabular}
-\end{center}
-\end{small}
-}
-\caption{Compressed 16-bit RVC instruction formats.}
-\label{rvc-formats}
-\end{table}
-
-
-\begin{table}[H]
-{
-\begin{center}
-\begin{tabular}{l|c|c|c|c|c|c|c|c|}
-\cline{2-9}
-RVC Register Number & 000 & 001 & 010 & 011 & 100 & 101 & 110 & 111
-\\ \cline{2-9}
-Integer Register Number & {\tt x8} & {\tt x9} & {\tt x10} & {\tt x11} & {\tt x12} & {\tt x13} & {\tt x14} & {\tt x15} \\ \cline{2-9}
-Integer Register ABI Name & {\tt s0} & {\tt s1} & {\tt a0} & {\tt a1} & {\tt a2} & {\tt a3} & {\tt a4} & {\tt a5} \\ \cline{2-9}
-Floating-Point Register Number & {\tt f8} & {\tt f9} & {\tt f10} & {\tt f11} & {\tt f12} & {\tt f13} & {\tt f14} & {\tt f15} \\ \cline{2-9}
-Floating-Point Register ABI Name & {\tt fs0} & {\tt fs1} & {\tt fa0} & {\tt fa1} & {\tt fa2} & {\tt fa3} & {\tt fa4} & {\tt fa5} \\ \cline{2-9}
-\end{tabular}
-\end{center}
-}
-\caption{Registers specified by the three-bit {\em \rsoneprime}, {\em \rstwoprime}, and {\em \rdprime} fields of the CIW, CL, CS, CA, and CB formats.}
-\label{registers}
-\end{table}
-
-\section{Load and Store Instructions}
-
-To increase the reach of 16-bit instructions, data-transfer
-instructions use zero-extended immediates that are scaled by the size
-of the data in bytes: $\times$4 for words, $\times$8 for double words,
-and $\times$16 for quad words.
-
-RVC provides two variants of loads and stores. One uses the ABI stack
-pointer, {\tt x2}, as the base address and can target any data register. The
-other can reference one of 8 base address registers and one of 8 data
-registers.
-
-\subsection*{Stack-Pointer-Based Loads and Stores}
-
-\begin{center}
-\begin{tabular}{S@{}W@{}T@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbitrange{11}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 1 & 5 & 5 & 2 \\
-C.LWSP & offset[5] & dest$\neq$0 & offset[4:2$\vert$7:6] & C2 \\
-C.LDSP & offset[5] & dest$\neq$0 & offset[4:3$\vert$8:6] & C2 \\
-C.LQSP & offset[5] & dest$\neq$0 & offset[4$\vert$9:6] & C2 \\
-C.FLWSP& offset[5] & dest & offset[4:2$\vert$7:6] & C2 \\
-C.FLDSP& offset[5] & dest & offset[4:3$\vert$8:6] & C2 \\
-\end{tabular}
-\end{center}
-These instructions use the CI format.
-
-C.LWSP loads a 32-bit value from memory into register {\em rd}. It computes
-an effective address by adding the {\em zero}-extended offset, scaled by 4, to
-the stack pointer, {\tt x2}. It expands to {\tt lw rd, offset(x2)}.
-C.LWSP is only valid when $\textit{rd}{\neq}\texttt{x0}$;
-the code points with $\textit{rd}{=}\texttt{x0}$ are reserved.
-
-
-C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into
-register {\em rd}. It computes its effective address by adding the
-zero-extended offset, scaled by 8, to the stack pointer, {\tt x2}.
-It expands to {\tt ld rd, offset(x2)}.
-C.LDSP is only valid when $\textit{rd}{\neq}\texttt{x0}$;
-the code points with $\textit{rd}{=}\texttt{x0}$ are reserved.
-
-C.LQSP is an RV128C-only instruction that loads a 128-bit value from memory
-into register {\em rd}. It computes its effective address by adding the
-zero-extended offset, scaled by 16, to the stack pointer, {\tt x2}.
-It expands to {\tt lq rd, offset(x2)}.
-C.LQSP is only valid when $\textit{rd}{\neq}\texttt{x0}$;
-the code points with $\textit{rd}{=}\texttt{x0}$ are reserved.
-
-C.FLWSP is an RV32FC-only instruction that loads a single-precision
-floating-point value from memory into floating-point register {\em rd}. It
-computes its effective address by adding the {\em zero}-extended offset,
-scaled by 4, to the stack pointer, {\tt x2}. It expands to {\tt flw rd,
-offset(x2)}.
-
-C.FLDSP is an RV32DC/RV64DC-only instruction that loads a double-precision
-floating-point value from memory into floating-point register {\em rd}. It
-computes its effective address by adding the {\em zero}-extended offset,
-scaled by 8, to the stack pointer, {\tt x2}. It expands to {\tt fld rd,
-offset(x2)}.
-
-\begin{center}
-\begin{tabular}{S@{}M@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\instbitrange{12}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 6 & 5 & 2 \\
-C.SWSP & offset[5:2$\vert$7:6] & src & C2 \\
-C.SDSP & offset[5:3$\vert$8:6] & src & C2 \\
-C.SQSP & offset[5:4$\vert$9:6] & src & C2 \\
-C.FSWSP& offset[5:2$\vert$7:6] & src & C2 \\
-C.FSDSP& offset[5:3$\vert$8:6] & src & C2 \\
-\end{tabular}
-\end{center}
-These instructions use the CSS format.
-
-C.SWSP stores a 32-bit value in register {\em rs2} to memory. It computes
-an effective address by adding the {\em zero}-extended offset, scaled by 4, to
-the stack pointer, {\tt x2}.
-It expands to {\tt sw rs2, offset(x2)}.
-
-C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in register
-{\em rs2} to memory. It computes an effective address by adding the {\em
-zero}-extended offset, scaled by 8, to the stack pointer, {\tt x2}.
-It expands to {\tt sd rs2, offset(x2)}.
-
-C.SQSP is an RV128C-only instruction that stores a 128-bit value in register
-{\em rs2} to memory. It computes an effective address by adding the {\em
-zero}-extended offset, scaled by 16, to the stack pointer, {\tt x2}.
-It expands to {\tt sq rs2, offset(x2)}.
-
-C.FSWSP is an RV32FC-only instruction that stores a single-precision
-floating-point value in floating-point register {\em rs2} to memory. It
-computes an effective address by adding the {\em zero}-extended offset, scaled
-by 4, to the stack pointer, {\tt x2}. It expands to {\tt fsw rs2,
-offset(x2)}.
-
-C.FSDSP is an RV32DC/RV64DC-only instruction that stores a double-precision
-floating-point value in floating-point register {\em rs2} to memory. It
-computes an effective address by adding the {\em zero}-extended offset, scaled
-by 8, to the stack pointer, {\tt x2}. It expands to {\tt fsd rs2,
-offset(x2)}.
-
-\begin{commentary}
-Register save/restore code at function entry/exit represents a
-significant portion of static code size. The stack-pointer-based
-compressed loads and stores in RVC are effective at reducing the
-save/restore static code size by a factor of 2 while improving
-performance by reducing dynamic instruction bandwidth.
-
-A common mechanism used in other ISAs to further reduce
-save/restore code size is load-multiple and store-multiple
-instructions. We considered adopting these for RISC-V but noted the
-following drawbacks to these instructions:
-\begin{itemize}
-\item These instructions complicate processor implementations.
-\item For virtual memory systems, some data accesses could be
- resident in physical memory and some could not, which requires a
- new restart mechanism for partially executed instructions.
-\item Unlike the rest of the RVC instructions, there is no IFD
- equivalent to Load Multiple and Store Multiple.
-\item Unlike the rest of the RVC instructions, the compiler would
- have to be aware of these instructions to both generate the
- instructions and to allocate registers in an order to maximize
- the chances of the them being saved and stored, since they would
- be saved and restored in sequential order.
-\item Simple microarchitectural implementations will constrain how
- other instructions can be scheduled around the load and store
- multiple instructions, leading to a potential performance loss.
-\item The desire for sequential register allocation might conflict with
- the featured registers selected for the CIW, CL, CS, CA, and CB formats.
-\end{itemize}
-Furthermore, much of the gains can be realized in software by replacing
-prologue and epilogue code with subroutine calls to common
-prologue and epilogue code, a technique described in
-Section 5.6 of~\cite{waterman-phd}.
-
-While reasonable architects might come to different conclusions, we
-decided to omit load and store multiple and instead use the
-software-only approach of calling save/restore millicode routines to
-attain the greatest code size reduction.
-\end{commentary}
-
-\subsection*{Register-Based Loads and Stores}
-
-\begin{center}
-\begin{tabular}{S@{}S@{}S@{}Y@{}S@{}Y}
-\\
-\instbitrange{15}{13} &
-\instbitrange{12}{10} &
-\instbitrange{9}{7} &
-\instbitrange{6}{5} &
-\instbitrange{4}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{\rsoneprime} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{\rdprime} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 3 & 3 & 2 & 3 & 2 \\
-C.LW & offset[5:3] & base & offset[2$\vert$6] & dest & C0 \\
-C.LD & offset[5:3] & base & offset[7:6] & dest & C0 \\
-C.LQ & offset[5$\vert$4$\vert$8] & base & offset[7:6] & dest & C0 \\
-C.FLW& offset[5:3] & base & offset[2$\vert$6] & dest & C0 \\
-C.FLD& offset[5:3] & base & offset[7:6] & dest & C0 \\
-\end{tabular}
-\end{center}
-These instructions use the CL format.
-
-C.LW loads a 32-bit value from memory into register {\em \rdprime}. It computes
-an effective address by adding the {\em zero}-extended offset, scaled by 4, to
-the base address in register {\em \rsoneprime}.
-It expands to {\tt lw \rdprime, offset(\rsoneprime)}.
-
-C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into
-register {\em \rdprime}. It computes an effective address by adding the {\em
-zero}-extended offset, scaled by 8, to the base address in register {\em
-\rsoneprime}.
-It expands to {\tt ld \rdprime, offset(\rsoneprime)}.
-
-C.LQ is an RV128C-only instruction that loads a 128-bit value from memory into
-register {\em \rdprime}. It computes an effective address by adding the {\em
-zero}-extended offset, scaled by 16, to the base address in register {\em
-\rsoneprime}.
-It expands to {\tt lq \rdprime, offset(\rsoneprime)}.
-
-C.FLW is an RV32FC-only instruction that loads a single-precision
-floating-point value from memory into floating-point register {\em \rdprime}. It
-computes an effective address by adding the {\em zero}-extended offset, scaled
-by 4, to the base address in register {\em \rsoneprime}. It expands to {\tt flw
-\rdprime, offset(\rsoneprime)}.
-
-C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision
-floating-point value from memory into floating-point register {\em \rdprime}. It
-computes an effective address by adding the {\em zero}-extended offset, scaled
-by 8, to the base address in register {\em \rsoneprime}. It expands to {\tt fld
-\rdprime, offset(\rsoneprime)}.
-
-\begin{center}
-\begin{tabular}{S@{}S@{}S@{}Y@{}S@{}Y}
-\\
-\instbitrange{15}{13} &
-\instbitrange{12}{10} &
-\instbitrange{9}{7} &
-\instbitrange{6}{5} &
-\instbitrange{4}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{\rsoneprime} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{\rstwoprime} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 3 & 3 & 2 & 3 & 2 \\
-C.SW & offset[5:3] & base & offset[2$\vert$6] & src & C0 \\
-C.SD & offset[5:3] & base & offset[7:6] & src & C0 \\
-C.SQ & offset[5$\vert$4$\vert$8] & base & offset[7:6] & src & C0 \\
-C.FSW& offset[5:3] & base & offset[2$\vert$6] & src & C0 \\
-C.FSD& offset[5:3] & base & offset[7:6] & src & C0 \\
-\end{tabular}
-\end{center}
-These instructions use the CS format.
-
-C.SW stores a 32-bit value in register {\em \rstwoprime} to memory. It computes an
-effective address by adding the {\em zero}-extended offset, scaled by 4, to
-the base address in register {\em \rsoneprime}.
-It expands to {\tt sw \rstwoprime, offset(\rsoneprime)}.
-
-C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in
-register {\em \rstwoprime} to memory. It computes an effective address by adding
-the {\em zero}-extended offset, scaled by 8, to the base address in register
-{\em \rsoneprime}.
-It expands to {\tt sd \rstwoprime, offset(\rsoneprime)}.
-
-C.SQ is an RV128C-only instruction that stores a 128-bit value in register
-{\em \rstwoprime} to memory. It computes an effective address by adding the {\em
-zero}-extended offset, scaled by 16, to the base address in register {\em
-\rsoneprime}.
-It expands to {\tt sq \rstwoprime, offset(\rsoneprime)}.
-
-C.FSW is an RV32FC-only instruction that stores a single-precision
-floating-point value in floating-point register {\em \rstwoprime} to memory. It
-computes an effective address by adding the {\em zero}-extended offset, scaled
-by 4, to the base address in register {\em \rsoneprime}. It expands to {\tt fsw
-\rstwoprime, offset(\rsoneprime)}.
-
-C.FSD is an RV32DC/RV64DC-only instruction that stores a double-precision
-floating-point value in floating-point register {\em \rstwoprime} to memory. It
-computes an effective address by adding the {\em zero}-extended offset, scaled
-by 8, to the base address in register {\em \rsoneprime}. It expands to {\tt fsd
-\rstwoprime, offset(\rsoneprime)}.
-
-\section{Control Transfer Instructions}
-
-RVC provides unconditional jump instructions and conditional branch
-instructions. As with base RVI instructions, the offsets of all RVC
-control transfer instructions are in multiples of 2 bytes.
-
-\begin{center}
-\begin{tabular}{S@{}L@{}Y}
-\\
-\instbitrange{15}{13} &
-\instbitrange{12}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 11 & 2 \\
-C.J & offset[11$\vert$4$\vert$9:8$\vert$10$\vert$6$\vert$7$\vert$3:1$\vert$5] & C1 \\
-C.JAL & offset[11$\vert$4$\vert$9:8$\vert$10$\vert$6$\vert$7$\vert$3:1$\vert$5] & C1 \\
-\end{tabular}
-\end{center}
-These instructions use the CJ format.
-
-C.J performs an unconditional control transfer. The offset is sign-extended and
-added to the {\tt pc} to form the jump target address. C.J can therefore target
-a $\pm$\wunits{2}{KiB} range. C.J expands to {\tt jal x0, offset}.
-
-C.JAL is an RV32C-only instruction that performs the same operation as C.J,
-but additionally writes the address of the instruction following the jump
-({\tt pc}+2) to the link register, {\tt x1}. C.JAL expands to {\tt jal x1,
-offset}.
-
-\begin{center}
-\begin{tabular}{E@{}T@{}T@{}Y}
-\\
-\instbitrange{15}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct4} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{op} \\
-\hline
-4 & 5 & 5 & 2 \\
-C.JR & src$\neq$0 & 0 & C2 \\
-C.JALR & src$\neq$0 & 0 & C2 \\
-\end{tabular}
-\end{center}
-These instructions use the CR format.
-
-C.JR (jump register) performs an unconditional control transfer to
-the address in register {\em rs1}. C.JR expands to {\tt jalr x0, 0(rs1)}.
-C.JR is only valid when $\textit{rs1}{\neq}\texttt{x0}$; the code point
-with $\textit{rs1}{=}\texttt{x0}$ is reserved.
-
-C.JALR (jump and link register) performs the same operation as C.JR,
-but additionally writes the address of the instruction following the
-jump ({\tt pc}+2) to the link register, {\tt x1}. C.JALR expands to
-{\tt jalr x1, 0(rs1)}.
-C.JALR is only valid when $\textit{rs1}{\neq}\texttt{x0}$; the code point
-with $\textit{rs1}{=}\texttt{x0}$ corresponds to the C.EBREAK instruction.
-
-\begin{commentary}
-Strictly speaking, C.JALR does not expand exactly to a base RVI
-instruction as the value added to the {\tt pc} to form the link address is 2
-rather than 4 as in the base ISA, but supporting both offsets of 2 and
-4 bytes is only a very minor change to the base microarchitecture.
-\end{commentary}
-
-\begin{center}
-\begin{tabular}{S@{}S@{}S@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\instbitrange{12}{10} &
-\instbitrange{9}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{\rsoneprime} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 3 & 3 & 5 & 2 \\
-C.BEQZ & offset[8$\vert$4:3] & src & offset[7:6$\vert$2:1$\vert$5] & C1 \\
-C.BNEZ & offset[8$\vert$4:3] & src & offset[7:6$\vert$2:1$\vert$5] & C1 \\
-\end{tabular}
-\end{center}
-These instructions use the CB format.
-
-C.BEQZ performs conditional control transfers. The offset is sign-extended
-and added to the {\tt pc} to form the branch target address. It can
-therefore target a $\pm$\wunits{256}{B} range. C.BEQZ takes the branch if the
-value in register {\em \rsoneprime} is zero. It expands to {\tt beq \rsoneprime, x0,
-offset}.
-
-C.BNEZ is defined analogously, but it takes the branch if {\em \rsoneprime} contains
-a nonzero value. It expands to {\tt bne \rsoneprime, x0, offset}.
-
-\section{Integer Computational Instructions}
-
-RVC provides several instructions for integer arithmetic and constant generation.
-
-\subsection*{Integer Constant-Generation Instructions}
-
-The two constant-generation instructions both use the CI instruction
-format and can target any integer register.
-
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{S@{}W@{}T@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbitrange{11}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm[5]} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{imm[4:0]} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 1 & 5 & 5 & 2 \\
-C.LI & imm[5] & dest$\neq$0 & imm[4:0] & C1 \\
-C.LUI & nzimm[17] & $\textrm{dest}{\neq}{\left\{0,2\right\}}$ & nzimm[16:12] & C1 \\
-\end{tabular}
-\end{center}
-C.LI loads the sign-extended 6-bit immediate, {\em imm}, into
-register {\em rd}.
-C.LI expands into {\tt addi rd, x0, imm}.
-C.LI is only valid when {\em rd}$\neq${\tt x0};
-the code points with {\em rd}={\tt x0} encode HINTs.
-
-C.LUI loads the non-zero 6-bit immediate field into bits 17--12 of the
-destination register, clears the bottom 12 bits, and sign-extends bit
-17 into all higher bits of the destination.
-C.LUI expands into {\tt lui rd, nzimm}.
-C.LUI is only valid when
-$\textit{rd}{\neq}{\left\{\texttt{x0},\texttt{x2}\right\}}$,
-and when the immediate is not equal to zero.
-The code points with {\em nzimm}=0 are reserved; the remaining code points
-with {\em rd}={\tt x0} are HINTs; and the remaining code points with
-{\em rd}={\tt x2} correspond to the C.ADDI16SP instruction.
-
-\subsection*{Integer Register-Immediate Operations}
-
-These integer register-immediate operations are encoded in the CI
-format and perform operations on an integer register and
-a 6-bit immediate.
-
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{S@{}W@{}T@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbitrange{11}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm[5]} &
-\multicolumn{1}{c|}{rd/rs1} &
-\multicolumn{1}{c|}{imm[4:0]} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 1 & 5 & 5 & 2 \\
-C.ADDI & nzimm[5] & dest$\neq$0 & nzimm[4:0] & C1 \\
-C.ADDIW & imm[5] & dest$\neq$0 & imm[4:0] & C1 \\
-C.ADDI16SP & nzimm[9] & 2 & nzimm[4$\vert$6$\vert$8:7$\vert$5] & C1 \\
-\end{tabular}
-\end{center}
-
-C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in
-register {\em rd} then writes the result to {\em rd}. C.ADDI expands
-into {\tt addi rd, rd, nzimm}.
-C.ADDI is only valid when {\em rd}$\neq${\tt x0} and {\em nzimm}$\neq$0.
-The code points with {\em rd}={\tt x0} encode the C.NOP instruction;
-the remaining code points with {\em nzimm}=0 encode HINTs.
-
-C.ADDIW is an RV64C/RV128C-only instruction that performs the same
-computation but produces a 32-bit result, then sign-extends result to
-64 bits. C.ADDIW expands into {\tt addiw rd, rd, imm}. The
-immediate can be zero for C.ADDIW, where this corresponds to {\tt
-sext.w rd}. C.ADDIW is only valid when {\em rd}$\neq${\tt x0};
-the code points with {\em rd}={\tt x0} are reserved.
-
-C.ADDI16SP shares the opcode with C.LUI, but has a destination field
-of {\tt x2}. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to
-the value in the stack pointer ({\tt sp}={\tt x2}), where the
-immediate is scaled to represent multiples of 16 in the range
-(-512,496). C.ADDI16SP is used to adjust the stack pointer in procedure
-prologues and epilogues. It expands into {\tt addi x2, x2, nzimm}.
-C.ADDI16SP is only valid when {\em nzimm}$\neq$0;
-the code point with {\em nzimm}=0 is reserved.
-
-\begin{commentary}
-In the standard RISC-V calling convention, the stack pointer {\tt sp}
-is always 16-byte aligned.
-\end{commentary}
-
-\begin{center}
-\begin{tabular}{@{}S@{}K@{}S@{}Y}
-\\
-\instbitrange{15}{13} &
-\instbitrange{12}{5} &
-\instbitrange{4}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm} &
-\multicolumn{1}{c|}{\rdprime} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 8 & 3 & 2 \\
-C.ADDI4SPN & nzuimm[5:4$\vert$9:6$\vert$2$\vert$3] & dest & C0 \\
-\end{tabular}
-\end{center}
-
-C.ADDI4SPN is a CIW-format instruction that adds a {\em zero}-extended
-non-zero immediate, scaled by 4, to the stack pointer, {\tt x2}, and
-writes the result to {\tt \rdprime}. This instruction is used
-to generate pointers to stack-allocated variables, and expands to
-{\tt addi \rdprime, x2, nzuimm}.
-C.ADDI4SPN is only valid when {\em nzuimm}$\neq$0;
-the code points with {\em nzuimm}=0 are reserved.
-
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{S@{}W@{}T@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbitrange{11}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{shamt[5]} &
-\multicolumn{1}{c|}{rd/rs1} &
-\multicolumn{1}{c|}{shamt[4:0]} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 1 & 5 & 5 & 2 \\
-C.SLLI & shamt[5] & dest$\neq$0 & shamt[4:0] & C2 \\
-\end{tabular}
-\end{center}
-
-C.SLLI is a CI-format instruction that performs a logical left shift
-of the value in register {\em rd} then writes the result to {\em rd}.
-The shift amount is encoded in the {\em shamt} field.
-For RV128C, a shift amount of zero is used to encode a shift of 64.
-C.SLLI expands into {\tt slli rd, rd, shamt}, except for
-RV128C with {\tt shamt=0}, which expands to {\tt slli rd, rd, 64}.
-
-For RV32C, {\em shamt[5]} must be zero; the code points with {\em shamt[5]}=1
-are designated for custom extensions. For RV32C and RV64C, the shift
-amount must be non-zero; the code points with {\em shamt}=0 are HINTs. For
-all base ISAs, the code points with {\em rd}={\tt x0} are HINTs, except those
-with {\em shamt[5]}=1 in RV32C.
-
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{S@{}W@{}Y@{}S@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbitrange{11}{10} &
-\instbitrange{9}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{shamt[5]} &
-\multicolumn{1}{|c|}{funct2} &
-\multicolumn{1}{c|}{\rdprime/\rsoneprime} &
-\multicolumn{1}{c|}{shamt[4:0]} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 1 & 2 & 3 & 5 & 2 \\
-C.SRLI & shamt[5] & C.SRLI & dest & shamt[4:0] & C1 \\
-C.SRAI & shamt[5] & C.SRAI & dest & shamt[4:0] & C1 \\
-\end{tabular}
-\end{center}
-
-C.SRLI is a CB-format instruction that performs a logical right shift
-of the value in register {\em \rdprime} then writes the result to {\em \rdprime}.
-The shift amount is encoded in the {\em shamt} field.
-For RV128C, a shift amount of zero is used to encode a shift of 64.
-Furthermore, the shift amount is sign-extended
-for RV128C, and so the legal shift amounts are 1--31, 64, and 96--127.
-C.SRLI expands into {\tt srli \rdprime, \rdprime, shamt},
-except for RV128C with {\tt shamt=0}, which expands to
-{\tt srli \rdprime, \rdprime, 64}.
-
-For RV32C, {\em shamt[5]} must be zero; the code points with {\em shamt[5]}=1
-are designated for custom extensions. For RV32C and RV64C, the shift
-amount must be non-zero; the code points with {\em shamt}=0 are HINTs.
-
-C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic
-right shift.
-C.SRAI expands to {\tt srai \rdprime, \rdprime, shamt}.
-
-\begin{commentary}
-Left shifts are usually more frequent than right shifts, as left
-shifts are frequently used to scale address values. Right shifts have
-therefore been granted less encoding space and are placed in an
-encoding quadrant where all other immediates are sign-extended. For
-RV128, the decision was made to have the 6-bit shift-amount immediate
-also be sign-extended. Apart from reducing the decode complexity, we
-believe right-shift amounts of 96--127 will be more useful than 64--95,
-to allow extraction of tags located in the high portions of 128-bit
-address pointers. We note that RV128C will not be frozen at the same
-point as RV32C and RV64C, to allow evaluation of typical usage of
-128-bit address-space codes.
-\end{commentary}
-
-\begin{center}
-\begin{tabular}{S@{}W@{}Y@{}S@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbitrange{11}{10} &
-\instbitrange{9}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm[5]} &
-\multicolumn{1}{|c|}{funct2} &
-\multicolumn{1}{c|}{\rdprime/\rsoneprime} &
-\multicolumn{1}{c|}{imm[4:0]} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 1 & 2 & 3 & 5 & 2 \\
-C.ANDI & imm[5] & C.ANDI & dest & imm[4:0] & C1 \\
-\end{tabular}
-\end{center}
-
-C.ANDI is a CB-format instruction that computes the bitwise AND of
-the value in register {\em \rdprime} and the sign-extended 6-bit immediate,
-then writes the result to {\em \rdprime}.
-C.ANDI expands to {\tt andi \rdprime, \rdprime, imm}.
-
-\subsection*{Integer Register-Register Operations}
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{E@{}T@{}T@{}Y}
-\\
-\instbitrange{15}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct4} &
-\multicolumn{1}{c|}{rd/rs1} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{op} \\
-\hline
-4 & 5 & 5 & 2 \\
-C.MV & dest$\neq$0 & src$\neq$0 & C2 \\
-C.ADD & dest$\neq$0 & src$\neq$0 & C2 \\
-\end{tabular}
-\end{center}
-These instructions use the CR format.
-
-C.MV copies the value in register {\em rs2} into register {\em rd}. C.MV
-expands into {\tt add rd, x0, rs2}.
-C.MV is only valid when $\textit{rs2}{\neq}\texttt{x0}$; the code points
-with $\textit{rs2}{=}\texttt{x0}$ correspond to the C.JR instruction.
-The code points with $\textit{rs2}{\neq}\texttt{x0}$ and
-$\textit{rd}{=}\texttt{x0}$ are HINTs.
-
-\begin{commentary}
-C.MV expands to a different instruction than the canonical MV
-pseudoinstruction, which instead uses ADDI. Implementations that handle MV
-specially, e.g. using register-renaming hardware, may find it more convenient
-to expand C.MV to MV instead of ADD, at slight additional hardware cost.
-\end{commentary}
-
-C.ADD adds the values in registers {\em rd} and {\em rs2} and writes the
-result to register {\em rd}. C.ADD expands into {\tt add rd, rd, rs2}.
-C.ADD is only valid when $\textit{rs2}{\neq}\texttt{x0}$; the code points
-with $\textit{rs2}{=}\texttt{x0}$ correspond to the C.JALR and C.EBREAK instructions.
-The code points with $\textit{rs2}{\neq}\texttt{x0}$ and
-$\textit{rd}{=}\texttt{x0}$ are HINTs.
-
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{M@{}S@{}Y@{}S@{}Y}
-\\
-\instbitrange{15}{10} &
-\instbitrange{9}{7} &
-\instbitrange{6}{5} &
-\instbitrange{4}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct6} &
-\multicolumn{1}{c|}{\rdprime/\rsoneprime} &
-\multicolumn{1}{c|}{funct2} &
-\multicolumn{1}{c|}{\rstwoprime} &
-\multicolumn{1}{c|}{op} \\
-\hline
-6 & 3 & 2 & 3 & 2 \\
-C.AND & dest & C.AND & src & C1 \\
-C.OR & dest & C.OR & src & C1 \\
-C.XOR & dest & C.XOR & src & C1 \\
-C.SUB & dest & C.SUB & src & C1 \\
-C.ADDW & dest & C.ADDW & src & C1 \\
-C.SUBW & dest & C.SUBW & src & C1 \\
-\end{tabular}
-\end{center}
-
-These instructions use the CA format.
-
-C.AND computes the bitwise AND of the values in registers {\em \rdprime}
-and {\em \rstwoprime}, then writes the result to register {\em \rdprime}.
-C.AND expands into {\tt and \rdprime, \rdprime, \rstwoprime}.
-
-C.OR computes the bitwise OR of the values in registers {\em \rdprime}
-and {\em \rstwoprime}, then writes the result to register {\em \rdprime}.
-C.OR expands into {\tt or \rdprime, \rdprime, \rstwoprime}.
-
-C.XOR computes the bitwise XOR of the values in registers {\em \rdprime}
-and {\em \rstwoprime}, then writes the result to register {\em \rdprime}.
-C.XOR expands into {\tt xor \rdprime, \rdprime, \rstwoprime}.
-
-C.SUB subtracts the value in register {\em \rstwoprime} from the value in
-register {\em \rdprime}, then writes the result to register {\em \rdprime}.
-C.SUB expands into {\tt sub \rdprime, \rdprime, \rstwoprime}.
-
-C.ADDW is an RV64C/RV128C-only instruction that adds the values in
-registers {\em \rdprime} and {\em \rstwoprime}, then sign-extends the lower
-32 bits of the sum before writing the result to register {\em \rdprime}.
-C.ADDW expands into {\tt addw \rdprime, \rdprime, \rstwoprime}.
-
-C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in
-register {\em \rstwoprime} from the value in register {\em \rdprime}, then
-sign-extends the lower 32 bits of the difference before writing the result
-to register {\em \rdprime}. C.SUBW expands into {\tt subw \rdprime, \rdprime, \rstwoprime}.
-
-\begin{commentary}
-This group of six instructions do not provide large savings
-individually, but do not occupy much encoding space and are
-straightforward to implement, and as a group provide a worthwhile
-improvement in static and dynamic compression.
-\end{commentary}
-
-\subsection*{Defined Illegal Instruction}
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{SW@{}T@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbitrange{11}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{0} &
-\multicolumn{1}{c|}{0} &
-\multicolumn{1}{c|}{0} &
-\multicolumn{1}{c|}{0} &
-\multicolumn{1}{c|}{0} \\
-\hline
-3 & 1 & 5 & 5 & 2 \\
-0 & 0 & 0 & 0 & 0 \\
-\end{tabular}
-\end{center}
-
-A 16-bit instruction with all bits zero is permanently reserved as an
-illegal instruction.
-\begin{commentary}
-We reserve all-zero instructions to be illegal instructions to help
-trap attempts to execute zero-ed or non-existent portions of the
-memory space. The all-zero value should not be redefined in any
-non-standard extension. Similarly, we reserve instructions with all
-bits set to 1 (corresponding to very long instructions in the RISC-V
-variable-length encoding scheme) as illegal to capture another common
-value seen in non-existent memory regions.
-\end{commentary}
-
-\subsection*{NOP Instruction}
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{SW@{}T@{}T@{}Y}
-\\
-\instbitrange{15}{13} &
-\multicolumn{1}{c}{\instbit{12}} &
-\instbitrange{11}{7} &
-\instbitrange{6}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct3} &
-\multicolumn{1}{c|}{imm[5]} &
-\multicolumn{1}{c|}{rd/rs1} &
-\multicolumn{1}{c|}{imm[4:0]} &
-\multicolumn{1}{c|}{op} \\
-\hline
-3 & 1 & 5 & 5 & 2 \\
-C.NOP & 0 & 0 & 0 & C1 \\
-\end{tabular}
-\end{center}
-
-C.NOP is a CI-format instruction that does not change any user-visible state,
-except for advancing the {\tt pc} and incrementing any applicable performance
-counters. C.NOP expands to {\tt nop}. C.NOP is only valid when {\em imm}=0;
-the code points with {\em imm}$\neq$0 encode HINTs.
-
-\subsection*{Breakpoint Instruction}
-\vspace{-0.4in}
-\begin{center}
-\begin{tabular}{E@{}U@{}Y}
-\\
-\instbitrange{15}{12} &
-\instbitrange{11}{2} &
-\instbitrange{1}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct4} &
-\multicolumn{1}{c|}{0} &
-\multicolumn{1}{c|}{op} \\
-\hline
-4 & 10 & 2 \\
-C.EBREAK & 0 & C2 \\
-\end{tabular}
-\end{center}
-
-Debuggers can use the C.EBREAK instruction, which expands to {\tt ebreak},
-to cause control to be transferred back to the debugging environment.
-C.EBREAK shares the opcode with the C.ADD instruction, but with {\em
- rd} and {\em rs2} both zero, thus can also use the CR format.
-
-\section{Usage of C Instructions in LR/SC Sequences}
-
-On implementations that support the C extension, compressed forms of the
-I instructions permitted inside constrained LR/SC sequences, as described in
-Section~\ref{sec:lrscseq}, are also permitted inside constrained LR/SC
-sequences.
-
-\begin{commentary}
-The implication is that any implementation that claims to support both
-the A and C extensions must ensure that LR/SC sequences containing
-valid C instructions will eventually complete.
-\end{commentary}
-
-\section{HINT Instructions}
-\label{sec:rvc-hints}
-
-A portion of the RVC encoding space is reserved for microarchitectural HINTs.
-Like the HINTs in the RV32I base ISA (see Section~\ref{sec:rv32i-hints}),
-these instructions do not modify any architectural state, except for advancing
-the {\tt pc} and any applicable performance counters. HINTs are
-executed as no-ops on implementations that ignore them.
-
-RVC HINTs are encoded as computational instructions that do not modify the
-architectural state, either because {\em rd}={\tt x0}
-(e.g. \mbox{C.ADD {\em x0}, {\em t0}}), or because {\em rd} is overwritten
-with a copy of itself (e.g. \mbox{C.ADDI {\em t0}, 0}).
-
-\begin{commentary}
-This HINT encoding has been chosen so that simple implementations can ignore
-HINTs altogether, and instead execute a HINT as a regular computational
-instruction that happens not to mutate the architectural state.
-\end{commentary}
-
-RVC HINTs do not necessarily expand to their RVI HINT counterparts. For
-example, \mbox{C.ADD {\em x0}, {\em a0}} might not encode the same HINT
-as \mbox{ADD {\em x0}, {\em x0}, {\em a0}}.
-
-\begin{commentary}
-The primary reason to not require an RVC HINT to expand to an RVI HINT
-is that HINTs are unlikely to be compressible in the same manner as
-the underlying computational instruction. Also, decoupling the RVC
-and RVI HINT mappings allows the scarce RVC HINT space to be allocated
-to the most popular HINTs, and in particular, to HINTs that are
-amenable to macro-op fusion.
-\end{commentary}
-
-Table~\ref{tab:rvc-hints} lists all RVC HINT code points. For RV32C, 78\% of
-the HINT space is reserved for standard HINTs.
-The remainder of the HINT space is designated for custom HINTs: no standard
-HINTs will ever be defined in this subspace.
-
-\begin{table}[hbt]
-\centering
-\begin{tabular}{|l|l|r|l|}
- \hline
- Instruction & Constraints & Code Points & Purpose \\ \hline \hline
- C.NOP & {\em nzimm}$\neq$0 & 63 & \multirow{6}{*}{\em Reserved for future standard use} \\ \cline{1-3}
- C.ADDI & {\em rd}$\neq${\tt x0}, {\em nzimm}=0 & 31 & \\ \cline{1-3}
- C.LI & {\em rd}={\tt x0} & 64 & \\ \cline{1-3}
- C.LUI & {\em rd}={\tt x0}, {\em nzimm}$\neq$0 & 63 & \\ \cline{1-3}
- C.MV & {\em rd}={\tt x0}, {\em rs2}$\neq${\tt x0} & 31 & \\ \cline{1-3}
- C.ADD & {\em rd}={\tt x0}, {\em rs2}$\neq${\tt x0}, {\em rs2}$\neq${\tt x2}--{\tt x5} & 27 & \\ \hline
- \multirow{4}{*}{C.ADD} & \multirow{4}{*}{{\em rd}={\tt x0}, {\em rs2}={\tt x2}--{\tt x5}}
- & \multirow{4}{*}{$4$}
- & ({\em rs2}={\tt x2}) C.NTL.P1 \\
- & & & ({\em rs2}={\tt x3}) C.NTL.PALL \\
- & & & ({\em rs2}={\tt x4}) C.NTL.S1 \\
- & & & ({\em rs2}={\tt x5}) C.NTL.ALL \\ \hline
- \multirow{2}{*}{C.SLLI} & \multirow{2}{*}{{\em rd}={\tt x0}, {\em nzimm}$\neq$0} & 31 (RV32) & \multirow{6}{*}{\em Designated for custom use} \\
- & & 63 (RV64/128) & \\ \cline{1-3}
- C.SLLI64 & {\em rd}={\tt x0} & 1 & \\ \cline{1-3}
- C.SLLI64 & {\em rd}$\neq${\tt x0}, RV32 and RV64 only & 31 & \\ \cline{1-3}
- C.SRLI64 & RV32 and RV64 only & 8 & \\ \cline{1-3}
- C.SRAI64 & RV32 and RV64 only & 8 & \\ \hline
-\end{tabular}
-\caption{RVC HINT instructions.}
-\label{tab:rvc-hints}
-\end{table}
-
-\clearpage
-
-\section{RVC Instruction Set Listings}
-
-Table~\ref{rvcopcodemap} shows a map of the major opcodes for RVC.
-Each row of the table corresponds to one quadrant of the encoding
-space. The last quadrant, which has the two
-least-significant bits set, corresponds to instructions wider
-than 16 bits, including those in the base ISAs. Several instructions
-are only valid for certain operands; when invalid, they are marked
-either {\em RES} to indicate that the opcode is reserved for future
-standard extensions; {\em Custom} to indicate that the opcode is designated
-for custom extensions; or {\em HINT} to indicate that the opcode
-is reserved for microarchitectural hints (see Section~\ref{sec:rvc-hints}).
-
-\input{rvc-opcode-map}
-
-Tables~\ref{rvc-instr-table0}--\ref{rvc-instr-table2} list the RVC instructions.
-\input{rvc-instr-table}
diff --git a/src/counters.tex b/src/counters.tex
deleted file mode 100644
index 545804b..0000000
--- a/src/counters.tex
+++ /dev/null
@@ -1,252 +0,0 @@
-\chapter{``Zicntr'' and ``Zihpm'' Counters}
-\label{counters}
-
-RISC-V ISAs provide a set of up to thirty-two 64-bit performance counters and
-timers that are accessible via unprivileged XLEN-bit read-only CSR
-registers {\tt 0xC00}--{\tt 0xC1F} (when XLEN=32, the upper 32 bits
-are accessed via CSR registers {\tt 0xC80}--{\tt 0xC9F}).
-These counters are divided between the ``Zicntr'' and ``Zihpm'' extensions.
-
-\section{``Zicntr'' Standard Extension for Base Counters and Timers}
-
-The Zicntr standard extension comprises the first three of these
-counters (CYCLE, TIME, and INSTRET), which
-have dedicated functions (cycle
-count, real-time clock, and instructions retired, respectively).
-The Zicntr extension depends on the Zicsr extension.
-
-\begin{commentary}
-We recommend provision of these basic counters in implementations as
-they are essential for basic performance analysis, adaptive and
-dynamic optimization, and to allow an application to work with
-real-time streams. Additional counters in the separate Zihpm extension can
-help diagnose performance problems and these should be made accessible
-from user-level application code with low overhead.
-
-Some execution environments might prohibit access to counters, for
-example, to impede timing side-channel attacks.
-\end{commentary}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{M@{}R@{}F@{}R@{}S}
-\\
-\instbitrange{31}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{csr} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-12 & 5 & 3 & 5 & 7 \\
-RDCYCLE[H] & 0 & CSRRS & dest & SYSTEM \\
-RDTIME[H] & 0 & CSRRS & dest & SYSTEM \\
-RDINSTRET[H] & 0 & CSRRS & dest & SYSTEM \\
-\end{tabular}
-\end{center}
-
-For base ISAs with XLEN$\geq$64, CSR instructions can access the full
-64-bit CSRs directly. In particular, the RDCYCLE, RDTIME, and
-RDINSTRET pseudoinstructions read the full 64 bits of the {\tt cycle},
-{\tt time}, and {\tt instret} counters.
-
-\begin{commentary}
-The counter pseudoinstructions are mapped to the read-only {\tt csrrs
- rd, counter, x0} canonical form, but the other read-only CSR
-instruction forms (based on CSRRC/CSRRSI/CSRRCI) are also legal ways
-to read these CSRs.
-\end{commentary}
-
-For base ISAs with XLEN=32, the Zicntr extension enables the three
-64-bit read-only counters to be accessed in 32-bit pieces.
-The RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions provide the lower 32
-bits, and the RDCYCLEH, RDTIMEH, and RDINSTRETH pseudoinstructions provide
-the upper 32 bits of the respective counters.
-
-\begin{commentary}
-We required the counters be 64 bits wide, even when XLEN=32, as otherwise
-it is very difficult for software to determine if values have
-overflowed. For a low-end implementation, the upper 32 bits of each
-counter can be implemented using software counters incremented by a
-trap handler triggered by overflow of the lower 32 bits. The sample
-code given below shows how the full 64-bit width value can be
-safely read using the individual 32-bit width pseudoinstructions.
-\end{commentary}
-
-The RDCYCLE pseudoinstruction reads the low XLEN bits of the {\tt
- cycle} CSR which holds a count of the number of clock cycles
-executed by the processor core on which the hart is running from an
-arbitrary start time in the past. RDCYCLEH is only present when
-XLEN=32 and reads bits 63--32 of the same cycle
-counter. The underlying 64-bit counter should never overflow in
-practice. The rate at which the cycle counter advances will depend on
-the implementation and operating environment. The execution
-environment should provide a means to determine the current rate
-(cycles/second) at which the cycle counter is incrementing.
-
-\begin{commentary}
-RDCYCLE is intended to return the number of cycles executed by the
-processor core, not the hart. Precisely defining what is a ``core'' is
-difficult given some implementation choices (e.g., AMD Bulldozer).
-Precisely defining what is a ``clock cycle'' is also difficult given the
-range of implementations (including software emulations), but the
-intent is that RDCYCLE is used for performance monitoring along with the
-other performance counters. In particular, where there is one
-hart/core, one would expect cycle-count/instructions-retired to
-measure CPI for a hart.
-
-Cores don't have to be exposed to software at all, and an implementor
-might choose to pretend multiple harts on one physical core are
-running on separate cores with one hart/core, and provide separate
-cycle counters for each hart. This might make sense in a simple
-barrel processor (e.g., CDC 6600 peripheral processors) where
-inter-hart timing interactions are non-existent or minimal.
-
-Where there is more than one hart/core and dynamic multithreading, it
-is not generally possible to separate out cycles per hart (especially
-with SMT). It might be possible to define a separate performance
-counter that tried to capture the number of cycles a particular hart
-was running, but this definition would have to be very fuzzy to cover
-all the possible threading implementations. For example, should we
-only count cycles for which any instruction was issued to execution
-for this hart, and/or cycles any instruction retired, or include
-cycles this hart was occupying machine resources but couldn't execute
-due to stalls while other harts went into execution? Likely, ``all of
-the above'' would be needed to have understandable performance stats.
-This complexity of defining a per-hart cycle count, and also the need
-in any case for a total per-core cycle count when tuning multithreaded
-code led to just standardizing the per-core cycle counter, which also
-happens to work well for the common single hart/core case.
-
-Standardizing what happens during ``sleep'' is not practical given
-that what ``sleep'' means is not standardized across execution
-environments, but if the entire core is paused (entirely clock-gated
-or powered-down in deep sleep), then it is not executing clock cycles,
-and the cycle count shouldn't be increasing per the spec. There are
-many details, e.g., whether clock cycles required to reset a processor
-after waking up from a power-down event should be counted, and these
-are considered execution-environment-specific details.
-
-Even though there is no precise definition that works for all
-platforms, this is still a useful facility for most platforms, and an
-imprecise, common, ``usually correct'' standard here is better than no
-standard. The intent of RDCYCLE was primarily performance
-monitoring/tuning, and the specification was written with that goal in
-mind.
-\end{commentary}
-
-The RDTIME pseudoinstruction reads the low XLEN bits of the {\tt
- time} CSR, which counts wall-clock real time that has passed from an
-arbitrary start time in the past.
-RDTIMEH is only present when XLEN=32 and reads bits 63--32 of the same
-real-time counter.
-The underlying 64-bit counter increments by one with each tick of the
-real-time clock, and, for realistic real-time clock frequencies, should never
-overflow in practice.
-The execution environment should provide a means of determining the period of
-a counter tick (seconds/tick).
-The period should be constant within a small error bound.
-The environment should provide a means to determine the accuracy of the clock
-(i.e., the maximum relative error between the nominal and actual real-time
-clock periods).
-
-\begin{commentary}
-On some simple platforms, cycle count might represent a valid
-implementation of RDTIME, in which case RDTIME and RDCYCLE may
-return the same result.
-
-It is difficult to provide a strict mandate on clock period given the
-wide variety of possible implementation platforms. The maximum error
-bound should be set based on the requirements of the platform.
-\end{commentary}
-
-The real-time clocks of all harts
-must be synchronized to within one tick of the real-time clock.
-
-\begin{commentary}
-As with other architectural mandates, it suffices to appear ``as if''
-harts are synchronized to within one tick of the real-time clock,
-i.e., software is unable to observe that there is a greater delta
-between the real-time clock values observed on two harts.
-\end{commentary}
-
-The RDINSTRET pseudoinstruction reads the low XLEN bits of the {\tt
- instret} CSR, which counts the number of instructions retired by
-this hart from some arbitrary start point in the past. RDINSTRETH is
-only present when XLEN=32 and reads bits 63--32 of the same
-instruction counter. The underlying 64-bit counter should never
-overflow in practice.
-
-\begin{commentary}
-Instructions that cause synchronous exceptions, including ECALL and EBREAK,
-are not considered to retire and hence do not increment the {\tt instret} CSR.
-\end{commentary}
-
-The following code sequence will read a valid 64-bit cycle counter value into
-{\tt x3}:{\tt x2}, even if the counter overflows its lower half between reading its upper
-and lower halves.
-
-\begin{figure}[h!]
-\begin{center}
-\begin{verbatim}
- again:
- rdcycleh x3
- rdcycle x2
- rdcycleh x4
- bne x3, x4, again
-\end{verbatim}
-\end{center}
-\caption{Sample code for reading the 64-bit cycle counter when XLEN=32.}
-\label{rdcycle}
-\end{figure}
-
-\section{``Zihpm'' Standard Extension for Hardware Performance Counters}
-
-The Zihpm extension comprises up to 29 additional unprivileged 64-bit
-hardware performance counters, {\tt hpmcounter3}--{\tt hpmcounter31}.
-When XLEN=32, the upper 32 bits of these performance counters are
-accessible via additional CSRs {\tt hpmcounter3h}--{\tt
- hpmcounter31h}. The Zihpm extension depends on the Zicsr extension.
-
-\begin{commentary}
-In some applications, it is important to be able to read multiple
-counters at the same instant in time. When run under a multitasking
-environment, a user thread can suffer a context switch while
-attempting to read the counters. One solution is for the user thread
-to read the real-time counter before and after reading the other
-counters to determine if a context switch occurred in the middle of the
-sequence, in which case the reads can be retried. We considered
-adding output latches to allow a user thread to snapshot the counter
-values atomically, but this would increase the size of the user
-context, especially for implementations with a richer set of counters.
-\end{commentary}
-
-The implemented number and width of these additional counters, and the
-set of events they count, is platform-specific. Accessing an
-unimplemented or ill-configured counter may cause an illegal
-instruction exception or may return a constant value.
-
-The execution environment should provide a means to determine the
-number and width of the implemented counters, and an interface to
-configure the events to be counted by each counter.
-
-\begin{commentary}
- For execution environments implemented on RISC-V privileged
- platforms, the privileged architecture manual describes privileged
- CSRs controlling access by lower privileged modes to these counters,
- and to set the events to be counted.
-
- Alternative execution environments (e.g., user-level-only software
- performance models) may provide alternative mechanisms to configure
- the events counted by the performance counters.
-
- It would be useful to eventually standardize event settings to count
- ISA-level metrics, such as the number of floating-point instructions
- executed for example, and possibly a few common microarchitectural
- metrics, such as ``L1 instruction cache misses''.
-\end{commentary}
diff --git a/src/csr.tex b/src/csr.tex
deleted file mode 100644
index b642567..0000000
--- a/src/csr.tex
+++ /dev/null
@@ -1,260 +0,0 @@
-\chapter{``Zicsr'', Control and Status Register (CSR) Instructions, Version 2.0}
-\label{csrinsts}
-
-RISC-V defines a separate address space of 4096 Control and Status
-registers associated with each hart. This chapter defines the full
-set of CSR instructions that operate on these CSRs.
-
-\begin{commentary}
- While CSRs are primarily used by the privileged architecture, there
- are several uses in unprivileged code including for counters and
- timers, and for floating-point status.
-
- The counters and timers are no longer considered mandatory parts of
- the standard base ISAs, and so the CSR instructions required to
- access them have been moved out of Chapter~\ref{rv32} into this
- separate chapter.
-\end{commentary}
-
-\section{CSR Instructions}
-
-All CSR instructions atomically read-modify-write a single CSR, whose
-CSR specifier is encoded in the 12-bit {\em csr} field of the
-instruction held in bits 31--20. The immediate forms use a 5-bit
-zero-extended immediate encoded in the {\em rs1} field.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{M@{}R@{}F@{}R@{}S}
-\\
-\instbitrange{31}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{csr} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-12 & 5 & 3 & 5 & 7 \\
-source/dest & source & CSRRW & dest & SYSTEM \\
-source/dest & source & CSRRS & dest & SYSTEM \\
-source/dest & source & CSRRC & dest & SYSTEM \\
-source/dest & uimm[4:0] & CSRRWI & dest & SYSTEM \\
-source/dest & uimm[4:0] & CSRRSI & dest & SYSTEM \\
-source/dest & uimm[4:0] & CSRRCI & dest & SYSTEM \\
-\end{tabular}
-\end{center}
-
-The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values
-in the CSRs and integer registers. CSRRW reads the old value of the
-CSR, zero-extends the value to XLEN bits, then writes it to integer
-register {\em rd}. The initial value in {\em rs1} is written to the
-CSR. If {\em rd}={\tt x0}, then the instruction shall not read the CSR
-and shall not cause any of the side effects that might occur on a CSR
-read.
-
-The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the
-value of the CSR, zero-extends the value to XLEN bits, and writes it
-to integer register {\em rd}. The initial value in integer register
-{\em rs1} is treated as a bit mask that specifies bit positions to be
-set in the CSR. Any bit that is high in {\em rs1} will cause the
-corresponding bit to be set in the CSR, if that CSR bit is writable.
-Other bits in the CSR are not explicitly written.
-
-The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the
-value of the CSR, zero-extends the value to XLEN bits, and writes it
-to integer register {\em rd}. The initial value in integer register
-{\em rs1} is treated as a bit mask that specifies bit positions to be
-cleared in the CSR. Any bit that is high in {\em rs1} will cause the
-corresponding bit to be cleared in the CSR, if that CSR bit is writable.
-Other bits in the CSR are not explicitly written.
-
-For both CSRRS and CSRRC, if {\em rs1}={\tt x0}, then the instruction
-will not write to the CSR at all, and so shall not cause any of the
-side effects that might otherwise occur on a CSR write, nor
-raise illegal instruction exceptions on accesses to read-only CSRs.
-Both CSRRS and CSRRC always read the addressed CSR and cause any read
-side effects regardless of {\em rs1} and {\em rd} fields. Note that
-if {\em rs1} specifies a register holding a zero value other than {\tt
- x0}, the instruction will still attempt to write the unmodified
-value back to the CSR and will cause any attendant side effects. A
-CSRRW with {\em rs1}={\tt x0} will attempt to write zero to the
-destination CSR.
-
-The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS,
-and CSRRC respectively, except they update the CSR using an XLEN-bit
-value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field
-encoded in the {\em rs1} field instead of a value from an integer
-register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then
-these instructions will not write to the CSR, and shall not cause any
-of the side effects that might otherwise occur on a CSR write, nor raise
-illegal instruction exceptions on accesses to read-only CSRs.
-For CSRRWI, if {\em rd}={\tt x0}, then the instruction shall not read the
-CSR and shall not cause any of the side effects that might occur on a
-CSR read. Both CSRRSI and CSRRCI will always read the CSR and cause
-any read side effects regardless of {\em rd} and {\em rs1} fields.
-
-\begin{table}
- \centering
- \begin{tabular}{|l|c|c|c|c|}
- \hline
- \multicolumn{5}{|c|}{Register operand} \\
- \hline
- Instruction & \textit{rd} is \texttt{x0}
- & \textit{rs1} is \texttt{x0}
- & Reads CSR & Writes CSR \\
- \hline
- CSRRW & Yes & -- & No & Yes \\
- CSRRW & No & -- & Yes & Yes \\
- CSRRS/CSRRC & -- & Yes & Yes & No \\
- CSRRS/CSRRC & -- & No & Yes & Yes \\
- \hline
- \multicolumn{5}{|c|}{Immediate operand} \\
- \hline
- Instruction & \textit{rd} is \texttt{x0}
- & \textit{uimm}$=$0
- & Reads CSR & Writes CSR \\
- \hline
- CSRRWI & Yes & -- & No & Yes \\
- CSRRWI & No & -- & Yes & Yes \\
- CSRRSI/CSRRCI & -- & Yes & Yes & No \\
- CSRRSI/CSRRCI & -- & No & Yes & Yes \\
- \hline
- \end{tabular}
- \caption{Conditions determining whether a CSR instruction reads or writes
- the specified CSR.}
- \label{tab:csrsideeffects}
-\end{table}
-
-Table~\ref{tab:csrsideeffects} summarizes the behavior of the CSR
-instructions with respect to whether they read and/or write the CSR.
-
-For any event or consequence that occurs due to a CSR having a particular
-value, if a write to the CSR gives it that value, the resulting event or
-consequence is said to be an \emph{indirect effect} of the write.
-Indirect effects of a CSR write are not considered by the RISC-V ISA to
-be side effects of that write.
-
-\begin{commentary}
- An example of side effects for CSR accesses would be if reading from a
- specific CSR causes a light bulb to turn on, while writing an odd value
- to the same CSR causes the light to turn off.
- Assume writing an even value has no effect.
- In this case, both the read and write have side effects controlling
- whether the bulb is lit, as this condition is not determined solely
- from the CSR value.
- (Note that after writing an odd value to the CSR to turn off the light,
- then reading to turn the light on, writing again the same odd value
- causes the light to turn off again.
- Hence, on the last write, it is not a change in the CSR value that
- turns off the light.)
-
- On the other hand, if a bulb is rigged to light whenever the value
- of a particular CSR is odd, then turning the light on and off is not
- considered a side effect of writing to the CSR but merely an indirect
- effect of such writes.
-
- More concretely, the RISC-V privileged architecture defined in
- Volume~II specifies that certain combinations of CSR values cause a
- trap to occur.
- When an explicit write to a CSR creates the conditions that trigger the
- trap, the trap is not considered a side effect of the write but merely
- an indirect effect.
-
- Standard CSRs do not have any side effects on reads.
- Standard CSRs may have side effects on writes.
- Custom extensions might add CSRs for which accesses have side effects
- on either reads or writes.
-\end{commentary}
-
-Some CSRs, such as the instructions-retired counter, {\tt instret},
-may be modified as side effects of instruction execution. In these
-cases, if a CSR access instruction reads a CSR, it reads the value
-prior to the execution of the instruction. If a CSR access
-instruction writes such a CSR, the write is done instead of the
-increment. In particular, a value written to {\tt instret} by one
-instruction will be the value read by the following instruction.
-
-The assembler pseudoinstruction to read a CSR, CSRR {\em rd, csr}, is
-encoded as CSRRS {\em rd, csr, x0}. The assembler pseudoinstruction
-to write a CSR, CSRW {\em csr, rs1}, is encoded as CSRRW {\em x0, csr,
- rs1}, while CSRWI {\em csr, uimm}, is encoded as CSRRWI {\em x0,
- csr, uimm}.
-
-Further assembler pseudoinstructions are defined to set and clear
-bits in the CSR when the old value is not required: CSRS/CSRC {\em
- csr, rs1}; CSRSI/CSRCI {\em csr, uimm}.
-
-
-\subsection*{CSR Access Ordering}
-
-Each RISC-V hart normally observes its own CSR accesses, including its
-implicit CSR accesses, as performed in program order.
-In particular, unless specified otherwise, a CSR access is performed
-after the execution of any prior instructions in program order whose behavior
-modifies or is modified by the CSR state and before the execution of any
-subsequent instructions in program order whose behavior modifies or is modified
-by the CSR state.
-Furthermore, an explicit CSR read returns the
-CSR state before the execution of the instruction, while an
-explicit CSR write suppresses and overrides any implicit writes or
-modifications to the same CSR by the same instruction.
-
-Likewise, any side effects from an explicit CSR access are normally
-observed to occur synchronously in program order.
-Unless specified otherwise, the full consequences of any such side
-effects are observable by the very next instruction, and no consequences
-may be observed out-of-order by preceding instructions.
-(Note the distinction made earlier between side effects and indirect
-effects of CSR writes.)
-
-For the RVWMO memory consistency model (Chapter~\ref{ch:memorymodel}),
-CSR accesses are weakly ordered by default,
-so other harts or devices may observe CSR accesses in an order
-different from program order. In addition, CSR accesses are not ordered with
-respect to explicit memory accesses, unless a CSR access modifies the execution
-behavior of the instruction that performs the explicit memory access or unless
-a CSR access and an explicit memory access are ordered by either the syntactic
-dependencies defined by the memory model or the ordering requirements defined
-by the Memory-Ordering PMAs section in Volume II of this manual. To enforce
-ordering in all other cases, software should execute a FENCE instruction
-between the relevant accesses. For the purposes of the FENCE instruction, CSR
-read accesses are classified as device input (I), and CSR write accesses are
-classified as device output (O).
-
-\begin{commentary}
-Informally, the CSR space acts as a weakly ordered memory-mapped I/O region, as
-defined by the Memory-Ordering PMAs section in Volume II of this manual. As a
-result, the order of CSR accesses with respect to all other accesses is
-constrained by the same mechanisms that constrain the order of memory-mapped
-I/O accesses to such a region.
-
-These CSR-ordering constraints are imposed to support ordering main
-memory and memory-mapped I/O accesses with respect to CSR accesses that
-are visible to, or affected by, devices or other harts.
-Examples include the {\tt time}, {\tt cycle}, and {\tt mcycle}
-CSRs, in addition to CSRs that reflect pending interrupts, like {\tt mip} and
-{\tt sip}.
-Note that implicit reads of such CSRs (e.g., taking an interrupt because of
-a change in {\tt mip}) are also ordered as device input.
-
-Most CSRs (including, e.g., the {\tt fcsr}) are not visible to other harts;
-their accesses can be freely reordered in the global memory order with respect
-to FENCE instructions without violating this specification.
-\end{commentary}
-
-The hardware platform may define that accesses to certain CSRs are
-strongly ordered, as defined by the Memory-Ordering PMAs section in Volume II
-of this manual. Accesses to strongly ordered CSRs have stronger ordering
-constraints with respect to accesses to both weakly ordered CSRs and accesses
-to memory-mapped I/O regions.
-
-\begin{commentary}
-The rules for the reordering of CSR accesses in the global memory order
-should probably be moved to Chapter~\ref{ch:memorymodel} concerning the
-RVWMO memory consistency model.
-\end{commentary}
diff --git a/src/d.tex b/src/d.tex
deleted file mode 100644
index 8119f47..0000000
--- a/src/d.tex
+++ /dev/null
@@ -1,442 +0,0 @@
-\chapter{``D'' Standard Extension for Double-Precision Floating-Point,
-Version 2.2}
-
-This chapter describes the standard double-precision floating-point
-instruction-set extension, which is named ``D'' and adds
-double-precision floating-point computational instructions compliant
-with the IEEE 754-2008 arithmetic standard. The D extension depends on
-the base single-precision instruction subset F.
-
-\section{D Register State}
-
-The D extension widens the 32 floating-point registers, {\tt f0}--{\tt
- f31}, to 64 bits (FLEN=64 in Figure~\ref{fprs}). The {\tt f}
-registers can now hold either 32-bit or 64-bit floating-point values
-as described below in Section~\ref{nanboxing}.
-
-\begin{commentary}
-FLEN can be 32, 64, or 128 depending on which of the F, D, and Q
-extensions are supported. There can be up to four different
-floating-point precisions supported, including H, F, D, and Q.
-\end{commentary}
-
-\section{NaN Boxing of Narrower Values}
-\label{nanboxing}
-
-When multiple floating-point precisions are supported, then valid
-values of narrower $n$-bit types, \mbox{$n<$ FLEN}, are represented in
-the lower $n$ bits of an FLEN-bit NaN value, in a process termed
-NaN-boxing. The upper bits of a valid NaN-boxed value must be all 1s.
-Valid NaN-boxed $n$-bit values therefore appear as negative quiet NaNs
-(qNaNs) when viewed as any wider $m$-bit value, \mbox{$n < m \leq$
- FLEN}. Any operation that writes a narrower result to an {\tt f}
-register must write all 1s to the uppermost FLEN$-n$ bits to yield a
-legal NaN-boxed value.
-
-\begin{commentary}
-Software might not know the current type of data stored in a
-floating-point register but has to be able to save and restore the
-register values, hence the result of using wider operations to
-transfer narrower values has to be defined. A common case is for
-callee-saved registers, but a standard convention is also desirable for
-features including varargs, user-level threading libraries, virtual
-machine migration, and debugging.
-\end{commentary}
-
-Floating-point $n$-bit transfer operations move external values held
-in IEEE standard formats into and out of the {\tt f} registers, and
-comprise floating-point loads and stores (FL$n$/FS$n$) and
-floating-point move instructions (FMV.$n$.X/FMV.X.$n$). A narrower
-$n$-bit transfer, \mbox{$n<$ FLEN}, into the {\tt f} registers will create a
-valid NaN-boxed value. A narrower $n$-bit transfer out of
-the floating-point registers will transfer the lower $n$ bits of the
-register ignoring the upper FLEN$-n$ bits.
-
-Apart from transfer operations described in the previous paragraph,
-all other floating-point operations on narrower $n$-bit operations,
-\mbox{$n<$ FLEN}, check if the input operands are correctly NaN-boxed,
-i.e., all upper FLEN$-n$ bits are 1. If so, the $n$ least-significant
-bits of the input are used as the input value, otherwise the input
-value is treated as an $n$-bit canonical NaN.
-
-\begin{commentary}
-Earlier versions of this document did not define the behavior of
-feeding the results of narrower or wider operands into an operation,
-except to require that wider saves and restores would preserve the
-value of a narrower operand. The new definition removes this
-implementation-specific behavior, while still accommodating both
-non-recoded and recoded implementations of the floating-point unit.
-The new definition also helps catch software errors by propagating
-NaNs if values are used incorrectly.
-
-Non-recoded implementations unpack and pack the operands to IEEE
-standard format on the input and output of every floating-point
-operation. The NaN-boxing cost to a non-recoded implementation is
-primarily in checking if the upper bits of a narrower operation
-represent a legal NaN-boxed value, and in writing all 1s to the upper
-bits of a result.
-
-Recoded implementations use a more convenient internal format to
-represent floating-point values, with an added exponent bit to allow
-all values to be held normalized. The cost to the recoded
-implementation is primarily the extra tagging needed to track the
-internal types and sign bits, but this can be done without adding new
-state bits by recoding NaNs internally in the exponent field. Small
-modifications are needed to the pipelines used to transfer values in
-and out of the recoded format, but the datapath and latency costs are
-minimal. The recoding process has to handle shifting of input
-subnormal values for wide operands in any case, and extracting the
-NaN-boxed value is a similar process to normalization except for
-skipping over leading-1 bits instead of skipping over leading-0 bits,
-allowing the datapath muxing to be shared.
-\end{commentary}
-
-\section{Double-Precision Load and Store Instructions}
-\label{fld_fsd}
-
-The FLD instruction loads a double-precision floating-point value from
-memory into floating-point register {\em rd}. FSD stores a double-precision
-value from the floating-point registers to memory.
-\begin{commentary}
-The double-precision value may be a NaN-boxed single-precision value.
-\end{commentary}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{M@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{imm[11:0]} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{width} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-12 & 5 & 3 & 5 & 7 \\
-offset[11:0] & base & D & dest & LOAD-FP \\
-\end{tabular}
-\end{center}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{O@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{imm[11:5]} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{width} &
-\multicolumn{1}{c|}{imm[4:0]} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-7 & 5 & 5 & 3 & 5 & 7 \\
-offset[11:5] & src & base & D & offset[4:0] & STORE-FP \\
-\end{tabular}
-\end{center}
-
-FLD and FSD are only guaranteed to execute atomically if the effective address
-is naturally aligned and XLEN$\geq$64.
-
-FLD and FSD do not modify the bits being transferred; in particular, the
-payloads of non-canonical NaNs are preserved.
-
-\section{Double-Precision Floating-Point Computational Instructions}
-
-The double-precision floating-point computational instructions are
-defined analogously to their single-precision counterparts, but operate on
-double-precision operands and produce double-precision results.
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FADD/FSUB & D & src2 & src1 & RM & dest & OP-FP \\
-FMUL/FDIV & D & src2 & src1 & RM & dest & OP-FP \\
-FMIN-MAX & D & src2 & src1 & MIN/MAX & dest & OP-FP \\
-FSQRT & D & 0 & src & RM & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{rs3} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-src3 & D & src2 & src1 & RM & dest & F[N]MADD/F[N]MSUB \\
-\end{tabular}
-\end{center}
-
-\section{Double-Precision Floating-Point Conversion and Move Instructions}
-
-Floating-point-to-integer and integer-to-floating-point conversion
-instructions are encoded in the OP-FP major opcode space.
-FCVT.W.D or FCVT.L.D converts a double-precision floating-point number
-in floating-point register {\em rs1} to a signed 32-bit or 64-bit
-integer, respectively, in integer register {\em rd}. FCVT.D.W
-or FCVT.D.L converts a 32-bit or 64-bit signed integer,
-respectively, in integer register {\em rs1} into a
-double-precision floating-point
-number in floating-point register {\em rd}. FCVT.WU.D,
-FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants
-convert to or from unsigned integer values.
-For RV64, FCVT.W[U].D sign-extends the 32-bit result.
-FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions.
-The range of valid inputs for FCVT.{\em int}.D and
-the behavior for invalid inputs are the same as for FCVT.{\em int}.S.
-
-All floating-point to integer and integer to floating-point conversion
-instructions round according to the {\em rm} field. Note FCVT.D.W[U] always
-produces an exact result and is unaffected by rounding mode.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FCVT.{\em int}.D & D & W[U]/L[U] & src & RM & dest & OP-FP \\
-FCVT.D.{\em int} & D & W[U]/L[U] & src & RM & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-The double-precision to single-precision and single-precision to
-double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are
-encoded in the OP-FP major opcode space and both the source and
-destination are floating-point registers. The {\em rs2} field
-encodes the datatype of the source, and the {\em fmt} field encodes
-the datatype of the destination. FCVT.S.D rounds according to the
-RM field; FCVT.D.S will never round.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FCVT.S.D & S & D & src & RM & dest & OP-FP \\
-FCVT.D.S & D & S & src & RM & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-Floating-point to floating-point sign-injection instructions, FSGNJ.D,
-FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision
-sign-injection instruction.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FSGNJ & D & src2 & src1 & J[N]/JX & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-For XLEN$\geq$64 only, instructions are provided to move bit patterns
-between the floating-point and integer registers. FMV.X.D moves the
-double-precision value in floating-point register {\em rs1} to a
-representation in IEEE 754-2008 standard encoding in integer register
-{\em rd}. FMV.D.X moves the double-precision value encoded in IEEE
-754-2008 standard encoding from the integer register {\em rs1} to the
-floating-point register {\em rd}.
-
-FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the
-payloads of non-canonical NaNs are preserved.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FMV.X.D & D & 0 & src & 000 & dest & OP-FP \\
-FMV.D.X & D & 0 & src & 000 & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-\begin{commentary}
- Early versions of the RISC-V ISA had additional instructions to
- allow RV32 systems to transfer between the upper and lower portions
- of a 64-bit floating-point register and an integer register.
- However, these would be the only instructions with partial register
- writes and would add complexity in implementations with recoded
- floating-point or register renaming, requiring a pipeline read-modify-write
- sequence. Scaling up to handling quad-precision for RV32 and RV64
- would also require additional instructions if they were to follow
- this pattern. The ISA was defined to reduce the number of explicit
- int-float register moves, by having conversions and comparisons
- write results to the appropriate register file, so we expect the
- benefit of these instructions to be lower than for other ISAs.
-
- We note that for systems that implement a 64-bit floating-point unit
- including fused multiply-add support and 64-bit floating-point loads
- and stores, the marginal hardware cost of moving from a 32-bit to
- a 64-bit integer datapath is low, and a software ABI supporting 32-bit
- wide address-space and pointers can be used to avoid growth of
- static data and dynamic memory traffic.
-\end{commentary}
-
-\section{Double-Precision Floating-Point Compare Instructions}
-
-The double-precision floating-point compare instructions are
-defined analogously to their single-precision counterparts, but operate on
-double-precision operands.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{S@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FCMP & D & src2 & src1 & EQ/LT/LE & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-\section{Double-Precision Floating-Point Classify Instruction}
-
-The double-precision floating-point classify instruction, FCLASS.D, is
-defined analogously to its single-precision counterpart, but operates on
-double-precision operands.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{S@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FCLASS & D & 0 & src & 001 & dest & OP-FP \\
-\end{tabular}
-\end{center}
diff --git a/src/extensions.tex b/src/extensions.tex
deleted file mode 100644
index 56cc912..0000000
--- a/src/extensions.tex
+++ /dev/null
@@ -1,383 +0,0 @@
-\chapter{Extending RISC-V}
-\label{extensions}
-
-In addition to supporting standard general-purpose software
-development, another goal of RISC-V is to provide a basis for more
-specialized instruction-set extensions or more customized
-accelerators. The instruction encoding spaces and optional
-variable-length instruction encoding are designed to make it easier to
-leverage software development effort for the standard ISA toolchain
-when building more customized processors. For example, the intent is
-to continue to provide full software support for implementations that
-only use the standard I base, perhaps together with many non-standard
-instruction-set extensions.
-
-This chapter describes various ways in which the base RISC-V ISA can
-be extended, together with the scheme for managing instruction-set
-extensions developed by independent groups. This volume only deals
-with the unprivileged ISA, although the same approach and terminology is
-used for supervisor-level extensions described in the second volume.
-
-\section{Extension Terminology}
-
-This section defines some standard terminology for describing RISC-V
-extensions.
-\vspace{-0.2in}
-\subsection*{Standard versus Non-Standard Extension}
-
-Any RISC-V processor implementation must support a base integer ISA
-(RV32I, RV32E, RV64I, or RV128I). In addition, an implementation may
-support one or more extensions. We divide extensions into two broad
-categories: {\em standard} versus {\em non-standard}.
-\begin{itemize}
-\item A standard extension is one that is generally useful and that is
- designed to not conflict with any other standard extension.
- Currently, ``MAFDQLCBTPV'', described in other chapters of this
- manual, are either complete or planned standard extensions.
-\item A non-standard extension may be highly specialized and may
- conflict with other standard or non-standard extensions. We
- anticipate a wide variety of non-standard extensions will be
- developed over time, with some eventually being promoted to standard
- extensions.
-\end{itemize}
-
-\vspace{-0.2in}
-\subsection*{Instruction Encoding Spaces and Prefixes}
-
-An instruction encoding space is some number of instruction bits
-within which a base ISA or ISA extension is encoded. RISC-V supports
-varying instruction lengths, but even within a single instruction
-length, there are various sizes of encoding space available. For
-example, the base ISAs are defined within a 30-bit encoding space (bits
-31--2 of the 32-bit instruction), while the atomic extension ``A''
-fits within a 25-bit encoding space (bits 31--7).
-
-We use the term {\em prefix} to refer to the bits to the {\em right}
-of an instruction encoding space (since instruction fetch in RISC-V is
-little-endian, the
-bits to the right are stored at earlier memory addresses, hence form a
-prefix in instruction-fetch order). The prefix for the standard base
-ISA encoding is the two-bit ``11'' field held in bits 1--0 of the
-32-bit word, while the prefix for the standard atomic extension ``A''
-is the seven-bit ``0101111'' field held in bits 6--0 of the 32-bit
-word representing the AMO major opcode. A quirk of the encoding
-format is that the 3-bit funct3 field used to encode a minor opcode is
-not contiguous with the major opcode bits in the 32-bit instruction
-format, but is considered part of the prefix for 22-bit instruction
-spaces.
-
-Although an instruction encoding space could be of any size, adopting
-a smaller set of common sizes simplifies packing independently
-developed extensions into a single global encoding.
-Table~\ref{encodingspaces} gives the suggested sizes for RISC-V.
-
-\begin{table}[H]
-\begin{center}
-\begin{tabular}{|c|l|r|r|r|r|}
-\hline
-\multicolumn{1}{|c|}{Size} & \multicolumn{1}{|c|}{Usage} &
-\multicolumn{4}{|c|}{\# Available in standard instruction length} \\ \cline{3-6}
- & &
-\multicolumn{1}{|c|}{16-bit} &
-\multicolumn{1}{|c|}{32-bit} &
-\multicolumn{1}{|c|}{48-bit} &
-\multicolumn{1}{|c|}{64-bit} \\ \hline \hline
-14-bit & Quadrant of compressed 16-bit encoding & 3 & & & \\ \hline \hline
-22-bit & Minor opcode in base 32-bit encoding & & $2^{8}$ & $2^{20}$ & $2^{35}$ \\ \hline
-25-bit & Major opcode in base 32-bit encoding & & 32 & $2^{17}$ & $2^{32}$ \\ \hline
-30-bit & Quadrant of base 32-bit encoding & & 1 & $2^{12}$ & $2^{27}$ \\ \hline \hline
-32-bit & Minor opcode in 48-bit encoding & & & $2^{10}$ & $2^{25}$ \\ \hline
-37-bit & Major opcode in 48-bit encoding & & & 32 & $2^{20}$ \\ \hline
-40-bit & Quadrant of 48-bit encoding & & & 4 & $2^{17}$ \\ \hline \hline
-45-bit & Sub-minor opcode in 64-bit encoding & & & & $2^{12}$ \\ \hline
-48-bit & Minor opcode in 64-bit encoding & & & & $2^{9}$ \\ \hline
-52-bit & Major opcode in 64-bit encoding & & & & 32\\ \hline
-\end{tabular}
-\end{center}
-\caption{Suggested standard RISC-V instruction encoding space sizes.}
-\label{encodingspaces}
-\end{table}
-
-\vspace{-0.2in}
-\subsection*{Greenfield versus Brownfield Extensions}
-
-We use the term {\em greenfield extension} to describe an extension
-that begins populating a new instruction encoding space, and hence can
-only cause encoding conflicts at the prefix level. We use the term
-{\em brownfield extension} to describe an extension that fits around
-existing encodings in a previously defined instruction space. A
-brownfield extension is necessarily tied to a particular greenfield
-parent encoding, and there may be multiple brownfield extensions to
-the same greenfield parent encoding. For example, the base ISAs are
-greenfield encodings of a 30-bit instruction space, while the FDQ
-floating-point extensions are all brownfield extensions adding to the
-parent base ISA 30-bit encoding space.
-
-Note that we consider the standard A extension to have a greenfield
-encoding as it defines a new previously empty 25-bit encoding space in
-the leftmost bits of the full 32-bit base instruction encoding, even
-though its standard prefix locates it within the 30-bit encoding space
-of its parent base ISA.
-Changing only its single 7-bit prefix could move the
-A extension to a different 30-bit encoding space while only worrying
-about conflicts at the prefix level, not within the encoding space
-itself.
-
-\begin{table}[H]
-{
-\begin{center}
-\begin{tabular}{|r|c|c|}
-\hline
- & Adds state & No new state \\ \hline
-Greenfield & RV32I(30), RV64I(30) & A(25) \\\hline
-Brownfield & F(I), D(F), Q(D) & M(I) \\
-\hline
-\end{tabular}
-\end{center}
-}
-\caption{Two-dimensional characterization of standard instruction-set
- extensions.}
-\label{exttax}
-\end{table}
-
-Table~\ref{exttax} shows the bases and standard extensions placed in a
-simple two-dimensional taxonomy. One axis is whether the extension is
-greenfield or brownfield, while the other axis is whether the
-extension adds architectural state. For greenfield extensions, the
-size of the instruction encoding space is given in parentheses. For
-brownfield extensions, the name of the extension (greenfield or
-brownfield) it builds upon is given in parentheses. Additional
-user-level architectural state usually implies changes to the
-supervisor-level system or possibly to the standard calling
-convention.
-
-Note that RV64I is not considered an extension of RV32I, but a
-different complete base encoding.
-
-\vspace{-0.2in}
-\subsection*{Standard-Compatible Global Encodings}
-
-A complete or {\em global} encoding of an ISA for an actual RISC-V
-implementation must allocate a unique non-conflicting prefix for every
-included instruction encoding space. The bases and every standard
-extension have each had a standard prefix allocated to ensure they can
-all coexist in a global encoding.
-
-A {\em standard-compatible} global encoding is one where the base and
-every included standard extension have their standard prefixes. A
-standard-compatible global encoding can include non-standard
-extensions that do not conflict with the included standard extensions.
-A standard-compatible global encoding can also use standard prefixes
-for non-standard extensions if the associated standard extensions are
-not included in the global encoding. In other words, a standard
-extension must use its standard prefix if included in a
-standard-compatible global encoding, but otherwise its prefix is free
-to be reallocated. These constraints allow a common toolchain to
-target the standard subset of any RISC-V standard-compatible global
-encoding.
-
-\vspace{-0.2in}
-\subsection*{Guaranteed Non-Standard Encoding Space}
-
-To support development of proprietary custom extensions, portions of
-the encoding space are guaranteed to never be used by standard
-extensions.
-
-\section{RISC-V Extension Design Philosophy}
-
-We intend to support a large number of independently developed
-extensions by encouraging extension developers to operate within
-instruction encoding spaces, and by providing tools to pack these into
-a standard-compatible global encoding by allocating unique prefixes.
-Some extensions are more naturally implemented as brownfield
-augmentations of existing extensions, and will share whatever prefix
-is allocated to their parent greenfield extension. The standard
-extension prefixes avoid spurious incompatibilities in the encoding of
-core functionality, while allowing custom packing of more esoteric
-extensions.
-
-This capability of repacking RISC-V extensions into different
-standard-compatible global encodings can be used in a number of ways.
-
-One use-case is developing highly specialized custom accelerators,
-designed to run kernels from important application domains. These
-might want to drop all but the base integer ISA and add in only the
-extensions that are required for the task in hand. The base ISAs have
-been designed to place minimal requirements on a hardware
-implementation, and has been encoded to use only a small fraction of a
-32-bit instruction encoding space.
-
-Another use-case is to build a research prototype for a new type of
-instruction-set extension. The researchers might not want to expend
-the effort to implement a variable-length instruction-fetch unit, and
-so would like to prototype their extension using a simple 32-bit
-fixed-width instruction encoding. However, this new extension might
-be too large to coexist with standard extensions in the 32-bit space.
-If the research experiments do not need all of the standard
-extensions, a standard-compatible global encoding might drop the
-unused standard extensions and reuse their prefixes to place the
-proposed extension in a non-standard location to simplify engineering
-of the research prototype. Standard tools will still be able to
-target the base and any standard extensions that are present to reduce
-development time. Once the instruction-set extension has been
-evaluated and refined, it could then be made available for packing
-into a larger variable-length encoding space to avoid conflicts with
-all standard extensions.
-
-The following sections describe increasingly sophisticated strategies
-for developing implementations with new instruction-set extensions.
-These are mostly intended for use in highly customized, educational,
-or experimental architectures rather than for the main line of RISC-V
-ISA development.
-
-\section{Extensions within fixed-width 32-bit instruction format}
-\label{fix32b}
-
-In this section, we discuss adding extensions to implementations that
-only support the base fixed-width 32-bit instruction format.
-
-\begin{commentary}
-We anticipate the simplest fixed-width 32-bit encoding will be popular for
-many restricted accelerators and research prototypes.
-\end{commentary}
-
-\subsection*{Available 30-bit instruction encoding spaces}
-
-In the standard encoding, three of the available 30-bit instruction
-encoding spaces (those with 2-bit prefixes 00, 01, and 10) are used to
-enable the optional compressed instruction extension. However, if the
-compressed instruction-set extension is not required, then these three
-further 30-bit encoding spaces become available. This quadruples the
-available encoding space within the 32-bit format.
-
-\subsection*{Available 25-bit instruction encoding spaces}
-
-A 25-bit instruction encoding space corresponds to a major opcode in
-the base and standard extension encodings.
-
-There are four major opcodes expressly designated for custom extensions
-(Table~\ref{opcodemap}), each of which represents a 25-bit encoding
-space. Two of these are reserved for eventual use in the RV128 base
-encoding (will be OP-IMM-64 and OP-64), but can be used for
-non-standard extensions for RV32 and RV64.
-
-The two major opcodes reserved for RV64 (OP-IMM-32 and OP-32) can also be
-used for non-standard extensions to RV32 only.
-
-If an implementation does not require floating-point, then the seven
-major opcodes reserved for standard floating-point extensions
-(LOAD-FP, STORE-FP, MADD, MSUB, NMSUB, NMADD, OP-FP) can be reused for
-non-standard extensions. Similarly, the AMO major opcode can be
-reused if the standard atomic extensions are not required.
-
-If an implementation does not require instructions longer than
-32-bits, then an additional four major opcodes are available (those
-marked in gray in Table~\ref{opcodemap}).
-
-The base RV32I encoding uses only 11 major opcodes plus 3 reserved
-opcodes, leaving up to 18 available for extensions. The base RV64I
-encoding uses only 13 major opcodes plus 3 reserved opcodes, leaving
-up to 16 available for extensions.
-
-\subsection*{Available 22-bit instruction encoding spaces}
-
-A 22-bit encoding space corresponds to a funct3 minor opcode space in
-the base and standard extension encodings. Several major opcodes have
-a funct3 field minor opcode that is not completely occupied, leaving
-available several 22-bit encoding spaces.
-
-Usually a major opcode selects the format used to encode operands in
-the remaining bits of the instruction, and ideally, an extension
-should follow the operand format of the major opcode to simplify
-hardware decoding.
-
-\subsection*{Other spaces}
-
-Smaller spaces are available under certain major opcodes, and not all
-minor opcodes are entirely filled.
-
-\section{Adding aligned 64-bit instruction extensions}
-
-The simplest approach to provide space for extensions that are too
-large for the base 32-bit fixed-width instruction format is to add
-naturally aligned 64-bit instructions. The implementation must still
-support the 32-bit base instruction format, but can require that
-64-bit instructions are aligned on 64-bit boundaries to simplify
-instruction fetch, with a 32-bit NOP instruction used as alignment
-padding where necessary.
-
-To simplify use of standard tools, the 64-bit instructions should be
-encoded as described in Figure~\ref{instlengthcode}. However, an
-implementation might choose a non-standard instruction-length encoding
-for 64-bit instructions, while retaining the standard encoding for
-32-bit instructions. For example, if compressed instructions are not
-required, then a 64-bit instruction could be encoded using one or more
-zero bits in the first two bits of an instruction.
-
-\begin{commentary}
-We anticipate processor generators that produce instruction-fetch
-units capable of automatically handling any combination of supported
-variable-length instruction encodings.
-\end{commentary}
-
-\section{Supporting VLIW encodings}
-
-Although RISC-V was not designed as a base for a pure VLIW machine,
-VLIW encodings can be added as extensions using several alternative
-approaches. In all cases, the base 32-bit encoding has to be supported
-to allow use of any standard software tools.
-
-\subsection*{Fixed-size instruction group}
-
-The simplest approach is to define a single large naturally aligned
-instruction format (e.g., 128 bits) within which VLIW operations are
-encoded. In a conventional VLIW, this approach would tend to waste
-instruction memory to hold NOPs, but a RISC-V-compatible
-implementation would have to also support the base 32-bit
-instructions, confining the VLIW code size expansion to
-VLIW-accelerated functions.
-
-\subsection*{Encoded-Length Groups}
-
-Another approach is to use the standard length encoding from
-Figure~\ref{instlengthcode} to encode parallel instruction groups,
-allowing NOPs to be compressed out of the VLIW instruction. For
-example, a 64-bit instruction could hold two 28-bit operations, while
-a 96-bit instruction could hold three 28-bit operations, and so on.
-Alternatively, a 48-bit instruction could hold one 42-bit operation,
-while a 96-bit instruction could hold two 42-bit operations, and so
-on.
-
-This approach has the advantage of retaining the base ISA encoding for
-instructions holding a single operation, but has the disadvantage of
-requiring a new 28-bit or 42-bit encoding for operations within the
-VLIW instructions, and misaligned instruction fetch for larger groups.
-One simplification is to not allow VLIW instructions to straddle
-certain microarchitecturally significant boundaries (e.g., cache lines
-or virtual memory pages).
-
-\subsection*{Fixed-Size Instruction Bundles}
-
-Another approach, similar to Itanium, is to use a larger naturally
-aligned fixed instruction bundle size (e.g., 128 bits) across which
-parallel operation groups are encoded. This simplifies instruction
-fetch, but shifts the complexity to the group execution engine. To
-remain RISC-V compatible, the base 32-bit instruction would still have
-to be supported.
-
-\subsection*{End-of-Group bits in Prefix}
-
-None of the above approaches retains the RISC-V encoding for the
-individual operations within a VLIW instruction. Yet another approach
-is to repurpose the two prefix bits in the fixed-width 32-bit
-encoding. One prefix bit can be used to signal ``end-of-group'' if
-set, while the second bit could indicate execution under a predicate
-if clear. Standard RISC-V 32-bit instructions generated by tools
-unaware of the VLIW extension would have both prefix bits set (11) and
-thus have the correct semantics, with each instruction at the end of a
-group and not predicated.
-
-The main disadvantage of this approach is that the base ISAs lack the
-complex predication support usually required in an aggressive VLIW
-system, and it is difficult to add space to specify more predicate
-registers in the standard 30-bit encoding space.
diff --git a/src/f.tex b/src/f.tex
deleted file mode 100644
index 4e2f723..0000000
--- a/src/f.tex
+++ /dev/null
@@ -1,851 +0,0 @@
-\chapter{``F'' Standard Extension for Single-Precision Floating-Point,
-Version 2.2}
-\label{sec:single-float}
-
-This chapter describes the standard instruction-set extension for
-single-precision floating-point, which is named ``F'' and adds
-single-precision floating-point computational instructions compliant
-with the IEEE 754-2008 arithmetic standard~\cite{ieee754-2008}.
-The F extension depends on the ``Zicsr'' extension for control
-and status register access.
-
-\section{F Register State}
-
-The F extension adds 32 floating-point registers, {\tt f0}--{\tt f31},
-each 32 bits wide, and a floating-point control and status register
-{\tt fcsr}, which contains the operating mode and exception status of the
-floating-point unit. This additional state is shown in
-Figure~\ref{fprs}. We use the term FLEN to describe the width of the
-floating-point registers in the RISC-V ISA, and FLEN=32 for the F
-single-precision floating-point extension. Most floating-point
-instructions operate on values in the floating-point register file.
-Floating-point load and store instructions transfer floating-point
-values between registers and memory. Instructions to transfer values
-to and from the integer register file are also provided.
-
-\begin{figure}[htbp]
-{\footnotesize
-\begin{center}
-\begin{tabular}{p{2in}}
-\instbitrange{FLEN-1}{0} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f0\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f1\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f2\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f3\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f4\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f5\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f6\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f7\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f8\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ \ f9\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f10\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f11\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f12\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f13\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f14\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f15\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f16\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f17\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f18\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f19\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f20\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f21\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f22\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f23\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f24\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f25\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f26\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f27\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f28\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f29\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f30\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{\ \ \ f31\ \ \ \ \ }} \\ \cline{1-1}
-\multicolumn{1}{c}{FLEN} \\
-
-\instbitrange{31}{0} \\ \cline{1-1}
-\multicolumn{1}{|c|}{\reglabel{fcsr}} \\ \cline{1-1}
-\multicolumn{1}{c}{32} \\
-\end{tabular}
-\end{center}
-}
-\caption{RISC-V standard F extension single-precision floating-point state.}
-\label{fprs}
-\end{figure}
-
-\begin{commentary}
-We considered a unified register file for both integer and
-floating-point values as this simplifies software register allocation
-and calling conventions, and reduces total user state. However, a
-split organization increases the total number of registers accessible
-with a given instruction width, simplifies provision of enough regfile
-ports for wide superscalar issue, supports decoupled
-floating-point-unit architectures, and simplifies use of internal
-floating-point encoding techniques. Compiler support and calling
-conventions for split register file architectures are well understood,
-and using dirty bits on floating-point register file state can reduce
-context-switch overhead.
-\end{commentary}
-
-\clearpage
-
-\section{Floating-Point Control and Status Register}
-
-The floating-point control and status register, {\tt fcsr}, is a RISC-V
-control and status register (CSR). It is a 32-bit read/write register that
-selects the dynamic rounding mode for floating-point arithmetic operations and
-holds the accrued exception flags, as shown in Figure~\ref{fcsr}.
-
-\begin{figure*}[h]
-{\footnotesize
-\begin{center}
-\begin{tabular}{K@{}E@{}ccccc}
-\instbitrange{31}{8} &
-\instbitrange{7}{5} &
-\instbit{4} &
-\instbit{3} &
-\instbit{2} &
-\instbit{1} &
-\instbit{0} \\
-\hline
-\multicolumn{1}{|c|}{{\em Reserved}} &
-\multicolumn{1}{c|}{Rounding Mode ({\tt frm})} &
-\multicolumn{5}{c|}{Accrued Exceptions ({\tt fflags})} \\
-\hline
-\multicolumn{1}{c}{} &
-\multicolumn{1}{c|}{} &
-\multicolumn{1}{c|}{NV} &
-\multicolumn{1}{c|}{DZ} &
-\multicolumn{1}{c|}{OF} &
-\multicolumn{1}{c|}{UF} &
-\multicolumn{1}{c|}{NX} \\
-\cline{3-7}
-24 & 3 & 1 & 1 & 1 & 1 & 1 \\
-\end{tabular}
-\end{center}
-}
-\vspace{-0.1in}
-\caption{Floating-point control and status register.}
-\label{fcsr}
-\end{figure*}
-
-The {\tt fcsr} register can be read and written with the FRCSR and
-FSCSR instructions, which are assembler pseudoinstructions built on the
-underlying CSR access instructions. FRCSR reads {\tt fcsr} by copying
-it into integer register {\em rd}. FSCSR swaps the value in {\tt
- fcsr} by copying the original value into integer register {\em rd},
-and then writing a new value obtained from integer register {\em rs1}
-into {\tt fcsr}.
-
-The fields within the {\tt fcsr} can also be accessed individually
-through different CSR addresses, and separate assembler pseudoinstructions are
-defined for these accesses. The FRRM instruction reads the Rounding
-Mode field {\tt frm} and copies it into the least-significant three
-bits of integer register {\em rd}, with zero in all other bits. FSRM
-swaps the value in {\tt frm} by copying the original value into
-integer register {\em rd}, and then writing a new value obtained from
-the three least-significant bits of integer register {\em rs1} into
-{\tt frm}. FRFLAGS and FSFLAGS are defined analogously for the
-Accrued Exception Flags field {\tt fflags}.
-
-Bits 31--8 of the {\tt fcsr} are reserved for other standard extensions. If
-these extensions are not present, implementations shall ignore writes to
-these bits and supply a zero value when read. Standard software should
-preserve the contents of these bits.
-
-Floating-point operations use either a static rounding mode encoded in
-the instruction, or a dynamic rounding mode held in {\tt frm}.
-Rounding modes are encoded as shown in Table~\ref{rm}. A value of 111
-in the instruction's {\em rm} field selects the dynamic rounding mode
-held in {\tt frm}. The behavior of floating-point instructions that
-depend on rounding mode when executed with a reserved rounding mode is
-{\em reserved}, including both static reserved rounding modes (101--110) and
-dynamic reserved rounding modes (101--111). Some instructions, including
-widening conversions, have the {\em rm} field but are nevertheless
-mathematically unaffected by the rounding mode; software should set their
-{\em rm} field to RNE (000) but implementations must treat the {\em rm}
-field as usual (in particular, with regard to decoding legal vs. reserved
-encodings).
-
-\begin{table}[htp]
-\begin{small}
-\begin{center}
-\begin{tabular}{ccl}
-\hline
-\multicolumn{1}{|c|}{Rounding Mode} &
-\multicolumn{1}{c|}{Mnemonic} &
-\multicolumn{1}{c|}{Meaning} \\
-\hline
-\multicolumn{1}{|c|}{000} &
-\multicolumn{1}{l|}{RNE} &
-\multicolumn{1}{l|}{Round to Nearest, ties to Even}\\
-\hline
-\multicolumn{1}{|c|}{001} &
-\multicolumn{1}{l|}{RTZ} &
-\multicolumn{1}{l|}{Round towards Zero}\\
-\hline
-\multicolumn{1}{|c|}{010} &
-\multicolumn{1}{l|}{RDN} &
-\multicolumn{1}{l|}{Round Down (towards $-\infty$)}\\
-\hline
-\multicolumn{1}{|c|}{011} &
-\multicolumn{1}{l|}{RUP} &
-\multicolumn{1}{l|}{Round Up (towards $+\infty$)}\\
-\hline
-\multicolumn{1}{|c|}{100} &
-\multicolumn{1}{l|}{RMM} &
-\multicolumn{1}{l|}{Round to Nearest, ties to Max Magnitude}\\
-\hline
-\multicolumn{1}{|c|}{101} &
-\multicolumn{1}{l|}{} &
-\multicolumn{1}{l|}{\em Reserved for future use.}\\
-\hline
-\multicolumn{1}{|c|}{110} &
-\multicolumn{1}{l|}{} &
-\multicolumn{1}{l|}{\em Reserved for future use.}\\
-\hline
-\multicolumn{1}{|c|}{111} &
-\multicolumn{1}{l|}{DYN} &
-\multicolumn{1}{l|}{In instruction's {\em rm} field, selects dynamic rounding mode;}\\
-\multicolumn{1}{|c|}{} &
-\multicolumn{1}{l|}{} &
-\multicolumn{1}{l|}{In Rounding Mode register, {\em reserved}.}\\
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-\caption{Rounding mode encoding.}
-\label{rm}
-\end{table}
-
-\begin{commentary}
-The C99 language standard effectively mandates the provision of a
-dynamic rounding mode register. In typical implementations, writes to
-the dynamic rounding mode CSR state will serialize the pipeline.
-Static rounding modes are used to implement specialized arithmetic
-operations that often have to switch frequently between different
-rounding modes.
-
-The ratified version of the F spec mandated that an illegal
-instruction exception was raised when an instruction was executed with
-a reserved dynamic rounding mode. This has been weakened to reserved,
-which matches the behavior of static rounding-mode instructions.
-Raising an illegal instruction exception is still valid behavior when
-encountering a reserved encoding, so implementations compatible with
-the ratified spec are compatible with the weakened spec.
-\end{commentary}
-
-The accrued exception flags indicate the exception conditions that
-have arisen on any floating-point arithmetic instruction since the
-field was last reset by software, as shown in Table~\ref{bitdef}.
-The base RISC-V ISA
-does not support generating a trap on the setting of a floating-point
-exception flag.
-
-\begin{table}[htp]
-\begin{small}
-\begin{center}
-\begin{tabular}{cl}
-\hline
-\multicolumn{1}{|c|}{Flag Mnemonic} &
-\multicolumn{1}{c|}{Flag Meaning} \\
-\hline
-\multicolumn{1}{|c|}{NV} &
-\multicolumn{1}{c|}{Invalid Operation}\\
-\hline
-\multicolumn{1}{|c|}{DZ} &
-\multicolumn{1}{c|}{Divide by Zero}\\
-\hline
-\multicolumn{1}{|c|}{OF} &
-\multicolumn{1}{c|}{Overflow}\\
-\hline
-\multicolumn{1}{|c|}{UF} &
-\multicolumn{1}{c|}{Underflow}\\
-\hline
-\multicolumn{1}{|c|}{NX} &
-\multicolumn{1}{c|}{Inexact}\\
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-\caption{Accrued exception flag encoding.}
-\label{bitdef}
-\end{table}
-
-\begin{commentary}
-As allowed by the standard, we do not support traps on floating-point
-exceptions in the F extension, but instead require explicit checks of the flags
-in software. We considered adding branches controlled directly by the
-contents of the floating-point accrued exception flags, but ultimately chose
-to omit these instructions to keep the ISA simple.
-\end{commentary}
-
-\section{NaN Generation and Propagation}
-
-Except when otherwise stated, if the result of a floating-point operation is
-NaN, it is the canonical NaN. The canonical NaN has a positive sign and all
-significand bits clear except the MSB, a.k.a. the quiet bit. For
-single-precision floating-point, this corresponds to the pattern {\tt
-0x7fc00000}.
-
-\begin{commentary}
-We considered propagating NaN payloads, as is recommended by the standard,
-but this decision would have increased hardware cost. Moreover, since this
-feature is optional in the standard, it cannot be used in portable code.
-
-Implementors are free to provide a NaN payload propagation scheme as
-a non-standard extension enabled by a non-standard operating mode. However, the
-canonical NaN scheme described above must always be supported and should be
-the default mode.
-\end{commentary}
-
-\begin{commentary}
-We require implementations to return the standard-mandated default
-values in the case of exceptional conditions, without any further
-intervention on the part of user-level software (unlike the Alpha ISA
-floating-point trap barriers). We believe full hardware handling of
-exceptional cases will become more common, and so wish to avoid
-complicating the user-level ISA to optimize other approaches.
-Implementations can always trap to machine-mode software handlers to
-provide exceptional default values.
-\end{commentary}
-
-\section{Subnormal Arithmetic}
-
-Operations on subnormal numbers are handled in accordance with the IEEE
-754-2008 standard.
-
-In the parlance of the IEEE standard, tininess is detected after rounding.
-
-\begin{commentary}
-Detecting tininess after rounding results in fewer spurious underflow signals.
-\end{commentary}
-
-\section{Single-Precision Load and Store Instructions}
-
-Floating-point loads and stores use the same base+offset addressing
-mode as the integer base ISAs, with a base address in register {\em
- rs1} and a 12-bit signed byte offset. The FLW instruction loads a
-single-precision floating-point value from memory into floating-point
-register {\em rd}. FSW stores a single-precision value from
-floating-point register {\em rs2} to memory.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{M@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{imm[11:0]} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{width} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-12 & 5 & 3 & 5 & 7 \\
-offset[11:0] & base & W & dest & LOAD-FP \\
-\end{tabular}
-\end{center}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{O@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{imm[11:5]} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{width} &
-\multicolumn{1}{c|}{imm[4:0]} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-7 & 5 & 5 & 3 & 5 & 7 \\
-offset[11:5] & src & base & W & offset[4:0] & STORE-FP \\
-\end{tabular}
-\end{center}
-
-FLW and FSW are only guaranteed to execute atomically if the effective address
-is naturally aligned.
-
-FLW and FSW do not modify the bits being transferred; in particular, the
-payloads of non-canonical NaNs are preserved.
-
-As described in Section~\ref{sec:rv32:ldst}, the EEI defines whether
-misaligned floating-point loads and stores are handled invisibly or raise
-a contained or fatal trap.
-
-\section{Single-Precision Floating-Point Computational Instructions}
-\label{sec:single-float-compute}
-
-Floating-point arithmetic instructions with one or two source operands use the
-R-type format with the OP-FP major opcode. FADD.S and FMUL.S perform
-single-precision floating-point addition and multiplication respectively,
-between {\em rs1} and {\em rs2}. FSUB.S performs the single-precision
-floating-point subtraction of {\em rs2} from {\em rs1}. FDIV.S performs the
-single-precision floating-point division of {\em rs1} by {\em rs2}. FSQRT.S
-computes the square root of {\em rs1}. In each case, the result is written to
-{\em rd}.
-
-The 2-bit floating-point format field {\em fmt} is encoded as shown in
-Table~\ref{tab:fmt}. It is set to {\em S} (00) for all instructions in
-the F extension.
-
-\begin{table}[htp]
-\begin{small}
-\begin{center}
-\begin{tabular}{|c|c|l|}
-\hline
-{\em fmt} field &
-Mnemonic &
-Meaning \\
-\hline
-00 & S & 32-bit single-precision \\
-01 & D & 64-bit double-precision \\
-10 & H & 16-bit half-precision \\
-11 & Q & 128-bit quad-precision \\
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-\caption{Format field encoding.}
-\label{tab:fmt}
-\end{table}
-
-All floating-point operations that perform rounding can select the
-rounding mode using the {\em rm} field with the encoding shown in
-Table~\ref{rm}.
-
-Floating-point minimum-number and maximum-number instructions FMIN.S and
-FMAX.S write, respectively, the smaller or larger of {\em rs1} and {\em rs2}
-to {\em rd}. For the purposes of these instructions only, the value $-0.0$ is
-considered to be less than the value $+0.0$. If both inputs are NaNs, the
-result is the canonical NaN. If only one operand is a NaN, the result is the
-non-NaN operand. Signaling NaN inputs set the invalid operation exception flag,
-even when the result is not NaN.
-
-\begin{commentary}
-Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S
-instructions were amended to implement the proposed IEEE 754-201x
-minimumNumber and maximumNumber operations, rather than the IEEE 754-2008
-minNum and maxNum operations. These operations differ in their handling of
-signaling NaNs.
-\end{commentary}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FADD/FSUB & S & src2 & src1 & RM & dest & OP-FP \\
-FMUL/FDIV & S & src2 & src1 & RM & dest & OP-FP \\
-FSQRT & S & 0 & src & RM & dest & OP-FP \\
-FMIN-MAX & S & src2 & src1 & MIN/MAX & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-Floating-point fused multiply-add instructions require a new standard
-instruction format. R4-type instructions specify three source
-registers ({\em rs1}, {\em rs2}, and {\em rs3}) and a destination
-register ({\em rd}). This format is only used by the floating-point
-fused multiply-add instructions.
-
-FMADD.S multiplies the values in {\em
-rs1} and {\em rs2}, adds the value in {\em rs3}, and writes the final
-result to {\em rd}. FMADD.S computes {\em (rs1$\times$rs2)+rs3}.
-
-FMSUB.S multiplies the values in {\em rs1} and {\em rs2}, subtracts
-the value in {\em rs3}, and writes the final result to {\em rd}.
-FMSUB.S computes {\em (rs1$\times$rs2)-rs3}.
-
-FNMSUB.S multiplies the
-values in {\em rs1} and {\em rs2}, negates the product, adds the value
-in {\em rs3}, and writes the final result to {\em rd}. FNMSUB.S
-computes {\em -(rs1$\times$rs2)+rs3}.
-
-FNMADD.S multiplies the values
-in {\em rs1} and {\em rs2}, negates the product, subtracts the value
-in {\em rs3}, and writes the final result to {\em rd}. FNMADD.S
-computes {\em -(rs1$\times$rs2)-rs3}.
-
-\begin{commentary}
-The FNMSUB and FNMADD instructions are counterintuitively named, owing to the
-naming of the corresponding instructions in MIPS-IV. The MIPS instructions
-were defined to negate the sum, rather than negating the product as the
-RISC-V instructions do, so the naming scheme was more rational at the time.
-The two definitions differ with respect to signed-zero results. The RISC-V
-definition matches the behavior of the x86 and ARM fused multiply-add
-instructions, but unfortunately the RISC-V FNMSUB and FNMADD instruction
-names are swapped compared to x86 and ARM.
-\end{commentary}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{rs3} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-src3 & S & src2 & src1 & RM & dest & F[N]MADD/F[N]MSUB \\
-\end{tabular}
-\end{center}
-
-\begin{commentary}
- The fused multiply-add (FMA) instructions consume a large part of the
- 32-bit instruction encoding space. Some alternatives considered were
- to restrict FMA to only use dynamic rounding modes, but static
- rounding modes are useful in code that exploits the lack of product
- rounding. Another alternative would have been to use rd to provide
- rs3, but this would require additional move instructions in some
- common sequences. The current design still leaves a large portion of
- the 32-bit encoding space open while avoiding having FMA be
- non-orthogonal.
-\end{commentary}
-
-The fused multiply-add instructions must set the invalid operation exception flag
-when the multiplicands are $\infty$ and zero, even when the addend is a quiet
-NaN.
-\begin{commentary}
-The IEEE 754-2008 standard permits, but does not require, raising the
-invalid exception for the operation \mbox{$\infty\times 0\ +$ qNaN}.
-\end{commentary}
-
-\section{Single-Precision Floating-Point Conversion and Move \mbox{Instructions}}
-
-Floating-point-to-integer and integer-to-floating-point conversion
-instructions are encoded in the OP-FP major opcode space.
-FCVT.W.S or FCVT.L.S converts a floating-point number
-in floating-point register {\em rs1} to a signed 32-bit or 64-bit
-integer, respectively, in integer register {\em rd}. FCVT.S.W
-or FCVT.S.L converts a 32-bit or 64-bit signed integer,
-respectively, in integer register {\em rs1} into a floating-point
-number in floating-point register {\em rd}. FCVT.WU.S,
-FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants
-convert to or from unsigned integer values.
-For XLEN$>32$, FCVT.W[U].S sign-extends the 32-bit result to the
-destination register width.
-FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions.
-If the rounded result is not representable in the destination format,
-it is clipped to the nearest value and the invalid flag is set.
-Table~\ref{tab:int_conv} gives the range of valid inputs for FCVT.{\em int}.S
-and the behavior for invalid inputs.
-
-\begin{table}[htp]
-\begin{small}
-\begin{center}
-\begin{tabular}{|l|r|r|r|r|}
-\hline
- & FCVT.W.S & FCVT.WU.S & FCVT.L.S & FCVT.LU.S \\
-\hline
-Minimum valid input (after rounding) & $-2^{31}$ & 0 & $-2^{63}$ & 0 \\
-Maximum valid input (after rounding) & $2^{31}-1$ & $2^{32}-1$ & $2^{63}-1$ & $2^{64}-1$ \\
-\hline
-Output for out-of-range negative input & $-2^{31}$ & 0 & $-2^{63}$ & 0 \\
-Output for $-\infty$ & $-2^{31}$ & 0 & $-2^{63}$ & 0 \\
-Output for out-of-range positive input & $2^{31}-1$ & $2^{32}-1$ & $2^{63}-1$ & $2^{64}-1$ \\
-Output for $+\infty$ or NaN & $2^{31}-1$ & $2^{32}-1$ & $2^{63}-1$ & $2^{64}-1$ \\
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-\caption{Domains of float-to-integer conversions and behavior for invalid inputs.}
-\label{tab:int_conv}
-\end{table}
-
-All floating-point to integer and integer to floating-point conversion
-instructions round according to the {\em rm} field. A floating-point register
-can be initialized to floating-point positive zero using FCVT.S.W {\em rd},
-{\tt x0}, which will never set any exception flags.
-
-All floating-point conversion instructions set the Inexact exception flag if
-the rounded result differs from the operand value and the Invalid exception
-flag is not set.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FCVT.{\em int}.{\em fmt} & S & W[U]/L[U] & src & RM & dest & OP-FP \\
-FCVT.{\em fmt}.{\em int} & S & W[U]/L[U] & src & RM & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-Floating-point to floating-point sign-injection instructions, FSGNJ.S,
-FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except
-the sign bit from {\em rs1}. For FSGNJ, the result's sign bit is {\em
- rs2}'s sign bit; for FSGNJN, the result's sign bit is the opposite
-of {\em rs2}'s sign bit; and for FSGNJX, the sign bit is the XOR of
-the sign bits of {\em rs1} and {\em rs2}. Sign-injection instructions
-do not set floating-point exception flags, nor do they canonicalize
-NaNs. Note, FSGNJ.S {\em rx, ry,
- ry} moves {\em ry} to {\em rx} (assembler pseudoinstruction FMV.S {\em rx,
- ry}); FSGNJN.S {\em rx, ry, ry} moves the negation of {\em ry} to
-{\em rx} (assembler pseudoinstruction FNEG.S {\em rx, ry}); and FSGNJX.S {\em rx,
- ry, ry} moves the absolute value of {\em ry} to {\em rx} (assembler
-pseudoinstruction FABS.S {\em rx, ry}).
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FSGNJ & S & src2 & src1 & J[N]/JX & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-\begin{commentary}
-The sign-injection instructions
-provide floating-point MV, ABS, and NEG,
-as well as supporting a few other operations, including the IEEE copySign
-operation and sign manipulation in transcendental math function
-libraries. Although MV, ABS, and NEG only need a single register
-operand, whereas FSGNJ instructions need two, it is unlikely most
-microarchitectures would add optimizations to benefit from the reduced
-number of register reads for these relatively infrequent instructions.
-Even in this case, a microarchitecture can simply detect when both
-source registers are the same for FSGNJ instructions and only read a
-single copy.
-\end{commentary}
-
-Instructions are provided to move bit patterns between the
-floating-point and integer registers. FMV.X.W moves the
-single-precision value in floating-point register {\em rs1}
-represented in IEEE 754-2008 encoding to the lower 32 bits of integer
-register {\em rd}. The bits are not
-modified in the transfer, and in particular, the payloads of
-non-canonical NaNs are preserved.
-For RV64, the higher 32 bits of the destination
-register are filled with copies of the floating-point number's sign
-bit.
-
-FMV.W.X moves the single-precision value encoded in IEEE
-754-2008 standard encoding from the lower 32 bits of integer register
-{\em rs1} to the floating-point register {\em rd}. The bits are not
-modified in the transfer, and in particular, the payloads of
-non-canonical NaNs are preserved.
-
-\begin{commentary}
-The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X
-and FMV.X.S. The use of W is more consistent with their semantics as
-an instruction that moves 32 bits without interpreting them. This
-became clearer after defining NaN-boxing. To avoid disturbing
-existing code, both the W and S versions will be supported by tools.
-\end{commentary}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{R@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FMV.X.W & S & 0 & src & 000 & dest & OP-FP \\
-FMV.W.X & S & 0 & src & 000 & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-\begin{commentary}
-The base floating-point ISA was defined so as to allow implementations
-to employ an internal recoding of the floating-point format in
-registers to simplify handling of subnormal values and possibly to
-reduce functional unit latency. To this end, the F extension avoids
-representing integer values in the floating-point registers by
-defining conversion and comparison operations that read and write the
-integer register file directly. This also removes many of the common
-cases where explicit moves between integer and floating-point
-registers are required, reducing instruction count and critical paths
-for common mixed-format code sequences.
-\end{commentary}
-
-\section{Single-Precision Floating-Point Compare Instructions}
-
-Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the
-specified comparison between floating-point registers ($\mbox{\em rs1}
-= \mbox{\em rs2}$, $\mbox{\em rs1} < \mbox{\em rs2}$, $\mbox{\em rs1} \leq
-\mbox{\em rs2}$) writing 1 to the integer register {\em rd} if the condition
-holds, and 0 otherwise.
-
-FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as {\em
-signaling} comparisons: that is, they set the invalid operation exception flag
-if either input is NaN. FEQ.S performs a {\em quiet} comparison: it only
-sets the invalid operation exception flag if either input is a signaling NaN.
-For all three instructions,
-the result is 0 if either operand is NaN.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{S@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FCMP & S & src2 & src1 & EQ/LT/LE & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-\begin{commentary}
-The F extension provides a $\leq$ comparison, whereas the base ISAs provide
-a $\geq$ branch comparison. Because $\leq$ can be synthesized from $\geq$ and
-vice-versa, there is no performance implication to this inconsistency, but it
-is nevertheless an unfortunate incongruity in the ISA.
-\end{commentary}
-
-\section{Single-Precision Floating-Point Classify Instruction}
-
-The FCLASS.S instruction examines the value in floating-point register {\em
-rs1} and writes to integer register {\em rd} a 10-bit mask that indicates
-the class of the floating-point number. The format of the mask is
-described in Table~\ref{tab:fclass}. The corresponding bit in {\em rd} will
-be set if the property is true and clear otherwise. All other bits in
-{\em rd} are cleared. Note that exactly one bit in {\em rd} will be set.
-FCLASS.S does not set the floating-point exception flags.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{S@{}F@{}R@{}R@{}F@{}R@{}O}
-\\
-\instbitrange{31}{27} &
-\instbitrange{26}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct5} &
-\multicolumn{1}{c|}{fmt} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{rm} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-5 & 2 & 5 & 5 & 3 & 5 & 7 \\
-FCLASS & S & 0 & src & 001 & dest & OP-FP \\
-\end{tabular}
-\end{center}
-
-\begin{table}[htp]
-\begin{small}
-\begin{center}
-\begin{tabular}{|c|l|}
-\hline
-{\em rd} bit &
-Meaning \\
-\hline
-0 & {\em rs1} is $-\infty$. \\
-1 & {\em rs1} is a negative normal number. \\
-2 & {\em rs1} is a negative subnormal number. \\
-3 & {\em rs1} is $-0$. \\
-4 & {\em rs1} is $+0$. \\
-5 & {\em rs1} is a positive subnormal number. \\
-6 & {\em rs1} is a positive normal number. \\
-7 & {\em rs1} is $+\infty$. \\
-8 & {\em rs1} is a signaling NaN. \\
-9 & {\em rs1} is a quiet NaN. \\
-\hline
-\end{tabular}
-\end{center}
-\end{small}
-\caption{Format of result of FCLASS instruction.}
-\label{tab:fclass}
-\end{table}
diff --git a/src/history.tex b/src/history.tex
deleted file mode 100644
index 0e6e816..0000000
--- a/src/history.tex
+++ /dev/null
@@ -1,403 +0,0 @@
-\chapter{History and Acknowledgments}
-\label{history}
-
-\section{``Why Develop a new ISA?'' Rationale from Berkeley Group}
-
-We developed RISC-V to support our own needs in research and
-education, where our group is particularly interested in actual
-hardware implementations of research ideas (we have completed eleven
-different silicon fabrications of RISC-V since the first edition of
-this specification), and in providing real implementations for
-students to explore in classes (RISC-V processor RTL designs have been
-used in multiple undergraduate and graduate classes at Berkeley). In
-our current research, we are especially interested in the move towards
-specialized and heterogeneous accelerators, driven by the power
-constraints imposed by the end of conventional transistor scaling. We
-wanted a highly flexible and extensible base ISA around which to build
-our research effort.
-
-A question we have been repeatedly asked is ``Why develop a new ISA?''
-The biggest obvious benefit of using an existing commercial ISA is the
-large and widely supported software ecosystem, both development tools
-and ported applications, which can be leveraged in research and
-teaching. Other benefits include the existence of large amounts of
-documentation and tutorial examples. However, our experience of using
-commercial instruction sets for research and teaching is that these
-benefits are smaller in practice, and do not outweigh the
-disadvantages:
-
-\begin{itemize}
-\item {\bf Commercial ISAs are proprietary.} Except for SPARC V8,
- which is an open IEEE standard~\cite{sparcieee1994}, most owners of
- commercial ISAs carefully guard their intellectual property and do
- not welcome freely available competitive implementations. This is
- much less of an issue for academic research and teaching using only
- software simulators, but has been a major concern for groups wishing
- to share actual RTL implementations. It is also a major concern for
- entities who do not want to trust the few sources of commercial ISA
- implementations, but who are prohibited from creating their own
- clean room implementations. We cannot guarantee that all RISC-V
- implementations will be free of third-party patent infringements,
- but we can guarantee we will not attempt to sue a RISC-V
- implementor.
-
-\item {\bf Commercial ISAs are only popular in certain market
- domains.} The most obvious examples at time of writing are that
- the ARM architecture is not well supported in the server space, and
- the Intel x86 architecture (or for that matter, almost every other
- architecture) is not well supported in the mobile space, though both
- Intel and ARM are attempting to enter each other's market segments.
- Another example is ARC and Tensilica, which provide extensible cores
- but are focused on the embedded space. This market segmentation
- dilutes the benefit of supporting a particular commercial ISA as in
- practice the software ecosystem only exists for certain domains, and
- has to be built for others.
-
-\item {\bf Commercial ISAs come and go.} Previous research
- infrastructures have been built around commercial ISAs that are no
- longer popular (SPARC, MIPS) or even no longer in production
- (Alpha). These lose the benefit of an active software ecosystem,
- and the lingering intellectual property issues around the ISA and
- supporting tools interfere with the ability of interested third
- parties to continue supporting the ISA. An open ISA might also lose
- popularity, but any interested party can continue using and
- developing the ecosystem.
-
-\item {\bf Popular commercial ISAs are complex.} The dominant
- commercial ISAs (x86 and ARM) are both very complex to implement in
- hardware to the level of supporting common software stacks and
- operating systems. Worse, nearly all the complexity is due to bad,
- or at least outdated, ISA design decisions rather than features that
- truly improve efficiency.
-
-\item {\bf Commercial ISAs alone are not enough to bring up
- applications.} Even if we expend the effort to implement a
- commercial ISA, this is not enough to run existing applications for
- that ISA. Most applications need a complete ABI (application binary
- interface) to run, not just the user-level ISA. Most ABIs rely on
- libraries, which in turn rely on operating system support. To run an
- existing operating system requires implementing the supervisor-level
- ISA and device interfaces expected by the OS. These are usually
- much less well-specified and considerably more complex to
- implement than the user-level ISA.
-
-\item {\bf Popular commercial ISAs were not designed for extensibility.} The
- dominant commercial ISAs were not particularly designed for
- extensibility, and as a consequence have added considerable
- instruction encoding complexity as their instruction sets have
- grown. Companies such as Tensilica (acquired by Cadence) and ARC
- (acquired by Synopsys) have built ISAs and toolchains around
- extensibility, but have focused on embedded applications rather than
- general-purpose computing systems.
-
-\item {\bf A modified commercial ISA is a new ISA.} One of our main
- goals is to support architecture research, including major ISA
- extensions. Even small extensions diminish the benefit of using a
- standard ISA, as compilers have to be modified and applications
- rebuilt from source code to use the extension. Larger extensions
- that introduce new architectural state also require modifications to
- the operating system. Ultimately, the modified commercial ISA
- becomes a new ISA, but carries along all the legacy baggage of the
- base ISA.
-\end{itemize}
-
-Our position is that the ISA is perhaps the most important interface
-in a computing system, and there is no reason that such an important
-interface should be proprietary. The dominant commercial ISAs are
-based on instruction-set concepts that were already well known over 30
-years ago. Software developers should be able to target an open
-standard hardware target, and commercial processor designers should
-compete on implementation quality.
-
-We are far from the first to contemplate an open ISA design suitable
-for hardware implementation. We also considered other existing open
-ISA designs, of which the closest to our goals was the OpenRISC
-architecture~\cite{openriscarch}. We decided against adopting the
-OpenRISC ISA for several technical reasons:
-
-\begin{itemize}
-\item OpenRISC has condition codes and branch delay slots, which
- complicate higher performance implementations.
-\item OpenRISC uses a fixed 32-bit encoding and 16-bit immediates,
- which precludes a denser instruction encoding and limits space for
- later expansion of the ISA.
-\item OpenRISC does not support the 2008 revision to the IEEE 754
- floating-point standard.
-\item The OpenRISC 64-bit design had not been completed when we began.
-\end{itemize}
-
-By starting from a clean slate, we could design an ISA that met all of
-our goals, though of course, this took far more effort than we had
-planned at the outset. We have now invested considerable effort in
-building up the RISC-V ISA infrastructure, including documentation,
-compiler tool chains, operating system ports, reference ISA
-simulators, FPGA implementations, efficient ASIC implementations,
-architecture test suites, and teaching materials. Since the last
-edition of this manual, there has been considerable uptake of the
-RISC-V ISA in both academia and industry, and we have created the
-non-profit RISC-V Foundation to protect and promote the standard. The
-RISC-V Foundation website at \url{https://riscv.org} contains the latest
-information on the Foundation membership and various open-source
-projects using RISC-V.
-
-
-\section{History from Revision 1.0 of ISA manual}
-
-The RISC-V ISA and instruction-set manual builds upon several earlier
-projects. Several aspects of the supervisor-level machine and the
-overall format of the manual date back to the T0 (Torrent-0) vector
-microprocessor project at UC Berkeley and ICSI, begun in 1992. T0 was
-a vector processor based on the MIPS-II ISA, with Krste Asanovi\'{c}
-as main architect and RTL designer, and Brian Kingsbury and Bertrand
-Irrisou as principal VLSI implementors. David Johnson at ICSI was a
-major contributor to the T0 ISA design, particularly supervisor mode,
-and to the manual text. John Hauser also provided considerable
-feedback on the T0 ISA design.
-
-The Scale (Software-Controlled Architecture for Low Energy) project at
-MIT, begun in 2000, built upon the T0 project infrastructure, refined
-the supervisor-level interface, and moved away from the MIPS scalar
-ISA by dropping the branch delay slot. Ronny Krashinsky and
-Christopher Batten were the principal architects of the Scale
-Vector-Thread processor at MIT, while Mark Hampton ported the
-GCC-based compiler infrastructure and tools for Scale.
-
-A lightly edited version of the T0 MIPS scalar processor specification
-(MIPS-6371) was used in teaching a new version of the MIT 6.371
-Introduction to VLSI Systems class in the Fall 2002 semester, with
-Chris Terman and Krste Asanovi\'{c} as lecturers. Chris Terman
-contributed most of the lab material for the class (there was no
-TA!). The 6.371 class evolved into the trial 6.884 Complex Digital
-Design class at MIT, taught by Arvind and Krste Asanovi\'{c} in Spring
-2005, which became a regular Spring class 6.375. A reduced version of
-the Scale MIPS-based scalar ISA, named SMIPS, was used in 6.884/6.375.
-Christopher Batten was the TA for the early offerings of these classes
-and developed a considerable amount of documentation and lab material
-based around the SMIPS ISA. This same SMIPS lab material was adapted
-and enhanced by TA Yunsup Lee for the UC Berkeley Fall 2009 CS250 VLSI
-Systems Design class taught by John Wawrzynek, Krste Asanovi\'{c}, and
-John Lazzaro.
-
-The Maven (Malleable Array of Vector-thread ENgines) project was a
-second-generation vector-thread architecture. Its design was led by
-Christopher Batten when he was an Exchange Scholar at UC Berkeley starting
-in summer 2007. Hidetaka Aoki, a visiting industrial fellow from
-Hitachi, gave considerable feedback on the early Maven ISA and
-microarchitecture design. The Maven infrastructure was based on the
-Scale infrastructure but the Maven ISA moved further away from the
-MIPS ISA variant defined in Scale, with a unified floating-point and
-integer register file. Maven was designed to support experimentation
-with alternative data-parallel accelerators. Yunsup Lee was the main
-implementor of the various Maven vector units, while Rimas Avi\v{z}ienis
-was the main implementor of the various Maven scalar units.
-Yunsup Lee and Christopher Batten ported GCC to work with the new
-Maven ISA. Christopher Celio provided the initial definition of a
-traditional vector instruction set (``Flood'') variant of Maven.
-
-Based on experience with all these previous projects, the RISC-V ISA
-definition was begun in Summer 2010, with Andrew Waterman, Yunsup Lee,
-Krste Asanovi\'{c}, and David Patterson as principal designers.
-An initial version of the RISC-V
-32-bit instruction subset was used in the UC Berkeley Fall 2010 CS250
-VLSI Systems Design class, with Yunsup Lee as TA. RISC-V is a clean
-break from the earlier MIPS-inspired designs. John Hauser contributed
-to the floating-point ISA definition, including the sign-injection
-instructions and a register encoding scheme that permits
-internal recoding of floating-point values.
-
-\section{History from Revision 2.0 of ISA manual}
-
-Multiple implementations of RISC-V processors have been completed,
-including several silicon fabrications, as shown in
-Figure~\ref{silicon}.
-
-\begin{table*}[!h]
-\begin{center}
-\begin{tabular}{|l|r|l|l|}
-\hline
-\multicolumn{1}{|c|}{Name} & \multicolumn{1}{|c|}{Tapeout Date} & \multicolumn{1}{|c|}{Process} & \multicolumn{1}{|c|}{ISA} \\ \hline
-\hline
-Raven-1 & May 29, 2011 & ST 28nm FDSOI & RV64G1\_Xhwacha1 \\ \hline
-EOS14 & April 1, 2012 & IBM 45nm SOI & RV64G1p1\_Xhwacha2 \\ \hline
-EOS16 & August 17, 2012 & IBM 45nm SOI & RV64G1p1\_Xhwacha2 \\ \hline
-Raven-2 & August 22, 2012 & ST 28nm FDSOI & RV64G1p1\_Xhwacha2 \\ \hline
-EOS18 & February 6, 2013 & IBM 45nm SOI & RV64G1p1\_Xhwacha2 \\ \hline
-EOS20 & July 3, 2013 & IBM 45nm SOI & RV64G1p99\_Xhwacha2 \\ \hline
-Raven-3 & September 26, 2013 & ST 28nm SOI & RV64G1p99\_Xhwacha2 \\ \hline
-EOS22 & March 7, 2014 & IBM 45nm SOI & RV64G1p9999\_Xhwacha3 \\ \hline
-\end{tabular}
-\end{center}
-\vspace{-0.15in}
-\caption{Fabricated RISC-V testchips.}
-\label{silicon}
-\end{table*}
-
-The first RISC-V processors to be fabricated were written in Verilog and
-manufactured in a pre-production \wunits{28}{nm} FDSOI technology from
-ST as the Raven-1 testchip in 2011. Two cores were developed by Yunsup
-Lee and Andrew Waterman, advised by Krste Asanovi\'{c}, and fabricated
-together: 1) an RV64 scalar core with error-detecting flip-flops, and 2)
-an RV64 core with an attached 64-bit floating-point vector unit. The
-first microarchitecture was informally known as ``TrainWreck'', due to
-the short time available to complete the design with immature design
-libraries.
-
-Subsequently, a clean microarchitecture for an in-order decoupled RV64
-core was developed by Andrew Waterman, Rimas Avi\v{z}ienis, and Yunsup
-Lee, advised by Krste Asanovi\'{c}, and, continuing the railway theme,
-was codenamed ``Rocket'' after George Stephenson's successful steam
-locomotive design. Rocket was written in Chisel, a new hardware
-design language developed at UC Berkeley. The IEEE floating-point
-units used in Rocket were developed by John Hauser, Andrew
-Waterman, and Brian Richards.
-Rocket has since been refined and developed further, and has been
-fabricated two more times in \wunits{28}{nm} FDSOI (Raven-2, Raven-3),
-and five times in IBM \wunits{45}{nm} SOI technology (EOS14, EOS16,
-EOS18, EOS20, EOS22) for a photonics project. Work is ongoing to make
-the Rocket design available as a parameterized RISC-V processor
-generator.
-
-EOS14--EOS22 chips include early versions of Hwacha, a 64-bit IEEE
-floating-point vector unit, developed by Yunsup Lee, Andrew Waterman,
-Huy Vo, Albert Ou, Quan Nguyen, and Stephen Twigg, advised by Krste
-Asanovi\'{c}. EOS16--EOS22 chips include dual cores with a
-cache-coherence protocol developed by Henry Cook and Andrew Waterman,
-advised by Krste Asanovi\'{c}. EOS14 silicon has successfully run at
-\wunits{1.25}{GHz}. EOS16 silicon suffered from a bug in the IBM pad
-libraries. EOS18 and EOS20 have successfully run at \wunits{1.35}{GHz}.
-
-Contributors to the Raven testchips include Yunsup Lee, Andrew Waterman,
-Rimas Avi\v{z}ienis, Brian Zimmer, Jaehwa Kwak, Ruzica Jevti\'{c},
-Milovan Blagojevi\'{c}, Alberto Puggelli, Steven Bailey, Ben Keller,
-Pi-Feng Chiu, Brian Richards, Borivoje Nikoli\'{c}, and Krste
-Asanovi\'{c}.
-
-Contributors to the EOS testchips include Yunsup Lee, Rimas
-Avi\v{z}ienis, Andrew Waterman, Henry Cook, Huy Vo, Daiwei Li, Chen Sun,
-Albert Ou, Quan Nguyen, Stephen Twigg, Vladimir Stojanovi\'{c}, and
-Krste Asanovi\'{c}.
-
-Andrew Waterman and Yunsup Lee developed the C++ ISA simulator
-``Spike'', used as a golden model in development and named after the
-golden spike used to celebrate completion of the US transcontinental
-railway. Spike has been made available as a BSD open-source project.
-
-Andrew Waterman completed a Master's thesis with a preliminary design
-of the RISC-V compressed instruction set~\cite{waterman-ms}.
-
-Various FPGA implementations of the RISC-V have been completed,
-primarily as part of integrated demos for the Par Lab project research
-retreats. The largest FPGA design has 3 cache-coherent RV64IMA
-processors running a research operating system. Contributors to the
-FPGA implementations include Andrew Waterman, Yunsup Lee, Rimas
-Avi\v{z}ienis, and Krste Asanovi\'{c}.
-
-RISC-V processors have been used in several classes at UC Berkeley.
-Rocket was used in the Fall 2011 offering of CS250 as a basis for class
-projects, with Brian Zimmer as TA. For the undergraduate CS152 class in
-Spring 2012, Christopher Celio used Chisel to write a suite of educational
-RV32 processors, named ``Sodor'' after the island on which ``Thomas the
-Tank Engine'' and friends live. The suite includes a microcoded core,
-an unpipelined core, and 2, 3, and 5-stage pipelined cores, and is
-publicly available under a BSD license. The suite was subsequently
-updated and used again in CS152 in Spring 2013, with Yunsup Lee as TA,
-and in Spring 2014, with Eric Love as TA.
-Christopher Celio also developed an out-of-order RV64 design known as BOOM
-(Berkeley Out-of-Order Machine), with accompanying pipeline
-visualizations, that was used in the CS152 classes. The CS152 classes
-also used cache-coherent versions of the Rocket core developed by Andrew
-Waterman and Henry Cook.
-
-Over the summer of 2013, the RoCC (Rocket Custom Coprocessor)
-interface was defined to simplify adding custom accelerators to the
-Rocket core. Rocket and the RoCC interface were used extensively in
-the Fall 2013 CS250 VLSI class taught by Jonathan Bachrach, with
-several student accelerator projects built to the RoCC interface. The
-Hwacha vector unit has been rewritten as a RoCC coprocessor.
-
-Two Berkeley undergraduates, Quan Nguyen and Albert Ou, have
-successfully ported Linux to run on RISC-V in Spring 2013.
-
-Colin Schmidt successfully completed an LLVM backend for RISC-V 2.0 in
-January 2014.
-
-Darius Rad at Bluespec contributed soft-float ABI support to the GCC port in
-March 2014.
-
-John Hauser contributed the definition of the floating-point classification
-instructions.
-
-We are aware of several other RISC-V core implementations, including
-one in Verilog by Tommy Thorn, and one in Bluespec by Rishiyur Nikhil.
-
-\section*{Acknowledgments}
-
-Thanks to Christopher F. Batten, Preston Briggs, Christopher Celio, David
-Chisnall, Stefan Freudenberger, John Hauser, Ben Keller, Rishiyur
-Nikhil, Michael Taylor, Tommy Thorn, and Robert Watson for comments on
-the draft ISA version 2.0 specification.
-
-\section{History from Revision 2.1}
-
-Uptake of the RISC-V ISA has been very rapid since the introduction of
-the frozen version 2.0 in May 2014, with too much activity to record
-in a short history section such as this. Perhaps the most important
-single event was the formation of the non-profit RISC-V Foundation in
-August 2015. The Foundation will now take over stewardship of the
-official RISC-V ISA standard, and the official website {\tt riscv.org}
-is the best place to obtain news and updates on the RISC-V standard.
-
-\section*{Acknowledgments}
-
-Thanks to Scott Beamer, Allen J. Baum, Christopher Celio, David Chisnall,
-Paul Clayton, Palmer Dabbelt, Jan Gray, Michael Hamburg, and John
-Hauser for comments on the version 2.0 specification.
-
-\section{History from Revision 2.2}
-
-
-\section*{Acknowledgments}
-
-Thanks to Jacob Bachmeyer, Alex Bradbury, David Horner, Stefan O'Rear,
-and Joseph Myers for comments on the version 2.1 specification.
-
-\section{History for Revision 2.3}
-
-Uptake of RISC-V continues at breakneck pace.
-
-John Hauser and Andrew Waterman contributed a hypervisor ISA extension
-based upon a proposal from Paolo Bonzini.
-
-Daniel Lustig, Arvind, Krste Asanovi\'{c}, Shaked Flur, Paul Loewenstein, Yatin
-Manerkar, Luc Maranget, Margaret Martonosi, Vijayanand Nagarajan, Rishiyur
-Nikhil, Jonas Oberhauser, Christopher Pulte, Jose Renau, Peter Sewell, Susmit
-Sarkar, Caroline Trippel, Muralidaran Vijayaraghavan, Andrew Waterman, Derek
-Williams, Andrew Wright, and Sizhuo Zhang contributed the memory consistency
-model.
-
-\section{Funding}
-
-Development of the RISC-V architecture and implementations has been
-partially funded by the following sponsors.
-\begin{itemize}
-
-\item {\bf Par Lab:} Research supported by Microsoft (Award \#024263) and Intel (Award
- \#024894) funding and by matching funding by U.C. Discovery
- (Award \#DIG07-10227). Additional support came from Par Lab
- affiliates Nokia, NVIDIA, Oracle, and Samsung.
-
-\item {\bf Project Isis:} DoE Award DE-SC0003624.
-
-\item {\bf ASPIRE Lab}: DARPA PERFECT program, Award
- HR0011-12-2-0016. DARPA POEM program Award HR0011-11-C-0100. The
- Center for Future Architectures Research (C-FAR), a STARnet center
- funded by the Semiconductor Research Corporation. Additional
- support from ASPIRE industrial sponsor, Intel, and ASPIRE
- affiliates, Google, Hewlett Packard Enterprise, Huawei, Nokia,
- NVIDIA, Oracle, and Samsung.
-
-\end{itemize}
-
-The content of this paper does not necessarily reflect the position or the
-policy of the US government and no official endorsement should be
-inferred.
diff --git a/src/intro.tex b/src/intro.tex
deleted file mode 100644
index 7a74ab7..0000000
--- a/src/intro.tex
+++ /dev/null
@@ -1,770 +0,0 @@
-\chapter{Introduction}
-
-RISC-V (pronounced ``risk-five'') is a new instruction-set
-architecture (ISA) that was originally designed to support computer
-architecture research and education, but which we now hope will also
-become a standard free and open architecture for industry
-implementations. Our goals in defining RISC-V include:
-\vspace{-0.1in}
-\begin{itemize}
-\parskip 0pt
-\itemsep 1pt
-\item A completely {\em open} ISA that is freely available to
- academia and industry.
-\item A {\em real} ISA suitable for direct native hardware implementation,
- not just simulation or binary translation.
-\item An ISA that avoids ``over-architecting'' for a particular
- microarchitecture style (e.g., microcoded, in-order, decoupled,
- out-of-order) or implementation technology (e.g., full-custom, ASIC,
- FPGA), but which allows efficient implementation in any of these.
-\item An ISA separated into a {\em small} base integer ISA, usable by
- itself as a base for customized accelerators or for educational
- purposes, and optional standard extensions, to support
- general-purpose software development.
-\item Support for the revised 2008 IEEE-754 floating-point standard~\cite{ieee754-2008}.
-\item An ISA supporting extensive ISA extensions and
- specialized variants.
-\item Both 32-bit and 64-bit address space variants for
- applications, operating system kernels, and hardware implementations.
-\item An ISA with support for highly parallel multicore
- or manycore implementations, including heterogeneous multiprocessors.
-\item Optional {\em variable-length instructions} to both expand available
- instruction encoding space and to support an optional {\em dense
- instruction encoding} for improved performance, static code size,
- and energy efficiency.
-\item A fully virtualizable ISA to ease hypervisor development.
-\item An ISA that simplifies experiments with new privileged architecture designs.
-\end{itemize}
-\vspace{-0.1in}
-
-\begin{commentary}
- Commentary on our design decisions is formatted as in this
- paragraph. This non-normative text can be skipped if the reader is
- only interested in the specification itself.
-\end{commentary}
-\begin{commentary}
-The name RISC-V was chosen to represent the fifth major RISC ISA
-design from UC Berkeley (RISC-I~\cite{riscI-isca1981},
-RISC-II~\cite{Katevenis:1983}, SOAR~\cite{Ungar:1984}, and
-SPUR~\cite{spur-jsscc1989} were the first four). We also pun on the
-use of the Roman numeral ``V'' to signify ``variations'' and
-``vectors'', as support for a range of architecture research,
-including various data-parallel accelerators, is an explicit goal of
-the ISA design.
-\end{commentary}
-
-The RISC-V ISA is defined avoiding implementation details as much as
-possible (although commentary is included on implementation-driven
-decisions) and should be read as the software-visible interface to a
-wide variety of implementations rather than as the design of a
-particular hardware artifact. The RISC-V manual is structured in two
-volumes. This volume covers the design of the base {\em unprivileged}
-instructions, including optional unprivileged ISA extensions.
-Unprivileged instructions are those that are generally usable in all
-privilege modes in all privileged architectures, though behavior might
-vary depending on privilege mode and privilege architecture. The
-second volume provides the design of the first (``classic'')
-privileged architecture. The manuals use IEC 80000-13:2008
-conventions, with a byte of 8 bits.
-
-\begin{commentary}
-In the unprivileged ISA design, we tried to remove any dependence on
-particular microarchitectural features, such as cache line size, or on
-privileged architecture details, such as page translation. This is
-both for simplicity and to allow maximum flexibility for alternative
-microarchitectures or alternative privileged architectures.
-\end{commentary}
-
-
-\section{RISC-V Hardware Platform Terminology}
-
-A RISC-V hardware platform can contain one or more RISC-V-compatible
-processing cores together with other non-RISC-V-compatible cores,
-fixed-function accelerators, various physical memory structures, I/O
-devices, and an interconnect structure to allow the components to
-communicate.
-
-A component is termed a {\em core} if it contains an independent
-instruction fetch unit. A RISC-V-compatible core might support
-multiple RISC-V-compatible hardware threads, or {\em harts}, through
-multithreading.
-
-A RISC-V core might have additional specialized instruction-set
-extensions or an added {\em coprocessor}. We use the term {\em
- coprocessor} to refer to a unit that is attached to a RISC-V core
-and is mostly sequenced by a RISC-V instruction stream, but which
-contains additional architectural state and instruction-set
-extensions, and possibly some limited autonomy relative to the
-primary RISC-V instruction stream.
-
-We use the term {\em accelerator} to refer to either a
-non-programmable fixed-function unit or a core that can operate
-autonomously but is specialized for certain tasks. In RISC-V systems,
-we expect many programmable accelerators will be RISC-V-based cores
-with specialized instruction-set extensions and/or customized
-coprocessors. An important class of RISC-V accelerators are I/O
-accelerators, which offload I/O processing tasks from the main
-application cores.
-
-The system-level organization of a RISC-V hardware platform can range
-from a single-core microcontroller to a many-thousand-node cluster of
-shared-memory manycore server nodes. Even small systems-on-a-chip
-might be structured as a hierarchy of multicomputers and/or
-multiprocessors to modularize development effort or to provide secure
-isolation between subsystems.
-
-\section{RISC-V Software Execution Environments and Harts}
-
-The behavior of a RISC-V program depends on the execution environment
-in which it runs. A RISC-V execution environment interface (EEI)
-defines the initial state of the program, the number and type of harts
-in the environment including the privilege modes supported by the
-harts, the accessibility and attributes of memory and I/O regions, the
-behavior of all legal instructions executed on each hart (i.e., the
-ISA is one component of the EEI), and the handling of any interrupts
-or exceptions raised during execution including environment calls.
-Examples of EEIs include the Linux application binary interface (ABI),
-or the RISC-V supervisor binary interface (SBI). The implementation
-of a RISC-V execution environment can be pure hardware, pure software,
-or a combination of hardware and software. For example, opcode traps
-and software emulation can be used to implement functionality not
-provided in hardware. Examples of execution environment
-implementations include:
-\begin{itemize}
- \item ``Bare metal'' hardware platforms where harts are directly
- implemented by physical processor threads and instructions have
- full access to the physical address space. The hardware platform
- defines an execution environment that begins at power-on reset.
- \item RISC-V operating systems that provide multiple user-level
- execution environments by multiplexing user-level harts onto
- available physical processor threads and by controlling access to
- memory via virtual memory.
- \item RISC-V hypervisors that provide multiple supervisor-level
- execution environments for guest operating systems.
- \item RISC-V emulators, such as Spike, QEMU or rv8, which emulate
- RISC-V harts on an underlying x86 system, and which can provide
- either a user-level or a supervisor-level execution environment.
-\end{itemize}
-
-\begin{commentary}
- A bare hardware platform can be considered to define an EEI, where
- the accessible harts, memory, and other devices populate the
- environment, and the initial state is that at power-on reset.
- Generally, most software is designed to use a more abstract
- interface to the hardware, as more abstract EEIs provide greater
- portability across different hardware platforms. Often EEIs are
- layered on top of one another, where one higher-level EEI uses
- another lower-level EEI.
-\end{commentary}
-
-From the perspective of software running in a given execution
-environment, a hart is a resource that autonomously fetches and
-executes RISC-V instructions within that execution environment. In
-this respect, a hart behaves like a hardware thread resource even if
-time-multiplexed onto real hardware by the execution environment.
-Some EEIs support the creation and destruction of additional harts,
-for example, via environment calls to fork new harts.
-
-The execution environment is responsible for ensuring the eventual forward
-progress of each of its harts.
-For a given hart, that responsibility is suspended while the hart is
-exercising a mechanism that explicitly waits for an event, such as the
-wait-for-interrupt instruction defined in Volume II of this specification; and
-that responsibility ends if the hart is terminated.
-The following events constitute forward progress:
-\vspace{-0.2in}
-\begin{itemize}
-\parskip 0pt
-\itemsep 1pt
-\item The retirement of an instruction.
-\item A trap, as defined in Section~\ref{sec:trap-defn}.
-\item Any other event defined by an extension to constitute forward progress.
-\end{itemize}
-
-\begin{commentary}
-The term hart was introduced in the work on
-Lithe~\cite{lithe-pan-hotpar09,lithe-pan-pldi10} to provide a term to
-represent an abstract execution resource as opposed to a software
-thread programming abstraction.
-
-The important distinction between a hardware thread (hart) and a
-software thread context is that the software running inside an
-execution environment is not responsible for causing progress of each
-of its harts; that is the responsibility of the outer execution
-environment. So the environment's harts operate like hardware threads
-from the perspective of the software inside the execution environment.
-
-An execution environment implementation might time-multiplex a set of
-guest harts onto fewer host harts provided by its own execution
-environment but must do so in a way that guest harts operate like
-independent hardware threads. In particular, if there are more guest
-harts than host harts then the execution environment must be able to
-preempt the guest harts and must not wait indefinitely for guest
-software on a guest hart to ``yield" control of the guest hart.
-\end{commentary}
-
-\section{RISC-V ISA Overview}
-
-A RISC-V ISA is defined as a base integer ISA, which must be present
-in any implementation, plus optional extensions to the base ISA. The
-base integer ISAs are very similar to that of the early RISC processors
-except with no branch delay slots and with support for optional
-variable-length instruction encodings. A base is carefully
-restricted to a minimal set of instructions sufficient to provide a
-reasonable target for compilers, assemblers, linkers, and operating
-systems (with additional privileged operations), and so provides
-a convenient ISA and software toolchain ``skeleton'' around which more
-customized processor ISAs can be built.
-
-Although it is convenient to speak of {\em the} RISC-V ISA, RISC-V is
-actually a family of related ISAs, of which there are currently four
-base ISAs. Each base integer instruction set is characterized by the
-width of the integer registers and the corresponding size of the
-address space and by the number of integer registers. There are two
-primary base integer variants, RV32I and RV64I, described in
-Chapters~\ref{rv32} and \ref{rv64}, which provide 32-bit or 64-bit
-address spaces respectively. We use the term XLEN to refer to the
-width of an integer register in bits (either 32 or 64).
-Chapter~\ref{rv32e} describes the RV32E subset variant of the RV32I
-base instruction set, which has been added to support small
-microcontrollers, and which has half the number of integer registers.
-Chapter~\ref{rv128} sketches a future RV128I variant of the base
-integer instruction set supporting a flat 128-bit address space
-(XLEN=128). The base integer instruction sets use a two's-complement
-representation for signed integer values.
-
-\begin{commentary}
-Although 64-bit address spaces are a requirement for larger systems,
-we believe 32-bit address spaces will remain adequate for many
-embedded and client devices for decades to come and will be desirable
-to lower memory traffic and energy consumption. In addition, 32-bit
-address spaces are sufficient for educational purposes. A larger flat
-128-bit address space might eventually be required, so we ensured this
-could be accommodated within the RISC-V ISA framework.
-\end{commentary}
-
-\begin{commentary}
-The four base ISAs in RISC-V are treated as distinct base ISAs. A
-common question is why is there not a single ISA, and in particular,
-why is RV32I not a strict subset of RV64I? Some earlier ISA designs
-(SPARC, MIPS) adopted a strict superset policy when increasing address
-space size to support running existing 32-bit binaries on new 64-bit
-hardware.
-
-The main advantage of explicitly separating base ISAs is that each
-base ISA can be optimized for its needs without requiring to support
-all the operations needed for other base ISAs. For example, RV64I can
-omit instructions and CSRs that are only needed to cope with the
-narrower registers in RV32I. The RV32I variants can use encoding
-space otherwise reserved for instructions only required by wider
-address-space variants.
-
-The main disadvantage of not treating the design as a single ISA is
-that it complicates the hardware needed to emulate one base ISA on
-another (e.g., RV32I on RV64I). However, differences in addressing
-and illegal instruction traps generally mean some mode switch would be
-required in hardware in any case even with full superset instruction
-encodings, and the different RISC-V base ISAs are similar enough that
-supporting multiple versions is relatively low cost. Although some
-have proposed that the strict superset design would allow legacy
-32-bit libraries to be linked with 64-bit code, this is impractical in
-practice, even with compatible encodings, due to the differences in
-software calling conventions and system-call interfaces.
-
-The RISC-V privileged architecture provides fields in {\tt
- misa} to control the unprivileged ISA at each level to support emulating
-different base ISAs on the same hardware. We note that newer SPARC
-and MIPS ISA revisions have deprecated support for running 32-bit code
-unchanged on 64-bit systems.
-
-A related question is why there is a different encoding for 32-bit
-adds in RV32I (ADD) and RV64I (ADDW)? The ADDW opcode could be used
-for 32-bit adds in RV32I and ADDD for 64-bit adds in RV64I, instead of
-the existing design which uses the same opcode ADD for 32-bit adds in
-RV32I and 64-bit adds in RV64I with a different opcode ADDW for 32-bit
-adds in RV64I. This would also be more consistent with the use of the
-same LW opcode for 32-bit load in both RV32I and RV64I. The very
-first versions of RISC-V ISA did have a variant of this alternate
-design, but the RISC-V design was changed to the current choice in
-January 2011. Our focus was on supporting 32-bit integers in the
-64-bit ISA not on providing compatibility with the 32-bit ISA, and the
-motivation was to remove the asymmetry that arose from having not all
-opcodes in RV32I have a *W suffix (e.g., ADDW, but AND not ANDW). In
-hindsight, this was perhaps not well-justified and a consequence of
-designing both ISAs at the same time as opposed to adding one later to
-sit on top of another, and also from a belief we had to fold platform
-requirements into the ISA spec which would imply that all the RV32I
-instructions would have been required in RV64I. It is too late to
-change the encoding now, but this is also of little practical
-consequence for the reasons stated above.
-
-It has been noted we could enable the *W variants as an extension to
-RV32I systems to provide a common encoding across RV64I and a future
-RV32 variant.
-\end{commentary}
-
-RISC-V has been designed to support extensive customization and
-specialization. Each base integer ISA can be extended with one or
-more optional instruction-set extensions. An extension may be
-categorized as either standard, custom, or non-conforming.
-For this purpose, we divide each RISC-V
-instruction-set encoding space (and related encoding spaces such as
-the CSRs) into three disjoint categories: {\em standard}, {\em
- reserved}, and {\em custom}. Standard extensions and encodings
-are defined by RISC-V International; any extensions not defined by
-RISC-V International are {\em non-standard}.
-Each base ISA and its standard extensions use only standard encodings,
-and shall not conflict with each other in their uses of these encodings.
-Reserved encodings are currently not defined but are saved for future
-standard extensions; once thus used, they become standard encodings.
-Custom encodings shall never be used for standard extensions and are
-made available for vendor-specific non-standard extensions.
-Non-standard extensions are either custom extensions, that use only
-custom encodings, or {\em non-conforming} extensions, that use any
-standard or reserved encoding.
-Instruction-set extensions are generally shared but may provide slightly different
-functionality depending on the base ISA. Chapter~\ref{extensions}
-describes various ways of extending the RISC-V ISA. We have also
-developed a naming convention for RISC-V base instructions and
-instruction-set extensions, described in detail in
-Chapter~\ref{naming}.
-
-To support more general software development, a set of standard
-extensions are defined to provide integer multiply/divide, atomic
-operations, and single and double-precision floating-point arithmetic.
-The base integer ISA is named ``I'' (prefixed by RV32 or RV64
-depending on integer register width), and contains integer
-computational instructions, integer loads, integer stores, and
-control-flow instructions. The standard integer multiplication and
-division extension is named ``M'', and adds instructions to multiply
-and divide values held in the integer registers. The standard atomic
-instruction extension, denoted by ``A'', adds instructions that
-atomically read, modify, and write memory for inter-processor
-synchronization. The standard single-precision floating-point
-extension, denoted by ``F'', adds floating-point registers,
-single-precision computational instructions, and single-precision
-loads and stores. The standard double-precision floating-point
-extension, denoted by ``D'', expands the floating-point registers, and
-adds double-precision computational instructions, loads, and stores.
-The standard ``C'' compressed instruction extension
-provides narrower 16-bit forms of common instructions.
-
-Beyond the base integer ISA and these standard extensions, we believe
-it is rare that a new instruction will provide a significant benefit
-for all applications, although it may be very beneficial for a certain
-domain. As energy efficiency concerns are forcing greater
-specialization, we believe it is important to simplify the required
-portion of an ISA specification. Whereas other architectures usually
-treat their ISA as a single entity, which changes to a new version as
-instructions are added over time, RISC-V will endeavor to keep the
-base and each standard extension constant over time, and instead layer
-new instructions as further optional extensions. For example, the
-base integer ISAs will continue as fully supported standalone ISAs,
-regardless of any subsequent extensions.
-
-\section{Memory}
-
-A RISC-V hart has a single byte-addressable address space
-of $2^{\text{XLEN}}$ bytes for all memory
-accesses. A {\em word} of memory is defined as \wunits{32}{bits}
-(\wunits{4}{bytes}). Correspondingly, a {\em halfword} is \wunits{16}{bits}
-(\wunits{2}{bytes}), a {\em doubleword} is \wunits{64}{bits}
-(\wunits{8}{bytes}), and a {\em quadword} is \wunits{128}{bits}
-(\wunits{16}{bytes}).
-The memory address space is circular, so that the byte at address
-$2^{\text{XLEN}}-1$ is adjacent to the byte at address zero. Accordingly, memory
-address computations done by the hardware ignore overflow and instead
-wrap around modulo $2^{\text{XLEN}}$.
-
-
-The execution environment determines the mapping of hardware resources into
-a hart's address space.
-Different address ranges of a hart's address space may (1)~be vacant, or
-(2)~contain {\em main memory}, or (3)~contain one or more {\em I/O devices}.
-Reads and writes of I/O devices may have visible side effects, but accesses
-to main memory cannot.
-Although it is possible for the execution environment to call everything in
-a hart's address space an I/O device, it is usually expected that some
-portion will be specified as main memory.
-
-When a RISC-V platform has multiple harts, the address spaces of any two
-harts may be entirely the same, or entirely different, or may be partly
-different but sharing some subset of resources, mapped into the same or
-different address ranges.
-
-\begin{commentary}
-For a purely ``bare metal'' environment, all harts may see an identical
-address space, accessed entirely by physical addresses.
-However, when the execution environment includes an operating system
-employing address translation, it is common for each hart to be given a
-virtual address space that is largely or entirely its own.
-\end{commentary}
-
-Executing each RISC-V machine instruction entails one or more memory
-accesses, subdivided into {\em
-implicit} and {\em explicit} accesses. For each instruction executed, an {\em
-implicit} memory read (instruction fetch) is done to obtain the encoded
-instruction to execute. Many RISC-V instructions perform no further memory
-accesses beyond instruction fetch. Specific load and store instructions
-perform an {\em explicit} read or write of memory at an address determined by
-the instruction. The execution environment may dictate that instruction
-execution performs other {\em implicit} memory accesses (such as to implement
-address translation) beyond those documented for the unprivileged ISA.
-
-The execution environment determines what portions of the
-non-vacant address space are
-accessible for each kind of memory access. For example, the set of locations
-that can be implicitly read for instruction fetch may or may not have any
-overlap with the set of locations that can be explicitly read by a load
-instruction; and the set of locations that can be explicitly written by
-a store instruction may be only a subset of locations that can be read.
-Ordinarily, if an instruction attempts to access memory at an inaccessible
-address, an exception is raised for the instruction.
-Vacant locations in the address space are never accessible.
-
-Except when specified otherwise, implicit reads that do not raise an exception
-may occur arbitrarily early and speculatively, even before the machine could
-possibly prove that the read will be needed. For instance, a valid
-implementation could attempt to read all of main memory at the earliest
-opportunity, cache as many fetchable (executable) bytes as possible for later
-instruction fetches, and avoid reading main memory for instruction fetches ever
-again. To ensure that certain implicit reads are ordered only after writes to
-the same memory locations, software must execute specific fence or cache-control
-instructions defined for this purpose (such as the FENCE.I instruction
-defined in Chapter~\ref{chap:zifencei}).
-
-The memory accesses (implicit or explicit) made by a hart may appear to occur
-in a different order as perceived by another hart or by any other agent that
-can access the same memory. This perceived reordering of memory accesses is
-always constrained, however, by the applicable memory consistency model. The
-default memory consistency model for RISC-V is the RISC-V Weak Memory Ordering
-(RVWMO), defined in Chapter~\ref{ch:memorymodel} and in appendices.
-Optionally, an implementation may adopt the stronger model of Total Store
-Ordering, as defined in Chapter~\ref{sec:ztso}. The execution environment may
-also add constraints that further limit the perceived reordering of memory
-accesses.
-Since the RVWMO model is the weakest model allowed for any RISC-V
-implementation, software written for this model is compatible with the
-actual memory consistency rules of all RISC-V implementations. As with
-implicit reads, software must execute fence or cache-control instructions to
-ensure specific ordering of memory accesses beyond the requirements of the
-assumed memory consistency model and execution environment.
-
-\section{Base Instruction-Length Encoding}
-
-The base RISC-V ISA has fixed-length 32-bit instructions that must be
-naturally aligned on 32-bit boundaries. However, the standard RISC-V
-encoding scheme is designed to support ISA extensions with
-variable-length instructions, where each instruction can be any number
-of 16-bit instruction {\em parcels} in length and parcels are
-naturally aligned on 16-bit boundaries. The standard compressed ISA
-extension described in Chapter~\ref{compressed} reduces code size by
-providing compressed 16-bit instructions and relaxes the alignment
-constraints to allow all instructions (16 bit and 32 bit) to be
-aligned on any 16-bit boundary to improve code density.
-
-We use the term IALIGN (measured in bits) to refer to the instruction-address
-alignment constraint the implementation enforces. IALIGN is 32 bits in the
-base ISA, but some ISA extensions, including the compressed ISA extension,
-relax IALIGN to 16 bits. IALIGN may not take on any value other than 16 or
-32.
-
-We use the term ILEN (measured in bits) to refer to the maximum
-instruction length supported by an implementation, and which is always
-a multiple of IALIGN. For implementations supporting only a base
-instruction set, ILEN is 32 bits. Implementations supporting longer
-instructions have larger values of ILEN.
-
-Figure~\ref{instlengthcode} illustrates the standard RISC-V
-instruction-length encoding convention. All the 32-bit instructions
-in the base ISA have their lowest two bits set to {\tt 11}. The
-optional compressed 16-bit instruction-set extensions have their
-lowest two bits equal to {\tt 00}, {\tt 01}, or {\tt 10}.
-
-\subsection*{Expanded Instruction-Length Encoding}
-
-A portion of the 32-bit instruction-encoding space has been tentatively
-allocated for instructions longer than 32 bits. The entirety of this space is
-reserved at this time, and the following proposal for encoding instructions
-longer than 32 bits is not considered frozen.
-
-Standard instruction-set extensions
-encoded with more than 32 bits have additional low-order bits set to {\tt 1},
-with the conventions for 48-bit and 64-bit lengths shown in
-Figure~\ref{instlengthcode}. Instruction lengths between 80 bits and 176 bits
-are encoded using a 3-bit field in bits [14:12] giving the number of 16-bit
-words in addition to the first 5$\times$16-bit words. The encoding with bits
-[14:12] set to {\tt 111} is reserved for future longer instruction encodings.
-
-
-\begin{figure}[hbt]
-{
-\begin{center}
-\begin{tabular}{ccccl}
-\cline{4-4}
-& & & \multicolumn{1}{|c|}{\tt xxxxxxxxxxxxxxaa} & 16-bit ({\tt aa}
-$\neq$ {\tt 11})\\
-\cline{4-4}
-\\
-\cline{3-4}
-& & \multicolumn{1}{|c|}{\tt xxxxxxxxxxxxxxxx}
-& \multicolumn{1}{c|}{\tt xxxxxxxxxxxbbb11} & 32-bit ({\tt bbb}
-$\neq$ {\tt 111}) \\
-\cline{3-4}
-\\
-\cline{2-4}
-\hspace{0.1in}
-& \multicolumn{1}{c|}{$\cdot\cdot\cdot${\tt xxxx} }
-& \multicolumn{1}{c|}{\tt xxxxxxxxxxxxxxxx}
-& \multicolumn{1}{c|}{\tt xxxxxxxxxx011111} & 48-bit \\
-\cline{2-4}
-\\
-\cline{2-4}
-\hspace{0.1in}
-& \multicolumn{1}{c|}{$\cdot\cdot\cdot${\tt xxxx} }
-& \multicolumn{1}{c|}{\tt xxxxxxxxxxxxxxxx}
-& \multicolumn{1}{c|}{\tt xxxxxxxxx0111111} & 64-bit \\
-\cline{2-4}
-\\
-\cline{2-4}
-\hspace{0.1in}
-& \multicolumn{1}{c|}{$\cdot\cdot\cdot${\tt xxxx} }
-& \multicolumn{1}{c|}{\tt xxxxxxxxxxxxxxxx}
-& \multicolumn{1}{c|}{\tt xnnnxxxxx1111111} & (80+16*{\tt nnn})-bit,
- {\tt nnn}$\neq${\tt 111} \\
-\cline{2-4}
-\\
-\cline{2-4}
-\hspace{0.1in}
-& \multicolumn{1}{c|}{$\cdot\cdot\cdot${\tt xxxx} }
-& \multicolumn{1}{c|}{\tt xxxxxxxxxxxxxxxx}
-& \multicolumn{1}{c|}{\tt x111xxxxx1111111} & Reserved for $\geq$192-bits \\
-\cline{2-4}
-\\
-Byte Address: & \multicolumn{1}{r}{base+4} & \multicolumn{1}{r}{base+2} & \multicolumn{1}{r}{base} & \\
- \end{tabular}
-\end{center}
-}
-\caption{RISC-V instruction length encoding. Only the 16-bit and 32-bit encodings are considered frozen at this time.}
-\label{instlengthcode}
-\end{figure}
-
-\begin{commentary}
-Given the code size and energy savings of a compressed format, we
-wanted to build in support for a compressed format to the ISA encoding
-scheme rather than adding this as an afterthought, but to allow
-simpler implementations we didn't want to make the compressed format
-mandatory. We also wanted to optionally allow longer instructions to
-support experimentation and larger instruction-set extensions.
-Although our encoding convention required a tighter encoding of the
-core RISC-V ISA, this has several beneficial effects.
-
-An implementation of the standard IMAFD ISA need only hold the
-most-significant 30 bits in instruction caches (a 6.25\% saving). On
-instruction cache refills, any instructions encountered with either
-low bit clear should be recoded into illegal 30-bit instructions
-before storing in the cache to preserve illegal instruction exception
-behavior.
-
-Perhaps more importantly, by condensing our base ISA into a subset of
-the 32-bit instruction word, we leave more space available for
-non-standard and custom extensions. In particular, the base RV32I ISA
-uses less than 1/8 of the encoding space in the 32-bit instruction
-word. As described in Chapter~\ref{extensions}, an implementation
-that does not require support for the standard compressed instruction
-extension can map 3 additional non-conforming 30-bit instruction
-spaces into the 32-bit fixed-width format, while preserving support
-for standard $\geq$32-bit instruction-set extensions. Further, if the
-implementation also does not need instructions $>$32-bits in length,
-it can recover a further four major opcodes for non-conforming extensions.
-\end{commentary}
-
-Encodings with bits [15:0] all zeros are defined as illegal
-instructions. These instructions are considered to be of minimal
-length: 16 bits if any 16-bit instruction-set extension is present,
-otherwise 32 bits. The encoding with bits [ILEN-1:0] all ones is also
-illegal; this instruction is considered to be ILEN bits long.
-
-\begin{commentary}
-We consider it a feature that any length of instruction containing all
-zero bits is not legal, as this quickly traps erroneous jumps into
-zeroed memory regions. Similarly, we also reserve the instruction
-encoding containing all ones to be an illegal instruction, to catch
-the other common pattern observed with unprogrammed non-volatile
-memory devices, disconnected memory buses, or broken memory devices.
-
-Software can rely on a naturally aligned 32-bit word containing zero to
-act as an illegal instruction on all RISC-V implementations, to be used
-by software where an illegal instruction is explicitly desired.
-Defining a corresponding known illegal value for all ones is more
-difficult due to the variable-length encoding. Software cannot
-generally use the illegal value of ILEN bits of all 1s, as software
-might not know ILEN for the eventual target machine (e.g., if software
-is compiled into a standard binary library used by many different
-machines). Defining a 32-bit word of all ones as illegal was also
-considered, as all machines must support a 32-bit instruction size, but
-this requires the instruction-fetch unit on machines with ILEN$>$32
-report an illegal instruction exception rather than an access-fault
-exception when such an instruction borders a protection boundary,
-complicating variable-instruction-length fetch and decode.
-\end{commentary}
-
-RISC-V base ISAs have either little-endian or big-endian memory systems,
-with the privileged architecture further defining bi-endian operation.
-Instructions are stored in memory as a sequence of 16-bit little-endian
-parcels, regardless of memory system endianness.
-Parcels forming one instruction are stored at increasing
-halfword addresses, with the lowest-addressed parcel holding the
-lowest-numbered bits in the instruction specification.
-
-\begin{commentary}
-We originally chose little-endian byte ordering for the RISC-V memory system
-because little-endian systems are currently dominant commercially (all
-x86 systems; iOS, Android, and Windows for ARM). A minor point is
-that we have also found little-endian memory systems to be more
-natural for hardware designers. However, certain application areas,
-such as IP networking, operate on big-endian data structures, and
-certain legacy code bases have been built assuming big-endian
-processors, so we have defined big-endian and bi-endian variants of RISC-V.
-
-We have to fix the order in which instruction parcels are stored in
-memory, independent of memory system endianness, to ensure that the
-length-encoding bits always appear first in halfword address
-order. This allows the length of a variable-length instruction to be
-quickly determined by an instruction-fetch unit by examining only the
-first few bits of the first 16-bit instruction parcel.
-
-We further make the instruction parcels themselves little-endian to decouple
-the instruction encoding from the memory system endianness altogether.
-This design benefits both software tooling and bi-endian hardware.
-Otherwise, for instance, a RISC-V assembler or disassembler would always need
-to know the intended active endianness, despite that in bi-endian systems, the
-endianness mode might change dynamically during execution.
-In contrast, by giving instructions a fixed endianness, it is sometimes
-possible for carefully written software to be endianness-agnostic even in
-binary form, much like position-independent code.
-
-The choice to have instructions be only little-endian does have consequences,
-however, for RISC-V software that encodes or decodes machine instructions.
-Big-endian JIT compilers, for example, must swap the byte order when storing
-to instruction memory.
-
-Once we had decided to fix on a little-endian instruction encoding, this
-naturally led to placing the length-encoding bits in the LSB positions of the
-instruction format to avoid breaking up opcode fields.
-\end{commentary}
-
-\section{Exceptions, Traps, and Interrupts}
-\label{sec:trap-defn}
-
-We use the term {\em exception} to refer to an unusual condition
-occurring at run time associated with an instruction in the current
-RISC-V hart. We use the term {\em interrupt} to refer to an external
-asynchronous event that may cause a RISC-V hart to experience an
-unexpected transfer of control. We use the term {\em trap} to refer
-to the transfer of control to a trap handler caused by either an
-exception or an interrupt.
-
-The instruction descriptions in following chapters describe conditions
-that can raise an exception during execution. The general behavior of
-most RISC-V EEIs is that a trap to some handler occurs when an
-exception is signaled on an instruction (except for floating-point
-exceptions, which, in the standard floating-point extensions, do not
-cause traps). The manner in which interrupts are generated, routed
-to, and enabled by a hart depends on the EEI.
-
-\begin{commentary}
-Our use of ``exception'' and ``trap'' is compatible with that in the IEEE-754
-floating-point standard.
-\end{commentary}
-
-How traps are handled and made visible to software running on the hart
-depends on the enclosing execution environment. From the perspective
-of software running inside an execution environment, traps encountered
-by a hart at runtime can have four different effects:
-\begin{description}
- \item[Contained Trap:] The trap is visible to, and handled by,
- software running inside the execution environment. For example,
- in an EEI providing both supervisor and user
- mode on harts, an ECALL by a user-mode hart will generally result
- in a transfer of control to a supervisor-mode handler running on
- the same hart. Similarly, in the same environment, when a hart is
- interrupted, an interrupt handler will be run in supervisor mode
- on the hart.
- \item[Requested Trap:] The trap is a synchronous exception that is
- an explicit call to the execution environment requesting an action
- on behalf of software inside the execution environment. An
- example is a system call. In this case, execution may or may not
- resume on the hart after the requested action is taken by the
- execution environment. For example, a system call could remove the
- hart or cause an orderly termination of the entire execution environment.
- \item[Invisible Trap:] The trap is handled transparently by the
- execution environment and execution resumes normally after the
- trap is handled. Examples include emulating missing instructions,
- handling non-resident page faults in a demand-paged virtual-memory
- system, or handling device interrupts for a different job in a
- multiprogrammed machine. In these cases, the software running
- inside the execution environment is not aware of the trap (we
- ignore timing effects in these definitions).
- \item[Fatal Trap:] The trap represents a fatal failure and causes
- the execution environment to terminate execution. Examples
- include failing a virtual-memory page-protection check or allowing
- a watchdog timer to expire. Each EEI should define how execution
- is terminated and reported to an external environment.
-\end{description}
-
-Table~\ref{table:trapcharacteristics} shows the characteristics of each
-kind of trap.
-
-\begin{table}[hbt]
- \centering
- \begin{tabular}{|l|c|c|c|c|}
- \hline
- & Contained & Requested & Invisible & Fatal\\
- \hline
- Execution terminates & No & No$^{1}$ & No & Yes \\
- Software is oblivious & No & No & Yes & Yes$^{2}$ \\
- Handled by environment & No & Yes & Yes & Yes \\
- \hline
- \end{tabular}
- \caption{Characteristics of traps. Notes: 1) Termination may be
- requested. 2) Imprecise fatal traps might be observable by software.}
-\label{table:trapcharacteristics}
-\end{table}
-
-The EEI defines for each trap whether it is handled precisely, though
-the recommendation is to maintain preciseness where possible.
-Contained and requested traps can be observed to be imprecise by
-software inside the execution environment. Invisible traps, by
-definition, cannot be observed to be precise or imprecise by software
-running inside the execution environment. Fatal traps can be observed
-to be imprecise by software running inside the execution environment,
-if known-errorful instructions do not cause immediate termination.
-
-Because this document describes unprivileged instructions, traps are
-rarely mentioned. Architectural means to handle contained traps are
-defined in the privileged architecture manual, along with other
-features to support richer EEIs. Unprivileged instructions that are
-defined solely to cause requested traps are documented here.
-Invisible traps are, by their nature, out of scope for this document.
-Instruction encodings that are not defined here and not defined by
-some other means may cause a fatal trap.
-
-\section{UNSPECIFIED Behaviors and Values}
-
-The architecture fully describes what implementations must do and any
-constraints on what they may do. In cases where the architecture
-intentionally does not constrain implementations, the term \unspecified\
-is explicitly used.
-
-The term \unspecified\ refers to a behavior or value that is
-intentionally unconstrained. The definition of these behaviors or
-values is open to extensions, platform standards, or implementations.
-Extensions, platform standards, or implementation documentation may
-provide normative content to further constrain cases that the base
-architecture defines as \unspecified.
-
-Like the base architecture, extensions should fully describe allowable
-behavior and values and use the term \unspecified\ for cases that are
-intentionally unconstrained. These cases may be constrained or defined
-by other extensions, platform standards, or implementations.
diff --git a/src/j.tex b/src/j.tex
deleted file mode 100644
index 37a8b37..0000000
--- a/src/j.tex
+++ /dev/null
@@ -1,13 +0,0 @@
-\chapter{``J'' Standard Extension for Dynamically Translated Languages, Version 0.0}
-\label{sec:j}
-
-This chapter is a placeholder for a future standard extension to
-support dynamically translated languages.
-
-\begin{commentary}
- Many popular languages are usually implemented via dynamic
- translation, including Java and Javascript. These languages can
- benefit from additional ISA support for dynamic checks and garbage
- collection.
-\end{commentary}
-
diff --git a/src/m.tex b/src/m.tex
deleted file mode 100644
index 3aceb8b..0000000
--- a/src/m.tex
+++ /dev/null
@@ -1,188 +0,0 @@
-\chapter{``M'' Standard Extension for Integer Multiplication and
- Division, Version 2.0}
-
-This chapter describes the standard integer multiplication and
-division instruction extension, which is named ``M'' and contains
-instructions that multiply or divide values held in two integer
-registers.
-
-\begin{commentary}
-We separate integer multiply and divide out from the base to simplify
-low-end implementations, or for applications where integer multiply
-and divide operations are either infrequent or better handled in
-attached accelerators.
-\end{commentary}
-
-\section{Multiplication Operations}
-\label{multiplication-operations}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{S@{}R@{}R@{}S@{}R@{}O}
-\\
-\instbitrange{31}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct7} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-7 & 5 & 5 & 3 & 5 & 7 \\
-MULDIV & multiplier & multiplicand & MUL/MULH[[S]U] & dest & OP \\
-MULDIV & multiplier & multiplicand & MULW & dest & OP-32 \\
-\end{tabular}
-\end{center}
-
-MUL performs an XLEN-bit$\times$XLEN-bit multiplication
-of {\em rs1} by {\em rs2} and places the
-lower XLEN bits in the destination register. MULH, MULHU, and MULHSU
-perform the same multiplication but return the upper XLEN bits of the
-full 2$\times$XLEN-bit product, for signed$\times$signed,
-unsigned$\times$unsigned, and \wunits{signed}{\em rs1}$\times$\wunits{unsigned}{\em rs2} multiplication,
-respectively. If both the high and low bits of the same product are
-required, then the recommended code sequence is: MULH[[S]U] {\em rdh,
- rs1, rs2}; MUL {\em rdl, rs1, rs2} (source register specifiers must
-be in same order and {\em rdh} cannot be the same as {\em rs1} or {\em
- rs2}). Microarchitectures can then fuse these into a single
-multiply operation instead of performing two separate multiplies.
-
-\begin{commentary}
-MULHSU is used in multi-word signed multiplication to multiply the
-most-significant word of the multiplicand (which contains the sign bit)
-with the less-significant words of the multiplier (which are unsigned).
-\end{commentary}
-
-MULW is an RV64 instruction that multiplies the lower 32 bits of the source
-registers, placing the sign-extension of the lower 32 bits of the result
-into the destination register.
-
-\begin{commentary}
-In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit product,
-but signed arguments must be proper 32-bit signed values, whereas unsigned
-arguments must have their upper 32 bits clear. If the
-arguments are not known to be sign- or zero-extended, an alternative is to
-shift both arguments left by 32 bits, then use MULH[[S]U].
-\end{commentary}
-
-\section{Division Operations}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{S@{}R@{}R@{}O@{}R@{}O}
-\\
-\instbitrange{31}{25} &
-\instbitrange{24}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct7} &
-\multicolumn{1}{c|}{rs2} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-7 & 5 & 5 & 3 & 5 & 7 \\
-MULDIV & divisor & dividend & DIV[U]/REM[U] & dest & OP \\
-MULDIV & divisor & dividend & DIV[U]W/REM[U]W & dest & OP-32 \\
-\end{tabular}
-\end{center}
-
-DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer
-division of {\em rs1} by {\em rs2}, rounding towards zero.
-REM and REMU provide the remainder of the corresponding division operation.
-For REM, the sign of the result equals the sign of the dividend.
-
-\begin{commentary}
-For both signed and unsigned division, it holds that
-\mbox{$\textrm{dividend} = \textrm{divisor} \times \textrm{quotient} + \textrm{remainder}$}.
-\end{commentary}
-
-If both the quotient and remainder
-are required from the same division, the recommended code sequence is:
-DIV[U] {\em rdq, rs1, rs2}; REM[U] {\em rdr, rs1, rs2} ({\em rdq}
-cannot be the same as {\em rs1} or {\em rs2}). Microarchitectures can
-then fuse these into a single divide operation instead of performing
-two separate divides.
-
-DIVW and DIVUW are RV64 instructions that divide the
-lower 32 bits of {\em rs1} by the lower 32 bits of {\em rs2}, treating
-them as signed and unsigned integers respectively, placing the 32-bit
-quotient in {\em rd}, sign-extended to 64 bits. REMW and REMUW
-are RV64 instructions that provide the corresponding
-signed and unsigned remainder operations respectively. Both REMW and
-REMUW always sign-extend the 32-bit result to 64 bits, including on a
-divide by zero.
-
-The semantics for division by zero and division overflow are summarized in
-Table~\ref{tab:divby0}. The quotient of division by zero has all bits set, and
-the remainder of division by zero equals the dividend. Signed division overflow
-occurs only when the most-negative integer is divided by $-1$. The quotient of
-a signed division with overflow is equal to the dividend, and the remainder is
-zero. Unsigned division overflow cannot occur.
-
-\begin{table}[h]
-\center
-\begin{tabular}{|l|c|c||c|c|c|c|}
-\hline
-Condition & Dividend & Divisor & DIVU[W] & REMU[W] & DIV[W] & REM[W] \\ \hline
-Division by zero & $x$ & 0 & $2^{L}-1$ & $x$ & $-1$ & $x$ \\
-Overflow (signed only) & $-2^{L-1}$ & $-1$ & -- & -- & $-2^{L-1}$ & 0 \\
-\hline
-\end{tabular}
-\caption{Semantics for division by zero and division overflow.
-L is the width of the operation in bits: XLEN for DIV[U] and REM[U], or
-32 for DIV[U]W and REM[U]W.}
-\label{tab:divby0}
-\end{table}
-
-\begin{commentary}
-We considered raising exceptions on integer divide by zero, with these
-exceptions causing a trap in most execution environments. However,
-this would be the only arithmetic trap in the standard ISA
-(floating-point exceptions set flags and write default values, but do
-not cause traps) and would require language implementors to interact
-with the execution environment's trap handlers for this case.
-Further, where language standards mandate that a divide-by-zero
-exception must cause an immediate control flow change, only a single
-branch instruction needs to be added to each divide operation, and
-this branch instruction can be inserted after the divide and should
-normally be very predictably not taken, adding little runtime
-overhead.
-
-The value of all bits set is returned for both unsigned and signed
-divide by zero to simplify the divider circuitry. The value of all 1s
-is both the natural value to return for unsigned divide, representing
-the largest unsigned number, and also the natural result for simple
-unsigned divider implementations. Signed division is often
-implemented using an unsigned division circuit and specifying the same
-overflow result simplifies the hardware.
-\end{commentary}
-
-\section{Zmmul Extension, Version 1.0}
-
-The Zmmul extension implements the multiplication subset of the M extension.
-It adds all of the instructions defined in Section~\ref{multiplication-operations},
-namely: MUL, MULH, MULHU, MULHSU, and (for RV64 only) MULW.
-The encodings are identical to those of the corresponding M-extension instructions.
-
-\begin{commentary}
-The Zmmul extension enables low-cost implementations that require
-multiplication operations but not division.
-For many microcontroller applications, division operations are too
-infrequent to justify the cost of divider hardware.
-By contrast, multiplication operations are more frequent, making the cost of
-multiplier hardware more justifiable.
-Simple FPGA soft cores particularly benefit from eliminating division but
-retaining multiplication, since many FPGAs provide hardwired multipliers
-but require dividers be implemented in soft logic.
-\end{commentary}
diff --git a/src/memory-model-alloy.tex b/src/memory-model-alloy.tex
deleted file mode 100644
index c584931..0000000
--- a/src/memory-model-alloy.tex
+++ /dev/null
@@ -1,269 +0,0 @@
-\section{Formal Axiomatic Specification in Alloy}
-\label{sec:alloy}
-
-\lstdefinelanguage{alloy}{
- morekeywords={abstract, sig, extends, pred, fun, fact, no, set, one, lone, let, not, all, iden, some, run, for},
- morecomment=[l]{//},
- morecomment=[s]{/*}{*/},
- commentstyle=\color{green!40!black},
- keywordstyle=\color{blue!40!black},
- moredelim=**[is][\color{red}]{@}{@},
- escapeinside={!}{!},
-}
-\lstset{language=alloy}
-\lstset{aboveskip=0pt}
-\lstset{belowskip=0pt}
-
-We present a formal specification of the RVWMO memory model in Alloy (\url{http://alloy.mit.edu}).
-This model is available online at \url{https://github.com/daniellustig/riscv-memory-model}.
-
-The online material also contains some litmus tests and some examples of how Alloy can be used to model check some of the mappings in Section~\ref{sec:memory:porting}.
-
-\begin{figure}[h!]
- {
- \tt\bfseries\centering\footnotesize
- \begin{lstlisting}
-////////////////////////////////////////////////////////////////////////////////
-// =RVWMO PPO=
-
-// Preserved Program Order
-fun ppo : Event->Event {
- // same-address ordering
- po_loc :> Store
- + rdw
- + (AMO + StoreConditional) <: rfi
-
- // explicit synchronization
- + ppo_fence
- + Acquire <: ^po :> MemoryEvent
- + MemoryEvent <: ^po :> Release
- + RCsc <: ^po :> RCsc
- + pair
-
- // syntactic dependencies
- + addrdep
- + datadep
- + ctrldep :> Store
-
- // pipeline dependencies
- + (addrdep+datadep).rfi
- + addrdep.^po :> Store
-}
-
-// the global memory order respects preserved program order
-fact { ppo in ^gmo }
-\end{lstlisting}}
- \caption{The RVWMO memory model formalized in Alloy (1/5: PPO)}
- \label{fig:alloy1}
-\end{figure}
-\begin{figure}[h!]
- {
- \tt\bfseries\centering\footnotesize
- \begin{lstlisting}
-////////////////////////////////////////////////////////////////////////////////
-// =RVWMO axioms=
-
-// Load Value Axiom
-fun candidates[r: MemoryEvent] : set MemoryEvent {
- (r.~^gmo & Store & same_addr[r]) // writes preceding r in gmo
- + (r.^~po & Store & same_addr[r]) // writes preceding r in po
-}
-
-fun latest_among[s: set Event] : Event { s - s.~^gmo }
-
-pred LoadValue {
- all w: Store | all r: Load |
- w->r in rf <=> w = latest_among[candidates[r]]
-}
-
-// Atomicity Axiom
-pred Atomicity {
- all r: Store.~pair | // starting from the lr,
- no x: Store & same_addr[r] | // there is no store x to the same addr
- x not in same_hart[r] // such that x is from a different hart,
- and x in r.~rf.^gmo // x follows (the store r reads from) in gmo,
- and r.pair in x.^gmo // and r follows x in gmo
-}
-
-// Progress Axiom implicit: Alloy only considers finite executions
-
-pred RISCV_mm { LoadValue and Atomicity /* and Progress */ }
-
-\end{lstlisting}}
- \caption{The RVWMO memory model formalized in Alloy (2/5: Axioms)}
- \label{fig:alloy2}
-\end{figure}
-\begin{figure}[h!]
- {
- \tt\bfseries\centering\footnotesize
- \begin{lstlisting}
-////////////////////////////////////////////////////////////////////////////////
-// Basic model of memory
-
-sig Hart { // hardware thread
- start : one Event
-}
-sig Address {}
-abstract sig Event {
- po: lone Event // program order
-}
-
-abstract sig MemoryEvent extends Event {
- address: one Address,
- acquireRCpc: lone MemoryEvent,
- acquireRCsc: lone MemoryEvent,
- releaseRCpc: lone MemoryEvent,
- releaseRCsc: lone MemoryEvent,
- addrdep: set MemoryEvent,
- ctrldep: set Event,
- datadep: set MemoryEvent,
- gmo: set MemoryEvent, // global memory order
- rf: set MemoryEvent
-}
-sig LoadNormal extends MemoryEvent {} // l{b|h|w|d}
-sig LoadReserve extends MemoryEvent { // lr
- pair: lone StoreConditional
-}
-sig StoreNormal extends MemoryEvent {} // s{b|h|w|d}
-// all StoreConditionals in the model are assumed to be successful
-sig StoreConditional extends MemoryEvent {} // sc
-sig AMO extends MemoryEvent {} // amo
-sig NOP extends Event {}
-
-fun Load : Event { LoadNormal + LoadReserve + AMO }
-fun Store : Event { StoreNormal + StoreConditional + AMO }
-
-sig Fence extends Event {
- pr: lone Fence, // opcode bit
- pw: lone Fence, // opcode bit
- sr: lone Fence, // opcode bit
- sw: lone Fence // opcode bit
-}
-sig FenceTSO extends Fence {}
-
-/* Alloy encoding detail: opcode bits are either set (encoded, e.g.,
- * as f.pr in iden) or unset (f.pr not in iden). The bits cannot be used for
- * anything else */
-fact { pr + pw + sr + sw in iden }
-// likewise for ordering annotations
-fact { acquireRCpc + acquireRCsc + releaseRCpc + releaseRCsc in iden }
-// don't try to encode FenceTSO via pr/pw/sr/sw; just use it as-is
-fact { no FenceTSO.(pr + pw + sr + sw) }
-\end{lstlisting}}
- \caption{The RVWMO memory model formalized in Alloy (3/5: model of memory)}
- \label{fig:alloy3}
-\end{figure}
-
-\begin{figure}[h!]
- {
- \tt\bfseries\centering\footnotesize
- \begin{lstlisting}
-////////////////////////////////////////////////////////////////////////////////
-// =Basic model rules=
-
-// Ordering annotation groups
-fun Acquire : MemoryEvent { MemoryEvent.acquireRCpc + MemoryEvent.acquireRCsc }
-fun Release : MemoryEvent { MemoryEvent.releaseRCpc + MemoryEvent.releaseRCsc }
-fun RCpc : MemoryEvent { MemoryEvent.acquireRCpc + MemoryEvent.releaseRCpc }
-fun RCsc : MemoryEvent { MemoryEvent.acquireRCsc + MemoryEvent.releaseRCsc }
-
-// There is no such thing as store-acquire or load-release, unless it's both
-fact { Load & Release in Acquire }
-fact { Store & Acquire in Release }
-
-// FENCE PPO
-fun FencePRSR : Fence { Fence.(pr & sr) }
-fun FencePRSW : Fence { Fence.(pr & sw) }
-fun FencePWSR : Fence { Fence.(pw & sr) }
-fun FencePWSW : Fence { Fence.(pw & sw) }
-
-fun ppo_fence : MemoryEvent->MemoryEvent {
- (Load <: ^po :> FencePRSR).(^po :> Load)
- + (Load <: ^po :> FencePRSW).(^po :> Store)
- + (Store <: ^po :> FencePWSR).(^po :> Load)
- + (Store <: ^po :> FencePWSW).(^po :> Store)
- + (Load <: ^po :> FenceTSO) .(^po :> MemoryEvent)
- + (Store <: ^po :> FenceTSO) .(^po :> Store)
-}
-
-// auxiliary definitions
-fun po_loc : Event->Event { ^po & address.~address }
-fun same_hart[e: Event] : set Event { e + e.^~po + e.^po }
-fun same_addr[e: Event] : set Event { e.address.~address }
-
-// initial stores
-fun NonInit : set Event { Hart.start.*po }
-fun Init : set Event { Event - NonInit }
-fact { Init in StoreNormal }
-fact { Init->(MemoryEvent & NonInit) in ^gmo }
-fact { all e: NonInit | one e.*~po.~start } // each event is in exactly one hart
-fact { all a: Address | one Init & a.~address } // one init store per address
-fact { no Init <: po and no po :> Init }
-\end{lstlisting}}
- \caption{The RVWMO memory model formalized in Alloy (4/5: Basic model rules)}
- \label{fig:alloy4}
-\end{figure}
-
-\begin{figure}[h!]
- {
- \tt\bfseries\centering\footnotesize
- \begin{lstlisting}
-// po
-fact { acyclic[po] }
-
-// gmo
-fact { total[^gmo, MemoryEvent] } // gmo is a total order over all MemoryEvents
-
-//rf
-fact { rf.~rf in iden } // each read returns the value of only one write
-fact { rf in Store <: address.~address :> Load }
-fun rfi : MemoryEvent->MemoryEvent { rf & (*po + *~po) }
-
-//dep
-fact { no StoreNormal <: (addrdep + ctrldep + datadep) }
-fact { addrdep + ctrldep + datadep + pair in ^po }
-fact { datadep in datadep :> Store }
-fact { ctrldep.*po in ctrldep }
-fact { no pair & (^po :> (LoadReserve + StoreConditional)).^po }
-fact { StoreConditional in LoadReserve.pair } // assume all SCs succeed
-
-// rdw
-fun rdw : Event->Event {
- (Load <: po_loc :> Load) // start with all same_address load-load pairs,
- - (~rf.rf) // subtract pairs that read from the same store,
- - (po_loc.rfi) // and subtract out "fri-rfi" patterns
-}
-
-// filter out redundant instances and/or visualizations
-fact { no gmo & gmo.gmo } // keep the visualization uncluttered
-fact { all a: Address | some a.~address }
-
-////////////////////////////////////////////////////////////////////////////////
-// =Optional: opcode encoding restrictions=
-
-// the list of blessed fences
-fact { Fence in
- Fence.pr.sr
- + Fence.pw.sw
- + Fence.pr.pw.sw
- + Fence.pr.sr.sw
- + FenceTSO
- + Fence.pr.pw.sr.sw
-}
-
-pred restrict_to_current_encodings {
- no (LoadNormal + StoreNormal) & (Acquire + Release)
-}
-
-////////////////////////////////////////////////////////////////////////////////
-// =Alloy shortcuts=
-pred acyclic[rel: Event->Event] { no iden & ^rel }
-pred total[rel: Event->Event, bag: Event] {
- all disj e, e': bag | e->e' in rel + ~rel
- acyclic[rel]
-}
-\end{lstlisting}}
- \caption{The RVWMO memory model formalized in Alloy (5/5: Auxiliaries)}
- \label{fig:alloy5}
-\end{figure}
-
diff --git a/src/memory-model-herd.tex b/src/memory-model-herd.tex
deleted file mode 100644
index de4a59e..0000000
--- a/src/memory-model-herd.tex
+++ /dev/null
@@ -1,160 +0,0 @@
-\section{Formal Axiomatic Specification in Herd}
-\label{sec:herd}
-
-The tool \textsf{herd} takes a memory model and a litmus test as input and simulates the execution of the test on top of the memory model. Memory models are written in the domain specific language \textsc{Cat}. This section provides two \textsc{Cat} memory model of RVWMO. The first model, Figure~\ref{fig:herd2}, follows the \emph{global memory order}, Chapter~\ref{ch:memorymodel}, definition of~RVWMO, as much as is possible for a \textsc{Cat} model. The second model, Figure~\ref{fig:herd3}, is an equivalent, more efficient, partial order based RVWMO model.
-
-The simulator~\textsf{herd} is part of the \textsf{diy} tool suite --- see \url{http://diy.inria.fr} for software and documentation. The models and more are available online at~\url{http://diy.inria.fr/cats7/riscv/}.
-
-\begin{figure}[h!]
- {
- \tt\bfseries\centering\footnotesize
- \begin{lstlisting}
-(*************)
-(* Utilities *)
-(*************)
-
-(* All fence relations *)
-let fence.r.r = [R];fencerel(Fence.r.r);[R]
-let fence.r.w = [R];fencerel(Fence.r.w);[W]
-let fence.r.rw = [R];fencerel(Fence.r.rw);[M]
-let fence.w.r = [W];fencerel(Fence.w.r);[R]
-let fence.w.w = [W];fencerel(Fence.w.w);[W]
-let fence.w.rw = [W];fencerel(Fence.w.rw);[M]
-let fence.rw.r = [M];fencerel(Fence.rw.r);[R]
-let fence.rw.w = [M];fencerel(Fence.rw.w);[W]
-let fence.rw.rw = [M];fencerel(Fence.rw.rw);[M]
-let fence.tso =
- let f = fencerel(Fence.tso) in
- ([W];f;[W]) | ([R];f;[M])
-
-let fence =
- fence.r.r | fence.r.w | fence.r.rw |
- fence.w.r | fence.w.w | fence.w.rw |
- fence.rw.r | fence.rw.w | fence.rw.rw |
- fence.tso
-
-(* Same address, no W to the same address in-between *)
-let po-loc-no-w = po-loc \ (po-loc?;[W];po-loc)
-(* Read same write *)
-let rsw = rf^-1;rf
-(* Acquire, or stronger *)
-let AQ = Acq|AcqRel
-(* Release or stronger *)
-and RL = RelAcqRel
-(* All RCsc *)
-let RCsc = Acq|Rel|AcqRel
-(* Amo events are both R and W, relation rmw relates paired lr/sc *)
-let AMO = R & W
-let StCond = range(rmw)
-
-(*************)
-(* ppo rules *)
-(*************)
-
-(* Overlapping-Address Orderings *)
-let r1 = [M];po-loc;[W]
-and r2 = ([R];po-loc-no-w;[R]) \ rsw
-and r3 = [AMO|StCond];rfi;[R]
-(* Explicit Synchronization *)
-and r4 = fence
-and r5 = [AQ];po;[M]
-and r6 = [M];po;[RL]
-and r7 = [RCsc];po;[RCsc]
-and r8 = rmw
-(* Syntactic Dependencies *)
-and r9 = [M];addr;[M]
-and r10 = [M];data;[W]
-and r11 = [M];ctrl;[W]
-(* Pipeline Dependencies *)
-and r12 = [R];(addr|data);[W];rfi;[R]
-and r13 = [R];addr;[M];po;[W]
-
-let ppo = r1 | r2 | r3 | r4 | r5 | r6 | r7 | r8 | r9 | r10 | r11 | r12 | r13
-\end{lstlisting}
- }
- \caption{{\tt riscv-defs.cat}, a herd definition of preserved program order (1/3)}
- \label{fig:herd1}
-\end{figure}
-
-\begin{figure}[ht!]
- {
- \tt\bfseries\centering\footnotesize
- \begin{lstlisting}
-Total
-
-(* Notice that herd has defined its own rf relation *)
-
-(* Define ppo *)
-include "riscv-defs.cat"
-
-(********************************)
-(* Generate global memory order *)
-(********************************)
-
-let gmo0 = (* precursor: ie build gmo as an total order that include gmo0 *)
- loc & (W\FW) * FW | # Final write after any write to the same location
- ppo | # ppo compatible
- rfe # includes herd external rf (optimization)
-
-(* Walk over all linear extensions of gmo0 *)
-with gmo from linearizations(M\IW,gmo0)
-
-(* Add initial writes upfront -- convenient for computing rfGMO *)
-let gmo = gmo | loc & IW * (M\IW)
-
-(**********)
-(* Axioms *)
-(**********)
-
-(* Compute rf according to the load value axiom, aka rfGMO *)
-let WR = loc & ([W];(gmo|po);[R])
-let rfGMO = WR \ (loc&([W];gmo);WR)
-
-(* Check equality of herd rf and of rfGMO *)
-empty (rf\rfGMO)|(rfGMO\rf) as RfCons
-
-(* Atomicity axiom *)
-let infloc = (gmo & loc)^-1
-let inflocext = infloc & ext
-let winside = (infloc;rmw;inflocext) & (infloc;rf;rmw;inflocext) & [W]
-empty winside as Atomic
-\end{lstlisting}
- }
- \caption{{\tt riscv.cat}, a herd version of the RVWMO memory model (2/3)}
- \label{fig:herd2}
-\end{figure}
-
-\begin{figure}[h!]
- {
- \tt\bfseries\centering\footnotesize
- \begin{lstlisting}
-Partial
-
-(***************)
-(* Definitions *)
-(***************)
-
-(* Define ppo *)
-include "riscv-defs.cat"
-
-(* Compute coherence relation *)
-include "cos-opt.cat"
-
-(**********)
-(* Axioms *)
-(**********)
-
-(* Sc per location *)
-acyclic co|rf|fr|po-loc as Coherence
-
-(* Main model axiom *)
-acyclic co|rfe|fr|ppo as Model
-
-(* Atomicity axiom *)
-empty rmw & (fre;coe) as Atomic
-\end{lstlisting}
- }
- \caption{{\tt riscv.cat}, an alternative herd presentation of the RVWMO memory model (3/3)}
- \label{fig:herd3}
-\end{figure}
-
diff --git a/src/naming.tex b/src/naming.tex
deleted file mode 100644
index bfd67d4..0000000
--- a/src/naming.tex
+++ /dev/null
@@ -1,189 +0,0 @@
-\chapter{ISA Extension Naming Conventions}
-\label{naming}
-
-This chapter describes the RISC-V ISA extension naming scheme that is
-used to concisely describe the set of instructions present in a
-hardware implementation, or the set of instructions used by an
-application binary interface (ABI).
-
-\begin{commentary}
-The RISC-V ISA is designed to support a wide variety of
-implementations with various experimental instruction-set extensions.
-We have found that an organized naming scheme simplifies software
-tools and documentation.
-\end{commentary}
-
-\section{Case Sensitivity}
-
-The ISA naming strings are case insensitive.
-
-\section{Base Integer ISA}
-RISC-V ISA strings begin with either RV32I, RV32E, RV64I, or RV128I
-indicating the supported address space size in bits for the base
-integer ISA.
-
-\section{Instruction-Set Extension Names}
-
-Standard ISA extensions are given a name consisting of a single
-letter. For example, the first four standard
-extensions to the integer bases are:
-``M'' for integer multiplication and division,
-``A'' for atomic memory instructions,
-``F'' for single-precision floating-point instructions, and
-``D'' for double-precision floating-point instructions.
-Any RISC-V instruction-set variant can be succinctly described by
-concatenating the base integer prefix with the names of the included
-extensions, e.g., ``RV64IMAFD''.
-
-We have also defined an abbreviation ``G'' to represent the ``IMAFDZicsr\_Zifencei''
-base and extensions, as this is intended to represent our standard
-general-purpose ISA.
-
-Standard extensions to the RISC-V ISA are given other reserved
-letters, e.g., ``Q'' for quad-precision floating-point, or
-``C'' for the 16-bit compressed instruction format.
-
-Some ISA extensions depend on the presence of other extensions, e.g., ``D''
-depends on ``F'' and ``F'' depends on ``Zicsr''. These dependences may be
-implicit in the ISA name: for example, RV32IF is equivalent to RV32IFZicsr,
-and RV32ID is equivalent to RV32IFD and RV32IFDZicsr.
-
-\section{Version Numbers}
-Recognizing that instruction sets may expand or alter over time, we
-encode extension version numbers following the extension name. Version
-numbers are divided into major and minor version numbers, separated by
-a ``p''. If the minor version is ``0'', then ``p0'' can be omitted
-from the version string. Changes in major version numbers imply a
-loss of backwards compatibility, whereas changes in only the minor
-version number must be backwards-compatible. For example, the
-original 64-bit standard ISA defined in release 1.0 of this manual can
-be written in full as ``RV64I1p0M1p0A1p0F1p0D1p0'', more concisely as
-``RV64I1M1A1F1D1''.
-
-We introduced the version numbering scheme with the second release. Hence, we
-define the default version of a standard extension to be the version present at that
-time, e.g., ``RV32I'' is equivalent to ``RV32I2''.
-
-\section{Underscores}
-
-Underscores ``\_'' may be used to separate ISA extensions to
-improve readability and to provide disambiguation, e.g., ``RV32I2\_M2\_A2''.
-
-Because the ``P'' extension for Packed SIMD can be confused for the decimal
-point in a version number, it must be preceded by an underscore if it follows
-a number. For example, ``rv32i2p2'' means version 2.2 of RV32I, whereas
-``rv32i2\_p2'' means version 2.0 of RV32I with version 2.0 of the P extension.
-
-\section{Additional Standard Extension Names}
-
-Standard extensions can also be named using a single ``Z'' followed by an
-alphabetical name and an optional version number. For example,
-``Zifencei'' names the instruction-fetch fence extension described in
-Chapter~\ref{chap:zifencei}; ``Zifencei2'' and ``Zifencei2p0'' name version
-2.0 of same.
-
-The first letter following the ``Z'' conventionally indicates the most closely
-related alphabetical extension category, IMAFDQCVH. For the ``Zam''
-extension for misaligned atomics, for example, the letter ``a'' indicates the
-extension is related to the ``A'' standard extension. If multiple ``Z''
-extensions are named, they should be ordered first by category, then
-alphabetically within a category---for example, ``Zicsr\_Zifencei\_Zam''.
-
-Extensions with the ``Z'' prefix must be separated
-from other multi-letter extensions by an underscore, e.g.,
-``RV32IMACZicsr\_Zifencei''.
-
-\section{Supervisor-level Instruction-Set Extensions}
-
-Standard supervisor-level instruction-set extensions are defined in Volume II,
-but are named using ``S'' as a prefix, followed by an alphabetical name and an
-optional version number. Supervisor-level extensions must be separated from
-other multi-letter extensions by an underscore.
-
-Standard supervisor-level extensions should be listed after standard
-unprivileged extensions. If multiple supervisor-level extensions are listed,
-they should be ordered alphabetically.
-
-\section{Machine-level Instruction-Set Extensions}
-
-Standard machine-level instruction-set extensions are prefixed with the three
-letters ``Zxm''.
-
-Standard machine-level extensions should be listed after standard
-lesser-privileged extensions. If multiple machine-level extensions are listed,
-they should be ordered alphabetically.
-
-\section{Non-Standard Extension Names}
-
-Non-standard extensions are named using a single ``X'' followed by an
-alphabetical name and an optional version number.
-For example, ``Xhwacha'' names the Hwacha vector-fetch ISA extension;
-``Xhwacha2'' and ``Xhwacha2p0'' name version 2.0 of same.
-
-Non-standard extensions must be listed after all standard extensions.
-They must be separated from other multi-letter extensions
-by an underscore. For example, an ISA with non-standard extensions
-Argle and Bargle may be named ``RV64IZifencei\_Xargle\_Xbargle''.
-
-If multiple non-standard extensions are listed, they should be ordered
-alphabetically.
-
-\section{Subset Naming Convention}
-Table~\ref{isanametable} summarizes the standardized extension names.
-~\\
-\begin{table}[h]
-\center
-\begin{tabular}{|l|c|c|}
-\hline
-Subset & Name & Implies \\
-\hline
-\hline
-\multicolumn{3}{|c|}{Base ISA}\\
-\hline
-Integer & I & \\
-Reduced Integer & E & \\
-\hline
-\hline
-\multicolumn{3}{|c|}{Standard Unprivileged Extensions}\\
-\hline
-Integer Multiplication and Division & M & \\
-Atomics & A & \\
-Single-Precision Floating-Point & F & Zicsr \\
-Double-Precision Floating-Point & D & F \\
-\hline
-General & G & IMAFDZicsr\_Zifencei \\
-\hline
-Quad-Precision Floating-Point & Q & D\\
-16-bit Compressed Instructions & C & \\
-Packed-SIMD Extensions & P & \\
-Vector Extension & V & D \\
-Hypervisor Extension & H & \\
-Control and Status Register Access & Zicsr & \\
-Instruction-Fetch Fence & Zifencei & \\
-Misaligned Atomics & Zam & A \\
-Total Store Ordering & Ztso & \\
-\hline
-\hline
-\multicolumn{3}{|c|}{Standard Supervisor-Level Extensions}\\
-\hline
-Supervisor-level extension ``def'' & Sdef & \\
-\hline
-\hline
-\multicolumn{3}{|c|}{Standard Machine-Level Extensions}\\
-\hline
-Machine-level extension ``jkl'' & Zxmjkl & \\
-\hline
-\hline
-\multicolumn{3}{|c|}{Non-Standard Extensions}\\
-\hline
-Non-standard extension ``mno'' & Xmno & \\
-\hline
-\end{tabular}
-\caption{Standard ISA extension names. The table also defines the
- canonical order in which extension names must appear in the name
- string, with top-to-bottom in table indicating first-to-last in the
- name string, e.g., RV32IMACV is legal, whereas RV32IMAVC is not.}
-\label{isanametable}
-\end{table}
-
-
diff --git a/src/p.tex b/src/p.tex
deleted file mode 100644
index dac4e4f..0000000
--- a/src/p.tex
+++ /dev/null
@@ -1,14 +0,0 @@
-\chapter{``P'' Standard Extension for Packed-SIMD Instructions,
- Version 0.2}
-\label{sec:packedsimd}
-
-\begin{commentary}
- Discussions at the 5th RISC-V workshop indicated a desire to drop
- this packed-SIMD proposal for floating-point registers in favor of
- standardizing on the V extension for large floating-point SIMD
- operations. However, there was interest in packed-SIMD fixed-point
- operations for use in the integer registers of small RISC-V
- implementations. A task group is working to define the new P
- extension.
-\end{commentary}
-