aboutsummaryrefslogtreecommitdiff
path: root/src/c.tex
diff options
context:
space:
mode:
authorAndrew Waterman <andrew@sifive.com>2017-02-01 20:41:47 -0800
committerAndrew Waterman <andrew@sifive.com>2017-02-01 20:41:47 -0800
commitab6f8c9bd7bc85361fcf35667d1fddfaf367a53f (patch)
tree716a2118ca0565dbb4e7903723f283ae4dd13c46 /src/c.tex
parent207a7c6ee51aa2fd74d4618cd1369ddc21706b9e (diff)
downloadriscv-isa-manual-ab6f8c9bd7bc85361fcf35667d1fddfaf367a53f.zip
riscv-isa-manual-ab6f8c9bd7bc85361fcf35667d1fddfaf367a53f.tar.gz
riscv-isa-manual-ab6f8c9bd7bc85361fcf35667d1fddfaf367a53f.tar.bz2
Reorganize directory structure
Diffstat (limited to 'src/c.tex')
-rw-r--r--src/c.tex1162
1 files changed, 1162 insertions, 0 deletions
diff --git a/src/c.tex b/src/c.tex
new file mode 100644
index 0000000..2c81f7b
--- /dev/null
+++ b/src/c.tex
@@ -0,0 +1,1162 @@
+\chapter{``C'' Standard Extension for Compressed Instructions, Version
+1.9}
+\label{compressed}
+
+This chapter describes the current draft proposal for the RISC-V
+standard compressed instruction set extension, named ``C'', which
+reduces static and dynamic code size by adding short 16-bit
+instruction encodings for common operations. The C extension can be
+added to any of the base ISAs (RV32, RV64, RV128), and we use the
+generic term ``RVC'' to cover any of these. Typically, 50\%--60\% of
+the RISC-V instructions in a program can be replaced with RVC
+instructions, resulting in a 25\%--30\% code-size reduction.
+
+We believe this draft represents the close to final design for RV32C
+and RV64C (it seems premature to freeze R128C), though we are
+requesting one more round of comments, hence the 1.9 revision number.
+Please send your comments to the {\tt isa-dev} mailing list at {\tt
+ isa-dev@lists.riscv.org}.
+
+\section{Overview}
+
+RVC uses a simple compression scheme that offers shorter 16-bit
+versions of common 32-bit RISC-V instructions when:
+\begin{tightlist}
+ \item the immediate or address offset is small, or
+ \item one of the registers is the zero register ({\tt x0}), the
+ ABI link register ({\tt x1}), or the ABI stack pointer ({\tt
+ x2}), or
+ \item the destination register and the first source register are
+ identical, or
+ \item the registers used are the 8 most popular ones.
+\end{tightlist}
+
+The C extension is compatible with all other standard instruction
+extensions. The C extension allows 16-bit instructions to be freely
+intermixed with 32-bit instructions, with the latter now able to start
+on any 16-bit boundary.
+
+\begin{commentary}
+Removing the 32-bit alignment constraint on the original 32-bit
+instructions allows significantly greater code density.
+\end{commentary}
+
+The compressed instruction encodings are mostly common across RV32C,
+RV64C, and RV128C, but as shown in Table~\ref{rvcopcodemap}, a few
+opcodes are used for different purposes depending on base ISA width.
+For example, the wider address-space RV64C and RV128C variants require
+additional opcodes to compress loads and stores of 64-bit integer
+values, while RV32C uses the same opcodes to compress loads and stores
+of single-precision floating-point values. Similarly, RV128C requires
+additional opcodes to capture loads and stores of 128-bit integer
+values, while these same opcodes are used for loads and stores of
+double-precision floating-point values in RV32C and RV64C. If the C
+extension is implemented, the appropriate compressed floating-point
+load and store instructions must be provided whenever the relevant
+standard floating-point extension (F and/or D) is also implemented.
+In addition, RV32C includes a compressed jump and link instruction to
+compress short-range subroutine calls, where the same opcode is used
+to compress ADDIW for RV64C and RV128C.
+
+\begin{commentary}
+Double-precision loads and stores are a significant fraction of static
+and dynamic instructions, hence the motivation to include them in the
+RV32C and RV64C encoding.
+
+Although single-precision loads and stores are not a significant
+source of static or dynamic compression for benchmarks compiled for
+the currently supported ABIs, for microcontrollers that only provide
+hardware single-precision floating-point units and have an ABI that
+only supports single-precision floating-point numbers, the
+single-precision loads and stores will be used at least as frequently
+as double-precision loads and stores in the measured benchmarks.
+Hence, the motivation to provide compressed support for these in
+RV32C.
+
+Short-range subroutine calls are more likely in small binaries for
+microcontrollers, hence the motivation to include these in RV32C.
+
+Although reusing opcodes for different purposes for different base
+register widths adds some complexity to documentation, the impact on
+implementation complexity is small even for designs that support
+multiple base ISA register widths. The compressed floating-point load
+and store variants use the same instruction format with the same
+register specifiers as the wider integer loads and stores.
+\end{commentary}
+
+RVC was designed under the constraint that each RVC instruction
+expands into a single 32-bit instruction in either the base ISA
+(RV32I/E, RV64I, or RV128I) or the F and D standard extensions where
+present. Adopting this constraint has two main benefits:
+
+\begin{tightlist}
+\item Hardware designs can simply expand RVC instructions during
+ decode, simplifying verification and minimizing modifications to
+ existing microarchitectures.
+\item Compilers can be unaware of the RVC extension and leave code
+ compression to the assembler and linker, although a
+ compression-aware compiler will generally be able to produce better
+ results.
+\end{tightlist}
+
+\begin{commentary}
+We felt the multiple complexity reductions of a simple one-one mapping
+between C and base IFD instructions far outweighed the potential gains
+of a slightly denser encoding that added additional instructions only
+supported in the C extension, or that allowed encoding of multiple IFD
+instructions in one C instruction.
+\end{commentary}
+
+It is important to note that the C extension is not designed to be a
+stand-alone ISA, and is meant to be used alongside a base ISA.
+
+\begin{commentary}
+Variable-length instruction sets have long been used to improve code
+density. For example, the IBM Stretch~\cite{stretch}, developed in
+the late 1950s, had an ISA with 32-bit and 64-bit instructions, where
+some of the 32-bit instructions were compressed versions of the full
+64-bit instructions. Stretch also employed the concept of limiting
+the set of registers that were addressable in some of the shorter
+instruction formats, with short branch instructions that could only
+refer to one of the index registers. The later IBM 360
+architecture~\cite{ibm360} supported a simple variable-length
+instruction encoding with 16-bit, 32-bit, or 48-bit instruction
+formats.
+
+In 1963, CDC introduced the Cray-designed CDC 6600~\cite{cdc6600}, a
+precursor to RISC architectures, that introduced a register-rich
+load-store architecture with instructions of two lengths, 15-bits and
+30-bits. The later Cray-1 design used a very similar instruction
+format, with 16-bit and 32-bit instruction lengths.
+
+The initial RISC ISAs from the 1980s all picked performance over code
+size, which was reasonable for a workstation environment, but not for
+embedded systems. Hence, both ARM and MIPS subsequently made versions
+of the ISAs that offered smaller code size by offering an alternative
+16-bit wide instruction set instead of the standard 32-bit wide
+instructions. The compressed RISC ISAs reduced code size relative to
+their starting points by about 25--30\%, yielding code that was
+significantly \emph{smaller} than 80x86. This result surprised some,
+as their intuition was that the variable-length CISC ISA should be
+smaller than RISC ISAs that offered only 16-bit and 32-bit formats.
+
+Since the original RISC ISAs did not leave sufficient opcode space
+free to include these unplanned compressed instructions, they were
+instead developed as complete new ISAs. This meant compilers needed
+different code generators for the separate compressed ISAs. The first
+compressed RISC ISA extensions (e.g., ARM Thumb and MIPS16) used only
+a fixed 16-bit instruction size, which gave good reductions in static
+code size but caused an increase in dynamic instruction count, which
+led to lower performance compared to the original fixed-width 32-bit
+instruction size. This led to the development of a second generation
+of compressed RISC ISA designs with mixed 16-bit and 32-bit
+instruction lengths (e.g., ARM Thumb2, microMIPS, PowerPC VLE), so
+that performance was similar to pure 32-bit instructions but with
+significant code size savings. Unfortunately, these different
+generations of compressed ISAs are incompatible with each other and
+with the original uncompressed ISA, leading to significant complexity
+in documentation, implementations, and software tools support.
+
+Of the commonly used 64-bit ISAs, only PowerPC and microMIPS currently
+supports a compressed instruction format. It is surprising that the
+most popular 64-bit ISA for mobile platforms (ARM v8) does not include
+a compressed instruction format given that static code size and
+dynamic instruction fetch bandwidth are important metrics. Although
+static code size is not a major concern in larger systems, instruction
+fetch bandwidth can be a major bottleneck in servers running
+commercial workloads, which often have a large instruction working
+set.
+
+Benefiting from 25 years of hindsight, RISC-V was designed to support
+compressed instructions from the outset, leaving enough opcode
+space for RVC to be added as a simple extension on top of the base ISA
+(along with many other extensions). The philosophy of RVC is to
+reduce code size for embedded applications \emph{and} to improve
+performance and energy-efficiency for all applications due to fewer
+misses in the instruction cache. Waterman shows that RVC fetches
+25\%-30\% fewer instruction bits, which reduces instruction cache
+misses by 20\%-25\%, or roughly the same performance impact as
+doubling the instruction cache size~\cite{waterman-ms}.
+\end{commentary}
+
+
+\section{Compressed Instruction Formats}
+
+Table~\ref{formats} shows the eight compressed instruction
+formats. CR, CI, and CSS can use any of the 32 RVI registers, but CIW,
+CL, CS, and CB are limited to just 8 of them. Table~\ref{registers}
+lists these popular registers, which correspond to registers {\tt x8}
+to {\tt x15}. Note that there is a
+separate version of load and store instructions that use the stack
+pointer as the base address register, since saving to and restoring
+from the stack are so prevalent, and that they use the CI and CSS
+formats to allow access to all 32 data registers. CIW supplies an
+8-bit immediate for the ADDI4SPN instruction.
+
+\begin{commentary}
+The RISC-V ABI was changed to make the frequently used registers map
+to registers {\tt x8}--{\tt x15}. This simplifies the decompression
+decoder by having a contiguous naturally aligned set of register
+numbers, and is also compatible with the RV32E subset base
+specification, which only has 16 integer registers.
+\end{commentary}
+
+Compressed register-based floating-point loads and stores also use the
+CL and CS formats respectively, with the eight registers mapping to
+{\tt f8} to {\tt f15}.
+
+\begin{commentary}
+The standard RISC-V calling convention maps the most frequently used
+floating-point registers to registers {\tt f8} to {\tt f15}, which
+allows the same register decompression decoding as for integer
+register numbers.
+\end{commentary}
+
+The formats were designed to keep bits for the two register source
+specifiers in the same place in all instructions, while the
+destination register field can move. When the full 5-bit destination
+register specifier is present, it is in the same place as in the
+32-bit RISC-V encoding. Where immediates are
+sign-extended, the sign-extension is always from bit 12. Immediate
+fields have been scrambled, as in the base specification, to reduce
+the number of immediate muxes required.
+
+\begin{commentary}
+The immediate fields are scrambled in the instruction formats instead
+of in sequential order so that as many bits as possible are in the
+same position in every instruction, thereby simplifying
+implementations. For example, immediate bits 17---10 are always sourced from
+the same instruction bit positions. Five other immediate bits (5, 4,
+3, 1, and 0) have just two source instruction bits, while four (9, 7,
+6, and 2) have three sources and one (8) has four sources.
+\end{commentary}
+
+For many RVC instructions, zero-valued immediates are disallowed and
+{\tt x0} is not a valid 5-bit register specifier. These restrictions
+free up encoding space for other instructions requiring fewer operand
+bits.
+
+\begin{table}[h]
+{
+\begin{small}
+\begin{center}
+\begin{tabular}{c c p{0in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}}
+& & & & & & & & & \\
+Format & Meaning &
+\instbit{15} &
+\instbit{14} &
+\instbit{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbit{11} &
+\instbit{10} &
+\instbit{9} &
+\instbit{8} &
+\instbit{7} &
+\instbit{6} &
+\multicolumn{1}{r}{\instbit{5}} &
+\instbit{4} &
+\instbit{3} &
+\instbit{2} &
+\instbit{1} &
+\instbit{0} \\
+\cline{3-18}
+
+CR & Register &
+\multicolumn{4}{|c|}{funct4} &
+\multicolumn{5}{c|}{rd/rs1} &
+\multicolumn{5}{c|}{rs2} &
+\multicolumn{2}{c|}{op} \\
+\cline{3-18}
+
+CI & Immediate &
+\multicolumn{3}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{5}{c|}{rd/rs1} &
+\multicolumn{5}{c|}{imm} &
+\multicolumn{2}{c|}{op} \\
+\cline{3-18}
+
+CSS & Stack-relative Store &
+\multicolumn{3}{|c|}{funct3} &
+\multicolumn{6}{c|}{imm} &
+\multicolumn{5}{c|}{rs2} &
+\multicolumn{2}{c|}{op} \\
+\cline{3-18}
+
+CIW & Wide Immediate &
+\multicolumn{3}{|c|}{funct3} &
+\multicolumn{8}{c|}{imm} &
+\multicolumn{3}{c|}{rd$'$} &
+\multicolumn{2}{c|}{op} \\
+\cline{3-18}
+
+CL & Load &
+\multicolumn{3}{|c|}{funct3} &
+\multicolumn{3}{c|}{imm} &
+\multicolumn{3}{c|}{rs1$'$} &
+\multicolumn{2}{c|}{imm} &
+\multicolumn{3}{c|}{rd$'$} &
+\multicolumn{2}{c|}{op} \\
+\cline{3-18}
+
+CS & Store &
+\multicolumn{3}{|c|}{funct3} &
+\multicolumn{3}{c|}{imm} &
+\multicolumn{3}{c|}{rs1$'$} &
+\multicolumn{2}{c|}{imm} &
+\multicolumn{3}{c|}{rs2$'$} &
+\multicolumn{2}{c|}{op} \\
+\cline{3-18}
+
+CB & Branch &
+\multicolumn{3}{|c|}{funct3} &
+\multicolumn{3}{c|}{offset} &
+\multicolumn{3}{c|}{rs1$'$} &
+\multicolumn{5}{c|}{offset} &
+\multicolumn{2}{c|}{op} \\
+\cline{3-18}
+
+CJ & Jump &
+\multicolumn{3}{|c|}{funct3} &
+\multicolumn{11}{c|}{jump target} &
+\multicolumn{2}{c|}{op} \\
+\cline{3-18}
+
+\end{tabular}
+\end{center}
+\end{small}
+}
+\caption{Compressed 16-bit RVC instruction formats.}
+\label{formats}
+\end{table}
+
+
+\begin{table}[H]
+{
+\begin{center}
+\begin{tabular}{l|c|c|c|c|c|c|c|c|}
+\cline{2-9}
+RVC Register Number & 000 & 001 & 010 & 011 & 100 & 101 & 110 & 111
+\\ \cline{2-9}
+Integer Register Number & {\tt x8} & {\tt x9} & {\tt x10} & {\tt x11} & {\tt x12} & {\tt x13} & {\tt x14} & {\tt x15} \\ \cline{2-9}
+Integer Register ABI Name & {\tt s0} & {\tt s1} & {\tt a0} & {\tt a1} & {\tt a2} & {\tt a3} & {\tt a4} & {\tt a5} \\ \cline{2-9}
+Floating-Point Register Number & {\tt f8} & {\tt f9} & {\tt f10} & {\tt f11} & {\tt f12} & {\tt f13} & {\tt f14} & {\tt f15} \\ \cline{2-9}
+Floating-Point Register ABI Name & {\tt fs0} & {\tt fs1} & {\tt fa0} & {\tt fa1} & {\tt fa2} & {\tt fa3} & {\tt fa4} & {\tt fa5} \\ \cline{2-9}
+\end{tabular}
+\end{center}
+}
+\caption{Registers specified by the three-bit rs1', rs2', and rd' fields of the CIW, CL, CS, and CB formats.}
+\label{registers}
+\end{table}
+
+\section{Load and Store Instructions}
+
+To increase the reach of 16-bit instructions, data-transfer
+instructions use zero-extended immediates that are scaled by the size
+of the data in bytes: $\times$4 for words, $\times$8 for double words,
+and $\times$16 for quad words.
+
+RVC provides two variants of loads and stores. One uses the ABI stack
+pointer, {\tt x2}, as the base address and can target any data register. The
+other can reference one of 8 base address registers and one of 8 data
+registers.
+
+\subsection*{Stack-Pointer-Based Loads and Stores}
+
+\begin{center}
+\begin{tabular}{S@{}W@{}T@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbitrange{11}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 1 & 5 & 5 & 2 \\
+C.LWSP & offset[5] & dest$\neq$0 & offset[4:2$\vert$7:6] & C2 \\
+C.LDSP & offset[5] & dest$\neq$0 & offset[4:3$\vert$8:6] & C2 \\
+C.LQSP & offset[5] & dest$\neq$0 & offset[4$\vert$9:6] & C2 \\
+C.FLWSP& offset[5] & dest & offset[4:2$\vert$7:6] & C2 \\
+C.FLDSP& offset[5] & dest & offset[4:3$\vert$8:6] & C2 \\
+\end{tabular}
+\end{center}
+These instructions use the CI format.
+
+C.LWSP loads a 32-bit value from memory into register {\em rd}. It computes
+an effective address by adding the {\em zero}-extended offset, scaled by 4, to
+the stack pointer, {\tt x2}. It expands to {\tt lw rd, offset[7:2](x2)}.
+
+C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into
+register {\em rd}. It computes its effective address by adding the
+zero-extended offset, scaled by 8, to the stack pointer, {\tt x2}.
+It expands to {\tt ld rd, offset[8:3](x2)}.
+
+C.LQSP is an RV128C-only instruction that loads a 128-bit value from memory
+into register {\em rd}. It computes its effective address by adding the
+zero-extended offset, scaled by 16, to the stack pointer, {\tt x2}.
+It expands to {\tt lq rd, offset[9:4](x2)}.
+
+C.FLWSP is an RV32FC-only instruction that loads a single-precision
+floating-point value from memory into floating-point register {\em rd}. It
+computes its effective address by adding the {\em zero}-extended offset,
+scaled by 4, to the stack pointer, {\tt x2}. It expands to {\tt flw rd,
+offset[7:2](x2)}.
+
+C.FLDSP is an RV32DC/RV64DC-only instruction that loads a double-precision
+floating-point value from memory into floating-point register {\em rd}. It
+computes its effective address by adding the {\em zero}-extended offset,
+scaled by 8, to the stack pointer, {\tt x2}. It expands to {\tt fld rd,
+offset[8:3](x2)}.
+
+\begin{center}
+\begin{tabular}{S@{}M@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\instbitrange{12}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 6 & 5 & 2 \\
+C.SWSP & offset[5:2$\vert$7:6] & src & C2 \\
+C.SDSP & offset[5:3$\vert$8:6] & src & C2 \\
+C.SQSP & offset[5:4$\vert$9:6] & src & C2 \\
+C.FSWSP& offset[5:2$\vert$7:6] & src & C2 \\
+C.FSDSP& offset[5:3$\vert$8:6] & src & C2 \\
+\end{tabular}
+\end{center}
+These instructions use the CSS format.
+
+C.SWSP stores a 32-bit value in register {\em rs2} to memory. It computes
+an effective address by adding the {\em zero}-extended offset, scaled by 4, to
+the stack pointer, {\tt x2}.
+It expands to {\tt sw rs2, offset[7:2](x2)}.
+
+C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in register
+{\em rs2} to memory. It computes an effective address by adding the {\em
+zero}-extended offset, scaled by 8, to the stack pointer, {\tt x2}.
+It expands to {\tt sd rs2, offset[8:3](x2)}.
+
+C.SQSP is an RV128C-only instruction that stores a 128-bit value in register
+{\em rs2} to memory. It computes an effective address by adding the {\em
+zero}-extended offset, scaled by 16, to the stack pointer, {\tt x2}.
+It expands to {\tt sq rs2, offset[9:4](x2)}.
+
+C.FSWSP is an RV32FC-only instruction that stores a single-precision
+floating-point value in floating-point register {\em rs2} to memory. It
+computes an effective address by adding the {\em zero}-extended offset, scaled
+by 4, to the stack pointer, {\tt x2}. It expands to {\tt fsw rs2,
+offset[7:2](x2)}.
+
+C.FSDSP is an RV32DC/RV64DC-only instruction that stores a double-precision
+floating-point value in floating-point register {\em rs2} to memory. It
+computes an effective address by adding the {\em zero}-extended offset, scaled
+by 8, to the stack pointer, {\tt x2}. It expands to {\tt fsd rs2,
+offset[8:3](x2)}.
+
+\begin{commentary}
+Register save/restore code at function entry/exit represents a
+significant portion of static code size. The stack-pointer-based
+compressed loads and stores in RVC are effective at reducing the
+save/restore static code size by a factor of 2 while improving
+performance by reducing dynamic instruction bandwidth.
+
+A common mechanism used in other ISAs to further reduce
+save/restore code size is load-multiple and store-multiple
+instructions. We considered adopting these for RISC-V but noted the
+following drawbacks to these instructions:
+\begin{itemize}
+\item These instructions complicate processor implementations.
+\item For virtual memory systems, some data accesses could be
+ resident in physical memory and some could not, which requires a
+ new restart mechanism for partially executed instructions.
+\item Unlike the rest of the RVC instructions, there is no IFD
+ equivalent to Load Multiple and Store Multiple.
+\item Unlike the rest of the RVC instructions, the compiler would
+ have to be aware of these instructions to both generate the
+ instructions and to allocate registers in an order to maximize
+ the chances of the them being saved and stored, since they would
+ be saved and restored in sequential order.
+\item Simple microarchitectural implementations will constrain how
+ other instructions can be scheduled around the load and store
+ multiple instructions, leading to a potential performance loss.
+\item The desire for sequential register allocation might conflict with
+ the featured registers selected for the CIW, CL, CS, and CB formats.
+\end{itemize}
+Furthermore, much of the gains can be realized in software by replacing
+prologue and epilogue code with subroutine calls to common
+prologue and epilogue code, a technique described in
+Section 5.6 of~\cite{waterman-phd}.
+
+While reasonable architects might come to different conclusions, we
+decided to omit load and store multiple and instead use the
+software-only approach of calling save/restore millicode routines to
+attain the greatest code size reduction.
+\end{commentary}
+
+\subsection*{Register-Based Loads and Stores}
+
+\begin{center}
+\begin{tabular}{S@{}S@{}S@{}Y@{}S@{}Y}
+\\
+\instbitrange{15}{13} &
+\instbitrange{12}{10} &
+\instbitrange{9}{7} &
+\instbitrange{6}{5} &
+\instbitrange{4}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{rs1$'$} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{rd$'$} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 3 & 3 & 2 & 3 & 2 \\
+C.LW & offset[5:3] & base & offset[2$\vert$6] & dest & C0 \\
+C.LD & offset[5:3] & base & offset[7:6] & dest & C0 \\
+C.LQ & offset[5$\vert$4$\vert$8] & base & offset[7:6] & dest & C0 \\
+C.FLW& offset[5:3] & base & offset[2$\vert$6] & dest & C0 \\
+C.FLD& offset[5:3] & base & offset[7:6] & dest & C0 \\
+\end{tabular}
+\end{center}
+These instructions use the CL format.
+
+C.LW loads a 32-bit value from memory into register {\em rd$'$}. It computes
+an effective address by adding the {\em zero}-extended offset, scaled by 4, to
+the base address in register {\em rs1$'$}.
+It expands to {\tt lw rd$'$, offset[6:2](rs1$'$)}.
+
+C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into
+register {\em rd$'$}. It computes an effective address by adding the {\em
+zero}-extended offset, scaled by 8, to the base address in register {\em
+rs1$'$}.
+It expands to {\tt ld rd$'$, offset[7:3](rs1$'$)}.
+
+C.LQ is an RV128C-only instruction that loads a 128-bit value from memory into
+register {\em rd$'$}. It computes an effective address by adding the {\em
+zero}-extended offset, scaled by 16, to the base address in register {\em
+rs1$'$}.
+It expands to {\tt lq rd$'$, offset[8:4](rs1$'$)}.
+
+C.FLW is an RV32FC-only instruction that loads a single-precision
+floating-point value from memory into floating-point register {\em rd$'$}. It
+computes an effective address by adding the {\em zero}-extended offset, scaled
+by 4, to the base address in register {\em rs1$'$}. It expands to {\tt flw
+rd$'$, offset[6:2](rs1$'$)}.
+
+C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision
+floating-point value from memory into floating-point register {\em rd$'$}. It
+computes an effective address by adding the {\em zero}-extended offset, scaled
+by 8, to the base address in register {\em rs1$'$}. It expands to {\tt fld
+rd$'$, offset[7:3](rs1$'$)}.
+
+\begin{center}
+\begin{tabular}{S@{}S@{}S@{}Y@{}S@{}Y}
+\\
+\instbitrange{15}{13} &
+\instbitrange{12}{10} &
+\instbitrange{9}{7} &
+\instbitrange{6}{5} &
+\instbitrange{4}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{rs1$'$} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{rs2$'$} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 3 & 3 & 2 & 3 & 2 \\
+C.SW & offset[5:3] & base & offset[2$\vert$6] & src & C0 \\
+C.SD & offset[5:3] & base & offset[7:6] & src & C0 \\
+C.SQ & offset[5$\vert$4$\vert$8] & base & offset[7:6] & src & C0 \\
+C.FSW& offset[5:3] & base & offset[2$\vert$6] & src & C0 \\
+C.FSD& offset[5:3] & base & offset[7:6] & src & C0 \\
+\end{tabular}
+\end{center}
+These instructions use the CS format.
+
+C.SW stores a 32-bit value in register {\em rs2$'$} to memory. It computes an
+effective address by adding the {\em zero}-extended offset, scaled by 4, to
+the base address in register {\em rs1$'$}.
+It expands to {\tt sw rs2$'$, offset[6:2](rs1$'$)}.
+
+C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in
+register {\em rs2$'$} to memory. It computes an effective address by adding
+the {\em zero}-extended offset, scaled by 8, to the base address in register
+{\em rs1$'$}.
+It expands to {\tt sd rs2$'$, offset[7:3](rs1$'$)}.
+
+C.SQ is an RV128C-only instruction that stores a 128-bit value in register
+{\em rs2$'$} to memory. It computes an effective address by adding the {\em
+zero}-extended offset, scaled by 16, to the base address in register {\em
+rs1$'$}.
+It expands to {\tt sq rs2$'$, offset[8:4](rs1$'$)}.
+
+C.FSW is an RV32FC-only instruction that stores a single-precision
+floating-point value in floating-point register {\em rs2$'$} to memory. It
+computes an effective address by adding the {\em zero}-extended offset, scaled
+by 4, to the base address in register {\em rs1$'$}. It expands to {\tt fsw
+rs2$'$, offset[6:2](rs1$'$)}.
+
+C.FSD is an RV32DC/RV64DC-only instruction that stores a double-precision
+floating-point value in floating-point register {\em rs2$'$} to memory. It
+computes an effective address by adding the {\em zero}-extended offset, scaled
+by 8, to the base address in register {\em rs1$'$}. It expands to {\tt fsd
+rs2$'$, offset[7:3](rs1$'$)}.
+
+\section{Control Transfer Instructions}
+
+RVC provides unconditional jump instructions and conditional branch
+instructions. As with base RVI instructions, the offsets of all RVC
+control transfer instruction are in multiples of 2 bytes.
+
+\begin{center}
+\begin{tabular}{S@{}L@{}Y}
+\\
+\instbitrange{15}{13} &
+\instbitrange{12}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 11 & 2 \\
+C.J & offset[11$\vert$4$\vert$9:8$\vert$10$\vert$6$\vert$7$\vert$3:1$\vert$5] & C1 \\
+C.JAL & offset[11$\vert$4$\vert$9:8$\vert$10$\vert$6$\vert$7$\vert$3:1$\vert$5] & C1 \\
+\end{tabular}
+\end{center}
+These instructions use the CJ format.
+
+C.J performs an unconditional control transfer. The offset is sign-extended and
+added to the {\tt pc} to form the jump target address. C.J can therefore target
+a $\pm$\wunits{2}{KiB} range. C.J expands to {\tt jal x0, offset[11:1]}.
+
+C.JAL is an RV32C-only instruction that performs the same operation as C.J,
+but additionally writes the address of the instruction following the jump
+({\tt pc}+2) to the link register, {\tt x1}. C.JAL expands to {\tt jal x1,
+offset[11:1]}.
+
+\begin{center}
+\begin{tabular}{E@{}T@{}T@{}Y}
+\\
+\instbitrange{15}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct4} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{op} \\
+\hline
+4 & 5 & 5 & 2 \\
+C.JR & src$\neq$0 & 0 & C2 \\
+C.JALR & src$\neq$0 & 0 & C2 \\
+\end{tabular}
+\end{center}
+These instructions use the CR format.
+
+C.JR (jump register) performs an unconditional control transfer to
+the address in register {\em rs1}. C.JR expands to {\tt jalr x0, rs1, 0}.
+
+C.JALR (jump and link register) performs the same operation as C.JR,
+but additionally writes the address of the instruction following the
+jump ({\tt pc}+2) to the link register, {\tt x1}. C.JALR expands to
+{\tt jalr x1, rs1, 0}.
+
+\begin{commentary}
+Strictly speaking, C.JALR does not expand exactly to a base RVI
+instruction as the value added to the PC to form the link address is 2
+rather than 4 as in the base ISA, but supporting both offsets of 2 and
+4 bytes is only a very minor change to the base microarchitecture.
+\end{commentary}
+
+\begin{center}
+\begin{tabular}{S@{}S@{}S@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\instbitrange{12}{10} &
+\instbitrange{9}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{rs1$'$} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 3 & 3 & 5 & 2 \\
+C.BEQZ & offset[8$\vert$4:3] & src & offset[7:6$\vert$2:1$\vert$5] & C1 \\
+C.BNEZ & offset[8$\vert$4:3] & src & offset[7:6$\vert$2:1$\vert$5] & C1 \\
+\end{tabular}
+\end{center}
+These instructions use the CB format.
+
+C.BEQZ performs conditional control transfers. The offset is sign-extended
+and added to the {\tt pc} to form the branch target address. It can
+therefore target a $\pm$\wunits{256}{B} range. C.BEQZ takes the branch if the
+value in register {\em rs1$'$} is zero. It expands to {\tt beq rs1$'$, x0,
+offset[8:1]}.
+
+C.BNEZ is defined analogously, but it takes the branch if {\em rs1$'$} contains
+a nonzero value. It expands to {\tt bne rs1$'$, x0, offset[8:1]}.
+
+\section{Integer Computational Instructions}
+
+RVC provides several instructions for integer arithmetic and constant generation.
+
+\subsection*{Integer Constant-Generation Instructions}
+
+The two constant-generation instructions both use the CI instruction
+format and can target any integer register.
+
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{S@{}W@{}T@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbitrange{11}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm[5]} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{imm[4:0]} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 1 & 5 & 5 & 2 \\
+C.LI & imm[5] & dest$\neq$0 & imm[4:0] & C1 \\
+C.LUI & nzimm[17] & $\textrm{dest}{\neq}{\left\{0,2\right\}}$ & nzimm[16:12] & C1 \\
+\end{tabular}
+\end{center}
+C.LI loads the sign-extended 6-bit immediate, {\em imm}, into
+register {\em rd}. C.LI is only valid when {\em rd}$\neq${\tt x0}.
+C.LI expands into {\tt addi rd, x0, imm[5:0]}.
+
+C.LUI loads the non-zero 6-bit immediate field into bits 17--12 of the
+destination register, clears the bottom 12 bits, and sign-extends bit
+17 into all higher bits of the destination. C.LUI is only valid when
+$\textit{rd}{\neq}{\left\{\texttt{x0},\texttt{x2}\right\}}$,
+and when the immediate is not equal to zero.
+C.LUI expands into {\tt lui rd, nzimm[17:12]}.
+
+\subsection*{Integer Register-Immediate Operations}
+
+These integer register-immediate operations are encoded in the CI
+format and perform operations on any non-{\tt x0} integer register and
+a 6-bit immediate. The immediate cannot be zero.
+
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{S@{}W@{}T@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbitrange{11}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm[5]} &
+\multicolumn{1}{c|}{rd/rs1} &
+\multicolumn{1}{c|}{imm[4:0]} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 1 & 5 & 5 & 2 \\
+C.ADDI & nzimm[5] & dest & nzimm[4:0] & C1 \\
+C.ADDIW & imm[5] & dest$\neq$0 & imm[4:0] & C1 \\
+C.ADDI16SP & nzimm[9] & 2 & nzimm[4$\vert$6$\vert$8:7$\vert$5] & C1 \\
+\end{tabular}
+\end{center}
+
+C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in
+register {\em rd} then writes the result to {\em rd}. C.ADDI expands
+into {\tt addi rd, rd, nzimm[5:0]}.
+
+C.ADDIW is an RV64C/RV128C-only instruction that performs the same
+computation but produces a 32-bit result, then sign-extends result to
+64 bits. C.ADDIW expands into {\tt addiw rd, rd, imm[5:0]}. The
+immediate can be zero for C.ADDIW, where this corresponds to {\tt
+sext.w rd}.
+
+C.ADDI16SP shares the opcode with C.LUI, but has a destination field
+of {\tt x2}. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to
+the value in the stack pointer ({\tt sp}={\tt x2}), where the
+immediate is scaled to represent multiples of 16 in the range
+(-512,496). C.ADDI16SP is used to adjust the stack pointer in procedure
+prologues and epilogues. It expands into {\tt addi x2, x2, nzimm[9:4]}.
+
+\begin{commentary}
+In the standard RISC-V calling convention, the stack pointer {\tt sp}
+is always 16-byte aligned.
+\end{commentary}
+
+\begin{center}
+\begin{tabular}{@{}S@{}K@{}S@{}Y}
+\\
+\instbitrange{15}{13} &
+\instbitrange{12}{5} &
+\instbitrange{4}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm} &
+\multicolumn{1}{c|}{rd$'$} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 8 & 3 & 2 \\
+C.ADDI4SPN & zimm[5:4$\vert$9:6$\vert$2$\vert$3] & dest & C0 \\
+\end{tabular}
+\end{center}
+
+C.ADDI4SPN is a CIW-format RV32C/RV64C-only instruction that adds a
+{\em zero}-extended non-zero immediate, scaled by 4, to the stack pointer,
+{\tt x2}, and writes the result to {\tt rd$'$}. This instruction is used
+to generate pointers to stack-allocated variables, and expands to
+{\tt addi rd$'$, x2, zimm[9:2]}.
+
+
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{S@{}W@{}T@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbitrange{11}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{shamt[5]} &
+\multicolumn{1}{c|}{rd/rs1} &
+\multicolumn{1}{c|}{shamt[4:0]} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 1 & 5 & 5 & 2 \\
+C.SLLI & shamt[5] & dest$\neq$0 & shamt[4:0] & C2 \\
+\end{tabular}
+\end{center}
+
+C.SLLI is a CI-format instruction that performs a logical left shift
+of the value in register {\em rd} then writes the result to {\em rd}.
+The shift amount is encoded in the {\em shamt} field, where {\em
+ shamt[5]} must be zero for RV32C. For RV32C and RV64C, the shift
+amount must be non-zero. For RV128C, a shift amount of zero is used
+to encode a shift of 64. C.SLLI expands into {\tt slli rd, rd,
+ shamt[5:0]}, except for RV128C with {\tt shamt=0}, which expands to
+{\tt slli rd, rd, 64}.
+
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{S@{}W@{}Y@{}S@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbitrange{11}{10} &
+\instbitrange{9}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{shamt[5]} &
+\multicolumn{1}{|c|}{funct2} &
+\multicolumn{1}{c|}{rd$'$/rs1$'$} &
+\multicolumn{1}{c|}{shamt[4:0]} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 1 & 2 & 3 & 5 & 2 \\
+C.SRLI & shamt[5] & C.SRLI & dest & shamt[4:0] & C1 \\
+C.SRAI & shamt[5] & C.SRAI & dest & shamt[4:0] & C1 \\
+\end{tabular}
+\end{center}
+
+C.SRLI is a CB-format instruction that performs a logical right shift
+of the value in register {\em rd$'$} then writes the result to {\em rd$'$}.
+The shift amount is encoded in the {\em shamt} field, where {\em
+ shamt[5]} must be zero for RV32C. For RV32C and RV64C, the shift
+amount must be non-zero. For RV128C, a shift amount of zero is used
+to encode a shift of 64. Furthermore, the shift amount is sign-extended
+for RV128C, and so the legal shift amounts are 1--31, 64, and 96--127.
+C.SRLI expands into {\tt srli rd$'$, rd$'$, shamt[5:0]},
+except for RV128C with {\tt shamt=0}, which expands to
+{\tt srli rd$'$, rd$'$, 64}.
+
+C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic
+right shift.
+C.SRAI expands to {\tt srai rd$'$, rd$'$, shamt[5:0]}.
+
+\begin{commentary}
+Left shifts are usually more frequent than right shifts, as left
+shifts are frequently used to scale address values. Right shifts have
+therefore been granted less encoding space and are placed in an
+encoding quadrant where all other immediates are sign-extended. For
+RV128, the decision was made to have the 6-bit shift-amount immediate
+also be sign-extended. Apart from reducing the decode complexity, we
+believe right-shift amounts of 96--127 will be more useful than 64--95,
+to allow extraction of tags located in the high portions of 128-bit
+address pointers. We note that RV128C will not be frozen at the same
+point as RV32C and RV64C, to allow evaluation of typical usage of
+128-bit address-space codes.
+\end{commentary}
+
+\begin{center}
+\begin{tabular}{S@{}W@{}Y@{}S@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbitrange{11}{10} &
+\instbitrange{9}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm[5]} &
+\multicolumn{1}{|c|}{funct2} &
+\multicolumn{1}{c|}{rd$'$/rs1$'$} &
+\multicolumn{1}{c|}{imm[4:0]} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 1 & 2 & 3 & 5 & 2 \\
+C.ANDI & imm[5] & C.ANDI & dest & imm[4:0] & C1 \\
+\end{tabular}
+\end{center}
+
+C.ANDI is a CB-format instruction that computes the bitwise AND of
+of the value in register {\em rd$'$} and the sign-extended 6-bit immediate,
+then writes the result to {\em rd$'$}.
+C.ANDI expands to {\tt andi rd$'$, rd$'$, imm[5:0]}.
+
+\subsection*{Integer Register-Register Operations}
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{E@{}T@{}T@{}Y}
+\\
+\instbitrange{15}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct4} &
+\multicolumn{1}{c|}{rd/rs1} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{op} \\
+\hline
+4 & 5 & 5 & 2 \\
+C.MV & dest$\neq$0 & src$\neq$0 & C0 \\
+C.ADD & dest$\neq$0 & src$\neq$0 & C0 \\
+\end{tabular}
+\end{center}
+These instructions use the CR format.
+
+C.MV copies the value in register {\em rs2} into register {\em rd}. C.MV
+expands into {\tt add rd, x0, rs2}.
+
+C.ADD adds the values in registers {\em rd} and {\em rs2} and writes the
+result to register {\em rd}. C.ADD expands into {\tt add rd, rd, rs2}.
+
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{M@{}S@{}Y@{}S@{}Y}
+\\
+\instbitrange{15}{10} &
+\instbitrange{9}{7} &
+\instbitrange{6}{5} &
+\instbitrange{4}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct6} &
+\multicolumn{1}{c|}{rd$'$/rs1$'$} &
+\multicolumn{1}{c|}{funct} &
+\multicolumn{1}{c|}{rs2$'$} &
+\multicolumn{1}{c|}{op} \\
+\hline
+6 & 3 & 2 & 3 & 2 \\
+C.AND & dest & C.AND & src & C1 \\
+C.OR & dest & C.OR & src & C1 \\
+C.XOR & dest & C.XOR & src & C1 \\
+C.SUB & dest & C.SUB & src & C1 \\
+C.ADDW & dest & C.ADDW & src & C1 \\
+C.SUBW & dest & C.SUBW & src & C1 \\
+\end{tabular}
+\end{center}
+
+These instructions use the CS format.
+
+C.AND computes the bitwise AND of the values in registers {\em rd$'$}
+and {\em rs2$'$}, then writes the result to register {\em rd$'$}.
+C.AND expands into {\tt and rd$'$, rd$'$, rs2$'$}.
+
+C.OR computes the bitwise OR of the values in registers {\em rd$'$}
+and {\em rs2$'$}, then writes the result to register {\em rd$'$}.
+C.OR expands into {\tt or rd$'$, rd$'$, rs2$'$}.
+
+C.XOR computes the bitwise XOR of the values in registers {\em rd$'$}
+and {\em rs2$'$}, then writes the result to register {\em rd$'$}.
+C.XOR expands into {\tt xor rd$'$, rd$'$, rs2$'$}.
+
+C.SUB subtracts the value in register {\em rs2$'$} from the value in
+register {\em rd$'$}, then writes the result to register {\em rd$'$}.
+C.SUB expands into {\tt sub rd$'$, rd$'$, rs2$'$}.
+
+C.ADDW is an RV64C/RV128C-only instruction that adds the values in
+registers {\em rd$'$} and {\em rs2$'$}, then sign-extends the lower
+32 bits of the sum before writing the result to register {\em rd$'$}.
+C.ADDW expands into {\tt addw rd$'$, rd$'$, rs2$'$}.
+
+C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in
+register {\em rs2$'$} from the value in register {\em rd$'$}, then
+sign-extends the lower 32 bits of the difference before writing the result
+to register {\em rd$'$}. C.SUBW expands into {\tt subw rd$'$, rd$'$, rs2$'$}.
+
+\begin{commentary}
+This group of six instructions do not provide large savings
+individually, but do not occupy much encoding space and are
+straightforward to implement, and as a group provide a worthwhile
+improvement in static and dynamic compression.
+\end{commentary}
+
+\subsection*{Defined Illegal Instruction}
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{SW@{}T@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbitrange{11}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{0} &
+\multicolumn{1}{c|}{0} &
+\multicolumn{1}{c|}{0} &
+\multicolumn{1}{c|}{0} &
+\multicolumn{1}{c|}{0} \\
+\hline
+3 & 1 & 5 & 5 & 2 \\
+0 & 0 & 0 & 0 & 0 \\
+\end{tabular}
+\end{center}
+
+A 16-bit instruction with all bits zero is permanently reserved as an
+illegal instruction.
+\begin{commentary}
+We reserve all-zero instructions to be illegal instructions to help
+trap attempts to execute zero-ed or non-existent portions of the
+memory space. The all-zero value should not be redefined in any
+non-standard extension. Similarly, we reserve instructions with all
+bits set to 1 (corresponding to very long instructions in the RISC-V
+variable-length encoding scheme) as illegal to capture another common
+value seen in non-existent memory regions.
+\end{commentary}
+
+\subsection*{NOP Instruction}
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{SW@{}T@{}T@{}Y}
+\\
+\instbitrange{15}{13} &
+\multicolumn{1}{c}{\instbit{12}} &
+\instbitrange{11}{7} &
+\instbitrange{6}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct3} &
+\multicolumn{1}{c|}{imm[5]} &
+\multicolumn{1}{c|}{rd/rs1} &
+\multicolumn{1}{c|}{imm[4:0]} &
+\multicolumn{1}{c|}{op} \\
+\hline
+3 & 1 & 5 & 5 & 2 \\
+C.NOP & 0 & 0 & 0 & C1 \\
+\end{tabular}
+\end{center}
+
+C.NOP is a CI-format instruction that does not change any user-visible state,
+except for advancing the {\tt pc}. C.NOP is encoded as {\tt c.addi x0, 0} and
+so expands to {\tt addi x0, x0, 0}.
+
+\subsection*{Breakpoint Instruction}
+\vspace{-0.4in}
+\begin{center}
+\begin{tabular}{E@{}U@{}Y}
+\\
+\instbitrange{15}{12} &
+\instbitrange{11}{2} &
+\instbitrange{1}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct4} &
+\multicolumn{1}{c|}{0} &
+\multicolumn{1}{c|}{op} \\
+\hline
+4 & 10 & 2 \\
+C.EBREAK & 0 & C0 \\
+\end{tabular}
+\end{center}
+
+Debuggers can use the C.EBREAK instruction, which expands to {\tt ebreak},
+to cause control to be transferred back to the debugging environment.
+C.EBREAK shares the opcode with the C.ADD instruction, but with {\em
+ rd} and {\em rs2} both zero, thus can also use the CR format.
+
+\section{Usage of C Instructions in LR/SC Sequences}
+
+On implementations that support the C extension, compressed forms of
+the I instructions permitted inside LR/SC sequences can be used while
+retaining the guarantee of eventual success, as described in
+Section~\ref{lrscseq}.
+
+\begin{commentary}
+The implication is that any implementation that claims to support both
+the A and C extensions must ensure that LR/SC sequences containing
+valid C instructions will eventually complete.
+\end{commentary}
+
+\clearpage
+
+\section{RVC Instruction Set Listings}
+
+Table~\ref{rvcopcodemap} shows a map of the major opcodes for RVC.
+Opcodes with the lower two bits set correspond to instructions wider
+than 16 bits, including those in the base ISAs. Several instructions
+are only valid for certain operands; when invalid, they are marked
+either {\em RES} to indicate that the opcode is reserved for future
+standard extensions; {\em NSE} to indicate that the opcode is reserved
+for non-standard extensions; or {\em HINT} to indicate that the opcode
+is reserved for future standard microarchitectural hints.
+Instructions marked {\em HINT} must execute as no-ops on
+implementations for which the hint has no effect.
+
+\begin{commentary}
+The HINT instructions are designed to support future addition of
+microarchitectural hints that might affect performance but cannot
+affect architectural state. The HINT encodings have been chosen so
+that simple implementations can ignore the HINT encoding and execute
+the HINT as a regular operation that does not change architectural
+state. For example, C.ADD is a HINT if the destination register is
+{\tt x0}, where the five-bit rs2 field encodes details of the HINT.
+However, a simple implementation can simply execute the HINT as an add
+to register {\tt x0}, which will have no effect.
+\end{commentary}
+
+\input{rvc-opcode-map}
+
+Tables~\ref{rvc-instr-table0}--\ref{rvc-instr-table2} list the RVC instructions.
+\input{rvc-instr-table}