diff options
author | Andrew Waterman <andrew@sifive.com> | 2017-02-01 20:41:47 -0800 |
---|---|---|
committer | Andrew Waterman <andrew@sifive.com> | 2017-02-01 20:41:47 -0800 |
commit | ab6f8c9bd7bc85361fcf35667d1fddfaf367a53f (patch) | |
tree | 716a2118ca0565dbb4e7903723f283ae4dd13c46 /src/c.tex | |
parent | 207a7c6ee51aa2fd74d4618cd1369ddc21706b9e (diff) | |
download | riscv-isa-manual-ab6f8c9bd7bc85361fcf35667d1fddfaf367a53f.zip riscv-isa-manual-ab6f8c9bd7bc85361fcf35667d1fddfaf367a53f.tar.gz riscv-isa-manual-ab6f8c9bd7bc85361fcf35667d1fddfaf367a53f.tar.bz2 |
Reorganize directory structure
Diffstat (limited to 'src/c.tex')
-rw-r--r-- | src/c.tex | 1162 |
1 files changed, 1162 insertions, 0 deletions
diff --git a/src/c.tex b/src/c.tex new file mode 100644 index 0000000..2c81f7b --- /dev/null +++ b/src/c.tex @@ -0,0 +1,1162 @@ +\chapter{``C'' Standard Extension for Compressed Instructions, Version +1.9} +\label{compressed} + +This chapter describes the current draft proposal for the RISC-V +standard compressed instruction set extension, named ``C'', which +reduces static and dynamic code size by adding short 16-bit +instruction encodings for common operations. The C extension can be +added to any of the base ISAs (RV32, RV64, RV128), and we use the +generic term ``RVC'' to cover any of these. Typically, 50\%--60\% of +the RISC-V instructions in a program can be replaced with RVC +instructions, resulting in a 25\%--30\% code-size reduction. + +We believe this draft represents the close to final design for RV32C +and RV64C (it seems premature to freeze R128C), though we are +requesting one more round of comments, hence the 1.9 revision number. +Please send your comments to the {\tt isa-dev} mailing list at {\tt + isa-dev@lists.riscv.org}. + +\section{Overview} + +RVC uses a simple compression scheme that offers shorter 16-bit +versions of common 32-bit RISC-V instructions when: +\begin{tightlist} + \item the immediate or address offset is small, or + \item one of the registers is the zero register ({\tt x0}), the + ABI link register ({\tt x1}), or the ABI stack pointer ({\tt + x2}), or + \item the destination register and the first source register are + identical, or + \item the registers used are the 8 most popular ones. +\end{tightlist} + +The C extension is compatible with all other standard instruction +extensions. The C extension allows 16-bit instructions to be freely +intermixed with 32-bit instructions, with the latter now able to start +on any 16-bit boundary. + +\begin{commentary} +Removing the 32-bit alignment constraint on the original 32-bit +instructions allows significantly greater code density. +\end{commentary} + +The compressed instruction encodings are mostly common across RV32C, +RV64C, and RV128C, but as shown in Table~\ref{rvcopcodemap}, a few +opcodes are used for different purposes depending on base ISA width. +For example, the wider address-space RV64C and RV128C variants require +additional opcodes to compress loads and stores of 64-bit integer +values, while RV32C uses the same opcodes to compress loads and stores +of single-precision floating-point values. Similarly, RV128C requires +additional opcodes to capture loads and stores of 128-bit integer +values, while these same opcodes are used for loads and stores of +double-precision floating-point values in RV32C and RV64C. If the C +extension is implemented, the appropriate compressed floating-point +load and store instructions must be provided whenever the relevant +standard floating-point extension (F and/or D) is also implemented. +In addition, RV32C includes a compressed jump and link instruction to +compress short-range subroutine calls, where the same opcode is used +to compress ADDIW for RV64C and RV128C. + +\begin{commentary} +Double-precision loads and stores are a significant fraction of static +and dynamic instructions, hence the motivation to include them in the +RV32C and RV64C encoding. + +Although single-precision loads and stores are not a significant +source of static or dynamic compression for benchmarks compiled for +the currently supported ABIs, for microcontrollers that only provide +hardware single-precision floating-point units and have an ABI that +only supports single-precision floating-point numbers, the +single-precision loads and stores will be used at least as frequently +as double-precision loads and stores in the measured benchmarks. +Hence, the motivation to provide compressed support for these in +RV32C. + +Short-range subroutine calls are more likely in small binaries for +microcontrollers, hence the motivation to include these in RV32C. + +Although reusing opcodes for different purposes for different base +register widths adds some complexity to documentation, the impact on +implementation complexity is small even for designs that support +multiple base ISA register widths. The compressed floating-point load +and store variants use the same instruction format with the same +register specifiers as the wider integer loads and stores. +\end{commentary} + +RVC was designed under the constraint that each RVC instruction +expands into a single 32-bit instruction in either the base ISA +(RV32I/E, RV64I, or RV128I) or the F and D standard extensions where +present. Adopting this constraint has two main benefits: + +\begin{tightlist} +\item Hardware designs can simply expand RVC instructions during + decode, simplifying verification and minimizing modifications to + existing microarchitectures. +\item Compilers can be unaware of the RVC extension and leave code + compression to the assembler and linker, although a + compression-aware compiler will generally be able to produce better + results. +\end{tightlist} + +\begin{commentary} +We felt the multiple complexity reductions of a simple one-one mapping +between C and base IFD instructions far outweighed the potential gains +of a slightly denser encoding that added additional instructions only +supported in the C extension, or that allowed encoding of multiple IFD +instructions in one C instruction. +\end{commentary} + +It is important to note that the C extension is not designed to be a +stand-alone ISA, and is meant to be used alongside a base ISA. + +\begin{commentary} +Variable-length instruction sets have long been used to improve code +density. For example, the IBM Stretch~\cite{stretch}, developed in +the late 1950s, had an ISA with 32-bit and 64-bit instructions, where +some of the 32-bit instructions were compressed versions of the full +64-bit instructions. Stretch also employed the concept of limiting +the set of registers that were addressable in some of the shorter +instruction formats, with short branch instructions that could only +refer to one of the index registers. The later IBM 360 +architecture~\cite{ibm360} supported a simple variable-length +instruction encoding with 16-bit, 32-bit, or 48-bit instruction +formats. + +In 1963, CDC introduced the Cray-designed CDC 6600~\cite{cdc6600}, a +precursor to RISC architectures, that introduced a register-rich +load-store architecture with instructions of two lengths, 15-bits and +30-bits. The later Cray-1 design used a very similar instruction +format, with 16-bit and 32-bit instruction lengths. + +The initial RISC ISAs from the 1980s all picked performance over code +size, which was reasonable for a workstation environment, but not for +embedded systems. Hence, both ARM and MIPS subsequently made versions +of the ISAs that offered smaller code size by offering an alternative +16-bit wide instruction set instead of the standard 32-bit wide +instructions. The compressed RISC ISAs reduced code size relative to +their starting points by about 25--30\%, yielding code that was +significantly \emph{smaller} than 80x86. This result surprised some, +as their intuition was that the variable-length CISC ISA should be +smaller than RISC ISAs that offered only 16-bit and 32-bit formats. + +Since the original RISC ISAs did not leave sufficient opcode space +free to include these unplanned compressed instructions, they were +instead developed as complete new ISAs. This meant compilers needed +different code generators for the separate compressed ISAs. The first +compressed RISC ISA extensions (e.g., ARM Thumb and MIPS16) used only +a fixed 16-bit instruction size, which gave good reductions in static +code size but caused an increase in dynamic instruction count, which +led to lower performance compared to the original fixed-width 32-bit +instruction size. This led to the development of a second generation +of compressed RISC ISA designs with mixed 16-bit and 32-bit +instruction lengths (e.g., ARM Thumb2, microMIPS, PowerPC VLE), so +that performance was similar to pure 32-bit instructions but with +significant code size savings. Unfortunately, these different +generations of compressed ISAs are incompatible with each other and +with the original uncompressed ISA, leading to significant complexity +in documentation, implementations, and software tools support. + +Of the commonly used 64-bit ISAs, only PowerPC and microMIPS currently +supports a compressed instruction format. It is surprising that the +most popular 64-bit ISA for mobile platforms (ARM v8) does not include +a compressed instruction format given that static code size and +dynamic instruction fetch bandwidth are important metrics. Although +static code size is not a major concern in larger systems, instruction +fetch bandwidth can be a major bottleneck in servers running +commercial workloads, which often have a large instruction working +set. + +Benefiting from 25 years of hindsight, RISC-V was designed to support +compressed instructions from the outset, leaving enough opcode +space for RVC to be added as a simple extension on top of the base ISA +(along with many other extensions). The philosophy of RVC is to +reduce code size for embedded applications \emph{and} to improve +performance and energy-efficiency for all applications due to fewer +misses in the instruction cache. Waterman shows that RVC fetches +25\%-30\% fewer instruction bits, which reduces instruction cache +misses by 20\%-25\%, or roughly the same performance impact as +doubling the instruction cache size~\cite{waterman-ms}. +\end{commentary} + + +\section{Compressed Instruction Formats} + +Table~\ref{formats} shows the eight compressed instruction +formats. CR, CI, and CSS can use any of the 32 RVI registers, but CIW, +CL, CS, and CB are limited to just 8 of them. Table~\ref{registers} +lists these popular registers, which correspond to registers {\tt x8} +to {\tt x15}. Note that there is a +separate version of load and store instructions that use the stack +pointer as the base address register, since saving to and restoring +from the stack are so prevalent, and that they use the CI and CSS +formats to allow access to all 32 data registers. CIW supplies an +8-bit immediate for the ADDI4SPN instruction. + +\begin{commentary} +The RISC-V ABI was changed to make the frequently used registers map +to registers {\tt x8}--{\tt x15}. This simplifies the decompression +decoder by having a contiguous naturally aligned set of register +numbers, and is also compatible with the RV32E subset base +specification, which only has 16 integer registers. +\end{commentary} + +Compressed register-based floating-point loads and stores also use the +CL and CS formats respectively, with the eight registers mapping to +{\tt f8} to {\tt f15}. + +\begin{commentary} +The standard RISC-V calling convention maps the most frequently used +floating-point registers to registers {\tt f8} to {\tt f15}, which +allows the same register decompression decoding as for integer +register numbers. +\end{commentary} + +The formats were designed to keep bits for the two register source +specifiers in the same place in all instructions, while the +destination register field can move. When the full 5-bit destination +register specifier is present, it is in the same place as in the +32-bit RISC-V encoding. Where immediates are +sign-extended, the sign-extension is always from bit 12. Immediate +fields have been scrambled, as in the base specification, to reduce +the number of immediate muxes required. + +\begin{commentary} +The immediate fields are scrambled in the instruction formats instead +of in sequential order so that as many bits as possible are in the +same position in every instruction, thereby simplifying +implementations. For example, immediate bits 17---10 are always sourced from +the same instruction bit positions. Five other immediate bits (5, 4, +3, 1, and 0) have just two source instruction bits, while four (9, 7, +6, and 2) have three sources and one (8) has four sources. +\end{commentary} + +For many RVC instructions, zero-valued immediates are disallowed and +{\tt x0} is not a valid 5-bit register specifier. These restrictions +free up encoding space for other instructions requiring fewer operand +bits. + +\begin{table}[h] +{ +\begin{small} +\begin{center} +\begin{tabular}{c c p{0in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}p{0.05in}} +& & & & & & & & & \\ +Format & Meaning & +\instbit{15} & +\instbit{14} & +\instbit{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbit{11} & +\instbit{10} & +\instbit{9} & +\instbit{8} & +\instbit{7} & +\instbit{6} & +\multicolumn{1}{r}{\instbit{5}} & +\instbit{4} & +\instbit{3} & +\instbit{2} & +\instbit{1} & +\instbit{0} \\ +\cline{3-18} + +CR & Register & +\multicolumn{4}{|c|}{funct4} & +\multicolumn{5}{c|}{rd/rs1} & +\multicolumn{5}{c|}{rs2} & +\multicolumn{2}{c|}{op} \\ +\cline{3-18} + +CI & Immediate & +\multicolumn{3}{|c|}{funct3} & +\multicolumn{1}{c|}{imm} & +\multicolumn{5}{c|}{rd/rs1} & +\multicolumn{5}{c|}{imm} & +\multicolumn{2}{c|}{op} \\ +\cline{3-18} + +CSS & Stack-relative Store & +\multicolumn{3}{|c|}{funct3} & +\multicolumn{6}{c|}{imm} & +\multicolumn{5}{c|}{rs2} & +\multicolumn{2}{c|}{op} \\ +\cline{3-18} + +CIW & Wide Immediate & +\multicolumn{3}{|c|}{funct3} & +\multicolumn{8}{c|}{imm} & +\multicolumn{3}{c|}{rd$'$} & +\multicolumn{2}{c|}{op} \\ +\cline{3-18} + +CL & Load & +\multicolumn{3}{|c|}{funct3} & +\multicolumn{3}{c|}{imm} & +\multicolumn{3}{c|}{rs1$'$} & +\multicolumn{2}{c|}{imm} & +\multicolumn{3}{c|}{rd$'$} & +\multicolumn{2}{c|}{op} \\ +\cline{3-18} + +CS & Store & +\multicolumn{3}{|c|}{funct3} & +\multicolumn{3}{c|}{imm} & +\multicolumn{3}{c|}{rs1$'$} & +\multicolumn{2}{c|}{imm} & +\multicolumn{3}{c|}{rs2$'$} & +\multicolumn{2}{c|}{op} \\ +\cline{3-18} + +CB & Branch & +\multicolumn{3}{|c|}{funct3} & +\multicolumn{3}{c|}{offset} & +\multicolumn{3}{c|}{rs1$'$} & +\multicolumn{5}{c|}{offset} & +\multicolumn{2}{c|}{op} \\ +\cline{3-18} + +CJ & Jump & +\multicolumn{3}{|c|}{funct3} & +\multicolumn{11}{c|}{jump target} & +\multicolumn{2}{c|}{op} \\ +\cline{3-18} + +\end{tabular} +\end{center} +\end{small} +} +\caption{Compressed 16-bit RVC instruction formats.} +\label{formats} +\end{table} + + +\begin{table}[H] +{ +\begin{center} +\begin{tabular}{l|c|c|c|c|c|c|c|c|} +\cline{2-9} +RVC Register Number & 000 & 001 & 010 & 011 & 100 & 101 & 110 & 111 +\\ \cline{2-9} +Integer Register Number & {\tt x8} & {\tt x9} & {\tt x10} & {\tt x11} & {\tt x12} & {\tt x13} & {\tt x14} & {\tt x15} \\ \cline{2-9} +Integer Register ABI Name & {\tt s0} & {\tt s1} & {\tt a0} & {\tt a1} & {\tt a2} & {\tt a3} & {\tt a4} & {\tt a5} \\ \cline{2-9} +Floating-Point Register Number & {\tt f8} & {\tt f9} & {\tt f10} & {\tt f11} & {\tt f12} & {\tt f13} & {\tt f14} & {\tt f15} \\ \cline{2-9} +Floating-Point Register ABI Name & {\tt fs0} & {\tt fs1} & {\tt fa0} & {\tt fa1} & {\tt fa2} & {\tt fa3} & {\tt fa4} & {\tt fa5} \\ \cline{2-9} +\end{tabular} +\end{center} +} +\caption{Registers specified by the three-bit rs1', rs2', and rd' fields of the CIW, CL, CS, and CB formats.} +\label{registers} +\end{table} + +\section{Load and Store Instructions} + +To increase the reach of 16-bit instructions, data-transfer +instructions use zero-extended immediates that are scaled by the size +of the data in bytes: $\times$4 for words, $\times$8 for double words, +and $\times$16 for quad words. + +RVC provides two variants of loads and stores. One uses the ABI stack +pointer, {\tt x2}, as the base address and can target any data register. The +other can reference one of 8 base address registers and one of 8 data +registers. + +\subsection*{Stack-Pointer-Based Loads and Stores} + +\begin{center} +\begin{tabular}{S@{}W@{}T@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbitrange{11}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 1 & 5 & 5 & 2 \\ +C.LWSP & offset[5] & dest$\neq$0 & offset[4:2$\vert$7:6] & C2 \\ +C.LDSP & offset[5] & dest$\neq$0 & offset[4:3$\vert$8:6] & C2 \\ +C.LQSP & offset[5] & dest$\neq$0 & offset[4$\vert$9:6] & C2 \\ +C.FLWSP& offset[5] & dest & offset[4:2$\vert$7:6] & C2 \\ +C.FLDSP& offset[5] & dest & offset[4:3$\vert$8:6] & C2 \\ +\end{tabular} +\end{center} +These instructions use the CI format. + +C.LWSP loads a 32-bit value from memory into register {\em rd}. It computes +an effective address by adding the {\em zero}-extended offset, scaled by 4, to +the stack pointer, {\tt x2}. It expands to {\tt lw rd, offset[7:2](x2)}. + +C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into +register {\em rd}. It computes its effective address by adding the +zero-extended offset, scaled by 8, to the stack pointer, {\tt x2}. +It expands to {\tt ld rd, offset[8:3](x2)}. + +C.LQSP is an RV128C-only instruction that loads a 128-bit value from memory +into register {\em rd}. It computes its effective address by adding the +zero-extended offset, scaled by 16, to the stack pointer, {\tt x2}. +It expands to {\tt lq rd, offset[9:4](x2)}. + +C.FLWSP is an RV32FC-only instruction that loads a single-precision +floating-point value from memory into floating-point register {\em rd}. It +computes its effective address by adding the {\em zero}-extended offset, +scaled by 4, to the stack pointer, {\tt x2}. It expands to {\tt flw rd, +offset[7:2](x2)}. + +C.FLDSP is an RV32DC/RV64DC-only instruction that loads a double-precision +floating-point value from memory into floating-point register {\em rd}. It +computes its effective address by adding the {\em zero}-extended offset, +scaled by 8, to the stack pointer, {\tt x2}. It expands to {\tt fld rd, +offset[8:3](x2)}. + +\begin{center} +\begin{tabular}{S@{}M@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\instbitrange{12}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 6 & 5 & 2 \\ +C.SWSP & offset[5:2$\vert$7:6] & src & C2 \\ +C.SDSP & offset[5:3$\vert$8:6] & src & C2 \\ +C.SQSP & offset[5:4$\vert$9:6] & src & C2 \\ +C.FSWSP& offset[5:2$\vert$7:6] & src & C2 \\ +C.FSDSP& offset[5:3$\vert$8:6] & src & C2 \\ +\end{tabular} +\end{center} +These instructions use the CSS format. + +C.SWSP stores a 32-bit value in register {\em rs2} to memory. It computes +an effective address by adding the {\em zero}-extended offset, scaled by 4, to +the stack pointer, {\tt x2}. +It expands to {\tt sw rs2, offset[7:2](x2)}. + +C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in register +{\em rs2} to memory. It computes an effective address by adding the {\em +zero}-extended offset, scaled by 8, to the stack pointer, {\tt x2}. +It expands to {\tt sd rs2, offset[8:3](x2)}. + +C.SQSP is an RV128C-only instruction that stores a 128-bit value in register +{\em rs2} to memory. It computes an effective address by adding the {\em +zero}-extended offset, scaled by 16, to the stack pointer, {\tt x2}. +It expands to {\tt sq rs2, offset[9:4](x2)}. + +C.FSWSP is an RV32FC-only instruction that stores a single-precision +floating-point value in floating-point register {\em rs2} to memory. It +computes an effective address by adding the {\em zero}-extended offset, scaled +by 4, to the stack pointer, {\tt x2}. It expands to {\tt fsw rs2, +offset[7:2](x2)}. + +C.FSDSP is an RV32DC/RV64DC-only instruction that stores a double-precision +floating-point value in floating-point register {\em rs2} to memory. It +computes an effective address by adding the {\em zero}-extended offset, scaled +by 8, to the stack pointer, {\tt x2}. It expands to {\tt fsd rs2, +offset[8:3](x2)}. + +\begin{commentary} +Register save/restore code at function entry/exit represents a +significant portion of static code size. The stack-pointer-based +compressed loads and stores in RVC are effective at reducing the +save/restore static code size by a factor of 2 while improving +performance by reducing dynamic instruction bandwidth. + +A common mechanism used in other ISAs to further reduce +save/restore code size is load-multiple and store-multiple +instructions. We considered adopting these for RISC-V but noted the +following drawbacks to these instructions: +\begin{itemize} +\item These instructions complicate processor implementations. +\item For virtual memory systems, some data accesses could be + resident in physical memory and some could not, which requires a + new restart mechanism for partially executed instructions. +\item Unlike the rest of the RVC instructions, there is no IFD + equivalent to Load Multiple and Store Multiple. +\item Unlike the rest of the RVC instructions, the compiler would + have to be aware of these instructions to both generate the + instructions and to allocate registers in an order to maximize + the chances of the them being saved and stored, since they would + be saved and restored in sequential order. +\item Simple microarchitectural implementations will constrain how + other instructions can be scheduled around the load and store + multiple instructions, leading to a potential performance loss. +\item The desire for sequential register allocation might conflict with + the featured registers selected for the CIW, CL, CS, and CB formats. +\end{itemize} +Furthermore, much of the gains can be realized in software by replacing +prologue and epilogue code with subroutine calls to common +prologue and epilogue code, a technique described in +Section 5.6 of~\cite{waterman-phd}. + +While reasonable architects might come to different conclusions, we +decided to omit load and store multiple and instead use the +software-only approach of calling save/restore millicode routines to +attain the greatest code size reduction. +\end{commentary} + +\subsection*{Register-Based Loads and Stores} + +\begin{center} +\begin{tabular}{S@{}S@{}S@{}Y@{}S@{}Y} +\\ +\instbitrange{15}{13} & +\instbitrange{12}{10} & +\instbitrange{9}{7} & +\instbitrange{6}{5} & +\instbitrange{4}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{rs1$'$} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{rd$'$} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 3 & 3 & 2 & 3 & 2 \\ +C.LW & offset[5:3] & base & offset[2$\vert$6] & dest & C0 \\ +C.LD & offset[5:3] & base & offset[7:6] & dest & C0 \\ +C.LQ & offset[5$\vert$4$\vert$8] & base & offset[7:6] & dest & C0 \\ +C.FLW& offset[5:3] & base & offset[2$\vert$6] & dest & C0 \\ +C.FLD& offset[5:3] & base & offset[7:6] & dest & C0 \\ +\end{tabular} +\end{center} +These instructions use the CL format. + +C.LW loads a 32-bit value from memory into register {\em rd$'$}. It computes +an effective address by adding the {\em zero}-extended offset, scaled by 4, to +the base address in register {\em rs1$'$}. +It expands to {\tt lw rd$'$, offset[6:2](rs1$'$)}. + +C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into +register {\em rd$'$}. It computes an effective address by adding the {\em +zero}-extended offset, scaled by 8, to the base address in register {\em +rs1$'$}. +It expands to {\tt ld rd$'$, offset[7:3](rs1$'$)}. + +C.LQ is an RV128C-only instruction that loads a 128-bit value from memory into +register {\em rd$'$}. It computes an effective address by adding the {\em +zero}-extended offset, scaled by 16, to the base address in register {\em +rs1$'$}. +It expands to {\tt lq rd$'$, offset[8:4](rs1$'$)}. + +C.FLW is an RV32FC-only instruction that loads a single-precision +floating-point value from memory into floating-point register {\em rd$'$}. It +computes an effective address by adding the {\em zero}-extended offset, scaled +by 4, to the base address in register {\em rs1$'$}. It expands to {\tt flw +rd$'$, offset[6:2](rs1$'$)}. + +C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision +floating-point value from memory into floating-point register {\em rd$'$}. It +computes an effective address by adding the {\em zero}-extended offset, scaled +by 8, to the base address in register {\em rs1$'$}. It expands to {\tt fld +rd$'$, offset[7:3](rs1$'$)}. + +\begin{center} +\begin{tabular}{S@{}S@{}S@{}Y@{}S@{}Y} +\\ +\instbitrange{15}{13} & +\instbitrange{12}{10} & +\instbitrange{9}{7} & +\instbitrange{6}{5} & +\instbitrange{4}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{rs1$'$} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{rs2$'$} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 3 & 3 & 2 & 3 & 2 \\ +C.SW & offset[5:3] & base & offset[2$\vert$6] & src & C0 \\ +C.SD & offset[5:3] & base & offset[7:6] & src & C0 \\ +C.SQ & offset[5$\vert$4$\vert$8] & base & offset[7:6] & src & C0 \\ +C.FSW& offset[5:3] & base & offset[2$\vert$6] & src & C0 \\ +C.FSD& offset[5:3] & base & offset[7:6] & src & C0 \\ +\end{tabular} +\end{center} +These instructions use the CS format. + +C.SW stores a 32-bit value in register {\em rs2$'$} to memory. It computes an +effective address by adding the {\em zero}-extended offset, scaled by 4, to +the base address in register {\em rs1$'$}. +It expands to {\tt sw rs2$'$, offset[6:2](rs1$'$)}. + +C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in +register {\em rs2$'$} to memory. It computes an effective address by adding +the {\em zero}-extended offset, scaled by 8, to the base address in register +{\em rs1$'$}. +It expands to {\tt sd rs2$'$, offset[7:3](rs1$'$)}. + +C.SQ is an RV128C-only instruction that stores a 128-bit value in register +{\em rs2$'$} to memory. It computes an effective address by adding the {\em +zero}-extended offset, scaled by 16, to the base address in register {\em +rs1$'$}. +It expands to {\tt sq rs2$'$, offset[8:4](rs1$'$)}. + +C.FSW is an RV32FC-only instruction that stores a single-precision +floating-point value in floating-point register {\em rs2$'$} to memory. It +computes an effective address by adding the {\em zero}-extended offset, scaled +by 4, to the base address in register {\em rs1$'$}. It expands to {\tt fsw +rs2$'$, offset[6:2](rs1$'$)}. + +C.FSD is an RV32DC/RV64DC-only instruction that stores a double-precision +floating-point value in floating-point register {\em rs2$'$} to memory. It +computes an effective address by adding the {\em zero}-extended offset, scaled +by 8, to the base address in register {\em rs1$'$}. It expands to {\tt fsd +rs2$'$, offset[7:3](rs1$'$)}. + +\section{Control Transfer Instructions} + +RVC provides unconditional jump instructions and conditional branch +instructions. As with base RVI instructions, the offsets of all RVC +control transfer instruction are in multiples of 2 bytes. + +\begin{center} +\begin{tabular}{S@{}L@{}Y} +\\ +\instbitrange{15}{13} & +\instbitrange{12}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 11 & 2 \\ +C.J & offset[11$\vert$4$\vert$9:8$\vert$10$\vert$6$\vert$7$\vert$3:1$\vert$5] & C1 \\ +C.JAL & offset[11$\vert$4$\vert$9:8$\vert$10$\vert$6$\vert$7$\vert$3:1$\vert$5] & C1 \\ +\end{tabular} +\end{center} +These instructions use the CJ format. + +C.J performs an unconditional control transfer. The offset is sign-extended and +added to the {\tt pc} to form the jump target address. C.J can therefore target +a $\pm$\wunits{2}{KiB} range. C.J expands to {\tt jal x0, offset[11:1]}. + +C.JAL is an RV32C-only instruction that performs the same operation as C.J, +but additionally writes the address of the instruction following the jump +({\tt pc}+2) to the link register, {\tt x1}. C.JAL expands to {\tt jal x1, +offset[11:1]}. + +\begin{center} +\begin{tabular}{E@{}T@{}T@{}Y} +\\ +\instbitrange{15}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct4} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{op} \\ +\hline +4 & 5 & 5 & 2 \\ +C.JR & src$\neq$0 & 0 & C2 \\ +C.JALR & src$\neq$0 & 0 & C2 \\ +\end{tabular} +\end{center} +These instructions use the CR format. + +C.JR (jump register) performs an unconditional control transfer to +the address in register {\em rs1}. C.JR expands to {\tt jalr x0, rs1, 0}. + +C.JALR (jump and link register) performs the same operation as C.JR, +but additionally writes the address of the instruction following the +jump ({\tt pc}+2) to the link register, {\tt x1}. C.JALR expands to +{\tt jalr x1, rs1, 0}. + +\begin{commentary} +Strictly speaking, C.JALR does not expand exactly to a base RVI +instruction as the value added to the PC to form the link address is 2 +rather than 4 as in the base ISA, but supporting both offsets of 2 and +4 bytes is only a very minor change to the base microarchitecture. +\end{commentary} + +\begin{center} +\begin{tabular}{S@{}S@{}S@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\instbitrange{12}{10} & +\instbitrange{9}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{rs1$'$} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 3 & 3 & 5 & 2 \\ +C.BEQZ & offset[8$\vert$4:3] & src & offset[7:6$\vert$2:1$\vert$5] & C1 \\ +C.BNEZ & offset[8$\vert$4:3] & src & offset[7:6$\vert$2:1$\vert$5] & C1 \\ +\end{tabular} +\end{center} +These instructions use the CB format. + +C.BEQZ performs conditional control transfers. The offset is sign-extended +and added to the {\tt pc} to form the branch target address. It can +therefore target a $\pm$\wunits{256}{B} range. C.BEQZ takes the branch if the +value in register {\em rs1$'$} is zero. It expands to {\tt beq rs1$'$, x0, +offset[8:1]}. + +C.BNEZ is defined analogously, but it takes the branch if {\em rs1$'$} contains +a nonzero value. It expands to {\tt bne rs1$'$, x0, offset[8:1]}. + +\section{Integer Computational Instructions} + +RVC provides several instructions for integer arithmetic and constant generation. + +\subsection*{Integer Constant-Generation Instructions} + +The two constant-generation instructions both use the CI instruction +format and can target any integer register. + +\vspace{-0.4in} +\begin{center} +\begin{tabular}{S@{}W@{}T@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbitrange{11}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm[5]} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{imm[4:0]} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 1 & 5 & 5 & 2 \\ +C.LI & imm[5] & dest$\neq$0 & imm[4:0] & C1 \\ +C.LUI & nzimm[17] & $\textrm{dest}{\neq}{\left\{0,2\right\}}$ & nzimm[16:12] & C1 \\ +\end{tabular} +\end{center} +C.LI loads the sign-extended 6-bit immediate, {\em imm}, into +register {\em rd}. C.LI is only valid when {\em rd}$\neq${\tt x0}. +C.LI expands into {\tt addi rd, x0, imm[5:0]}. + +C.LUI loads the non-zero 6-bit immediate field into bits 17--12 of the +destination register, clears the bottom 12 bits, and sign-extends bit +17 into all higher bits of the destination. C.LUI is only valid when +$\textit{rd}{\neq}{\left\{\texttt{x0},\texttt{x2}\right\}}$, +and when the immediate is not equal to zero. +C.LUI expands into {\tt lui rd, nzimm[17:12]}. + +\subsection*{Integer Register-Immediate Operations} + +These integer register-immediate operations are encoded in the CI +format and perform operations on any non-{\tt x0} integer register and +a 6-bit immediate. The immediate cannot be zero. + +\vspace{-0.4in} +\begin{center} +\begin{tabular}{S@{}W@{}T@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbitrange{11}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm[5]} & +\multicolumn{1}{c|}{rd/rs1} & +\multicolumn{1}{c|}{imm[4:0]} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 1 & 5 & 5 & 2 \\ +C.ADDI & nzimm[5] & dest & nzimm[4:0] & C1 \\ +C.ADDIW & imm[5] & dest$\neq$0 & imm[4:0] & C1 \\ +C.ADDI16SP & nzimm[9] & 2 & nzimm[4$\vert$6$\vert$8:7$\vert$5] & C1 \\ +\end{tabular} +\end{center} + +C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in +register {\em rd} then writes the result to {\em rd}. C.ADDI expands +into {\tt addi rd, rd, nzimm[5:0]}. + +C.ADDIW is an RV64C/RV128C-only instruction that performs the same +computation but produces a 32-bit result, then sign-extends result to +64 bits. C.ADDIW expands into {\tt addiw rd, rd, imm[5:0]}. The +immediate can be zero for C.ADDIW, where this corresponds to {\tt +sext.w rd}. + +C.ADDI16SP shares the opcode with C.LUI, but has a destination field +of {\tt x2}. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to +the value in the stack pointer ({\tt sp}={\tt x2}), where the +immediate is scaled to represent multiples of 16 in the range +(-512,496). C.ADDI16SP is used to adjust the stack pointer in procedure +prologues and epilogues. It expands into {\tt addi x2, x2, nzimm[9:4]}. + +\begin{commentary} +In the standard RISC-V calling convention, the stack pointer {\tt sp} +is always 16-byte aligned. +\end{commentary} + +\begin{center} +\begin{tabular}{@{}S@{}K@{}S@{}Y} +\\ +\instbitrange{15}{13} & +\instbitrange{12}{5} & +\instbitrange{4}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm} & +\multicolumn{1}{c|}{rd$'$} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 8 & 3 & 2 \\ +C.ADDI4SPN & zimm[5:4$\vert$9:6$\vert$2$\vert$3] & dest & C0 \\ +\end{tabular} +\end{center} + +C.ADDI4SPN is a CIW-format RV32C/RV64C-only instruction that adds a +{\em zero}-extended non-zero immediate, scaled by 4, to the stack pointer, +{\tt x2}, and writes the result to {\tt rd$'$}. This instruction is used +to generate pointers to stack-allocated variables, and expands to +{\tt addi rd$'$, x2, zimm[9:2]}. + + +\vspace{-0.4in} +\begin{center} +\begin{tabular}{S@{}W@{}T@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbitrange{11}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{shamt[5]} & +\multicolumn{1}{c|}{rd/rs1} & +\multicolumn{1}{c|}{shamt[4:0]} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 1 & 5 & 5 & 2 \\ +C.SLLI & shamt[5] & dest$\neq$0 & shamt[4:0] & C2 \\ +\end{tabular} +\end{center} + +C.SLLI is a CI-format instruction that performs a logical left shift +of the value in register {\em rd} then writes the result to {\em rd}. +The shift amount is encoded in the {\em shamt} field, where {\em + shamt[5]} must be zero for RV32C. For RV32C and RV64C, the shift +amount must be non-zero. For RV128C, a shift amount of zero is used +to encode a shift of 64. C.SLLI expands into {\tt slli rd, rd, + shamt[5:0]}, except for RV128C with {\tt shamt=0}, which expands to +{\tt slli rd, rd, 64}. + +\vspace{-0.4in} +\begin{center} +\begin{tabular}{S@{}W@{}Y@{}S@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbitrange{11}{10} & +\instbitrange{9}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{shamt[5]} & +\multicolumn{1}{|c|}{funct2} & +\multicolumn{1}{c|}{rd$'$/rs1$'$} & +\multicolumn{1}{c|}{shamt[4:0]} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 1 & 2 & 3 & 5 & 2 \\ +C.SRLI & shamt[5] & C.SRLI & dest & shamt[4:0] & C1 \\ +C.SRAI & shamt[5] & C.SRAI & dest & shamt[4:0] & C1 \\ +\end{tabular} +\end{center} + +C.SRLI is a CB-format instruction that performs a logical right shift +of the value in register {\em rd$'$} then writes the result to {\em rd$'$}. +The shift amount is encoded in the {\em shamt} field, where {\em + shamt[5]} must be zero for RV32C. For RV32C and RV64C, the shift +amount must be non-zero. For RV128C, a shift amount of zero is used +to encode a shift of 64. Furthermore, the shift amount is sign-extended +for RV128C, and so the legal shift amounts are 1--31, 64, and 96--127. +C.SRLI expands into {\tt srli rd$'$, rd$'$, shamt[5:0]}, +except for RV128C with {\tt shamt=0}, which expands to +{\tt srli rd$'$, rd$'$, 64}. + +C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic +right shift. +C.SRAI expands to {\tt srai rd$'$, rd$'$, shamt[5:0]}. + +\begin{commentary} +Left shifts are usually more frequent than right shifts, as left +shifts are frequently used to scale address values. Right shifts have +therefore been granted less encoding space and are placed in an +encoding quadrant where all other immediates are sign-extended. For +RV128, the decision was made to have the 6-bit shift-amount immediate +also be sign-extended. Apart from reducing the decode complexity, we +believe right-shift amounts of 96--127 will be more useful than 64--95, +to allow extraction of tags located in the high portions of 128-bit +address pointers. We note that RV128C will not be frozen at the same +point as RV32C and RV64C, to allow evaluation of typical usage of +128-bit address-space codes. +\end{commentary} + +\begin{center} +\begin{tabular}{S@{}W@{}Y@{}S@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbitrange{11}{10} & +\instbitrange{9}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm[5]} & +\multicolumn{1}{|c|}{funct2} & +\multicolumn{1}{c|}{rd$'$/rs1$'$} & +\multicolumn{1}{c|}{imm[4:0]} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 1 & 2 & 3 & 5 & 2 \\ +C.ANDI & imm[5] & C.ANDI & dest & imm[4:0] & C1 \\ +\end{tabular} +\end{center} + +C.ANDI is a CB-format instruction that computes the bitwise AND of +of the value in register {\em rd$'$} and the sign-extended 6-bit immediate, +then writes the result to {\em rd$'$}. +C.ANDI expands to {\tt andi rd$'$, rd$'$, imm[5:0]}. + +\subsection*{Integer Register-Register Operations} +\vspace{-0.4in} +\begin{center} +\begin{tabular}{E@{}T@{}T@{}Y} +\\ +\instbitrange{15}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct4} & +\multicolumn{1}{c|}{rd/rs1} & +\multicolumn{1}{c|}{rs2} & +\multicolumn{1}{c|}{op} \\ +\hline +4 & 5 & 5 & 2 \\ +C.MV & dest$\neq$0 & src$\neq$0 & C0 \\ +C.ADD & dest$\neq$0 & src$\neq$0 & C0 \\ +\end{tabular} +\end{center} +These instructions use the CR format. + +C.MV copies the value in register {\em rs2} into register {\em rd}. C.MV +expands into {\tt add rd, x0, rs2}. + +C.ADD adds the values in registers {\em rd} and {\em rs2} and writes the +result to register {\em rd}. C.ADD expands into {\tt add rd, rd, rs2}. + +\vspace{-0.4in} +\begin{center} +\begin{tabular}{M@{}S@{}Y@{}S@{}Y} +\\ +\instbitrange{15}{10} & +\instbitrange{9}{7} & +\instbitrange{6}{5} & +\instbitrange{4}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct6} & +\multicolumn{1}{c|}{rd$'$/rs1$'$} & +\multicolumn{1}{c|}{funct} & +\multicolumn{1}{c|}{rs2$'$} & +\multicolumn{1}{c|}{op} \\ +\hline +6 & 3 & 2 & 3 & 2 \\ +C.AND & dest & C.AND & src & C1 \\ +C.OR & dest & C.OR & src & C1 \\ +C.XOR & dest & C.XOR & src & C1 \\ +C.SUB & dest & C.SUB & src & C1 \\ +C.ADDW & dest & C.ADDW & src & C1 \\ +C.SUBW & dest & C.SUBW & src & C1 \\ +\end{tabular} +\end{center} + +These instructions use the CS format. + +C.AND computes the bitwise AND of the values in registers {\em rd$'$} +and {\em rs2$'$}, then writes the result to register {\em rd$'$}. +C.AND expands into {\tt and rd$'$, rd$'$, rs2$'$}. + +C.OR computes the bitwise OR of the values in registers {\em rd$'$} +and {\em rs2$'$}, then writes the result to register {\em rd$'$}. +C.OR expands into {\tt or rd$'$, rd$'$, rs2$'$}. + +C.XOR computes the bitwise XOR of the values in registers {\em rd$'$} +and {\em rs2$'$}, then writes the result to register {\em rd$'$}. +C.XOR expands into {\tt xor rd$'$, rd$'$, rs2$'$}. + +C.SUB subtracts the value in register {\em rs2$'$} from the value in +register {\em rd$'$}, then writes the result to register {\em rd$'$}. +C.SUB expands into {\tt sub rd$'$, rd$'$, rs2$'$}. + +C.ADDW is an RV64C/RV128C-only instruction that adds the values in +registers {\em rd$'$} and {\em rs2$'$}, then sign-extends the lower +32 bits of the sum before writing the result to register {\em rd$'$}. +C.ADDW expands into {\tt addw rd$'$, rd$'$, rs2$'$}. + +C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in +register {\em rs2$'$} from the value in register {\em rd$'$}, then +sign-extends the lower 32 bits of the difference before writing the result +to register {\em rd$'$}. C.SUBW expands into {\tt subw rd$'$, rd$'$, rs2$'$}. + +\begin{commentary} +This group of six instructions do not provide large savings +individually, but do not occupy much encoding space and are +straightforward to implement, and as a group provide a worthwhile +improvement in static and dynamic compression. +\end{commentary} + +\subsection*{Defined Illegal Instruction} +\vspace{-0.4in} +\begin{center} +\begin{tabular}{SW@{}T@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbitrange{11}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{0} & +\multicolumn{1}{c|}{0} & +\multicolumn{1}{c|}{0} & +\multicolumn{1}{c|}{0} & +\multicolumn{1}{c|}{0} \\ +\hline +3 & 1 & 5 & 5 & 2 \\ +0 & 0 & 0 & 0 & 0 \\ +\end{tabular} +\end{center} + +A 16-bit instruction with all bits zero is permanently reserved as an +illegal instruction. +\begin{commentary} +We reserve all-zero instructions to be illegal instructions to help +trap attempts to execute zero-ed or non-existent portions of the +memory space. The all-zero value should not be redefined in any +non-standard extension. Similarly, we reserve instructions with all +bits set to 1 (corresponding to very long instructions in the RISC-V +variable-length encoding scheme) as illegal to capture another common +value seen in non-existent memory regions. +\end{commentary} + +\subsection*{NOP Instruction} +\vspace{-0.4in} +\begin{center} +\begin{tabular}{SW@{}T@{}T@{}Y} +\\ +\instbitrange{15}{13} & +\multicolumn{1}{c}{\instbit{12}} & +\instbitrange{11}{7} & +\instbitrange{6}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct3} & +\multicolumn{1}{c|}{imm[5]} & +\multicolumn{1}{c|}{rd/rs1} & +\multicolumn{1}{c|}{imm[4:0]} & +\multicolumn{1}{c|}{op} \\ +\hline +3 & 1 & 5 & 5 & 2 \\ +C.NOP & 0 & 0 & 0 & C1 \\ +\end{tabular} +\end{center} + +C.NOP is a CI-format instruction that does not change any user-visible state, +except for advancing the {\tt pc}. C.NOP is encoded as {\tt c.addi x0, 0} and +so expands to {\tt addi x0, x0, 0}. + +\subsection*{Breakpoint Instruction} +\vspace{-0.4in} +\begin{center} +\begin{tabular}{E@{}U@{}Y} +\\ +\instbitrange{15}{12} & +\instbitrange{11}{2} & +\instbitrange{1}{0} \\ +\hline +\multicolumn{1}{|c|}{funct4} & +\multicolumn{1}{c|}{0} & +\multicolumn{1}{c|}{op} \\ +\hline +4 & 10 & 2 \\ +C.EBREAK & 0 & C0 \\ +\end{tabular} +\end{center} + +Debuggers can use the C.EBREAK instruction, which expands to {\tt ebreak}, +to cause control to be transferred back to the debugging environment. +C.EBREAK shares the opcode with the C.ADD instruction, but with {\em + rd} and {\em rs2} both zero, thus can also use the CR format. + +\section{Usage of C Instructions in LR/SC Sequences} + +On implementations that support the C extension, compressed forms of +the I instructions permitted inside LR/SC sequences can be used while +retaining the guarantee of eventual success, as described in +Section~\ref{lrscseq}. + +\begin{commentary} +The implication is that any implementation that claims to support both +the A and C extensions must ensure that LR/SC sequences containing +valid C instructions will eventually complete. +\end{commentary} + +\clearpage + +\section{RVC Instruction Set Listings} + +Table~\ref{rvcopcodemap} shows a map of the major opcodes for RVC. +Opcodes with the lower two bits set correspond to instructions wider +than 16 bits, including those in the base ISAs. Several instructions +are only valid for certain operands; when invalid, they are marked +either {\em RES} to indicate that the opcode is reserved for future +standard extensions; {\em NSE} to indicate that the opcode is reserved +for non-standard extensions; or {\em HINT} to indicate that the opcode +is reserved for future standard microarchitectural hints. +Instructions marked {\em HINT} must execute as no-ops on +implementations for which the hint has no effect. + +\begin{commentary} +The HINT instructions are designed to support future addition of +microarchitectural hints that might affect performance but cannot +affect architectural state. The HINT encodings have been chosen so +that simple implementations can ignore the HINT encoding and execute +the HINT as a regular operation that does not change architectural +state. For example, C.ADD is a HINT if the destination register is +{\tt x0}, where the five-bit rs2 field encodes details of the HINT. +However, a simple implementation can simply execute the HINT as an add +to register {\tt x0}, which will have no effect. +\end{commentary} + +\input{rvc-opcode-map} + +Tables~\ref{rvc-instr-table0}--\ref{rvc-instr-table2} list the RVC instructions. +\input{rvc-instr-table} |