1 files changed, 918 insertions, 0 deletions
diff --git a/src/c-st-ext.adoc b/src/c-st-ext.adoc
new file mode 100644
index 0000000..702ac9f
--- /dev/null
+++ b/src/c-st-ext.adoc
@@ -0,0 +1,918 @@
+[[compressed]]
+== "C" Extension for Compressed Instructions, Version 2.0
+
+This chapter describes the RISC-V standard compressed instruction-set
+extension, named "C", which reduces static and dynamic code size by
+adding short 16-bit instruction encodings for common operations. The C
+extension can be added to any of the base ISAs (RV32I, RV32E, RV64I, RV64E), and
+we use the generic term "RVC" to cover any of these. Typically,
+50%-60% of the RISC-V instructions in a program can be replaced with RVC
+instructions, resulting in a 25%-30% code-size reduction.
+
+=== Overview
+
+RVC uses a simple compression scheme that offers shorter 16-bit versions
+of common 32-bit RISC-V instructions when:
+
+* the immediate or address offset is small, or
+* one of the registers is the zero register (`x0`), the ABI link register
+(`x1`), or the ABI stack pointer (`x2`), or
+* the destination register and the first source register are identical, or
+* the registers used are the 8 most popular ones.
+
+The C extension is compatible with all other standard instruction
+extensions. The C extension allows 16-bit instructions to be freely
+intermixed with 32-bit instructions, with the latter now able to start
+on any 16-bit boundary, i.e., IALIGN=16. With the addition of the C
+extension, no instructions can raise instruction-address-misaligned
+exceptions.
+
+[NOTE]
+====
+Removing the 32-bit alignment constraint on the original 32-bit
+instructions allows significantly greater code density.
+====
+
+The compressed instruction encodings are mostly common across RV32C and
+RV64C, but as shown in <<rvc-instr-table0>>, a few opcodes are used for
+different purposes depending on base ISA. For example, the wider
+address-space RV64C variant requires additional opcodes to
+compress loads and stores of 64-bit integer values, while RV32C uses the
+same opcodes to compress loads and stores of single-precision
+floating-point values.
+If the C extension is implemented, the
+appropriate compressed floating-point load and store instructions must
+be provided whenever the relevant standard floating-point extension (F
+and/or D) is also implemented. In addition, RV32C includes a compressed
+jump and link instruction to compress short-range subroutine calls,
+where the same opcode is used to compress ADDIW for RV64C.
+
+[NOTE]
+====
+Double-precision loads and stores are a significant fraction of static
+and dynamic instructions, hence the motivation to include them in the
+RV32C and RV64C encoding.
+
+Although single-precision loads and stores are not a significant source
+of static or dynamic compression for benchmarks compiled for the
+currently supported ABIs, for microcontrollers that only provide
+hardware single-precision floating-point units and have an ABI that only
+supports single-precision floating-point numbers, the single-precision
+loads and stores will be used at least as frequently as double-precision
+loads and stores in the measured benchmarks. Hence, the motivation to
+provide compressed support for these in RV32C.
+
+Short-range subroutine calls are more likely in small binaries for
+microcontrollers, hence the motivation to include these in RV32C.
+
+Although reusing opcodes for different purposes for different base ISAs
+adds some complexity to documentation, the impact on implementation
+complexity is small even for designs that support multiple base ISAs.
+The compressed floating-point load and store variants use the same
+instruction format with the same register specifiers as the wider
+integer loads and stores.
+====
+
+RVC was designed under the constraint that each RVC instruction expands
+into a single 32-bit instruction in either the base ISA (RV32I/E or RV64I/E)
+or the F and D standard extensions where present. Adopting
+this constraint has two main benefits:
+
+* Hardware designs can simply expand RVC instructions during decode,
+simplifying verification and minimizing modifications to existing
+microarchitectures.
+
+* Compilers can be unaware of the RVC extension and leave code compression
+to the assembler and linker, although a compression-aware compiler will
+generally be able to produce better results.
+
+[NOTE]
+====
+We felt the multiple complexity reductions of a simple one-one mapping
+between C and base IFD instructions far outweighed the potential gains
+of a slightly denser encoding that added additional instructions only
+supported in the C extension, or that allowed encoding of multiple IFD
+instructions in one C instruction.
+====
+
+It is important to note that the C extension is not designed to be a
+stand-alone ISA, and is meant to be used alongside a base ISA.
+
+[NOTE]
+====
+Variable-length instruction sets have long been used to improve code
+density. For example, the IBM Stretch cite:[stretch], developed in the late 1950s, had
+an ISA with 32-bit and 64-bit instructions, where some of the 32-bit
+instructions were compressed versions of the full 64-bit instructions.
+Stretch also employed the concept of limiting the set of registers that
+were addressable in some of the shorter instruction formats, with short
+branch instructions that could only refer to one of the index registers.
+The later IBM 360 architecture cite:[ibm360] supported a simple variable-length
+instruction encoding with 16-bit, 32-bit, or 48-bit instruction formats.
+
+In 1963, CDC introduced the Cray-designed CDC 6600 cite:[cdc6600], a precursor to RISC
+architectures, that introduced a register-rich load-store architecture
+with instructions of two lengths, 15-bits and 30-bits. The later Cray-1
+design used a very similar instruction format, with 16-bit and 32-bit
+instruction lengths.
+
+The initial RISC ISAs from the 1980s all picked performance over code
+size, which was reasonable for a workstation environment, but not for
+embedded systems. Hence, both ARM and MIPS subsequently made versions of
+the ISAs that offered smaller code size by offering an alternative
+16-bit wide instruction set instead of the standard 32-bit wide
+instructions. The compressed RISC ISAs reduced code size relative to
+their starting points by about 25-30%, yielding code that was
+significantly smaller than 80x86. This result surprised some, as their
+intuition was that the variable-length CISC ISA should be smaller than
+RISC ISAs that offered only 16-bit and 32-bit formats.
+
+Since the original RISC ISAs did not leave sufficient opcode space free
+to include these unplanned compressed instructions, they were instead
+developed as complete new ISAs. This meant compilers needed different
+code generators for the separate compressed ISAs. The first compressed
+RISC ISA extensions (e.g., ARM Thumb and MIPS16) used only a fixed
+16-bit instruction size, which gave good reductions in static code size
+but caused an increase in dynamic instruction count, which led to lower
+performance compared to the original fixed-width 32-bit instruction
+size. This led to the development of a second generation of compressed
+RISC ISA designs with mixed 16-bit and 32-bit instruction lengths (e.g.,
+ARM Thumb2, microMIPS, PowerPC VLE), so that performance was similar to
+pure 32-bit instructions but with significant code size savings.
+Unfortunately, these different generations of compressed ISAs are
+incompatible with each other and with the original uncompressed ISA,
+leading to significant complexity in documentation, implementations, and
+software tools support.
+
+Of the commonly used 64-bit ISAs, only PowerPC and microMIPS currently
+supports a compressed instruction format. It is surprising that the most
+popular 64-bit ISA for mobile platforms (ARM v8) does not include a
+compressed instruction format given that static code size and dynamic
+instruction fetch bandwidth are important metrics. Although static code
+size is not a major concern in larger systems, instruction fetch
+bandwidth can be a major bottleneck in servers running commercial
+workloads, which often have a large instruction working set.
+
+Benefiting from 25 years of hindsight, RISC-V was designed to support
+compressed instructions from the outset, leaving enough opcode space for
+RVC to be added as a simple extension on top of the base ISA (along with
+many other extensions). The philosophy of RVC is to reduce code size for
+embedded applications _and_ to improve performance and energy-efficiency
+for all applications due to fewer misses in the instruction cache.
+Waterman shows that RVC fetches 25%-30% fewer instruction bits, which
+reduces instruction cache misses by 20%-25%, or roughly the same
+performance impact as doubling the instruction cache size. cite:[waterman-ms]
+====
+
+=== Compressed Instruction Formats
+((((compressed, formats))))
+
+<<rvc-form>> shows the nine compressed instruction
+formats. CR, CI, and CSS can use any of the 32 RVI registers, but CIW,
+CL, CS, CA, and CB are limited to just 8 of them.
+<<registers>> lists these popular registers, which
+correspond to registers `x8` to `x15`. Note that there is a separate
+version of load and store instructions that use the stack pointer as the
+base address register, since saving to and restoring from the stack are
+so prevalent, and that they use the CI and CSS formats to allow access
+to all 32 data registers. CIW supplies an 8-bit immediate for the
+ADDI4SPN instruction.
+
+[NOTE]
+====
+The RISC-V ABI was changed to make the frequently used registers map to
+registers 'x8-x15'. This simplifies the decompression decoder by
+having a contiguous naturally aligned set of register numbers, and is
+also compatible with the RV32E and RV64E base ISAs, which only have 16 integer
+registers.
+====
+Compressed register-based floating-point loads and stores also use the
+CL and CS formats respectively, with the eight registers mapping to `f8` to `f15`.
+((((calling convention, standard))))
+[NOTE]
+====
+_The standard RISC-V calling convention maps the most frequently used
+floating-point registers to registers `f8` to `f15`, which allows the
+same register decompression decoding as for integer register numbers._
+====
+((((register source specifiers, c-ext))))
+The formats were designed to keep bits for the two register source
+specifiers in the same place in all instructions, while the destination
+register field can move. When the full 5-bit destination register
+specifier is present, it is in the same place as in the 32-bit RISC-V
+encoding. Where immediates are sign-extended, the sign extension is
+always from bit 12. Immediate fields have been scrambled, as in the base
+specification, to reduce the number of immediate multiplexers required.
+[NOTE]
+====
+The immediate fields are scrambled in the instruction formats instead of
+in sequential order so that as many bits as possible are in the same
+position in every instruction, thereby simplifying implementations.
+====
+
+For many RVC instructions, zero-valued immediates are disallowed and
+`x0` is not a valid 5-bit register specifier. These restrictions free up
+encoding space for other instructions requiring fewer operand bits.
+
+//[[cr-register]]
+//include::images/wavedrom/cr-register.edn[]
+//.Compressed 16-bit RVC instructions
+//(((compressed, 16-bit)))
+
+[[rvc-form]]
+.Compressed 16-bit RVC instruction formats
+//[%header]
+[float="center",align="center",cols="1a, 2a",frame="none",grid="none"]
+|===
+|
+[%autowidth,float="right",align="right",cols="^,^",frame="none",grid="none",options="noheader"]
+!===
+!Format ! Meaning
+!CR ! Register
+!CI ! Immediate
+!CSS ! Stack-relative Store
+!CIW ! Wide Immediate
+!CL ! Load
+!CS ! Store
+!CA ! Arithmetic
+!CB ! Branch/Arithmetic
+!CJ ! Jump
+!===
+|
+[float="left",align="left",cols="1,1,1,1,1,1,1",options="noheader"]
+!===
+^!15 14 13 ^!12 ^!11 10 ^!9 8 7 ^!6 5 ^!4 3 2 ^!1 0
+2+^!funct4 2+^!rd/rs1 2+^!rs2 ^!  op
+^!funct3 ^!imm 2+^!rd/rs1  2+^!imm ^!  op
+^!funct3 3+^!imm  2+^!rs2 ^!  op
+^!funct3 4+^!imm ^!rd{prime} ^! op
+^!funct3 2+^!imm ^!rs1{prime} ^!imm ^!rd{prime} ^! op
+^!funct3 2+^!imm ^!rs1{prime} ^! imm ^!rs2{prime} ^! op
+3+^!funct6 ^!rd{prime}/rs1{prime} ^!funct2 ^!rs2{prime} ^! op
+^!funct3 2+^!offset ^!rd{prime}/rs1{prime} 2+^!offset ^! op
+^!funct3 5+^!jump target ^! op
+!===
+|===
+
+[[registers]]
+.Registers specified by the three-bit _rs1_{prime}, _rs2_{prime}, and _rd_{prime} fields of the CIW, CL, CS, CA, and CB formats.
+//[cols="20%,10%,10%,10%,10%,10%,10%,10%,10%"]
+[float="center",align="center",cols="1a, 1a",frame="none",grid="none"]
+|===
+|
+[%autowidth,cols="<",frame="none",grid="none",options="noheader"]
+!===
+!RVC Register Number
+!Integer Register Number
+!Integer Register ABI Name
+!Floating-Point Register Number
+!Floating-Point Register ABI Name
+!===
+|
+
+[%autowidth,cols="^,^,^,^,^,^,^,^",options="noheader"]
+!===
+!`000` !`001` !`010` !`011` !`100` !`101` !`110` !`111`
+!`x8` !`x9` !`x10` !`x11` !`x12` !`x13` !`x14`!`x15`
+!`s0` !`s1` !`a0` !`a1` !`a2` !`a3` !`a4`!`a5`
+!`f8` !`f9` !`f10` !`f11` !`f12` !`f13`!`f14` !`f15`
+!`fs0` !`fs1` !`fa0` !`fa1` !`fa2`!`fa3` !`fa4` !`fa5`
+!===
+|===
+
+
+=== Load and Store Instructions
+
+To increase the reach of 16-bit instructions, data-transfer instructions
+use zero-extended immediates that are scaled by the size of the data in
+bytes: ×4 for words, ×8 for double
+words, and ×16 for quad words.
+
+RVC provides two variants of loads and stores. One uses the ABI stack
+pointer, `x2`, as the base address and can target any data register. The
+other can reference one of 8 base address registers and one of 8 data
+registers.
+
+==== Stack-Pointer-Based Loads and Stores
+
+include::images/wavedrom/c-sp-load-store.edn[]
+[[c-sp-load-store]]
+//.Stack-Pointer-Based Loads and Stores--these instructions use the CI format.
+
+These instructions use the CI format.
+
+C.LWSP loads a 32-bit value from memory into register _rd_. It computes
+an effective address by adding the _zero_-extended offset, scaled by 4,
+to the stack pointer, `x2`. It expands to `lw rd, offset(x2)`. C.LWSP is
+valid only when _rd_≠`x0`; the code points with _rd_=`x0` are reserved.
+
+C.LDSP is an RV64C-only instruction that loads a 64-bit value
+from memory into register _rd_. It computes its effective address by
+adding the zero-extended offset, scaled by 8, to the stack pointer,
+`x2`. It expands to `ld rd, offset(x2)`. C.LDSP is valid only when
+_rd_≠`x0`; the code points with
+_rd_=`x0` are reserved.
+
+C.FLWSP is an RV32FC-only instruction that loads a single-precision
+floating-point value from memory into floating-point register _rd_. It
+computes its effective address by adding the _zero_-extended offset,
+scaled by 4, to the stack pointer, `x2`. It expands to
+`flw rd, offset(x2)`.
+
+C.FLDSP is an RV32DC/RV64DC-only instruction that loads a
+double-precision floating-point value from memory into floating-point
+register _rd_. It computes its effective address by adding the
+_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It
+expands to `fld rd, offset(x2)`.
+
+include::images/wavedrom/c-sp-load-store-css.edn[]
+[[c-sp-load-store-css]]
+//.Stack-Pointer-Based Loads and Stores--these instructions use the CSS format.
+
+These instructions use the CSS format.
+
+C.SWSP stores a 32-bit value in register _rs2_ to memory. It computes an
+effective address by adding the _zero_-extended offset, scaled by 4, to
+the stack pointer, `x2`. It expands to `sw rs2, offset(x2)`.
+
+C.SDSP is an RV64C-only instruction that stores a 64-bit value in
+register _rs2_ to memory. It computes an effective address by adding the
+_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It
+expands to `sd rs2, offset(x2)`.
+
+C.FSWSP is an RV32FC-only instruction that stores a single-precision
+floating-point value in floating-point register _rs2_ to memory. It
+computes an effective address by adding the _zero_-extended offset,
+scaled by 4, to the stack pointer, `x2`. It expands to
+`fsw rs2, offset(x2)`.
+
+C.FSDSP is an RV32DC/RV64DC-only instruction that stores a
+double-precision floating-point value in floating-point register _rs2_
+to memory. It computes an effective address by adding the
+_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It
+expands to `fsd rs2, offset(x2)`.
+
+[NOTE]
+====
+Register save/restore code at function entry/exit represents a
+significant portion of static code size. The stack-pointer-based
+compressed loads and stores in RVC are effective at reducing the
+save/restore static code size by a factor of 2 while improving
+performance by reducing dynamic instruction bandwidth.
+
+A common mechanism used in other ISAs to further reduce save/restore
+code size is load-multiple and store-multiple instructions. We
+considered adopting these for RISC-V but noted the following drawbacks
+to these instructions:
+
+* These instructions complicate processor implementations.
+* For virtual memory systems, some data accesses could be resident in
+physical memory and some could not, which requires a new restart
+mechanism for partially executed instructions.
+* Unlike the rest of the RVC instructions, there is no IFD equivalent to
+Load Multiple and Store Multiple.
+* Unlike the rest of the RVC instructions, the compiler would have to be aware
+of these load-multiple and store-multiple instructions to both allocate
+registers in the expected order and also to schedule the loads and
+stores contiguously and in the proper order, to maximize the chances of them
+being detected and replaced by an assembler or linker with the equivalent
+load-multiple or store-multiple compressed instruction.
+* Simple microarchitectural implementations will constrain how other
+instructions can be scheduled around the load and store multiple
+instructions, leading to a potential performance loss.
+* The desire for sequential register allocation might conflict with the
+featured registers selected for the CIW, CL, CS, CA, and CB formats.
+
+Furthermore, much of the gains can be realized in software by replacing
+prologue and epilogue code with subroutine calls to common prologue and
+epilogue code, a technique described in Section 5.6 of cite:[waterman-phd].
+
+While reasonable architects might come to different conclusions, we
+decided to omit load and store multiple and instead use the
+software-only approach of calling save/restore millicode routines to
+attain the greatest code size reduction.
+====
+
+==== Register-Based Loads and Stores
+
+[[reg-based-ldnstr]]
+include::images/wavedrom/reg-based-ldnstr.edn[]
+//.Compressed, register-based load and stores--these instructions use the CL format.
+(((compressed, register-based load and store)))
+These instructions use the CL format.
+
+C.LW loads a 32-bit value from memory into register
+`_rd′_`. It computes an effective address by adding the
+_zero_-extended offset, scaled by 4, to the base address in register
+`_rs1′_`. It expands to `lw rd′, offset(rs1′)`.
+
+C.LD is an RV64C-only instruction that loads a 64-bit value from
+memory into register `_rd′_`. It computes an effective
+address by adding the _zero_-extended offset, scaled by 8, to the base
+address in register `_rs1′_`. It expands to
+`ld rd′, offset(rs1′)`.
+
+C.FLW is an RV32FC-only instruction that loads a single-precision
+floating-point value from memory into floating-point register
+`_rd′_`. It computes an effective address by adding the
+_zero_-extended offset, scaled by 4, to the base address in register
+`_rs1′_`. It expands to
+`flw rd′, offset(rs1′)`.
+
+C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision
+floating-point value from memory into floating-point register
+`_rd′_`. It computes an effective address by adding the
+_zero_-extended offset, scaled by 8, to the base address in register
+`_rs1′_`. It expands to
+`fld rd′, offset(rs1′)`.
+
+[[c-cs-format-ls]]
+include::images/wavedrom/c-cs-format-ls.edn[]
+//.Compressed, CS format load and store--these instructions use the CS format.
+(((compressed, cs-format load and store)))
+
+These instructions use the CS format.
+
+C.SW stores a 32-bit value in register `_rs2′_` to memory.
+It computes an effective address by adding the _zero_-extended offset,
+scaled by 4, to the base address in register `_rs1′_`. It
+expands to `sw rs2′, offset(rs1′)`.
+
+C.SD is an RV64C-only instruction that stores a 64-bit value in
+register `_rs2′_` to memory. It computes an effective
+address by adding the _zero_-extended offset, scaled by 8, to the base
+address in register `_rs1′_`. It expands to
+`sd rs2′, offset(rs1′)`.
+
+C.FSW is an RV32FC-only instruction that stores a single-precision
+floating-point value in floating-point register `_rs2′_` to
+memory. It computes an effective address by adding the _zero_-extended
+offset, scaled by 4, to the base address in register
+`_rs1′_`. It expands to
+`fsw rs2′, offset(rs1′)`.
+
+C.FSD is an RV32DC/RV64DC-only instruction that stores a
+double-precision floating-point value in floating-point register
+`_rs2′_` to memory. It computes an effective address by
+adding the _zero_-extended offset, scaled by 8, to the base address in
+register `_rs1′_`. It expands to
+`fsd rs2′, offset(rs1′)`.
+
+=== Control Transfer Instructions
+
+RVC provides unconditional jump instructions and conditional branch
+instructions. As with base RVI instructions, the offsets of all RVC
+control transfer instructions are in multiples of 2 bytes.
+
+[[c-cj-format-ls]]
+include::images/wavedrom/c-cj-format-ls.edn[]
+//.Compressed, CJ format load and store--these instructions use the CJ format.
+(((compressed, cj-format load and store)))
+
+These instructions use the CJ format.
+
+C.J performs an unconditional control transfer. The offset is
+sign-extended and added to the `pc` to form the jump target address. C.J
+can therefore target a {pm}2 KiB range. C.J expands to
+`jal x0, offset`.
+
+C.JAL is an RV32C-only instruction that performs the same operation as
+C.J, but additionally writes the address of the instruction following
+the jump (`pc+2`) to the link register, `x1`. C.JAL expands to
+`jal x1, offset`.
+
+[[c-cr-format-ls]]
+include::images/wavedrom/c-cr-format-ls.edn[]
+//.Compressed, CR format load and store--these instructions use the CR format.
+(((compressed, cr-format load and store)))
+
+These instructions use the CR format.
+
+C.JR (jump register) performs an unconditional control transfer to the
+address in register _rs1_. C.JR expands to `jalr x0, 0(rs1)`. C.JR is
+valid only when _rs1_≠`x0`; the code
+point with _rs1_=`x0` is reserved.
+
+C.JALR (jump and link register) performs the same operation as C.JR, but
+additionally writes the address of the instruction following the jump
+(`pc`+2) to the link register, `x1`. C.JALR expands to
+`jalr x1, 0(rs1)`. C.JALR is valid only when
+_rs1_≠`x0`; the code point with
+_rs1_=`x0` corresponds to the C.EBREAK
+instruction.
+
+[NOTE]
+====
+Strictly speaking, C.JALR does not expand exactly to a base RVI
+instruction as the value added to the PC to form the link address is 2
+rather than 4 as in the base ISA, but supporting both offsets of 2 and 4
+bytes is only a very minor change to the base microarchitecture.
+====
+
+[[c-cb-format-ls]]
+include::images/wavedrom/c-cb-format-ls.edn[]
+//.Compressed, CB format load and store--these instructions use the CB format.
+(((compressed, cb-format load and store)))
+
+These instructions use the CB format.
+
+C.BEQZ performs conditional control transfers. The offset is
+sign-extended and added to the `pc` to form the branch target address.
+It can therefore target a {pm}256 B range. C.BEQZ takes the
+branch if the value in register _rs1′_ is zero. It
+expands to `beq rs1′, x0, offset`.
+
+C.BNEZ is defined analogously, but it takes the branch if
+_rs1′_ contains a nonzero value. It expands to
+`bne rs1′, x0, offset`.
+
+=== Integer Computational Instructions
+
+RVC provides several instructions for integer arithmetic and constant
+generation.
+
+==== Integer Constant-Generation Instructions
+
+The two constant-generation instructions both use the CI instruction
+format and can target any integer register.
+
+[[c-integer-const-gen]]
+include::images/wavedrom/c-integer-const-gen.edn[]
+//.Integer constant generation format.
+(((compressed, integer constant generation)))
+
+
+C.LI loads the sign-extended 6-bit immediate, _imm_, into register _rd_.
+C.LI expands into `addi rd, x0, imm`.
+The C.LI code points with _rd_=`x0` are HINTs.
+
+C.LUI loads the non-zero 6-bit immediate field into bits 17–12 of the
+destination register, clears the bottom 12 bits, and sign-extends bit 17
+into all higher bits of the destination. C.LUI expands into
+`lui rd, imm`. C.LUI is valid only when
+_rd_≠`x2`,
+and when the immediate is not equal to zero. The code points with
+_imm_=0 are reserved.
+The code points with _rd_=`x2` and _imm_≠0 correspond to the
+C.ADDI16SP instruction.
+The code points with _rd_=`x0` and _imm_≠0 are HINTs.
+
+==== Integer Register-Immediate Operations
+
+These integer register-immediate operations are encoded in the CI format
+and perform operations on an integer register and a 6-bit immediate.
+
+[[c-integer-register-immediate]]
+include::images/wavedrom/c-int-reg-immed.edn[]
+//.Integer register-immediate format.
+(((compressed, integer register-immediate)))
+
+C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in
+register _rd_ then writes the result to _rd_. C.ADDI expands into
+`addi rd, rd, imm`.
+The code points with _rd_≠0 and _imm_=0 are HINTs.
+The code points with _rd_=`x0` encode the C.NOP instruction, of
+which the code points with _imm_≠0 are HINTs.
+
+
+C.ADDIW is an RV64C-only instruction that performs the same
+computation but produces a 32-bit result, then sign-extends result to 64
+bits. C.ADDIW expands into `addiw rd, rd, imm`. The immediate can be
+zero for C.ADDIW, where this corresponds to `sext.w rd`. C.ADDIW is
+valid only when _rd_≠`x0`; the code points with
+_rd_=`x0` are reserved.
+
+C.ADDI16SP (add immediate to stack pointer)
+shares the opcode with C.LUI, but has a destination field of
+`x2`. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to the
+value in the stack pointer (`sp=x2`), where the immediate is scaled to
+represent multiples of 16 in the range [-512, 496]. C.ADDI16SP is used to
+adjust the stack pointer in procedure prologues and epilogues. It
+expands into `addi x2, x2, nzimm[9:4]`. C.ADDI16SP is valid only when
+_nzimm_≠0; the code point with _nzimm_=0 is reserved.
+
+[NOTE]
+====
+In the standard RISC-V calling convention, the stack pointer `sp` is
+always 16-byte aligned.
+====
+
+[[c-ciw]]
+include::images/wavedrom/c-ciw.edn[]
+//.CIW format.
+(((compressed, CIW)))
+C.ADDI4SPN (add immediate to stack pointer, non-destructive)
+is a CIW-format instruction that adds a _zero_-extended
+non-zero immediate, scaled by 4, to the stack pointer, `x2`, and writes
+the result to `rd′`. This instruction is used to generate
+pointers to stack-allocated variables, and expands to
+`addi rd′, x2, nzuimm[9:2]`. C.ADDI4SPN is valid only when
+_nzuimm_≠0; the code points with _nzuimm_=0 are
+reserved.
+
+[[c-ci]]
+include::images/wavedrom/c-ci.edn[]
+//.CI format.
+(((compressed, CI)))
+
+C.SLLI is a CI-format instruction that performs a logical left shift of
+the value in register _rd_ then writes the result to _rd_. The shift
+amount is encoded in the _shamt_ field.
+C.SLLI expands into `slli rd, rd, shamt[5:0]`.
+
+The C.SLLI code points with _shamt_=0 or with _rd_=`x0` are HINTs.
+
+For RV32C, _shamt[5]_ must be zero; the code points with _shamt[5]_=1
+are designated for custom extensions.
+
+[[c-srli-srai]]
+
+include::images/wavedrom/c-srli-srai.edn[]
+//.C-SRLI-SRAI format.
+(((compressed, C.SRLI, C.SRAI)))
+
+C.SRLI is a CB-format instruction that performs a logical right shift of
+the value in register _rd′_ then writes the result to
+_rd′_. The shift amount is encoded in the _shamt_ field.
+C.SRLI expands into `srli rd′, rd′, shamt`.
+
+The C.SRLI code points with _shamt_=0 are HINTs.
+
+For RV32C, _shamt[5]_ must be zero; the code points with _shamt[5]_=1
+are designated for custom extensions.
+
+C.SRAI is defined analogously to C.SRLI, but instead performs an
+arithmetic right shift. C.SRAI expands to
+`srai rd′, rd′, shamt`.
+
+[NOTE]
+====
+Left shifts are usually more frequent than right shifts, as left shifts
+are frequently used to scale address values. Right shifts have therefore
+been granted less encoding space and are placed in an encoding quadrant
+where all other immediates are sign-extended.
+====
+[[c-andi]]
+include::images/wavedrom/c-andi.edn[]
+//.C.ANDI format
+(((compressed, C.ANDI)))
+
+C.ANDI is a CB-format instruction that computes the bitwise AND of the
+value in register _rd′_ and the sign-extended 6-bit
+immediate, then writes the result to _rd′_. C.ANDI
+expands to `andi rd′, rd′, imm`.
+
+==== Integer Register-Register Operations
+
+[[c-cr]]
+include::images/wavedrom/c-int-reg-to-reg-cr-format.edn[]
+//C.CR format
+((((compressed. C.CR))))
+These instructions use the CR format.
+
+C.MV copies the value in register _rs2_ into register _rd_. C.MV expands
+into `add rd, x0, rs2`. C.MV is valid only when
+_rs2_≠`x0`; the code points with _rs2_=`x0` correspond to the C.JR instruction. The code points with _rs2_≠`x0` and _rd_=`x0` are HINTs.
+
+[NOTE]
+====
+_C.MV expands to a different instruction than the canonical MV
+pseudoinstruction, which instead uses ADDI. Implementations that handle
+MV specially, e.g. using register-renaming hardware, may find it more
+convenient to expand C.MV to MV instead of ADD, at slight additional
+hardware cost._
+====
+
+C.ADD adds the values in registers _rd_ and _rs2_ and writes the result
+to register _rd_. C.ADD expands into `add rd, rd, rs2`. C.ADD is only
+valid when _rs2_≠`x0`; the code points with _rs2_=`x0` correspond to the C.JALR
+and C.EBREAK instructions. The code points with _rs2_≠`x0` and _rd_=`x0` are HINTs.
+
+[[c-ca]]
+include::images/wavedrom/c-int-reg-to-reg-ca-format.edn[]
+//C.CA format
+((((compressed. C.CA))))
+
+These instructions use the CA format.
+
+`C.AND` computes the bitwise `AND` of the values in registers
+_rd′_ and _rs2′_, then writes the result
+to register _rd′_. `C.AND` expands into
+`and rd′, rd′, rs2′`.
+
+`C.OR` computes the bitwise `OR` of the values in registers
+_rd′_ and _rs2′_, then writes the result
+to register _rd′_. `C.OR` expands into
+`or rd′, rd′, rs2′`.
+
+`C.XOR` computes the bitwise `XOR` of the values in registers
+_rd′_ and _rs2′_, then writes the result
+to register _rd′_. `C.XOR` expands into
+`xor rd′, rd′, rs2′`.
+
+`C.SUB` subtracts the value in register _rs2′_ from the
+value in register _rd′_, then writes the result to
+register _rd′_. `C.SUB` expands into
+`sub rd′, rd′, rs2′`.
+
+`C.ADDW` is an RV64C-only instruction that adds the values in
+registers _rd′_ and _rs2′_, then
+sign-extends the lower 32 bits of the sum before writing the result to
+register _rd′_. `C.ADDW` expands into
+`addw rd′, rd′, rs2′`.
+
+`C.SUBW` is an RV64C-only instruction that subtracts the value in
+register _rs2′_ from the value in register
+_rd′_, then sign-extends the lower 32 bits of the
+difference before writing the result to register _rd′_.
+`C.SUBW` expands into `subw rd′, rd′, rs2′`.
+
+[NOTE]
+====
+This group of six instructions do not provide large savings
+individually, but do not occupy much encoding space and are
+straightforward to implement, and as a group provide a worthwhile
+improvement in static and dynamic compression.
+====
+
+==== Defined Illegal Instruction
+
+[[c-def-illegal-inst]]
+include::images/wavedrom/c-def-illegal-inst.edn[]
+((((compressed. C.DIINST))))
+
+A 16-bit instruction with all bits zero is permanently reserved as an
+illegal instruction.
+
+[NOTE]
+====
+We reserve all-zero instructions to be illegal instructions to help trap
+attempts to execute zero-ed or non-existent portions of the memory
+space. The all-zero value should not be redefined in any non-standard
+extension. Similarly, we reserve instructions with all bits set to 1
+(corresponding to very long instructions in the RISC-V variable-length
+encoding scheme) as illegal to capture another common value seen in
+non-existent memory regions.
+====
+
+==== NOP Instruction
+
+[[c-nop-instr]]
+include::images/wavedrom/c-nop-instr.edn[]
+((((compressed. C.NOPINSTR))))
+
+`C.NOP` is a CI-format instruction that does not change any user-visible
+state, except for advancing the `pc` and incrementing any applicable
+performance counters. `C.NOP` expands to `nop`. The `C.NOP` code points
+with _imm_≠0 encode HINTs.
+
+==== Breakpoint Instruction
+
+[[c-breakpoint-instr]]
+include::images/wavedrom/c-breakpoint-instr.edn[]
+((((compressed. C.BREAKPOINTINSTR))))
+
+Debuggers can use the `C.EBREAK` instruction, which expands to `ebreak`,
+to cause control to be transferred back to the debugging environment.
+`C.EBREAK` shares the opcode with the `C.ADD` instruction, but with _rd_ and
+_rs2_ both zero, thus can also use the `CR` format.
+
+=== Usage of C Instructions in LR/SC Sequences
+
+On implementations that support the C extension, compressed forms of the
+I instructions permitted inside constrained LR/SC sequences, as
+described in <<sec:lrscseq>>, are also permitted
+inside constrained LR/SC sequences.
+
+[NOTE]
+====
+The implication is that any implementation that claims to support both
+the A and C extensions must ensure that LR/SC sequences containing valid
+C instructions will eventually complete.
+====
+
+[[rvc-hints]]
+=== HINT Instructions
+
+A portion of the RVC encoding space is reserved for microarchitectural
+HINTs. Like the HINTs in the RV32I base ISA (see
+<<rv32i-hints>>), these instructions do not
+modify any architectural state, except for advancing the `pc` and any
+applicable performance counters. HINTs are executed as no-ops on
+implementations that ignore them.
+
+RVC HINTs are encoded as computational instructions that do not modify
+the architectural state, either because _rd_=`x0` (e.g.
+`C.ADD _x0_, _t0_`), or because _rd_ is overwritten with a copy of itself
+(e.g. `C.ADDI _t0_, 0`).
+
+[NOTE]
+====
+This HINT encoding has been chosen so that simple implementations can
+ignore HINTs altogether, and instead execute a HINT as a regular
+computational instruction that happens not to mutate the architectural
+state.
+====
+
+RVC HINTs do not necessarily expand to their RVI HINT counterparts. For
+example, `C.ADD` _x0_, _a0_ might not encode the same HINT as
+`ADD` _x0_, _x0_, _a0_.
+
+[NOTE]
+====
+The primary reason to not require an RVC HINT to expand to an RVI HINT
+is that HINTs are unlikely to be compressible in the same manner as the
+underlying computational instruction. Also, decoupling the RVC and RVI
+HINT mappings allows the scarce RVC HINT space to be allocated to the
+most popular HINTs, and in particular, to HINTs that are amenable to
+macro-op fusion.
+====
+
+<<rvc-t-hints>> lists all RVC HINT code points. For RV32C, 78%
+of the HINT space is reserved for standard HINTs. The remainder of the HINT space is designated for custom HINTs;
+no standard HINTs will ever be defined in this subspace.
+
+[[rvc-t-hints]]
+.RVC HINT instructions.
+[cols="<,<,>,<",options="header",]
+|===
+|Instruction |Constraints |Code Points |Purpose
+
+|C.NOP |_imm_≠0 |63 .6+.^|_Designated for future standard use_
+
+|C.ADDI | _rd_≠`x0`, _imm_=0 |31
+
+|C.LI | _rd_=`x0` |64
+
+|C.LUI | _rd_=`x0`, _imm_≠0 |63
+
+|C.MV | _rd_=`x0`, _rs2_≠`x0` |31
+
+|C.ADD | _rd_=`x0`, _rs2_≠`x0`, _rs2_≠`x2-x5` | 27
+
+|C.ADD | _rd_=`x0`, _rs2_=`x2-x5` |4|(rs2=x2) C.NTL.P1 (rs2=x3) C.NTL.PALL (rs2=x4) C.NTL.S1 (rs2=x5) C.NTL.ALL
+
+|C.SLLI |_rd_=`x0` or _imm_=0 |63 (RV32), 95 (RV64)  .3+.^|_Designated for custom use_
+
+|C.SRLI | _imm_=0 |8
+
+|C.SRAI | _imm_=0 |8
+|===
+
+=== RVC Instruction Set Listings
+
+<<rvcopcodemap>> shows a map of the major
+opcodes for RVC. Each row of the table corresponds to one quadrant of
+the encoding space. The last quadrant, which has the two
+least-significant bits set, corresponds to instructions wider than 16
+bits, including those in the base ISAs. Several instructions are only
+valid for certain operands; when invalid, they are marked either _RES_
+to indicate that the opcode is reserved for future standard extensions;
+_Custom_ to indicate that the opcode is designated for custom
+extensions; or _HINT_ to indicate that the opcode is reserved for
+microarchitectural hints (see <<rvc-hints>>).
+
+<<<
+
+[[rvcopcodemap]]
+.RVC opcode map instructions.
+[%autowidth,float="center",align="center",cols=">,^,^,^,^,^,^,^,^,^,<]
+|===
+2+>|inst[15:13] +
+inst[1:0] ^.^s|000 ^.^s|001 ^.^s|010 ^.^s|011 ^.^s|100 ^.^s|101 ^.^s|110 ^.^s|111 |
+
+2+>.^|00 .^|ADDI4SPN ^.^|FLD +
+FLD ^.^| LW ^.^| FLW +
+LD ^.^| _Reserved_ ^.^| FSD +
+FSD ^.^| SW ^.^| FSW +
+SD
+^.^| RV32 +
+RV64
+
+2+>.^|01 ^.^|ADDI ^.^|JAL +
+ADDIW ^.^|LI ^.^|LUI/ADDI16SP ^.^|MISC-ALU ^.^|J ^.^|BEQZ ^.^|BNEZ ^.^|RV32 +
+RV64
+
+2+>.^|10 ^.^|SLLI ^.^|FLDSP +
+FLDSP ^.^|LWSP ^.^|FLWSP +
+LDSP ^.^|J[AL]R/MV/ADD ^.^|FSDSP +
+FSDSP ^.^|SWSP ^.^|FSWSP +
+SDSP ^.^|RV32 +
+RV64
+
+2+>.^|11 9+^|>16b
+|===
+
+<<rvc-instr-table0>>, <<rvc-instr-table1>>, and <<rvc-instr-table2>> list the RVC instructions.
+
+[[rvc-instr-table0]]
+.Instruction listing for RVC, Quadrant 0
+include::images/bytefield/rvc-instr-quad0.edn[]
+
+[[rvc-instr-table1]]
+.Instruction listing for RVC, Quadrant 1
+include::images/bytefield/rvc-instr-quad1.edn[]
+
+[[rvc-instr-table2]]
+.Instruction listing for RVC, Quadrant 2
+include::images/bytefield/rvc-instr-quad2.edn[]