diff options
author | elisa <elisa@riscv.org> | 2021-10-05 15:06:55 -0700 |
---|---|---|
committer | elisa <elisa@riscv.org> | 2021-10-05 15:06:55 -0700 |
commit | 5921e762efd81682f720130a0d72ce8a1a0da16e (patch) | |
tree | 98f1ec7488a804b717b9d2c5e323791834e76739 | |
parent | c6ae16c883f6b937c9696c427f046bfc9f8b25f6 (diff) | |
download | riscv-isa-manual-5921e762efd81682f720130a0d72ce8a1a0da16e.zip riscv-isa-manual-5921e762efd81682f720130a0d72ce8a1a0da16e.tar.gz riscv-isa-manual-5921e762efd81682f720130a0d72ce8a1a0da16e.tar.bz2 |
adoc formatting and table fixes for intro, a, c, counters, m, rvmo chapters
-rw-r--r-- | src/a-st-ext.adoc | 21 | ||||
-rw-r--r-- | src/c-st-ext.adoc | 200 | ||||
-rw-r--r-- | src/counters.adoc | 36 | ||||
-rw-r--r-- | src/intro.adoc | 2 | ||||
-rw-r--r-- | src/m-st-ext.adoc | 21 | ||||
-rw-r--r-- | src/riscv-isa-unpr-conv-review.pdf | bin | 7444541 -> 6729005 bytes | |||
-rw-r--r-- | src/rvwmo.adoc | 686 |
7 files changed, 477 insertions, 489 deletions
diff --git a/src/a-st-ext.adoc b/src/a-st-ext.adoc index 7468009..34cdb2c 100644 --- a/src/a-st-ext.adoc +++ b/src/a-st-ext.adoc @@ -1,7 +1,7 @@ [[atomics]] -== `A` Standard Extension for Atomic Instructions, Version 2.1 +== A Standard Extension for Atomic Instructions, Version 2.1 -The standard atomic-instruction extension, named `A`, contains +The standard atomic-instruction extension, named A, contains instructions that atomically read-modify-write memory to support synchronization between multiple RISC-V harts running in the same memory space. The two forms of atomic instruction provided are @@ -111,7 +111,7 @@ software should only assume the failure code will be non-zero. [NOTE] ==== -We reserve a failure code of 1 to mean `unspecified` so that simple +We reserve a failure code of 1 to mean *unspecified* so that simple implementations may return this value using the existing mux required for the SLT/SLTU instructions. More specific failure codes might be defined in future versions or extensions to the ISA. @@ -223,6 +223,7 @@ instruction unless the _rl_ bit is also set. LR._rl_ and SC._aq_ instructions are not guaranteed to provide any stronger ordering than those with both bits clear, but may result in lower performance. +.Sample code for compare-and-swap function using LR/SC. .... # a0 holds address of memory location # a1 holds expected value @@ -256,10 +257,10 @@ sequence in the case of failure, and must comprise at most 16 instructions placed sequentially in memory. * An LR/SC sequence begins with an LR instruction and ends with an SC instruction. The dynamic code executed between the LR and SC -instructions can only contain instructions from the base `I` +instructions can only contain instructions from the base _I_ instruction set, excluding loads, stores, backward jumps, taken backward -branches, JALR, FENCE, and SYSTEM instructions. If the `C` extension -is supported, then compressed forms of the aforementioned `I` +branches, JALR, FENCE, and SYSTEM instructions. If the _C_ extension +is supported, then compressed forms of the aforementioned _I_ instructions are also permitted. * The code to retry a failing LR/SC sequence can contain backwards jumps and/or branches to repeat the LR/SC sequence, but otherwise has the same @@ -371,7 +372,7 @@ is not naturally aligned, an address-misaligned exception or an access-fault exception will be generated. The access-fault exception can be generated for a memory access that would otherwise be able to complete except for the misalignment, if the misaligned access should -not be emulated. The ``Zam`` extension, described in +not be emulated. The _Zam_ extension, described in <<zam>>, relaxes this requirement and specifies the semantics of misaligned AMOs. @@ -379,7 +380,7 @@ The operations supported are swap, integer add, bitwise AND, bitwise OR, bitwise XOR, and signed and unsigned integer maximum and minimum. Without ordering constraints, these AMOs can be used to implement parallel reduction operations, where typically the return value would be -discarded by writing to `x0`. +discarded by writing to _x0_. [NOTE] ==== @@ -388,7 +389,7 @@ parallel systems better than LR/SC or CAS. A simple microarchitecture can implement AMOs using the LR/SC primitives, provided the implementation can guarantee the AMO eventually completes. More complex implementations might also implement AMOs at memory controllers, and can -optimize away fetching the original value when the destination is `x0`. +optimize away fetching the original value when the destination is *x0*. The set of AMOs was chosen to support the C11/C++11 atomic memory operations efficiently, and also to support parallel reductions in @@ -444,7 +445,7 @@ acquire and release to simplify the implementation of speculative lock elision cite:[Rajwar:2001:SLE]. ==== -The instructions in the `A` extension can also be used to provide +The instructions in the _A_ extension can also be used to provide sequentially consistent loads and stores. A sequentially consistent load can be implemented as an LR with both _aq_ and _rl_ set. A sequentially consistent store can be implemented as an AMOSWAP that writes the old diff --git a/src/c-st-ext.adoc b/src/c-st-ext.adoc index 79904d8..f750a33 100644 --- a/src/c-st-ext.adoc +++ b/src/c-st-ext.adoc @@ -1,11 +1,11 @@ [[compressed]] -== `C` Standard Extension for Compressed Instructions, Version 2.0 +== C Standard Extension for Compressed Instructions, Version 2.0 This chapter describes the current proposal for the RISC-V standard -compressed instruction-set extension, named `C`, which reduces static +compressed instruction-set extension, named _C_, which reduces static and dynamic code size by adding short 16-bit instruction encodings for common operations. The C extension can be added to any of the base ISAs -(RV32, RV64, RV128), and we use the generic term `RVC` to cover any of +(RV32, RV64, RV128), and we use the generic term _RVC_ to cover any of these. Typically, 50%–60% of the RISC-V instructions in a program can be replaced with RVC instructions, resulting in a 25%–30% code-size reduction. @@ -17,8 +17,8 @@ of common 32-bit RISC-V instructions when: * the immediate or address offset is small, or -* one of the registers is the zero register (`x0`), the ABI link register -(`x1`), or the ABI stack pointer (`x2`), or +* one of the registers is the zero register (_x0_), the ABI link register +(_x1_), or the ABI stack pointer (_x2_), or * the destination register and the first source register are identical, or @@ -177,7 +177,7 @@ Table <<rvcopcodemap>> shows the nine compressed instruction formats. CR, CI, and CSS can use any of the 32 RVI registers, but CIW, CL, CS, CA, and CB are limited to just 8 of them. Table <<registers>> lists these popular registers, which -correspond to registers `x8` to `x15`. Note that there is a separate +correspond to registers _x8_ to _x15_. Note that there is a separate version of load and store instructions that use the stack pointer as the base address register, since saving to and restoring from the stack are so prevalent, and that they use the CI and CSS formats to allow access @@ -187,7 +187,7 @@ ADDI4SPN instruction. [NOTE] ==== The RISC-V ABI was changed to make the frequently used registers map to -registers `x8`–`x15`. This simplifies the decompression decoder by +registers *x8*–*x15*. This simplifies the decompression decoder by having a contiguous naturally aligned set of register numbers, and is also compatible with the RV32E base ISA, which only has 16 integer registers. @@ -195,14 +195,14 @@ registers. (((compressed, loads and stores))) Compressed register-based floating-point loads and stores also use the -CL and CS formats respectively, with the eight registers mapping to `f8` -to `f15`. +CL and CS formats respectively, with the eight registers mapping to _f8_ +to _f15_. (((calling convention, standard))) [NOTE] ==== The standard RISC-V calling convention maps the most frequently used -floating-point registers to registers `f8` to `f15`, which allows the +floating-point registers to registers *f8* to *f15*, which allows the same register decompression decoding as for integer register numbers. ==== (((register source specifiers, c-ext))) @@ -223,7 +223,7 @@ position in every instruction, thereby simplifying implementations. ==== For many RVC instructions, zero-valued immediates are disallowed and -`x0` is not a valid 5-bit register specifier. These restrictions free up +_x0_ is not a valid 5-bit register specifier. These restrictions free up encoding space for other instructions requiring fewer operand bits. //include::images/wavedrom/cr-register.adoc[] @@ -267,13 +267,13 @@ CS, CA, and CB formats. |=== |RVC Register Number |000 |001 |010 |011 |100 |101 |110 |111 -|Integer Register Number |`x8` |`x9` |`x10` |`x11` |`x12` |`x13` |`x14`|`x15` +|Integer Register Number |_x8_ |_x9_ |_x10_ |_x11_ |_x12_ |_x13_ |_x14_|_x15_ -|Integer Register ABI Name |`s0` |`s1` |`a0` |`a1` |`a2` |`a3` |`a4`|`a5` +|Integer Register ABI Name |_s0_ |_s1_ |_a0_ |_a1_ |_a2_ |_a3_ |_a4_|_a5_ -|Floating-Point Register Number |`f8` |`f9` |`f10` |`f11` |`f12` |`f13`|`f14` |`f15` +|Floating-Point Register Number |_f8_ |_f9_ |_f10_ |_f11_ |_f12_ |_f13_|_f14_ |_f15_ -|Floating-Point Register ABI Name |`fs0` |`fs1` |`fa0` |`fa1` |`fa2`|`fa3` |`fa4` |`fa5` +|Floating-Point Register ABI Name |_fs0_ |_fs1_ |_fa0_ |_fa1_ |_fa2_|_fa3_ |_fa4_ |_fa5_ |=== @@ -285,7 +285,7 @@ bytes: latexmath:[$\times$]4 for words, latexmath:[$\times$]8 for double words, and latexmath:[$\times$]16 for quad words. RVC provides two variants of loads and stores. One uses the ABI stack -pointer, `x2`, as the base address and can target any data register. The +pointer, _x2_, as the base address and can target any data register. The other can reference one of 8 base address registers and one of 8 data registers. @@ -300,7 +300,7 @@ These instructions use the CI format. C.LWSP loads a 32-bit value from memory into register _rd_. It computes an effective address by adding the _zero_-extended offset, scaled by 4, -to the stack pointer, `x2`. It expands to `lw rd, offset(x2)`. C.LWSP is +to the stack pointer, _x2_. It expands to _lw rd, offset(x2)_. C.LWSP is only valid when latexmath:[$\textit{rd}{\neq}\texttt{x0}$]; the code points with latexmath:[$\textit{rd}{=}\texttt{x0}$] are reserved. (((operations, memory))) @@ -308,28 +308,28 @@ points with latexmath:[$\textit{rd}{=}\texttt{x0}$] are reserved. C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register _rd_. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, -`x2`. It expands to `ld rd, offset(x2)`. C.LDSP is only valid when +_x2_. It expands to _ld rd, offset(x2)_. C.LDSP is only valid when latexmath:[$\textit{rd}{\neq}\texttt{x0}$]; the code points with latexmath:[$\textit{rd}{=}\texttt{x0}$] are reserved. C.LQSP is an RV128C-only instruction that loads a 128-bit value from memory into register _rd_. It computes its effective address by adding -the zero-extended offset, scaled by 16, to the stack pointer, `x2`. It -expands to `lq rd, offset(x2)`. C.LQSP is only valid when +the zero-extended offset, scaled by 16, to the stack pointer, _x2_. It +expands to _lq rd, offset(x2)_. C.LQSP is only valid when latexmath:[$\textit{rd}{\neq}\texttt{x0}$]; the code points with latexmath:[$\textit{rd}{=}\texttt{x0}$] are reserved. C.FLWSP is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register _rd_. It computes its effective address by adding the _zero_-extended offset, -scaled by 4, to the stack pointer, `x2`. It expands to -`flw rd, offset(x2)`. +scaled by 4, to the stack pointer, _x2_. It expands to +_flw rd, offset(x2)_. C.FLDSP is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register _rd_. It computes its effective address by adding the -_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It -expands to `fld rd, offset(x2)`. +_zero_-extended offset, scaled by 8, to the stack pointer, _x2_. It +expands to _fld rd, offset(x2)_. include::images/wavedrom/sp-base-ls-2.adoc[] [sp-base-ls-2] @@ -340,29 +340,29 @@ These instructions use the CSS format. C.SWSP stores a 32-bit value in register _rs2_ to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 4, to -the stack pointer, `x2`. It expands to `sw rs2, offset(x2)`. +the stack pointer, _x2_. It expands to _sw rs2, offset(x2)_. C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in register _rs2_ to memory. It computes an effective address by adding the -_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It -expands to `sd rs2, offset(x2)`. +_zero_-extended offset, scaled by 8, to the stack pointer, _x2_. It +expands to _sd rs2, offset(x2)_. C.SQSP is an RV128C-only instruction that stores a 128-bit value in register _rs2_ to memory. It computes an effective address by adding the -_zero_-extended offset, scaled by 16, to the stack pointer, `x2`. It -expands to `sq rs2, offset(x2)`. +_zero_-extended offset, scaled by 16, to the stack pointer, _x2_. It +expands to _sq rs2, offset(x2)_. C.FSWSP is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register _rs2_ to memory. It computes an effective address by adding the _zero_-extended offset, -scaled by 4, to the stack pointer, `x2`. It expands to -`fsw rs2, offset(x2)`. +scaled by 4, to the stack pointer, _x2_. It expands to +_fsw rs2, offset(x2)_. C.FSDSP is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register _rs2_ to memory. It computes an effective address by adding the -_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It -expands to `fsd rs2, offset(x2)`. +_zero_-extended offset, scaled by 8, to the stack pointer, _x2_. It +expands to _fsd rs2, offset(x2)_. [NOTE] ==== @@ -422,31 +422,31 @@ These instructions use the CL format. C.LW loads a 32-bit value from memory into register _rd latexmath:[$'$]_. It computes an effective address by adding the _zero_-extended offset, scaled by 4, to the base address in register -_rs1 latexmath:[$'$]_. It expands to `lw rd, offset(rs1)`. +_rs1 latexmath:[$'$]_. It expands to _lw rd, offset(rs1)_. C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from memory into register _rd latexmath:[$'$]_. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the base address in register _rs1 latexmath:[$'$]_. It expands to -`ld rd', offset(rs1')`. +_ld rd', offset(rs1')_. C.LQ is an RV128C-only instruction that loads a 128-bit value from memory into register _rd latexmath:[$'$]_. It computes an effective address by adding the _zero_-extended offset, scaled by 16, to the base address in register _rs1 latexmath:[$'$]_. It expands to -`lq rd, offset(rs1)`. +_lq rd, offset(rs1)_. C.FLW is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register _rd latexmath:[$'$]_. It computes an effective address by adding the _zero_-extended offset, scaled by 4, to the base address in register -_rs1 latexmath:[$'$]_. It expands to `flw rd, offset(rs1)`. +_rs1 latexmath:[$'$]_. It expands to _flw rd, offset(rs1)_. C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register _rd latexmath:[$'$]_. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the base address in register -_rs1 latexmath:[$'$]_. It expands to `fld rd, offset(rs1)`. +_rs1 latexmath:[$'$]_. It expands to _fld rd, offset(rs1)_. S@S@S@Y@S@Y + & & & & & + @@ -464,32 +464,32 @@ These instructions use the CS format. C.SW stores a 32-bit value in register _rs2 latexmath:[$'$]_ to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 4, to the base address in register _rs1 latexmath:[$'$]_. It -expands to `sw rs2, offset(rs1)`. +expands to _sw rs2, offset(rs1)_. C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in register _rs2 latexmath:[$'$]_ to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the base address in register _rs1 latexmath:[$'$]_. It expands to -`sd rs2, offset(rs1)`. +_sd rs2, offset(rs1)_. C.SQ is an RV128C-only instruction that stores a 128-bit value in register _rs2 latexmath:[$'$]_ to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 16, to the base address in register _rs1 latexmath:[$'$]_. It expands to -`sq rs2, offset(rs1)`. +_sq rs2, offset(rs1)_. C.FSW is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register _rs2 latexmath:[$'$]_ to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 4, to the base address in register -_rs1 latexmath:[$'$]_. It expands to `fsw rs2, offset(rs1)`. +_rs1 latexmath:[$'$]_. It expands to _fsw rs2, offset(rs1)_. C.FSD is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register _rs2 latexmath:[$'$]_ to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the base address in register _rs1 latexmath:[$'$]_. It expands to -`fsd rs2, offset(rs1)`. +_fsd rs2, offset(rs1)_. === Control Transfer Instructions @@ -511,14 +511,14 @@ offset[11latexmath:[$\vert$]4latexmath:[$\vert$]9:8latexmath:[$\vert$]10latexmat These instructions use the CJ format. C.J performs an unconditional control transfer. The offset is -sign-extended and added to the `pc` to form the jump target address. C.J +sign-extended and added to the _pc_ to form the jump target address. C.J can therefore target a latexmath:[$\pm$] range. C.J expands to -`jal x0, offset`. +_jal x0, offset_. C.JAL is an RV32C-only instruction that performs the same operation as C.J, but additionally writes the address of the instruction following -the jump (`pc`+2) to the link register, `x1`. C.JAL expands to -`jal x1, offset`. +the jump (_pc_+2) to the link register, _x1_. C.JAL expands to +_jal x1, offset_. E@T@T@Y + & & & + @@ -530,14 +530,14 @@ C.JALR & srclatexmath:[$\neq$]0 & 0 & C2 + These instructions use the CR format. C.JR (jump register) performs an unconditional control transfer to the -address in register _rs1_. C.JR expands to `jalr x0, 0(rs1)`. C.JR is +address in register _rs1_. C.JR expands to _jalr x0, 0(rs1)_. C.JR is only valid when latexmath:[$\textit{rs1}{\neq}\texttt{x0}$]; the code point with latexmath:[$\textit{rs1}{=}\texttt{x0}$] is reserved. C.JALR (jump and link register) performs the same operation as C.JR, but additionally writes the address of the instruction following the jump -(`pc`+2) to the link register, `x1`. C.JALR expands to -`jalr x1, 0(rs1)`. C.JALR is only valid when +(_pc_+2) to the link register, _x1_. C.JALR expands to +_jalr x1, 0(rs1)_. C.JALR is only valid when latexmath:[$\textit{rs1}{\neq}\texttt{x0}$]; the code point with latexmath:[$\textit{rs1}{=}\texttt{x0}$] corresponds to the C.EBREAK instruction. @@ -562,14 +562,14 @@ offset[7:6latexmath:[$\vert$]2:1latexmath:[$\vert$]5] & C1 + These instructions use the CB format. C.BEQZ performs conditional control transfers. The offset is -sign-extended and added to the `pc` to form the branch target address. +sign-extended and added to the _pc_ to form the branch target address. It can therefore target a latexmath:[$\pm$] range. C.BEQZ takes the branch if the value in register _rs1 latexmath:[$'$]_ is zero. It -expands to `beq rs1, x0, offset`. +expands to _beq rs1, x0, offset_. C.BNEZ is defined analogously, but it takes the branch if _rs1 latexmath:[$'$]_ contains a nonzero value. It expands to -`bne rs1, x0, offset`. +_bne rs1, x0, offset_. === Integer Computational Instructions @@ -591,17 +591,17 @@ latexmath:[$\textrm{dest}{\neq}{\left\{0,2\right\}}$] & nzimm[16:12] & C1 + C.LI loads the sign-extended 6-bit immediate, _imm_, into register _rd_. -C.LI expands into `addi rd, x0, imm`. C.LI is only valid when -_rd_latexmath:[$\neq$]`x0`; the code points with _rd_=`x0` encode HINTs. +C.LI expands into _addi rd, x0, imm_. C.LI is only valid when +_rd_latexmath:[$\neq$]_x0_; the code points with _rd_=_x0_ encode HINTs. C.LUI loads the non-zero 6-bit immediate field into bits 17–12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. C.LUI expands into -`lui rd, nzimm`. C.LUI is only valid when +_lui rd, nzimm_. C.LUI is only valid when latexmath:[$\textit{rd}{\neq}{\left\{\texttt{x0},\texttt{x2}\right\}}$], and when the immediate is not equal to zero. The code points with -_nzimm_=0 are reserved; the remaining code points with _rd_=`x0` are -HINTs; and the remaining code points with _rd_=`x2` correspond to the +_nzimm_=0 are reserved; the remaining code points with _rd_=_x0_ are +HINTs; and the remaining code points with _rd_=_x2_ correspond to the C.ADDI16SP instruction. ==== Integer Register-Immediate Operations @@ -621,29 +621,29 @@ C1 + C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in register _rd_ then writes the result to _rd_. C.ADDI expands into -`addi rd, rd, nzimm`. C.ADDI is only valid when -_rd_latexmath:[$\neq$]`x0` and _nzimm_latexmath:[$\neq$]0. The code -points with _rd_=`x0` encode the C.NOP instruction; the remaining code +_addi rd, rd, nzimm_. C.ADDI is only valid when +_rd_latexmath:[$\neq$]_x0_ and _nzimm_latexmath:[$\neq$]0. The code +points with _rd_=_x0_ encode the C.NOP instruction; the remaining code points with _nzimm_=0 encode HINTs. C.ADDIW is an RV64C/RV128C-only instruction that performs the same computation but produces a 32-bit result, then sign-extends result to 64 -bits. C.ADDIW expands into `addiw rd, rd, imm`. The immediate can be -zero for C.ADDIW, where this corresponds to ` sext.w rd`. C.ADDIW is -only valid when _rd_latexmath:[$\neq$]`x0`; the code points with -_rd_=`x0` are reserved. +bits. C.ADDIW expands into _addiw rd, rd, imm_. The immediate can be +zero for C.ADDIW, where this corresponds to _sext.w rd_. C.ADDIW is +only valid when _rd_latexmath:[$\neq$]_x0_; the code points with +_rd_=_x0_ are reserved. C.ADDI16SP shares the opcode with C.LUI, but has a destination field of -`x2`. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to the -value in the stack pointer (`sp`=`x2`), where the immediate is scaled to +_x2_. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to the +value in the stack pointer (_sp_=_x2_), where the immediate is scaled to represent multiples of 16 in the range (-512,496). C.ADDI16SP is used to adjust the stack pointer in procedure prologues and epilogues. It -expands into `addi x2, x2, nzimm`. C.ADDI16SP is only valid when +expands into _addi x2, x2, nzimm_. C.ADDI16SP is only valid when _nzimm_latexmath:[$\neq$]0; the code point with _nzimm_=0 is reserved. [NOTE] ==== -In the standard RISC-V calling convention, the stack pointer `sp` is +In the standard RISC-V calling convention, the stack pointer *sp* is always 16-byte aligned. ==== @@ -656,9 +656,9 @@ nzuimm[5:4latexmath:[$\vert$]9:6latexmath:[$\vert$]2latexmath:[$\vert$]3] & dest & C0 + C.ADDI4SPN is a CIW-format instruction that adds a _zero_-extended -non-zero immediate, scaled by 4, to the stack pointer, `x2`, and writes -the result to `rd`. This instruction is used to generate pointers to -stack-allocated variables, and expands to `addi rd, x2, nzuimm`. +non-zero immediate, scaled by 4, to the stack pointer, _x2_, and writes +the result to _rd_. This instruction is used to generate pointers to +stack-allocated variables, and expands to _addi rd, x2, nzuimm_. C.ADDI4SPN is only valid when _nzuimm_latexmath:[$\neq$]0; the code points with _nzuimm_=0 are reserved. @@ -672,13 +672,13 @@ C.SLLI is a CI-format instruction that performs a logical left shift of the value in register _rd_ then writes the result to _rd_. The shift amount is encoded in the _shamt_ field. For RV128C, a shift amount of zero is used to encode a shift of 64. C.SLLI expands into -`slli rd, rd, shamt`, except for RV128C with `shamt=0`, which expands to -`slli rd, rd, 64`. +_slli rd, rd, shamt_, except for RV128C with _shamt=0_, which expands to +_slli rd, rd, 64_. For RV32C, _shamt[5]_ must be zero; the code points with _shamt[5]_=1 are designated for custom extensions. For RV32C and RV64C, the shift amount must be non-zero; the code points with _shamt_=0 are HINTs. For -all base ISAs, the code points with _rd_=`x0` are HINTs, except those +all base ISAs, the code points with _rd_=_x0_ are HINTs, except those with _shamt[5]_=1 in RV32C. S@W@Y@S@T@Y + @@ -694,15 +694,15 @@ _rd latexmath:[$'$]_. The shift amount is encoded in the _shamt_ field. For RV128C, a shift amount of zero is used to encode a shift of 64. Furthermore, the shift amount is sign-extended for RV128C, and so the legal shift amounts are 1–31, 64, and 96–127. C.SRLI expands into -`srli rd', rd', shamt`, except for RV128C with `shamt=0`, which -expands to `srli rd, rd, 64`. +_srli rd', rd', shamt_, except for RV128C with _shamt=0_, which +expands to _srli rd, rd, 64_. For RV32C, _shamt[5]_ must be zero; the code points with _shamt[5]_=1 are designated for custom extensions. For RV32C and RV64C, the shift amount must be non-zero; the code points with _shamt_=0 are HINTs. C.SRAI is defined analogously to C.SRLI, but instead performs an -arithmetic right shift. C.SRAI expands to `srai rd, rd, shamt`. +arithmetic right shift. C.SRAI expands to _srai rd, rd, shamt_. [NOTE] ==== @@ -727,7 +727,7 @@ C.ANDI & imm[5] & C.ANDI & dest & imm[4:0] & C1 + C.ANDI is a CB-format instruction that computes the bitwise AND of the value in register _rd latexmath:[$'$]_ and the sign-extended 6-bit immediate, then writes the result to _rd latexmath:[$'$]_. C.ANDI -expands to `andi rd, rd, imm`. +expands to _andi rd, rd, imm_. ==== Integer Register-Register Operations @@ -741,7 +741,7 @@ C.ADD & destlatexmath:[$\neq$]0 & srclatexmath:[$\neq$]0 & C2 + These instructions use the CR format. C.MV copies the value in register _rs2_ into register _rd_. C.MV expands -into `add rd, x0, rs2`. C.MV is only valid when +into _add rd, x0, rs2_. C.MV is only valid when latexmath:[$\textit{rs2}{\neq}\texttt{x0}$]; the code points with latexmath:[$\textit{rs2}{=}\texttt{x0}$] correspond to the C.JR instruction. The code points with @@ -758,7 +758,7 @@ hardware cost. ==== C.ADD adds the values in registers _rd_ and _rs2_ and writes the result -to register _rd_. C.ADD expands into `add rd, rd, rs2`. C.ADD is only +to register _rd_. C.ADD expands into _add rd, rd, rs2_. C.ADD is only valid when latexmath:[$\textit{rs2}{\neq}\texttt{x0}$]; the code points with latexmath:[$\textit{rs2}{=}\texttt{x0}$] correspond to the C.JALR and C.EBREAK instructions. The code points with @@ -781,34 +781,34 @@ These instructions use the CA format. C.AND computes the bitwise AND of the values in registers _rd latexmath:[$'$]_ and _rs2 latexmath:[$'$]_, then writes the result to register _rd latexmath:[$'$]_. C.AND expands into -`and rd, rd, rs2`. +_and rd, rd, rs2_. C.OR computes the bitwise OR of the values in registers _rd latexmath:[$'$]_ and _rs2 latexmath:[$'$]_, then writes the result to register _rd latexmath:[$'$]_. C.OR expands into -`or rd′, rd′, rs2′`. +_or rd′, rd′, rs2′_. C.XOR computes the bitwise XOR of the values in registers _rd latexmath:[$'$]_ and _rs2 latexmath:[$'$]_, then writes the result to register _rd latexmath:[$'$]_. C.XOR expands into -`xor rd', rd', rs2'. +_xor rd', rd', rs2'_. C.SUB subtracts the value in register _rs2 latexmath:[$'$]_ from the value in register _rd latexmath:[$'$]_, then writes the result to register _rd latexmath:[$'$]_. C.SUB expands into -`sub rd', rd', rs2'. +_sub rd', rd', rs2'_. C.ADDW is an RV64C/RV128C-only instruction that adds the values in registers _rd latexmath:[$'$]_ and _rs2 latexmath:[$'$]_, then sign-extends the lower 32 bits of the sum before writing the result to register _rd latexmath:[$'$]_. C.ADDW expands into -`addw rd', rd', rs2'`. +_addw rd', rd', rs2'_. C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in register _rs2 latexmath:[$'$]_ from the value in register _rd latexmath:[$'$]_, then sign-extends the lower 32 bits of the difference before writing the result to register _rd latexmath:[$'$]_. -C.SUBW expands into `subw rd', rd', rs2'`. +C.SUBW expands into _subw rd', rd', rs2'_. [NOTE] ==== @@ -849,8 +849,8 @@ SW@T@T@Y + C.NOP & 0 & 0 & 0 & C1 + C.NOP is a CI-format instruction that does not change any user-visible -state, except for advancing the `pc` and incrementing any applicable -performance counters. C.NOP expands to `nop`. C.NOP is only valid when +state, except for advancing the _pc_ and incrementing any applicable +performance counters. C.NOP expands to _nop_. C.NOP is only valid when _imm_=0; the code points with _imm_latexmath:[$\neq$]0 encode HINTs. ==== Breakpoint Instruction @@ -861,7 +861,7 @@ E@U@Y + & 10 & 2 + C.EBREAK & 0 & C2 + -Debuggers can use the C.EBREAK instruction, which expands to `ebreak`, +Debuggers can use the C.EBREAK instruction, which expands to _ebreak_, to cause control to be transferred back to the debugging environment. C.EBREAK shares the opcode with the C.ADD instruction, but with _rd_ and _rs2_ both zero, thus can also use the CR format. @@ -886,12 +886,12 @@ C instructions will eventually complete. A portion of the RVC encoding space is reserved for microarchitectural HINTs. Like the HINTs in the RV32I base ISA (see <<rv32i-hints>>, these instructions do not -modify any architectural state, except for advancing the `pc` and any +modify any architectural state, except for advancing the _pc_ and any applicable performance counters. HINTs are executed as no-ops on implementations that ignore them. RVC HINTs are encoded as computational instructions that do not modify -the architectural state, either because _rd_=`x0` (e.g. +the architectural state, either because _rd_=_x0_ (e.g. C.ADD _x0_, _t0_), or because _rd_ is overwritten with a copy of itself (e.g. C.ADDI _t0_, 0). @@ -930,22 +930,22 @@ no standard HINTs will ever be defined in this subspace. |C.NOP |_nzimm_latexmath:[$\neq$]0 |63 .6+^.>s|_Reserved for future standard use_ -|C.ADDI | _rd_latexmath:[$\neq$]`x0`, _nzimm_=0 |31 +|C.ADDI | _rd_latexmath:[$\neq$]_x0_, _nzimm_=0 |31 -|C.LI | _rd_=`x0` |64 +|C.LI | _rd_=_x0_ |64 -|C.LUI | _rd_=`x0`, _nzimm_latexmath:[$\neq$]0 |63 +|C.LUI | _rd_=_x0_, _nzimm_latexmath:[$\neq$]0 |63 -|C.MV | _rd_=`x0`, _rs2_latexmath:[$\neq$]`x0` |31 +|C.MV | _rd_=_x0_, _rs2_latexmath:[$\neq$]_x0_ |31 -|C.ADD | _rd_=`x0`, _rs2_latexmath:[$\neq$]`x0` |31 +|C.ADD | _rd_=_x0_, _rs2_latexmath:[$\neq$]_x0_ |31 -|C.SLLI |_rd_=`x0`, _nzimm_latexmath:[$\neq$]0 |31 (RV32), 63 (RV64/128) .5+^.>s|_Designated +|C.SLLI |_rd_=_x0_, _nzimm_latexmath:[$\neq$]0 |31 (RV32), 63 (RV64/128) .5+^.>s|_Designated for custom use_ -|C.SLLI64 | _rd_=`x0` |1 +|C.SLLI64 | _rd_=_x0_ |1 -|C.SLLI64 | _rd_latexmath:[$\neq$]`x0`, RV32 and RV64 only |31 +|C.SLLI64 | _rd_latexmath:[$\neq$]_x0_, RV32 and RV64 only |31 |C.SRLI64 | RV32 and RV64 only |8 diff --git a/src/counters.adoc b/src/counters.adoc index 670ed6e..a92b3fc 100644 --- a/src/counters.adoc +++ b/src/counters.adoc @@ -1,10 +1,10 @@ [[perf-counters]] == Counters -RISC-V ISAs provide a set of up to 32latexmath:[$\times$]64-bit +RISC-V ISAs provide a set of up to 32_X_64-bit performance counters and timers that are accessible via unprivileged -XLEN read-only CSR registers `0xC00`–`0xC1F` (with the upper 32 bits -accessed via CSR registers `0xC80`–`0xC9F` on RV32). The first three of +XLEN read-only CSR registers _0xC00_–_0xC1F_ (with the upper 32 bits +accessed via CSR registers _0xC80_–_0xC9F_ on RV32). The first three of these (CYCLE, TIME, and INSTRET) have dedicated functions (cycle count, real-time clock, and instructions-retired respectively), while the remaining counters, if implemented, provide programmable event counting. @@ -22,8 +22,8 @@ RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit CSR address space and accessed in 32-bit pieces using CSRRS instructions. In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and -RDINSTRET pseudoinstructions read the full 64 bits of the `cycle`, -`time`, and `instret` counters. Hence, the RDCYCLEH, RDTIMEH, and +RDINSTRET pseudoinstructions read the full 64 bits of the _cycle_, +_time_, and _instret_ counters. Hence, the RDCYCLEH, RDTIMEH, and RDINSTRETH instructions are RV32I-only. [NOTE] @@ -33,7 +33,7 @@ timing side-channel attacks. ==== (((counters, pseudoinstruction))) -The RDCYCLE pseudoinstruction reads the low XLEN bits of the `cycle` +The RDCYCLE pseudoinstruction reads the low XLEN bits of the _cycle_ CSR which holds a count of the number of clock cycles executed by the processor core on which the hart is running from an arbitrary start time in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63–32 @@ -46,9 +46,9 @@ environment should provide a means to determine the current rate [TIP] ==== RDCYCLE is intended to return the number of cycles executed by the -processor core, not the hart. Precisely defining what is +processor core, not the hart. Precisely defining what is a "core" difficult given some implementation choices (e.g., AMD Bulldozer). -Precisely defining what is a `clock cycle` is also difficult given the +Precisely defining what is a "clock cycle" is also difficult given the range of implementations (including software emulations), but the intent is that RDCYCLE is used for performance monitoring along with the other performance counters. In particular, where there is one hart/core, one @@ -71,7 +71,7 @@ threading implementations. For example, should we only count cycles for which any instruction was issued to execution for this hart, and/or cycles any instruction retired, or include cycles this hart was occupying machine resources but couldn’t execute due to stalls while -other harts went into execution? Likely, `all of the above` would be +other harts went into execution? Likely, _all of the above_ would be needed to have understandable performance stats. This complexity of defining a per-hart cycle count, and also the need in any case for a total per-core cycle count when tuning multithreaded code led to just @@ -79,8 +79,8 @@ standardizing the per-core cycle counter, which also happens to work well for the common single hart/core case. (((counters, handling sleep cycles))) -Standardizing what happens during `sleep` is not practical given that -what `sleep` means is not standardized across execution environments, +Standardizing what happens during "sleep" is not practical given that +what "sleep" means is not standardized across execution environments, but if the entire core is paused (entirely clock-gated or powered-down in deep sleep), then it is not executing clock cycles, and the cycle count shouldn’t be increasing per the spec. There are many details, @@ -90,12 +90,12 @@ execution-environment-specific details. Even though there is no precise definition that works for all platforms, this is still a useful facility for most platforms, and an imprecise, -common, `usually correct` standard here is better than no standard. +common, "usually correct" standard here is better than no standard. The intent of RDCYCLE was primarily performance monitoring/tuning, and the specification was written with that goal in mind. ==== -The RDTIME pseudoinstruction reads the low XLEN bits of the ` time` CSR, +The RDTIME pseudoinstruction reads the low XLEN bits of the *time* CSR, which counts wall-clock real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only instruction that reads bits 63–32 of the same real-time counter. The underlying 64-bit counter @@ -116,14 +116,14 @@ portable, rather than using RDCYCLE to measure wall-clock time. (((counters, pseudoinstructions))) The RDINSTRET pseudoinstruction reads the low XLEN bits of the -` instret` CSR, which counts the number of instructions retired by this +*instret* CSR, which counts the number of instructions retired by this hart from some arbitrary start point in the past. RDINSTRETH is an RV32I-only instruction that reads bits 63–32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice. The following code sequence will read a valid 64-bit cycle counter value -into `x3`:`x2`, even if the counter overflows its lower half between +into _x3_:_x2_, even if the counter overflows its lower half between reading its upper and lower halves. .Sample code for reading the 64-bit cycle counter in RV32. @@ -168,9 +168,9 @@ implementations with a richer set of counters. (((counters, performance))) There is CSR space allocated for 29 additional unprivileged 64-bit -hardware performance counters, `hpmcounter3`–`hpmcounter31`. For RV32, +hardware performance counters, _hpmcounter3_–_hpmcounter31_. For RV32, the upper 32 bits of these performance counters is accessible via -additional CSRs `hpmcounter3h`–` hpmcounter31h`. These counters count +additional CSRs _hpmcounter3h_–_hpmcounter31h_. These counters count platform-specific events and are configured via additional privileged registers. The number and width of these additional counters, and the set of events they count is platform-specific. @@ -184,6 +184,6 @@ counted. It would be useful to eventually standardize event settings to count ISA-level metrics, such as the number of floating-point instructions executed for example, and possibly a few common microarchitectural -metrics, such as `L1 instruction cache misses`. +metrics, such as _L1 instruction cache misses_. ==== diff --git a/src/intro.adoc b/src/intro.adoc index f455e54..3b318bc 100644 --- a/src/intro.adoc +++ b/src/intro.adoc @@ -730,5 +730,3 @@ behavior and values and use the term _unspecified_ for cases that are intentiona unconstrained. These cases may be constrained or defined by other extensions, platform standards, or implementations. - -Susan Anstey diff --git a/src/m-st-ext.adoc b/src/m-st-ext.adoc index ae5e91c..6538750 100644 --- a/src/m-st-ext.adoc +++ b/src/m-st-ext.adoc @@ -1,8 +1,8 @@ [[mstandard]] -== _M_ Standard Extension for Integer Multiplication and Division, Version 2.0 +== M Standard Extension for Integer Multiplication and Division, Version 2.0 This chapter describes the standard integer multiplication and division -instruction extension, which is named _M_ and contains instructions +instruction extension, which is named M and contains instructions that multiply or divide values held in two integer registers. [TIP] @@ -23,12 +23,12 @@ image::image_placeholder.png[] (((MUL, MULHU))) (((MUL, MULHSU))) -MUL performs an XLEN-bit_X_XLEN-bit multiplication of +MUL performs an XLEN-bit X XLEN-bit multiplication of _rs1_ by _rs2_ and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but -return the upper XLEN bits of the full 2_X_XLEN-bit -product, for signed_X_signed, -unsigned_X_unsigned, and _rs1X_ unsigned _rs2_ multiplication, respectively. +return the upper XLEN bits of the full 2 X XLEN-bit +product, for signed X signed, +unsigned X unsigned, and _rs1X_ unsigned _rs2_ multiplication, respectively. If both the high and low bits of the same product are required, then t he recommended code sequence is: MULH[[S]U] _rdh, rs1, rs2_; MUL _rdl, rs1, rs2_ (source register specifiers must be @@ -104,11 +104,10 @@ overflow cannot occur. [cols="<,^,^,^,^,^,^",options="header",] |=== |Condition |Dividend |Divisor |DIVU[W] |REMU[W] |DIV[W] |REM[W] -|Division by zero |latexmath:[$x$] |0 |latexmath:[$2^{L}-1$] -|latexmath:[$x$] |latexmath:[$-1$] |latexmath:[$x$] -|Overflow (signed only) |latexmath:[$-2^{L-1}$] |latexmath:[$-1$] |– |– -|latexmath:[$-2^{L-1}$] |0 +|Division by zero |latexmath:[$x$] |0 |latexmath:[$2^{L}-1$] |latexmath:[$x$] |latexmath:[$-1$] |latexmath:[$x$] + +|Overflow (signed only) |latexmath:[$-2^{L-1}$] |latexmath:[$-1$] |– |– |latexmath:[$-2^{L-1}$] |0 |=== In <<divby0>>, L is the width of the operation in bits: XLEN for DIV[U] and REM[U], or 32 for DIV[U]W and REM[U]W. @@ -147,7 +146,7 @@ of the corresponding M-extension instructions. [NOTE] ==== -The Zmmul extension enables low-cost implementations that require +The *Zmmul* extension enables low-cost implementations that require multiplication operations but not division. For many microcontroller applications, division operations are too infrequent to justify the cost of divider hardware. By contrast, multiplication operations are more diff --git a/src/riscv-isa-unpr-conv-review.pdf b/src/riscv-isa-unpr-conv-review.pdf Binary files differindex 1200701..917caf4 100644 --- a/src/riscv-isa-unpr-conv-review.pdf +++ b/src/riscv-isa-unpr-conv-review.pdf diff --git a/src/rvwmo.adoc b/src/rvwmo.adoc index e49d990..885e7ca 100644 --- a/src/rvwmo.adoc +++ b/src/rvwmo.adoc @@ -3,7 +3,7 @@ This chapter defines the RISC-V memory consistency model. A memory consistency model is a set of rules specifying the values that can be -returned by loads of memory. RISC-V uses a memory model called `RVWMO` +returned by loads of memory. RISC-V uses a memory model called RVWMO (RISC-V Weak Memory Ordering) which is designed to provide flexibility for architects to build high-performance scalable designs while simultaneously supporting a tractable programming model. @@ -17,14 +17,14 @@ instructions from the first hart being executed in a different order. Therefore, multithreaded code may require explicit synchronization to guarantee ordering between memory instructions from different harts. The base RISC-V ISA provides a FENCE instruction for this purpose, described -in <<fence>>, while the atomics extension `A` +in <<fence>>, while the atomics extension ^A^ additionally defines load-reserved/store-conditional and atomic read-modify-write instructions. (((atomics, misaligned))) -The standard ISA extension for misaligned atomics `Zam` +The standard ISA extension for misaligned atomics _Zam_ (<<zam>>) and the standard ISA extension for total -store ordering `Ztso` (<<ztso>>) augment RVWMO +store ordering _Ztso_ (<<ztso>>) augment RVWMO with additional rules specific to those extensions. The appendices to this specification provide both axiomatic and @@ -33,12 +33,14 @@ additional explanatory material. ((FENCE)) ((SFENCE)) +[NOTE] +==== This chapter defines the memory model for regular main memory operations. The interaction of the memory model with I/O memory, instruction fetches, FENCE.I, page table walks, and SFENCE.VMA is not (yet) formalized. Some or all of the above may be formalized in a future revision of this specification. The RV128 base ISA and future ISA -extensions such as the `V` vector and `J` JIT extensions will need +extensions such as the V vector and J JIT extensions will need to be incorporated into a future revision as well. Memory consistency models supporting overlapping memory accesses of @@ -47,6 +49,7 @@ research and are not yet fully understood. The specifics of how memory accesses of different sizes interact under RVWMO are specified to the best of our current abilities, but they are subject to revision should new issues be uncovered. +==== [[rvwmo]] === Definition of the RVWMO Memory Model @@ -86,10 +89,13 @@ multiple memory operations if XLENlatexmath:[$<$]64, as stated in gives rise to a single memory operation that is both a load operation and a store operation simultaneously. +[NOTE] +==== Instructions in the RV128 base instruction set and in future ISA extensions such as V (vector) and P (SIMD) may give rise to multiple memory operations. However, the memory model for these extensions has not yet been formalized. +==== A misaligned load or store instruction may be decomposed into a set of component memory operations of any granularity. An FLD or FSD @@ -98,19 +104,22 @@ a set of component memory operations of any granularity. The memory operations generated by such instructions are not ordered with respect to each other in program order, but they are ordered normally with respect to the memory operations generated by preceding and subsequent -instructions in program order. The atomics extension `A` does not +instructions in program order. The atomics extension ^A^ does not require execution environments to support misaligned atomic instructions -at all; however, if misaligned atomics are supported via the `Zam` +at all; however, if misaligned atomics are supported via the _Zam_ extension, LRs, SCs, and AMOs may be decomposed subject to the constraints of the atomicity axiom for misaligned atomics, which is defined in <<zam>>. ((decomposition)) +[NOTE] +==== The decomposition of misaligned memory operations down to byte granularity facilitates emulation on implementations that do not natively support misaligned accesses. Such implementations might, for example, simply iterate over the bytes of a misaligned access one by one. +==== An LR instruction and an SC instruction are said to be _paired_ if the LR precedes the SC in program order and if there are no other LR or SC @@ -121,38 +130,41 @@ whether an SC must succeed, may succeed, or must fail is defined in <<lrsc>>. Load and store operations may also carry one or more ordering -annotations from the following set: `acquire-RCpc`, `acquire-RCsc`, -`release-RCpc`, and `release-RCsc`. An AMO or LR instruction with -_aq_ set has an `acquire-RCsc` annotation. An AMO or SC instruction -with _rl_ set has a `release-RCsc` annotation. An AMO, LR, or SC -instruction with both _aq_ and _rl_ set has both `acquire-RCsc` and -`release-RCsc` annotations. - -For convenience, we use the term `acquire annotation` to refer to an +annotations from the following set: _acquire-RCpc_, _acquire-RCsc_, +_release-RCpc_, and _release-RCsc_. An AMO or LR instruction with +_aq_ set has an _acquire-RCsc_ annotation. An AMO or SC instruction +with _rl_ set has a _release-RCsc_ annotation. An AMO, LR, or SC +instruction with both _aq_ and _rl_ set has both _acquire-RCsc_ and +_release-RCsc_ annotations. + +For convenience, we use the term _acquire annotation_ to refer to an acquire-RCpc annotation or an acquire-RCsc annotation. Likewise, a -`release annotation` refers to a release-RCpc annotation or a -release-RCsc annotation. An `RCpc annotation` refers to an -acquire-RCpc annotation or a release-RCpc annotation. An `RCsc -annotation` refers to an acquire-RCsc annotation or a release-RCsc +_release annotation_ refers to a release-RCpc annotation or a +release-RCsc annotation. An _RCpc annotation_ refers to an +acquire-RCpc annotation or a release-RCpc annotation. An _RCsc +annotation_ refers to an acquire-RCsc annotation or a release-RCsc annotation. -In the memory model literature, the term `RCpc` stands for release +[NOTE] +==== +In the memory model literature, the term *RCpc* stands for release consistency with processor-consistent synchronization operations, and -the term `RCsc` stands for release consistency with sequentially -consistent synchronization operations . +the term *RCsc* stands for release consistency with sequentially +consistent synchronization operations. While there are many different definitions for acquire and release annotations in the literature, in the context of RVWMO these terms are concisely and completely defined by Preserved Program Order rules <<rcsc>>. -`RCpc` annotations are currently only used when implicitly assigned to -every memory access per the standard extension `Ztso` +*RCpc* annotations are currently only used when implicitly assigned to +every memory access per the standard extension *Ztso* (<<ztso>>). Furthermore, although the ISA does not currently contain native load-acquire or store-release instructions, nor RCpc variants thereof, the RVWMO model itself is designed to be forwards-compatible with the potential addition of any or all of the above into the ISA in a future extension. +==== [[mem-dependencies]] ==== Syntactic Dependencies @@ -160,7 +172,7 @@ above into the ISA in a future extension. The definition of the RVWMO memory model depends in part on the notion of a syntactic dependency, defined as follows. -In the context of defining dependencies, a `register` refers either to +In the context of defining dependencies, a _register_ refers either to an entire general-purpose register, some portion of a CSR, or an entire CSR. The granularity at which dependencies are tracked through CSRs is specific to each CSR and is defined in @@ -173,79 +185,81 @@ destination registers. This section provides a general definition of all of these terms; however, <<source-dest-regs>> provides a complete listing of the specifics for each instruction. -In general, a register latexmath:[$r$] other than `x0` is a _source -register_ for an instruction latexmath:[$i$] if any of the following +In general, a register _r_ other than _x0_ is a _source +register_ for an instruction _i_ if any of the following hold: -* In the opcode of latexmath:[$i$], _rs1_, _rs2_, or _rs3_ is set to -latexmath:[$r$] -* latexmath:[$i$] is a CSR instruction, and in the opcode of -latexmath:[$i$], _csr_ is set to latexmath:[$r$], unless latexmath:[$i$] -is CSRRW or CSRRWI and _rd_ is set to `x0` -* latexmath:[$r$] is a CSR and an implicit source register for -latexmath:[$i$], as defined in <<source-dest-regs>> -* latexmath:[$r$] is a CSR that aliases with another source register for -latexmath:[$i$] +* In the opcode of _i_, _rs1_, _rs2_, or _rs3_ is set to +_r_ +* _i_ is a CSR instruction, and in the opcode of +_i_, _csr_ is set to _r_, unless _i_ +is CSRRW or CSRRWI and _rd_ is set to _x0_ +* _r_ is a CSR and an implicit source register for +_i_, as defined in <<source-dest-regs>> +* _r_ is a CSR that aliases with another source register for +_i_ Memory instructions also further specify which source registers are _address source registers_ and which are _data source registers_. -In general, a register latexmath:[$r$] other than `x0` is a _destination -register_ for an instruction latexmath:[$i$] if any of the following +In general, a register _r_ other than _x0_ is a _destination +register_ for an instruction _i_ if any of the following hold: -* In the opcode of latexmath:[$i$], _rd_ is set to latexmath:[$r$] -* latexmath:[$i$] is a CSR instruction, and in the opcode of -latexmath:[$i$], _csr_ is set to latexmath:[$r$], unless latexmath:[$i$] -is CSRRS or CSRRC and _rs1_ is set to `x0` or latexmath:[$i$] is CSRRSI +* In the opcode of _i_, _rd_ is set to _r_ +* _i_ is a CSR instruction, and in the opcode of +_i_, _csr_ is set to _r_, unless _i_ +is CSRRS or CSRRC and _rs1_ is set to _x0_ or _i_ is CSRRSI or CSRRCI and uimm[4:0] is set to zero. -* latexmath:[$r$] is a CSR and an implicit destination register for -latexmath:[$i$], as defined in <<source-dest-regs>> -* latexmath:[$r$] is a CSR that aliases with another destination -register for latexmath:[$i$] +* _r_ is a CSR and an implicit destination register for +_i_, as defined in <<source-dest-regs>> +* _r_ is a CSR that aliases with another destination +register for _i_ Most non-memory instructions _carry a dependency_ from each of their source registers to each of their destination registers. However, there are exceptions to this rule; see <<>>source-dest-regs>>. -Instruction latexmath:[$j$] has a _syntactic dependency_ on instruction -latexmath:[$i$] via destination register latexmath:[$s$] of -latexmath:[$i$] and source register latexmath:[$r$] of latexmath:[$j$] +Instruction _j_ has a _syntactic dependency_ on instruction +_i_ via destination register _s_ of +_i_ and source register _r_ of _j_ if either of the following hold: -* latexmath:[$s$] is the same as latexmath:[$r$], and no instruction -program-ordered between latexmath:[$i$] and latexmath:[$j$] has -latexmath:[$r$] as a destination register -* There is an instruction latexmath:[$m$] program-ordered between -latexmath:[$i$] and latexmath:[$j$] such that all of the following hold: -. latexmath:[$j$] has a syntactic dependency on latexmath:[$m$] via -destination register latexmath:[$q$] and source register latexmath:[$r$] -. latexmath:[$m$] has a syntactic dependency on latexmath:[$i$] via -destination register latexmath:[$s$] and source register latexmath:[$p$] -. latexmath:[$m$] carries a dependency from latexmath:[$p$] to -latexmath:[$q$] - -Finally, in the definitions that follow, let latexmath:[$a$] and -latexmath:[$b$] be two memory operations, and let latexmath:[$i$] and -latexmath:[$j$] be the instructions that generate latexmath:[$a$] and -latexmath:[$b$], respectively. - -latexmath:[$b$] has a _syntactic address dependency_ on latexmath:[$a$] -if latexmath:[$r$] is an address source register for latexmath:[$j$] and -latexmath:[$j$] has a syntactic dependency on latexmath:[$i$] via source -register latexmath:[$r$] - -latexmath:[$b$] has a _syntactic data dependency_ on latexmath:[$a$] if -latexmath:[$b$] is a store operation, latexmath:[$r$] is a data source -register for latexmath:[$j$], and latexmath:[$j$] has a syntactic -dependency on latexmath:[$i$] via source register latexmath:[$r$] - -latexmath:[$b$] has a _syntactic control dependency_ on latexmath:[$a$] -if there is an instruction latexmath:[$m$] program-ordered between -latexmath:[$i$] and latexmath:[$j$] such that latexmath:[$m$] is a -branch or indirect jump and latexmath:[$m$] has a syntactic dependency -on latexmath:[$i$]. - +* _s_ is the same as _r_, and no instruction +program-ordered between _i_ and _j_ has +_r_ as a destination register +* There is an instruction _m_ program-ordered between +_i_ and _j_ such that all of the following hold: +. _j_ has a syntactic dependency on _m_ via +destination register _q_ and source register _r_ +. _m_ has a syntactic dependency on _i_ via +destination register _s_ and source register _p_ +. _m_ carries a dependency from _p_ to +_q_ + +Finally, in the definitions that follow, let ^A^ and +_b_ be two memory operations, and let _i_ and +_j_ be the instructions that generate l^A^ and +_b_, respectively. + +_b_ has a _syntactic address dependency_ on l^A^ +if _r_ is an address source register for _j_ and +_j_ has a syntactic dependency on _i_ via source +register _r_ + +_b_ has a _syntactic data dependency_ on l^A^ if +_b_ is a store operation, _r_ is a data source +register for _j_, and _j_ has a syntactic +dependency on _i_ via source register _r_ + +_b_ has a _syntactic control dependency_ on l^A^ +if there is an instruction _m_ program-ordered between +_i_ and _j_ such that _m_ is a +branch or indirect jump and _m_ has a syntactic dependency +on _i_. + +[NOTE] +==== Generally speaking, non-AMO load instructions do not have data source registers, and unconditional non-AMO store instructions do not have destination registers. However, a successful SC instruction is @@ -253,6 +267,7 @@ considered to have the register specified in _rd_ as a destination register, and hence it is possible for an instruction to have a syntactic dependency on a successful SC instruction that precedes it in program order. +==== ==== Preserved Program Order @@ -263,52 +278,52 @@ _preserved program order_. The complete definition of preserved program order is as follows (and note that AMOs are simultaneously both loads and stores): memory -operation latexmath:[$a$] precedes memory operation latexmath:[$b$] in +operation l^A^ precedes memory operation _b_ in preserved program order (and hence also in the global memory order) if -latexmath:[$a$] precedes latexmath:[$b$] in program order, -latexmath:[$a$] and latexmath:[$b$] both access regular main memory +l^A^ precedes _b_ in program order, +l^A^ and _b_ both access regular main memory (rather than I/O regions), and any of the following hold: [[overlapping-orering]] * Overlapping-Address Orderings: -. latexmath:[$b$] is a store, and -latexmath:[$a$] and latexmath:[$b$] access overlapping memory addresses -. latexmath:[$a$] and latexmath:[$b$] are loads, -latexmath:[$x$] is a byte read by both latexmath:[$a$] and -latexmath:[$b$], there is no store to latexmath:[$x$] between -latexmath:[$a$] and latexmath:[$b$] in program order, and -latexmath:[$a$] and latexmath:[$b$] return values for latexmath:[$x$] +. _b_ is a store, and +l^A^ and _b_ access overlapping memory addresses +. l^A^ and _b_ are loads, +_x_ is a byte read by both l^A^ and +_b_, there is no store to _x_ between +l^A^ and _b_ in program order, and +l^A^ and _b_ return values for _x_ written by different memory operations -. latexmath:[$a$] is -generated by an AMO or SC instruction, latexmath:[$b$] is a load, and -latexmath:[$b$] returns a value written by latexmath:[$a$] +. l^A^ is +generated by an AMO or SC instruction, _b_ is a load, and +_b_ returns a value written by l^A^ * Explicit Synchronization . There is a FENCE instruction that -orders latexmath:[$a$] before latexmath:[$b$] -. latexmath:[$a$] has an acquire +orders l^A^ before _b_ +. l^A^ has an acquire annotation -. latexmath:[$b$] has a release annotation -. latexmath:[$a$] and latexmath:[$b$] both have +. _b_ has a release annotation +. l^A^ and _b_ both have RCsc annotations -. {empty} latexmath:[$a$] is paired with -latexmath:[$b$] +. {empty} l^A^ is paired with +_b_ * Syntactic Dependencies -. latexmath:[$b$] has a syntactic address -dependency on latexmath:[$a$] -. latexmath:[$b$] has a syntactic data -dependency on latexmath:[$a$] -. latexmath:[$b$] is a store, and -latexmath:[$b$] has a syntactic control dependency on latexmath:[$a$] +. _b_ has a syntactic address +dependency on l^A^ +. _b_ has a syntactic data +dependency on l^A^ +. _b_ is a store, and +_b_ has a syntactic control dependency on l^A^ * Pipeline Dependencies -. latexmath:[$b$] is a -load, and there exists some store latexmath:[$m$] between -latexmath:[$a$] and latexmath:[$b$] in program order such that -latexmath:[$m$] has an address or data dependency on latexmath:[$a$], -and latexmath:[$b$] returns a value written by latexmath:[$m$] -. latexmath:[$b$] is a store, and -there exists some instruction latexmath:[$m$] between latexmath:[$a$] -and latexmath:[$b$] in program order such that latexmath:[$m$] has an -address dependency on latexmath:[$a$] +. _b_ is a +load, and there exists some store _m_ between +l^A^ and _b_ in program order such that +_m_ has an address or data dependency on l^A^, +and _b_ returns a value written by _m_ +. _b_ is a store, and +there exists some instruction _m_ between l^A^ +and _b_ in program order such that _m_ has an +address dependency on l^A^ ==== Memory Model Axioms @@ -320,25 +335,25 @@ axiom_, and the _progress axiom_. [[ax-load]] ===== Load Value Axiom -Each byte of each load latexmath:[$i$] returns the value written to that +Each byte of each load _i_ returns the value written to that byte by the store that is the latest in global memory order among the following stores: -. Stores that write that byte and that precede latexmath:[$i$] in the +. Stores that write that byte and that precede _i_ in the global memory order -. Stores that write that byte and that precede latexmath:[$i$] in +. Stores that write that byte and that precede _i_ in program order [[ax-atom]] ===== Atomicity Axiom -If latexmath:[$r$] and latexmath:[$w$] are paired load and store +If _r_ and _w_ are paired load and store operations generated by aligned LR and SC instructions in a hart -latexmath:[$h$], latexmath:[$s$] is a store to byte latexmath:[$x$], and -latexmath:[$r$] returns a value written by latexmath:[$s$], then -latexmath:[$s$] must precede latexmath:[$w$] in the global memory order, -and there can be no store from a hart other than latexmath:[$h$] to byte -latexmath:[$x$] following latexmath:[$s$] and preceding latexmath:[$w$] +_h_, _s_ is a store to byte _x_, and +_r_ returns a value written by _s_, then +_s_ must precede _w_ in the global memory order, +and there can be no store from a hart other than _h_ to byte +_x_ following _s_ and preceding _w_ in the global memory order. The theoretically supports LR/SC pairs of different widths and to @@ -359,9 +374,9 @@ infinite sequence of other memory operations. [cols="<,<,<",options="header",] |=== |Name |Portions Tracked as Independent Units |Aliases -|`fflags` |Bits 4, 3, 2, 1, 0 |`fcsr` -|`frm` |entire CSR |`fcsr` -|`fcsr` |Bits 7-5, 4, 3, 2, 1, 0 |`fflags`, `frm` +|_fflags_ |Bits 4, 3, 2, 1, 0 |_fcsr_ +|_frm_ |entire CSR |_fcsr_ +|_fcsr_ |Bits 7-5, 4, 3, 2, 1, 0 |_fflags_, _frm_ |=== Note: read-only CSRs are not listed, as they do not participate in the @@ -375,291 +390,296 @@ registers for each instruction. These listings are used in the definition of syntactic dependencies in <<mem-dependencies>>. -The term `accumulating CSR` is used to describe a CSR that is both a +The term _accumulating CSR_ is used to describe a CSR that is both a source and a destination register, but which carries a dependency only from itself to itself. Instructions carry a dependency from each source register in the -`Source Registers` column to each destination register in the -`Destination Registers` column, from each source register in the -`Source Registers` column to each CSR in the `Accumulating CSRs` -column, and from each CSR in the `Accumulating CSRs` column to itself, +_Source Registers_ column to each destination register in the +_Destination Registers_ column, from each source register in the +_Source Registers_ column to each CSR in the _Accumulating CSRs_ +column, and from each CSR in the _Accumulating CSRs_ column to itself, except where annotated otherwise. Key: -latexmath:[$^A$]Address source register +- ^A^: Address source register -latexmath:[$^D$]Data source register +- ^D^: Data source register -latexmath:[$^\dagger$]The instruction does not carry a dependency from +- latexmath:[$^\dagger$]: The instruction does not carry a dependency from any source register to any destination register -latexmath:[$^\ddagger$]The instruction carries dependencies from source +- latexmath:[$^\ddagger$]: The instruction carries dependencies from source register(s) to destination register(s) as specified -[cols="<,<,<,<",] +.RV32I Base Integer Instruction Set +[%header,cols="<,<,<,<"] |=== -|*RV32I Base Integer Instruction Set* | | | -| |Source |Destination |Accumulating -| |Registers |Registers |CSRs +||Source Registers |Destination Registers|Accumulating CSRs + |LUI | |_rd_ | + |AUIPC | |_rd_ | + |JAL | |_rd_ | -|JALRlatexmath:[$^\dagger$] |_rs1_ |_rd_ | + +|JALR latexmath:[$^\dagger$] |_rs1_ |_rd_ | + |BEQ |_rs1_, _rs2_ | | + |BNE |_rs1_, _rs2_ | | + |BLT |_rs1_, _rs2_ | | + |BGE |_rs1_, _rs2_ | | + |BLTU |_rs1_, _rs2_ | | + |BGEU |_rs1_, _rs2_ | | -|LBlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | -|LHlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | -|LWlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | -|LBUlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | -|LHUlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | -|SB |_rs1_latexmath:[$^A$], _rs2_latexmath:[$^D$] | | -|SH |_rs1_latexmath:[$^A$], _rs2_latexmath:[$^D$] | | -|SW |_rs1_latexmath:[$^A$], _rs2_latexmath:[$^D$] | | + +|LB latexmath:[$^\dagger$] | _rs1_ ^A^ | _rd_ | + +|LH latexmath:[$^\dagger$] | _rs1_ ^A^ | _rd_ | + +|LW latexmath:[$^\dagger$] | _rs1_ ^A^ | _rd_ | + +|LBU latexmath:[$^\dagger$] | _rs1_ ^A^ | _rd_ | + +|LHU latexmath:[$^\dagger$] | _rs1_ ^A^ | _rd_ | + +|SB |_rs1_ ^A^, _rs2_ ^D^ | | + +|SH |_rs1_ ^A^, _rs2_ ^D^ | | + +|SW |_rs1_ ^A^, _rs2_ ^D^ | | + |ADDI |_rs1_ |_rd_ | + |SLTI |_rs1_ |_rd_ | + |SLTIU |_rs1_ |_rd_ | + |XORI |_rs1_ |_rd_ | + |ORI |_rs1_ |_rd_ | + |ANDI |_rs1_ |_rd_ | + |SLLI |_rs1_ |_rd_ | + |SRLI |_rs1_ |_rd_ | + |SRAI |_rs1_ |_rd_ | + |ADD |_rs1_, _rs2_ |_rd_ | + |SUB |_rs1_, _rs2_ |_rd_ | + |SLL |_rs1_, _rs2_ |_rd_ | + |SLT |_rs1_, _rs2_ |_rd_ | + |SLTU |_rs1_, _rs2_ |_rd_ | + |XOR |_rs1_, _rs2_ |_rd_ | + |SRL |_rs1_, _rs2_ |_rd_ | + |SRA |_rs1_, _rs2_ |_rd_ | + |OR |_rs1_, _rs2_ |_rd_ | + |AND |_rs1_, _rs2_ |_rd_ | + |FENCE | | | + |FENCE.I | | | + |ECALL | | | + |EBREAK | | | -|=== -[cols="<,<,<,<,<",] -|=== -|RV32I Base Integer Instruction Set (continued) | | | | +|CSRRW latexmath:[$^\ddagger$] unless rd=x0 |_rs1_, _csr_^*^ | _rd_, _csr_ | ^*^ + +|CSRRS latexmath:[$^\ddagger$] |_rs1_, _csr_ unless _rs1_=_x0_ |_rd_ ^*^, _csr_ |^*^ -| |Source |Destination |Accumulating | +|CSRRC latexmath:[$^\ddagger$] |_rs1_, _csr_ unless _rs1_=_x0_ |_rd_ ^*^, _csr_ |^*^ -| |Registers |Registers |CSRs | +4+|latexmath:[$\ddagger$]carries a dependency from _rs1_ to _csr_ and from _csr_ to _rd_ -|CSRRWlatexmath:[$^\ddagger$] |_rs1_, _csr_latexmath:[$^*$] |_rd_, _csr_ -| |latexmath:[$^*$]unless _rd_=`x0` -|CSRRSlatexmath:[$^\ddagger$] |_rs1_, _csr_ |_rd_latexmath:[$^*$], _csr_ -| |latexmath:[$^*$]unless _rs1_=`x0` +|CSRRWI latexmath:[$^\ddagger$] |_csr_ ^*^ |_rd_, _csr_ |^*^unless _rd_=_x0_ -|CSRRClatexmath:[$^\ddagger$] |_rs1_, _csr_ |_rd_latexmath:[$^*$], _csr_ -| |latexmath:[$^*$]unless _rs1_=`x0` +|CSRRSI latexmath:[$^\ddagger$] |_csr_ |_rd_, _csr_^*^ |^*^unless uimm[4:0]=0 -| |latexmath:[$\ddagger$]carries a dependency from _rs1_ to _csr_ and -from _csr_ to _rd_ | | | +|CSRRCI latexmath:[$^\ddagger$] |_csr_ |_rd_, _csr_^*^ |^*^unless uimm[4:0]=0 + +4+|latexmath:[$\ddagger$]carries a dependency from _csr_ to _rd_ |=== -[cols="<,<,<,<,<",] +.RV64I Base Integer Instruction Set +[%header, cols="<,<,<,<",] |=== -|RV32I Base Integer Instruction Set (continued) | | | | +||Source Registers |Destination Registers |Accumulating CSRs -| |Source |Destination |Accumulating | +|_LWU_ latexmath:[$^\dagger$] |_rs1_ ^A^ |_rd_ | -| |Registers |Registers |CSRs | +|_LD_ latexmath:[$^\dagger$] |_rs1_ ^A^ |_rd_ | -|CSRRWIlatexmath:[$^\ddagger$] |_csr_latexmath:[$^*$] |_rd_, _csr_ | -|latexmath:[$^*$]unless _rd_=`x0` +|SD |_rs1_ ^A^, _rs2_ ^D^ | | -|CSRRSIlatexmath:[$^\ddagger$] |_csr_ |_rd_, _csr_latexmath:[$^*$] | -|latexmath:[$^*$]unless uimm[4:0]=0 +|SLLI | _rs1_ | _rd_ | -|CSRRCIlatexmath:[$^\ddagger$] |_csr_ |_rd_, _csr_latexmath:[$^*$] | -|latexmath:[$^*$]unless uimm[4:0]=0 +|SRLI | _rs1_ | _rd_ | -| |latexmath:[$\ddagger$]carries a dependency from _csr_ to _rd_ | | | -|=== +|SRAI | _rs1_ | _rd_ | -[cols="<,<,<,<",] -|=== -|*RV64I Base Integer Instruction Set* | | | -| |Source |Destination |Accumulating -| |Registers |Registers |CSRs -|LWUlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | -|LDlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | -|SD |_rs1_latexmath:[$^A$], _rs2_latexmath:[$^D$] | | -|SLLI |_rs1_ |_rd_ | -|SRLI |_rs1_ |_rd_ | -|SRAI |_rs1_ |_rd_ | -|ADDIW |_rs1_ |_rd_ | -|SLLIW |_rs1_ |_rd_ | -|SRLIW |_rs1_ |_rd_ | -|SRAIW |_rs1_ |_rd_ | -|ADDW |_rs1_, _rs2_ |_rd_ | -|SUBW |_rs1_, _rs2_ |_rd_ | -|SLLW |_rs1_, _rs2_ |_rd_ | -|SRLW |_rs1_, _rs2_ |_rd_ | -|SRAW |_rs1_, _rs2_ |_rd_ | +|ADDIW | _rs1_ | _rd_ | + +|SLLIW | _rs1_ | _rd_ | + +|SRLIW | _rs1_ | _rd_ | + +|SRAIW | _rs1_ | _rd_ | + +|ADDW | _rs1_, _rs2_ |_rd_ | + +|SUBW | _rs1_, _rs2_ |_rd_ | + +|SLLW | _rs1_, _rs2_ |_rd_ | + +|SRLW | _rs1_, _rs2_ |_rd_ | + +|SRAW | _rs1_, _rs2_ |_rd_ | |=== -[cols="<,<,<,<",] +.RV32M Standard Extension +[%header,cols="<,<,<,<",] |=== -|*RV32M Standard Extension* | | | -| |Source |Destination |Accumulating -| |Registers |Registers |CSRs -|MUL |_rs1_, _rs2_ |_rd_ | -|MULH |_rs1_, _rs2_ |_rd_ | +| |Source Regisers |Destination Registers |Accumulating CSRs + +|MUL | _rs1_, _rs2_ |_rd_ | + +|MULH | _rs1_, _rs2_ |_rd_ | + |MULHSU |_rs1_, _rs2_ |_rd_ | + |MULHU |_rs1_, _rs2_ |_rd_ | + |DIV |_rs1_, _rs2_ |_rd_ | + |DIVU |_rs1_, _rs2_ |_rd_ | + |REM |_rs1_, _rs2_ |_rd_ | + |REMU |_rs1_, _rs2_ |_rd_ | |=== -[cols="<,<,<,<",] +.RV64M Standard Extension +[%header, cols="<,<,<,<",] |=== -|*RV64M Standard Extension* | | | -| |Source |Destination |Accumulating -| |Registers |Registers |CSRs +||Source Registers |Destination Registers |Accumulating CSRs + |MULW |_rs1_, _rs2_ |_rd_ | + |DIVW |_rs1_, _rs2_ |_rd_ | + |DIVUW |_rs1_, _rs2_ |_rd_ | + |REMW |_rs1_, _rs2_ |_rd_ | + |REMUW |_rs1_, _rs2_ |_rd_ | |=== -[cols="<,<,<,<,<",] +.RV32A Standard Extension +[%header,cols="<,<,<,<,<",] |=== -|*RV32A Standard Extension* | | | | +||Source Registers |Destination Registers |Accumulating CSRs| -| |Source |Destination |Accumulating | +|LR.W latexmath:[$^\dagger$] | _rs1_ ^A^ | _rd_ | | -| |Registers |Registers |CSRs | +|SC.W latexmath:[$^\dagger$] | _rs1_ ^A^, _rs2_ ^D^ | _rd_ ^*^ | | ^*^ if successful -|LR.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | | +|AMOSWAP.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|SC.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_latexmath:[$^*$] | |latexmath:[$^*$]if -successful +|AMOADD.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOSWAP.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOXOR.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOADD.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOAND.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOXOR.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOOR.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_^D^ |_rd_ | | -|AMOAND.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOMIN.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOOR.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOMAX.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOMIN.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOMINU.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOMAX.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | - -|AMOMINU.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | - -|AMOMAXU.Wlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOMAXU.W latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | |=== -[cols="<,<,<,<,<",] +.RV64A Standard Extension +[%header,cols="<,<,<,<,<",] |=== -|*RV64A Standard Extension* | | | | -| |Source |Destination |Accumulating | +| |Source Registers |Destination Registers |Accumulating CSRs| -| |Registers |Registers |CSRs | +|LR.D latexmath:[$^\dagger$] |_rs1_ ^A^ |_rd_ | | -|LR.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | | - -|SC.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_latexmath:[$^*$] | |latexmath:[$^*$]if +|SC.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ ^*^ | |^*^if successful -|AMOSWAP.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOSWAP.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOADD.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOADD.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOXOR.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOXOR.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_ ^D^ |_rd_ | | -|AMOAND.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOAND.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_^D^ |_rd_ | | -|AMOOR.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOOR.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_^D^ |_rd_ | | -|AMOMIN.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOMIN.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_^D^ |_rd_ | | -|AMOMAX.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOMAX.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_^D^ |_rd_ | | -|AMOMINU.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOMINU.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_^D^ |_rd_ | | -|AMOMAXU.Dlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$], -_rs2_latexmath:[$^D$] |_rd_ | | +|AMOMAXU.D latexmath:[$^\dagger$] |_rs1_ ^A^, _rs2_^D^ |_rd_ | | |=== +.RV32F Standard Extension [cols="<,<,<,<,<",] |=== -|*RV32F Standard Extension* | | | | -| |Source |Destination |Accumulating | +| |Source Registers |Destination Registers |Accumulating CSRs | -| |Registers |Registers |CSRs | -|FLWlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | | +|FLWlatexmath:[$^\dagger$] |_rs1_ ^A^ |_rd_ | | -|FSW |_rs1_latexmath:[$^A$], _rs2_latexmath:[$^D$] | | | +|FSW |_rs1_ ^A^, _rs2_^D^ | | | -|FMADD.S |_rs1_, _rs2_, _rs3_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FMADD.S |_rs1_, _rs2_, _rs3_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FMSUB.S |_rs1_, _rs2_, _rs3_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FMSUB.S |_rs1_, _rs2_, _rs3_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FNMSUB.S |_rs1_, _rs2_, _rs3_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, -NX |latexmath:[$^*$]if rm=111 +|FNMSUB.S |_rs1_, _rs2_, _rs3_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FNMADD.S |_rs1_, _rs2_, _rs3_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, -NX |latexmath:[$^*$]if rm=111 +|FNMADD.S |_rs1_, _rs2_, _rs3_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FADD.S |_rs1_, _rs2_, frmlatexmath:[$^*$] |_rd_ |NV, OF, NX -|latexmath:[$^*$]if rm=111 +|FADD.S |_rs1_, _rs2_, frm^*^ |_rd_ |NV, OF, NX |^*^if rm=111 -|FSUB.S |_rs1_, _rs2_, frmlatexmath:[$^*$] |_rd_ |NV, OF, NX -|latexmath:[$^*$]if rm=111 +|FSUB.S |_rs1_, _rs2_, frm^*^ |_rd_ |NV, OF, NX |^*^if rm=111 -|FMUL.S |_rs1_, _rs2_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FMUL.S |_rs1_, _rs2_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FDIV.S |_rs1_, _rs2_, frmlatexmath:[$^*$] |_rd_ |NV, DZ, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FDIV.S |_rs1_, _rs2_, frm^*^ |_rd_ |NV, DZ, OF, UF, NX |^*^if rm=111 -|FSQRT.S |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FSQRT.S |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 |FSGNJ.S |_rs1_, _rs2_ |_rd_ | | @@ -671,11 +691,9 @@ rm=111 |FMAX.S |_rs1_, _rs2_ |_rd_ |NV | -|FCVT.W.S |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FCVT.W.S |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 -|FCVT.WU.S |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FCVT.WU.S |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 |FMV.X.W |_rs1_ |_rd_ | | @@ -687,76 +705,57 @@ rm=111 |FCLASS.S |_rs1_ |_rd_ | | -|FCVT.S.W |_rs1_, frmlatexmath:[$^*$] |_rd_ |NX |latexmath:[$^*$]if -rm=111 +|FCVT.S.W |_rs1_, frm^*^ |_rd_ |NX |^*^if rm=111 -|FCVT.S.WU |_rs1_, frmlatexmath:[$^*$] |_rd_ |NX |latexmath:[$^*$]if -rm=111 +|FCVT.S.WU |_rs1_, frm^*^ |_rd_ |NX |^*^if rm=111 |FMV.W.X |_rs1_ |_rd_ | | |=== -[cols="<,<,<,<,<",] +.RV64F Standard Extension +[%heaser,cols="<,<,<,<,<",] |=== -|*RV64F Standard Extension* | | | | +| |Source Regsiters |Destination Registers |Accumulating CSRs| -| |Source |Destination |Accumulating | +|FCVT.L.S |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 -| |Registers |Registers |CSRs | +|FCVT.LU.S |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 -|FCVT.L.S |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FCVT.S.L |_rs1_, frm^*^ |_rd_ |NX |^*^if rm=111 -|FCVT.LU.S |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 - -|FCVT.S.L |_rs1_, frmlatexmath:[$^*$] |_rd_ |NX |latexmath:[$^*$]if -rm=111 - -|FCVT.S.LU |_rs1_, frmlatexmath:[$^*$] |_rd_ |NX |latexmath:[$^*$]if -rm=111 +|FCVT.S.LU |_rs1_, frm^*^ |_rd_ |NX |^*^if rm=111 |=== -[cols="<,<,<,<,<",] +.RV32D Standard Extension +[%header,cols="<,<,<,<,<",] |=== -|*RV32D Standard Extension* | | | | -| |Source |Destination |Accumulating | +| |Source Regsters|Destination Regsiters |Accumulating CSRs | -| |Registers |Registers |CSRs | -|FLDlatexmath:[$^\dagger$] |_rs1_latexmath:[$^A$] |_rd_ | | +|FLD latexmath:[$^\dagger$] |_rs1_ ^A^ |_rd_ | | -|FSD |_rs1_latexmath:[$^A$], _rs2_latexmath:[$^D$] | | | +|FSD |_rs1_ ^A^, _rs2_^D^ | | | -|FMADD.D |_rs1_, _rs2_, _rs3_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FMADD.D |_rs1_, _rs2_, _rs3_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FMSUB.D |_rs1_, _rs2_, _rs3_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FMSUB.D |_rs1_, _rs2_, _rs3_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FNMSUB.D |_rs1_, _rs2_, _rs3_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, -NX |latexmath:[$^*$]if rm=111 +|FNMSUB.D |_rs1_, _rs2_, _rs3_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FNMADD.D |_rs1_, _rs2_, _rs3_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, -NX |latexmath:[$^*$]if rm=111 +|FNMADD.D |_rs1_, _rs2_, _rs3_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FADD.D |_rs1_, _rs2_, frmlatexmath:[$^*$] |_rd_ |NV, OF, NX -|latexmath:[$^*$]if rm=111 +|FADD.D |_rs1_, _rs2_, frm^*^ |_rd_ |NV, OF, NX |^*^if rm=111 -|FSUB.D |_rs1_, _rs2_, frmlatexmath:[$^*$] |_rd_ |NV, OF, NX -|latexmath:[$^*$]if rm=111 +|FSUB.D |_rs1_, _rs2_, frm^*^ |_rd_ |NV, OF, NX |^*^if rm=111 -|FMUL.D |_rs1_, _rs2_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FMUL.D |_rs1_, _rs2_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 -|FDIV.D |_rs1_, _rs2_, frmlatexmath:[$^*$] |_rd_ |NV, DZ, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FDIV.D |_rs1_, _rs2_, frm^*^ |_rd_ |NV, DZ, OF, UF, NX |^*^if rm=111 -|FSQRT.D |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FSQRT.D |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 |FSGNJ.D |_rs1_, _rs2_ |_rd_ | | @@ -768,8 +767,7 @@ rm=111 |FMAX.D |_rs1_, _rs2_ |_rd_ |NV | -|FCVT.S.D |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, OF, UF, NX -|latexmath:[$^*$]if rm=111 +|FCVT.S.D |_rs1_, frm^*^ |_rd_ |NV, OF, UF, NX |^*^if rm=111 |FCVT.D.S |_rs1_ |_rd_ |NV | @@ -781,11 +779,9 @@ rm=111 |FCLASS.D |_rs1_ |_rd_ | | -|FCVT.W.D |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FCVT.W.D |_rs1_,^*^ |_rd_ |NV, NX |^*^if rm=111 -|FCVT.WU.D |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FCVT.WU.D |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 |FCVT.D.W |_rs1_ |_rd_ | | @@ -793,27 +789,21 @@ rm=111 |=== -[cols="<,<,<,<,<",] +.RV64D Standard Extension +[%header,cols="<,<,<,<,<",] |=== -|*RV64D Standard Extension* | | | | - -| |Source |Destination |Accumulating | -| |Registers |Registers |CSRs | +| |Source Regsiters |Destination Registers |Accumulating CSRs | -|FCVT.L.D |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FCVT.L.D |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 -|FCVT.LU.D |_rs1_, frmlatexmath:[$^*$] |_rd_ |NV, NX |latexmath:[$^*$]if -rm=111 +|FCVT.LU.D |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 |FMV.X.D |_rs1_ |_rd_ | | -|FCVT.D.L |_rs1_, frmlatexmath:[$^*$] |_rd_ |NX |latexmath:[$^*$]if -rm=111 +|FCVT.D.L |_rs1_, frm^*^ |_rd_ |NX |^*^if rm=111 -|FCVT.D.LU |_rs1_, frmlatexmath:[$^*$] |_rd_ |NX |latexmath:[$^*$]if -rm=111 +|FCVT.D.LU |_rs1_, frm^*^ |_rd_ |NX |^*^if rm=111 |FMV.D.X |_rs1_ |_rd_ | | |