diff options
Diffstat (limited to 'src')
364 files changed, 7584 insertions, 7328 deletions
diff --git a/src/a-st-ext.adoc b/src/a-st-ext.adoc index ff6e3f3..a5e0b9f 100644 --- a/src/a-st-ext.adoc +++ b/src/a-st-ext.adoc @@ -54,7 +54,7 @@ same address domain. [[sec:lrsc]] === "Zalrsc" Extension for Load-Reserved/Store-Conditional Instructions -include::images/wavedrom/load-reserve-st-conditional.adoc[] +include::images/wavedrom/load-reserve-st-conditional.edn[] Complex atomic memory operations on a single memory word or doubleword are performed with the load-reserved (LR) and store-conditional (SC) @@ -66,8 +66,14 @@ if the reservation is still valid and the reservation set contains the bytes being written. If the SC.W succeeds, the instruction writes the word in _rs2_ to memory, and it writes zero to _rd_. If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value -to _rd_. For the purposes of memory protection, a failed SC.W may be -treated like a store. Regardless of success or failure, executing an +to _rd_. +No SC.W instruction shall retire unless it passes memory permission checks, +but it is UNSPECIFIED whether any side effects of implicit address translation +and protection memory accesses (such as setting a page-table entry D bit) +occur on a failed SC.W. +For the purposes of memory protection, a failed SC.W may be +treated like a store. +Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart. LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in _rd_. @@ -113,7 +119,7 @@ assume the failure code will be non-zero. [NOTE] ==== We reserve a failure code of 1 to mean ''unspecified'' so that simple -implementations may return this value using the existing mux required +implementations may return this value using the existing multiplexer required for the SLT/SLTU instructions. More specific failure codes might be defined in future versions or extensions to the ISA. ==== @@ -227,8 +233,8 @@ instruction unless the _rl_ bit is also set. LR._rl_ and SC._aq_ instructions are not guaranteed to provide any stronger ordering than those with both bits clear, but may result in lower performance. -<<< - +[NOTE] +==== [[cas]] [source,asm] .Sample code for compare-and-swap function using LR/SC. @@ -250,6 +256,7 @@ those with both bits clear, but may result in lower performance. LR/SC can be used to construct lock-free data structures. An example using LR/SC to implement a compare-and-swap function is shown in <<cas>>. If inlined, compare-and-swap functionality need only take four instructions. +==== [[sec:lrscseq]] === Eventual Success of Store-Conditional Instructions @@ -264,9 +271,9 @@ instructions placed sequentially in memory. instruction. The dynamic code executed between the LR and SC instructions can only contain instructions from the base ''I'' instruction set, excluding loads, stores, backward jumps, taken backward -branches, JALR, FENCE, and SYSTEM instructions. If the ''C'' extension -is supported, then compressed forms of the aforementioned ''I'' -instructions are also permitted. +branches, JALR, FENCE, and SYSTEM instructions. +Compressed forms of the aforementioned ''I'' instructions in the Zca and Zcb +extensions are also permitted. * The code to retry a failing LR/SC sequence can contain backwards jumps and/or branches to repeat the LR/SC sequence, but otherwise has the same constraint as the code between the LR and SC. @@ -355,7 +362,7 @@ substantially easier to provide in some microarchitectural styles. [[sec:amo]] === "Zaamo" Extension for Atomic Memory Operations -include::images/wavedrom/atomic-mem.adoc[] +include::images/wavedrom/atomic-mem.edn[] The atomic memory operation (AMO) instructions perform read-modify-write operations for multiprocessor synchronization and are encoded with an @@ -434,6 +441,8 @@ both imply additional unnecessary ordering as compared to AMOs with the corresponding _aq_ or _rl_ bit set. ==== +[NOTE] +==== An example code sequence for a critical section guarded by a test-and-test-and-set spinlock is shown in Example <<critical>>. Note the first AMO is marked _aq_ to @@ -441,8 +450,6 @@ order the lock acquisition before the critical section, and the second AMO is marked _rl_ to order the critical section before the lock relinquishment. -<<< - [[critical]] [source,asm] .Sample code for mutual exclusion. `a0` contains the address of the lock. @@ -457,9 +464,7 @@ relinquishment. # ... amoswap.w.rl x0, x0, (a0) # Release lock by storing 0. -[NOTE] -==== -We recommend the use of the AMO Swap idiom shown above for both lock +We recommend the use of the AMO Swap idiom shown in <<critical>> for both lock acquire and release to simplify the implementation of speculative lock elision. cite:[Rajwar:2001:SLE] ==== diff --git a/src/b-st-ext.adoc b/src/b-st-ext.adoc index 0dfb273..99f332f 100644 --- a/src/b-st-ext.adoc +++ b/src/b-st-ext.adoc @@ -16,8 +16,6 @@ The instructions have mnemonics and encodings that are independent of the extens Thus, when implementing extensions with overlapping instructions, there is no redundancy in logic or encoding. The bitmanip extensions are defined for RV32 and RV64. -Most of the instructions are expected to be forward compatible with RV128. -While the shift-immediate instructions are defined to have at most a 6-bit immediate field, a 7th bit is available in the encoding space should this be needed for RV128. === Word Instructions @@ -191,7 +189,7 @@ along with their specific mapping: |✓ |✓ -|orc.b _rd_, _rs1_, _rs2_ +|orc.b _rd_, _rs_ |<<#insns-orc_b>> | |✓ @@ -452,7 +450,7 @@ The shift and add instructions do a left shift of 1, 2, or 3 because these are c While the shift and add instructions are limited to a maximum left shift of 3, the slli instruction (from the base ISA) can be used to perform similar shifts for indexing into arrays of wider elements. The slli.uw -- added in this extension -- can be used when the index is to be interpreted as an unsigned word. -The following instructions (and pseudoinstructions) comprise the Zba extension: +The following instructions comprise the Zba extension: [%header,cols="^1,^1,4,8"] |=== @@ -501,11 +499,6 @@ The following instructions (and pseudoinstructions) comprise the Zba extension: |slli.uw _rd_, _rs1_, _imm_ |<<#insns-slli_uw>> -| -|✓ -|zext.w _rd_, _rs_ -|<<#insns-add_uw>> - |=== [#zbb,reftext="Basic bit-manipulation"] @@ -633,7 +626,7 @@ instructions that return the smaller/larger of two operands. These instructions perform the sign extension or zero extension of the least significant 8 bits or 16 bits of the source register. -These instructions replace the generalized idioms `slli rD,rS,(XLEN-<size>) + srli` (for zero extension) or `slli + srai` (for sign extension) for the sign extension of 8-bit and 16-bit quantities, and for the zero extension of 16-bit quantities. +These instructions replace the generalized idioms `slli rd,rs,(XLEN-<size>) + srai` (for sign extension of 8-bit and 16-bit quantities) and `slli + srli` (for zero extension of 16-bit quantities). [%header,cols="^1,^1,4,8"] |=== @@ -660,7 +653,7 @@ These instructions replace the generalized idioms `slli rD,rS,(XLEN-<size>) + sr ===== Bitwise rotation -Bitwise rotation instructions are similar to the shift-logical operations from the base spec. However, where the shift-logical +Bitwise rotation instructions are similar to the shift-logical operations from the base spec. However, where the shift-logical instructions shift in zeros, the rotate instructions shift in the bits that were shifted out of the other side of the value. Such operations are also referred to as ‘circular shifts’. @@ -836,7 +829,7 @@ a single bit in a register. The bit is specified by its index. |=== -[#zbkb,reftext="Bit-manipulation for Cryptography"] +[[zbkb,Bit-manipulation for Cryptography]] ==== Zbkb: Bit-manipulation for Cryptography This extension contains instructions essential for implementing @@ -850,89 +843,89 @@ common operations in cryptographic workloads. |Instruction -| ✓ -| ✓ -| rol +| ✓ +| ✓ +| rol | <<insns-rol>> -| -| ✓ -| rolw +| +| ✓ +| rolw | <<insns-rolw>> -| ✓ -| ✓ -| ror +| ✓ +| ✓ +| ror | <<insns-ror>> -| ✓ -| ✓ -| rori +| ✓ +| ✓ +| rori | <<insns-rori>> -| -| ✓ -| roriw +| +| ✓ +| roriw | <<insns-roriw>> -| -| ✓ -| rorw +| +| ✓ +| rorw | <<insns-rorw>> -| ✓ -| ✓ -| andn +| ✓ +| ✓ +| andn | <<insns-andn>> -| ✓ -| ✓ -| orn +| ✓ +| ✓ +| orn | <<insns-orn>> -| ✓ -| ✓ -| xnor +| ✓ +| ✓ +| xnor | <<insns-xnor>> -| ✓ -| ✓ -| pack +| ✓ +| ✓ +| pack | <<insns-pack>> -| ✓ -| ✓ -| packh +| ✓ +| ✓ +| packh | <<insns-packh>> -| -| ✓ -| packw +| +| ✓ +| packw | <<insns-packw>> -| ✓ -| ✓ -| rev.b -| <<insns-revb>> +| ✓ +| ✓ +| brev8 +| <<insns-brev8>> -| ✓ -| ✓ -| rev8 +| ✓ +| ✓ +| rev8 | <<insns-rev8>> -| ✓ -| -| zip +| ✓ +| +| zip | <<insns-zip>> -| ✓ -| -| unzip +| ✓ +| +| unzip | <<insns-unzip>> |=== -[#zbkc,reftext="Carry-less multiplication for Cryptography"] +[[zbkc,Carry-less multiplication for Cryptography]] ==== Zbkc: Carry-less multiplication for Cryptography Carry-less multiplication is the multiplication in the polynomial ring over @@ -960,7 +953,7 @@ efficiently implement the GHASH operation, which is part of this workload. |=== -[#zbkx,reftext="Crossbar permutations"] +[[zbkx,Crossbar permutations]] ==== Zbkx: Crossbar permutations These instructions implement a "lookup table" for 4 and 8 bit elements @@ -984,13 +977,13 @@ latency does not depend on the (secret) data being operated on. |✓ |✓ -|xperm.n _rd_, _rs1_, _rs2_ -|<<#insns-xpermn>> +|xperm4 _rd_, _rs1_, _rs2_ +|<<#insns-xperm4>> |✓ |✓ -|xperm.b _rd_, _rs1_, _rs2_ -|<<#insns-xpermb>> +|xperm8 _rd_, _rs1_, _rs2_ +|<<#insns-xperm8>> |=== @@ -1072,7 +1065,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs the bitwise logical AND operation between _rs1_ and the bitwise inversion of _rs2_. Operation:: @@ -1287,7 +1280,7 @@ Encoding (RV64):: .... Description:: -This instruction returns a single bit extracted from _rs1_ at the index specified in _rs2_. +This instruction returns a single bit extracted from _rs1_ at the index specified in _shamt_. The index is read from the lower log2(XLEN) bits of _shamt_. For RV32, the encodings corresponding to shamt[5]=1 are reserved. @@ -1726,7 +1719,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction counts the number of 0's before the first 1, starting at the most-significant bit (i.e., XLEN-1) and progressing to bit 0. Accordingly, if the input is 0, the output is XLEN, and if the most-significant bit of the input is a 1, the output is 0. Operation:: @@ -1832,7 +1825,7 @@ Encoding:: { bits: 7, name: 0x30, attr: ['CPOP'] }, ]} .... -Description:: +Description:: This instructions counts the number of 1's (i.e., set bits) in the source register. Operation:: @@ -1890,7 +1883,7 @@ Encoding:: { bits: 7, name: 0x30, attr: ['CPOPW'] }, ]} .... -Description:: +Description:: This instructions counts the number of 1's (i.e., set bits) in the least-significant word of the source register. Operation:: @@ -1940,7 +1933,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction counts the number of 0's before the first 1, starting at the least-significant bit (i.e., 0) and progressing to the most-significant bit (i.e., XLEN-1). Accordingly, if the input is 0, the output is XLEN, and if the least-significant bit of the input is a 1, the output is 0. @@ -2029,7 +2022,7 @@ Included in:: ==== max Synopsis:: -Maximum +Maximum Mnemonic:: max _rd_, _rs1_, _rs2_ @@ -2260,7 +2253,7 @@ Encoding:: ]} .... -Description:: +Description:: Combines the bits within each byte using bitwise logical OR. This sets the bits of each byte in the result _rd_ to all zeros if no bit within the respective byte of _rs_ is set, or to all ones if any bit within the respective byte of _rs_ is set. @@ -2314,7 +2307,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs the bitwise logical OR operation between _rs1_ and the bitwise inversion of _rs2_. Operation:: @@ -2362,7 +2355,7 @@ Encoding:: ]} .... -Description:: +Description:: The pack instruction packs the XLEN/2-bit lower halves of _rs1_ and _rs2_ into _rd_, with _rs1_ in the lower half and _rs2_ in the upper half. @@ -2386,6 +2379,13 @@ Included in:: |Ratified |=== +NOTE: For RV32, the `pack` instruction with _rs2_=`x0` is the `zext.h` +instruction. +Hence, for RV32, any extension that contains the `pack` instruction also +contains the `zext.h` instruction (but not necessarily the `c.zext.h` +instruction, which is only guaranteed to exist if both the Zcb and Zbb +extensions are implemented). + <<< [#insns-packh,reftext="Pack low bytes of registers"] ==== packh @@ -2409,8 +2409,8 @@ Encoding:: ]} .... -Description:: -And the packh instruction packs the least-significant bytes of +Description:: +The packh instruction packs the least-significant bytes of _rs1_ and _rs2_ into the 16 least-significant bits of _rd_, zero extending the rest of _rd_. @@ -2448,8 +2448,7 @@ Encoding:: [wavedrom, , svg] .... {reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0xe}, +{bits: 7, name: 0x3b, attr: ['OP-32']}, {bits: 5, name: 'rd'}, {bits: 3, name: 0x4}, {bits: 5, name: 'rs1'}, @@ -2458,7 +2457,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction packs the low 16 bits of _rs1_ and _rs2_ into the 32 least-significant bits of _rd_, sign extending the 32-bit result to the rest of _rd_. @@ -2484,6 +2483,13 @@ Included in:: |Ratified |=== +NOTE: For RV64, the `packw` instruction with _rs2_=`x0` is the `zext.h` +instruction. +Hence, for RV64, any extension that contains the `packw` instruction also +contains the `zext.h` instruction (but not necessarily the `c.zext.h` +instruction, which is only guaranteed to exist if both the Zcb and Zbb +extensions are implemented). + <<< [#insns-rev8,reftext="Byte-reverse register"] ==== rev8 @@ -2518,7 +2524,7 @@ Encoding (RV64):: ]} .... -Description:: +Description:: This instruction reverses the order of the bytes in _rs_. Operation:: @@ -2568,14 +2574,14 @@ Included in:: |=== <<< -[#insns-revb,reftext="Reverse bits in bytes"] -==== rev.b +[#insns-brev8,reftext="Reverse bits in bytes"] +==== brev8 Synopsis:: Reverse the bits in each byte of a source register. Mnemonic:: -rev.b _rd_, _rs_ +brev8 _rd_, _rs_ Encoding:: [wavedrom, , svg] @@ -2589,7 +2595,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction reverses the order of the bits in every byte of a register. Operation:: @@ -2692,7 +2698,7 @@ Encoding:: Description:: This instruction performs a rotate left on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. -The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits. +The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits. Operation:: [source,sail] @@ -2808,7 +2814,7 @@ Encoding (RV64):: ]} .... -Description:: +Description:: This instruction performs a rotate right of _rs1_ by the amount in the least-significant log2(XLEN) bits of _shamt_. For RV32, the encodings corresponding to shamt[5]=1 are reserved. @@ -2862,7 +2868,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs a rotate right on the least-significant word of _rs1_ by the amount in the least-significant log2(XLEN) bits of _shamt_. @@ -2917,7 +2923,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs a rotate right on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. The resultant word is sign-extended by copying bit 31 to all of the more-significant bits. @@ -3368,7 +3374,8 @@ This instruction is the same as *slli* with *zext.w* performed on _rs1_ before s ==== unzip Synopsis:: -Implements the inverse of the zip instruction. +Place odd and even bits of the source register into upper and lower halves of +the destination register, respectively. Mnemonic:: unzip _rd_, _rs_ @@ -3381,15 +3388,15 @@ Encoding:: {bits: 5, name: 'rd'}, {bits: 3, name: 0x5}, {bits: 5, name: 'rs1'}, -{bits: 5, name: 0x1f}, +{bits: 5, name: 0xf}, {bits: 7, name: 0x4}, ]} .... -Description:: -This instruction gathers bits from the high and low halves of the source -word into odd/even bit positions in the destination word. -It is the inverse of the <<insns-zip,zip>> instruction. +Description:: +This instruction scatters all of the odd and even bits of a source word into +the high and low halves of a destination word. +It is the inverse of the <<insns-zip-sc,zip>> instruction. This instruction is available only on RV32. Operation:: @@ -3445,7 +3452,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs the bit-wise exclusive-NOR operation on _rs1_ and _rs2_. Operation:: @@ -3471,21 +3478,20 @@ Included in:: |=== <<< -[#insns-xpermb,reftext="Crossbar permutation (bytes)"] -==== xperm.b +[#insns-xperm8,reftext="Crossbar permutation (bytes)"] +==== xperm8 Synopsis:: Byte-wise lookup of indices into a vector in registers. Mnemonic:: -xperm.b _rd_, _rs1_, _rs2_ +xperm8 _rd_, _rs1_, _rs2_ Encoding:: [wavedrom, , svg] .... {reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0xc}, +{bits: 7, name: 0x33, attr: ['OP'] }, {bits: 5, name: 'rd'}, {bits: 3, name: 0x4}, {bits: 5, name: 'rs1'}, @@ -3494,8 +3500,8 @@ Encoding:: ]} .... -Description:: -The xperm.b instruction operates on bytes. +Description:: +The xperm8 instruction operates on bytes. The _rs1_ register contains a vector of XLEN/8 8-bit elements. The _rs2_ register contains a vector of XLEN/8 8-bit indexes. The result is each element in _rs2_ replaced by the indexed element in _rs1_, @@ -3504,15 +3510,15 @@ or zero if the index into _rs2_ is out of bounds. Operation:: [source,sail] -- -val xpermb_lookup : (bits(8), xlenbits) -> bits(8) -function xpermb_lookup (idx, lut) = { +val xperm8_lookup : (bits(8), xlenbits) -> bits(8) +function xperm8_lookup (idx, lut) = { (lut >> (idx @ 0b000))[7..0] } -function clause execute ( XPERM_B (rs2,rs1,rd)) = { +function clause execute ( XPERM8 (rs2,rs1,rd)) = { result : xlenbits = EXTZ(0b0); foreach(i from 0 to xlen by 8) { - result[i+7..i] = xpermn_lookup(X(rs2)[i+7..i], X(rs1)); + result[i+7..i] = xperm8_lookup(X(rs2)[i+7..i], X(rs1)); }; X(rd) = result; RETIRE_SUCCESS @@ -3532,21 +3538,20 @@ Included in:: |=== <<< -[#insns-xpermn,reftext="Crossbar permutation (nibbles)"] -==== xperm.n +[#insns-xperm4,reftext="Crossbar permutation (nibbles)"] +==== xperm4 Synopsis:: Nibble-wise lookup of indices into a vector. Mnemonic:: -xperm.n _rd_, _rs1_, _rs2_ +xperm4 _rd_, _rs1_, _rs2_ Encoding:: [wavedrom, , svg] .... {reg:[ -{bits: 2, name: 0x3}, -{bits: 5, name: 0xc}, +{bits: 7, name: 0x33, attr: ['OP'] }, {bits: 5, name: 'rd'}, {bits: 3, name: 0x2}, {bits: 5, name: 'rs1'}, @@ -3555,8 +3560,8 @@ Encoding:: ]} .... -Description:: -The xperm.n instruction operates on nibbles. +Description:: +The xperm4 instruction operates on nibbles. The _rs1_ register contains a vector of XLEN/4 4-bit elements. The _rs2_ register contains a vector of XLEN/4 4-bit indexes. The result is each element in _rs2_ replaced by the indexed element in _rs1_, @@ -3565,15 +3570,15 @@ or zero if the index into _rs2_ is out of bounds. Operation:: [source,sail] -- -val xpermn_lookup : (bits(4), xlenbits) -> bits(4) -function xpermn_lookup (idx, lut) = { +val xperm4_lookup : (bits(4), xlenbits) -> bits(4) +function xperm4_lookup (idx, lut) = { (lut >> (idx @ 0b00))[3..0] } -function clause execute ( XPERM_N (rs2,rs1,rd)) = { +function clause execute ( XPERM4 (rs2,rs1,rd)) = { result : xlenbits = EXTZ(0b0); foreach(i from 0 to xlen by 4) { - result[i+3..i] = xpermn_lookup(X(rs2)[i+3..i], X(rs1)); + result[i+3..i] = xperm4_lookup(X(rs2)[i+3..i], X(rs1)); }; X(rd) = result; RETIRE_SUCCESS @@ -3660,8 +3665,8 @@ Included in:: ==== zip Synopsis:: -Gather odd and even bits of the source word into upper/lower halves of the -destination. +Interleave upper and lower halves of the source register into odd and even +bits of the destination register, respectively. Mnemonic:: zip _rd_, _rs_ @@ -3674,15 +3679,15 @@ Encoding:: {bits: 5, name: 'rd'}, {bits: 3, name: 0x1}, {bits: 5, name: 'rs1'}, -{bits: 5, name: 0x1e}, +{bits: 5, name: 0xf}, {bits: 7, name: 0x4}, ]} .... -Description:: -This instruction scatters all of the odd and even bits of a source word into -the high and low halves of a destination word. -It is the inverse of the <<insns-unzip,unzip>> instruction. +Description:: +This instruction gathers bits from the high and low halves of the source +word into odd/even bit positions in the destination word. +It is the inverse of the <<insns-unzip-sc,unzip>> instruction. This instruction is available only on RV32. Operation:: @@ -3731,70 +3736,70 @@ A full example of a *strlen* function, which uses these techniques and also demo -- #include <sys/asm.h> - .text - .globl strlen - .type strlen, @function + .text + .globl strlen + .type strlen, @function strlen: - andi a3, a0, (SZREG-1) // offset - andi a1, a0, -SZREG // align pointer + andi a3, a0, (SZREG-1) // offset + andi a1, a0, -SZREG // align pointer .Lprologue: - li a4, SZREG - sub a4, a4, a3 // XLEN - offset - slli a3, a3, 3 // offset * 8 - REG_L a2, 0(a1) // chunk - /* - * Shift the partial/unaligned chunk we loaded to remove the bytes - * from before the start of the string, adding NUL bytes at the end. - */ -#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ - srl a2, a2 ,a3 // chunk >> (offset * 8) + li a4, SZREG + sub a4, a4, a3 // XLEN - offset + slli a3, a3, 3 // offset * 8 + REG_L a2, 0(a1) // chunk + /* + * Shift the partial/unaligned chunk we loaded to remove the bytes + * from before the start of the string, adding NUL bytes at the end. + */ +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + srl a2, a2 ,a3 // chunk >> (offset * 8) #else - sll a2, a2, a3 + sll a2, a2, a3 #endif - orc.b a2, a2 - not a2, a2 - /* - * Non-NUL bytes in the string have been expanded to 0x00, while - * NUL bytes have become 0xff. Search for the first set bit - * (corresponding to a NUL byte in the original chunk). - */ + orc.b a2, a2 + not a2, a2 + /* + * Non-NUL bytes in the string have been expanded to 0x00, while + * NUL bytes have become 0xff. Search for the first set bit + * (corresponding to a NUL byte in the original chunk). + */ #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ - ctz a2, a2 + ctz a2, a2 #else - clz a2, a2 + clz a2, a2 #endif - /* - * The first chunk is special: compare against the number of valid - * bytes in this chunk. - */ - srli a0, a2, 3 - bgtu a4, a0, .Ldone - addi a3, a1, SZREG - li a4, -1 - .align 2 - /* - * Our critical loop is 4 instructions and processes data in 4 byte - * or 8 byte chunks. - */ + /* + * The first chunk is special: compare against the number of valid + * bytes in this chunk. + */ + srli a0, a2, 3 + bgtu a4, a0, .Ldone + addi a3, a1, SZREG + li a4, -1 + .align 2 + /* + * Our critical loop is 4 instructions and processes data in 4 byte + * or 8 byte chunks. + */ .Lloop: - REG_L a2, SZREG(a1) - addi a1, a1, SZREG - orc.b a2, a2 - beq a2, a4, .Lloop + REG_L a2, SZREG(a1) + addi a1, a1, SZREG + orc.b a2, a2 + beq a2, a4, .Lloop .Lepilogue: - not a2, a2 + not a2, a2 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ - ctz a2, a2 + ctz a2, a2 #else - clz a2, a2 + clz a2, a2 #endif - sub a1, a1, a3 - add a0, a0, a1 - srli a2, a2, 3 - add a0, a0, a2 + sub a1, a1, a3 + add a0, a0, a1 + srli a2, a2, 3 + add a0, a0, a2 .Ldone: - ret + ret -- ==== strcmp @@ -3821,7 +3826,7 @@ strcmp: addi a0, a0, SZREG addi a1, a1, SZREG beq a2, a3, .Lloop - + # Words don't match, and no null byte in first word. # Get bytes in big-endian order and compare. #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ diff --git a/src/bfloat16.adoc b/src/bfloat16.adoc index ba3e8bc..a5bbb58 100644 --- a/src/bfloat16.adoc +++ b/src/bfloat16.adoc @@ -1,5 +1,5 @@ [[bf16]] -== "BF16" Extensions for for BFloat16-precision Floating-Point, Version 1.0 +== "BF16" Extensions for BFloat16-precision Floating-Point, Version 1.0 [[BF16_introduction]] === Introduction @@ -33,7 +33,7 @@ point, widening multiply-accumulate instructions became much more common. Also, complicated dot product instructions started to show up including those that packed two FP16 numbers in a 32-bit register, multiplied these by another pair of FP16 numbers in another register, added these two products to an FP32 accumulate value in a 3rd register -and returned an FP32 result. +and returned an FP32 result. Experts working in machine learning at Google who continued to work with FP32 values noted that the least significant 16 bits of their mantissas were not always needed @@ -122,7 +122,7 @@ For BF16 these values are: .BF16 parameters [cols = "2,1"] |=== -| Parameter | Value +| Parameter | Value |radix (b)|2 |significand (p)|8 |emax|127 @@ -162,9 +162,9 @@ inputs and they can produce subnormal results. [NOTE] ==== Future floating-point extensions, including those that operate on BF16 values, may chose not to support subnormal numbers. -The comments about supporting subnormal BF16 values are limited to those instructions defined in this specification. +The comments about supporting subnormal BF16 values are limited to those instructions defined in this specification. ==== - + ===== Infinities: Infinities are used to represent values that are too large to be represented by the target format. These are usually produced as a result of overflows (depending on the rounding mode), but can also @@ -174,7 +174,7 @@ Infinities are important for keeping meaningless results from being operated upo ===== NaNs -NaN stands for Not a Number. +NaN stands for Not a Number. There are two types of NaNs: signalling (sNaN) and quiet (qNaN). No computational instruction will ever produce an sNaN; These are only provided as input data. Operating on an sNaN will cause @@ -209,10 +209,10 @@ are more significant than the operand's most significant bit. ===== Rounding Modes: -As is the case with other floating-point instructions, +As is the case with other floating-point instructions, the BF16 instructions support all 5 RISC-V Floating-point rounding modes. These modes can be specified in the `rm` field of scalar instructions -as well as in the `frm` CSR +as well as in the `frm` CSR [%autowidth] .RISC-V Floating Point Rounding Modes @@ -225,9 +225,9 @@ as well as in the `frm` CSR |011 | RUP | Round Up (towards +∞) |100 | RMM | Round to Nearest, ties to Max Magnitude |=== - + As with other scalar floating-point instructions, the rounding mode field -`rm` can also take on the +`rm` can also take on the `DYN` encoding, which indicates that the instruction uses the rounding mode specified in the `frm` CSR. @@ -259,7 +259,7 @@ that is different from the final result rounding. This tininess detection requi exponent were unbounded. This means that the input to the rounder is always a normal number. This is different from the final result rounding where the input to the rounder is a subnormal number when -the value is too small to be represented as a normal number in the target format. +the value is too small to be represented as a normal number in the target format. The two different roundings can result in underflow being signalled for results that are rounded back to the normal range. @@ -289,7 +289,7 @@ The BF16 extensions defined in this specification (i.e., `Zfbfmin`, `Zvfbfwma`) depend on the `"V"` Vector Extension for Application Processors or the `Zve32f` Vector Extension for Embedded Processors. -As stated later in this specification, +As stated later in this specification, there exists a dependency between the newly defined extensions: `Zvfbfwma` depends on `Zfbfmin` and `Zvfbfmin`. @@ -306,9 +306,9 @@ instructions. This extension provides the minimal set of instructions needed to enable scalar support of the BF16 format. It enables BF16 as an interchange format as it provides conversion -between BF16 values and FP32 values. +between BF16 values and FP32 values. -This extension requires the single-precision floating-point extension +This extension depends upon the single-precision floating-point extension `F`, and the `FLH`, `FSH`, `FMV.X.H`, and `FMV.H.X` instructions as defined in the `Zfh` extension. @@ -318,15 +318,15 @@ While conversion instructions tend to include all supported formats, in these ex only support conversion between BF16 and FP32 as we are targeting a special use case. These extensions are intended to support the case where BF16 values are used as reduced precision versions of FP32 values, where use of BF16 provides a two-fold advantage for -storage, bandwidth, and computation. In this use case, the BF16 values are typically -multiplied by each other and accumulated into FP32 sums. +storage, bandwidth, and computation. In this use case, the BF16 values are typically +multiplied by each other and accumulated into FP32 sums. These sums are typically converted to BF16 and then used as subsequent inputs. The operations on the BF16 values can be performed on the CPU or a loosely coupled coprocessor. Subsequent extensions might provide support for native BF16 arithmetic. Such extensions could add additional conversion -instructions to allow all supported formats to be converted to and from BF16. +instructions to allow all supported formats to be converted to and from BF16. ==== [NOTE] @@ -334,7 +334,7 @@ instructions to allow all supported formats to be converted to and from BF16. BF16 addition, subtraction, multiplication, division, and square-root operations can be faithfully emulated by converting the BF16 operands to single-precision, performing the operation using single-precision arithmetic, and then converting back to BF16. Performing -BF16 fused multiply-addition using this method can produce results that differ by 1-ulp +BF16 fused multiply-addition using this method can produce results that differ by 1-ulp on some inputs for the RNE and RMM rounding modes. @@ -342,12 +342,12 @@ Conversions between BF16 and formats larger than FP32 can be emulated. Exact widening conversions from BF16 can be synthesized by first converting to FP32 and then converting from FP32 to the target -precision. +precision. Conversions narrowing to BF16 can be synthesized by first converting to FP32 through a series of halving steps and then -converting from FP32 to the target precision. +converting from FP32 to BF16. As with the fused multiply-addition instruction described above, -this method of converting values to BF16 can be off by 1-ulp +this method of converting values to BF16 can be off by 1-ulp on some inputs for the RNE and RMM rounding modes. ==== @@ -358,7 +358,7 @@ on some inputs for the RNE and RMM rounding modes. |Instruction |FCVT.BF16.S | <<insns-fcvt.bf16.s>> |FCVT.S.BF16 | <<insns-fcvt.s.bf16>> -|FLH | +|FLH | |FSH | |FMV.H.X | |FMV.X.H | @@ -372,8 +372,7 @@ This extension provides the minimal set of instructions needed to enable vector format. It enables BF16 as an interchange format as it provides conversion between BF16 values and FP32 values. -This extension requires either the -"V" extension or the `Zve32f` embedded vector extension. +This extension depends upon `Zve32f` vector extension. [NOTE] ==== @@ -381,15 +380,15 @@ While conversion instructions tend to include all supported formats, in these ex only support conversion between BF16 and FP32 as we are targeting a special use case. These extensions are intended to support the case where BF16 values are used as reduced precision versions of FP32 values, where use of BF16 provides a two-fold advantage for -storage, bandwidth, and computation. In this use case, the BF16 values are typically -multiplied by each other and accumulated into FP32 sums. +storage, bandwidth, and computation. In this use case, the BF16 values are typically +multiplied by each other and accumulated into FP32 sums. These sums are typically converted to BF16 and then used as subsequent inputs. The operations on the BF16 values can be performed on the CPU or a loosely coupled coprocessor. Subsequent extensions might provide support for native BF16 arithmetic. Such extensions could add additional conversion -instructions to allow all supported formats to be converted to and from BF16. +instructions to allow all supported formats to be converted to and from BF16. ==== [NOTE] @@ -397,7 +396,7 @@ instructions to allow all supported formats to be converted to and from BF16. BF16 addition, subtraction, multiplication, division, and square-root operations can be faithfully emulated by converting the BF16 operands to single-precision, performing the operation using single-precision arithmetic, and then converting back to BF16. Performing -BF16 fused multiply-addition using this method can produce results that differ by 1-ulp +BF16 fused multiply-addition using this method can produce results that differ by 1-ulp on some inputs for the RNE and RMM rounding modes. Conversions between BF16 and formats larger than FP32 can be @@ -426,9 +425,9 @@ the desired rounding mode. ==== `Zvfbfwma` - Vector BF16 widening mul-add This extension provides -a vector widening BF16 mul-add instruction that accumulates into FP32. +a vector widening BF16 mul-add instruction that accumulates into FP32. -This extension requires the `Zvfbfmin` extension and the `Zfbfmin` extension. +This extension depends upon the `Zvfbfmin` extension and the `Zfbfmin` extension. [%autowidth] [%header,cols="2,4"] @@ -526,7 +525,7 @@ used in bits 24:20 to indicate that the source is BF16. ==== -Description:: +Description:: Converts a BF16 value to an FP32 value. The conversion is exact. This instruction is similar to other widening @@ -534,7 +533,7 @@ floating-point-to-floating-point conversion instructions. [NOTE] ==== -If the input is normal or infinity, the BF16 encoded value is shifted +If the input is normal or infinity, the BF16 encoded value is shifted to the left by 16 places and the least significant 16 bits are written with 0s. @@ -575,7 +574,7 @@ Encoding:: .... Reserved Encodings:: -* `SEW` is any value other than 16 +* `SEW` is any value other than 16 Arguments:: @@ -593,8 +592,8 @@ Arguments:: -Description:: -Narrowing convert from FP32 to BF16. Round according to the _frm_ register. +Description:: +Narrowing convert from FP32 to BF16. Round according to the _frm_ register. This instruction is similar to `vfncvt.f.f.w` which converts a floating-point value in a 2*SEW-width format into an SEW-width format. @@ -632,7 +631,7 @@ Encoding:: .... Reserved Encodings:: -* `SEW` is any value other than 16 +* `SEW` is any value other than 16 Arguments:: [%autowidth] @@ -647,7 +646,7 @@ Arguments:: | Vd | output | 32 | FP32 Result |=== -Description:: +Description:: Widening convert from BF16 to FP32. The conversion is exact. This instruction is similar to `vfwcvt.f.f.v` which converts a @@ -656,7 +655,7 @@ However, here the SEW-width format is limited to BF16. [NOTE] ==== -If the input is normal or infinity, the BF16 encoded value is shifted +If the input is normal or infinity, the BF16 encoded value is shifted to the left by 16 places and the least significant 16 bits are written with 0s. ==== @@ -670,7 +669,7 @@ Included in: <<zvfbfmin>> // include::insns/vfwmaccbf16.adoc[] // <<< [#insns-vfwmaccbf16, reftext="Vector BF16 widening multiply-accumulate"] -==== vfwmaccbf16 +==== vfwmaccbf16 Synopsis:: Vector BF16 widening multiply-accumulate @@ -708,7 +707,7 @@ Encoding (Vector-Scalar):: .... Reserved Encodings:: -* `SEW` is any value other than 16 +* `SEW` is any value other than 16 Arguments:: [%autowidth] @@ -725,7 +724,7 @@ Arguments:: | Vd | output | 32 | FP32 Result |=== -Description:: +Description:: This instruction performs a widening fused multiply-accumulate operation, where each pair of BF16 values are multiplied and their @@ -738,7 +737,7 @@ and `vs2` and FP32 accumulate value is read from `vd`. The FP32 result is written to the destination register `vd`. The vector-scalar version is similar, but instead of reading elements -from `vs1`, a scalar BF16 value is read from the FPU register `rs1`. +from `vs1`, a scalar BF16 value is read from the FPU register `rs1`. Exceptions: Overflow, Underflow, Inexact, Invalid diff --git a/src/c-st-ext.adoc b/src/c-st-ext.adoc index 97aca5f..05878b0 100644 --- a/src/c-st-ext.adoc +++ b/src/c-st-ext.adoc @@ -4,7 +4,7 @@ This chapter describes the RISC-V standard compressed instruction-set extension, named "C", which reduces static and dynamic code size by adding short 16-bit instruction encodings for common operations. The C -extension can be added to any of the base ISAs (RV32, RV64, RV128), and +extension can be added to any of the base ISAs (RV32I, RV32E, RV64I, RV64E), and we use the generic term "RVC" to cover any of these. Typically, 50%-60% of the RISC-V instructions in a program can be replaced with RVC instructions, resulting in a 25%-30% code-size reduction. @@ -33,28 +33,26 @@ Removing the 32-bit alignment constraint on the original 32-bit instructions allows significantly greater code density. ==== -The compressed instruction encodings are mostly common across RV32C, -RV64C, and RV128C, but as shown in <<rvc-instr-table0, Table 34>>, a few opcodes are used for +The compressed instruction encodings are mostly common across RV32C and +RV64C, but as shown in <<rvc-instr-table0, Table 34>>, a few opcodes are used for different purposes depending on base ISA. For example, the wider -address-space RV64C and RV128C variants require additional opcodes to +address-space RV64C variant requires additional opcodes to compress loads and stores of 64-bit integer values, while RV32C uses the same opcodes to compress loads and stores of single-precision -floating-point values. Similarly, RV128C requires additional opcodes to -capture loads and stores of 128-bit integer values, while these same -opcodes are used for loads and stores of double-precision floating-point -values in RV32C and RV64C. If the C extension is implemented, the +floating-point values. +If the C extension is implemented, the appropriate compressed floating-point load and store instructions must be provided whenever the relevant standard floating-point extension (F and/or D) is also implemented. In addition, RV32C includes a compressed jump and link instruction to compress short-range subroutine calls, -where the same opcode is used to compress ADDIW for RV64C and RV128C. +where the same opcode is used to compress ADDIW for RV64C. -[TIP] +[NOTE] ==== Double-precision loads and stores are a significant fraction of static and dynamic instructions, hence the motivation to include them in the RV32C and RV64C encoding. - + Although single-precision loads and stores are not a significant source of static or dynamic compression for benchmarks compiled for the currently supported ABIs, for microcontrollers that only provide @@ -76,8 +74,8 @@ integer loads and stores. ==== RVC was designed under the constraint that each RVC instruction expands -into a single 32-bit instruction in either the base ISA (RV32I/E, RV64I/E, -or RV128I) or the F and D standard extensions where present. Adopting +into a single 32-bit instruction in either the base ISA (RV32I/E or RV64I/E) +or the F and D standard extensions where present. Adopting this constraint has two main benefits: * Hardware designs can simply expand RVC instructions during decode, @@ -100,7 +98,7 @@ instructions in one C instruction. It is important to note that the C extension is not designed to be a stand-alone ISA, and is meant to be used alongside a base ISA. -[TIP] +[NOTE] ==== Variable-length instruction sets have long been used to improve code density. For example, the IBM Stretch cite:[stretch], developed in the late 1950s, had @@ -197,14 +195,14 @@ _The standard RISC-V calling convention maps the most frequently used floating-point registers to registers `f8` to `f15`, which allows the same register decompression decoding as for integer register numbers._ ==== -((((register source spcifiers, c-ext)))) +((((register source specifiers, c-ext)))) The formats were designed to keep bits for the two register source specifiers in the same place in all instructions, while the destination register field can move. When the full 5-bit destination register specifier is present, it is in the same place as in the 32-bit RISC-V encoding. Where immediates are sign-extended, the sign extension is always from bit 12. Immediate fields have been scrambled, as in the base -specification, to reduce the number of immediate muxes required. +specification, to reduce the number of immediate multiplexers required. [NOTE] ==== The immediate fields are scrambled in the instruction formats instead of @@ -217,7 +215,7 @@ For many RVC instructions, zero-valued immediates are disallowed and encoding space for other instructions requiring fewer operand bits. //[[cr-register]] -//include::images/wavedrom/cr-register.adoc[] +//include::images/wavedrom/cr-register.edn[] //.Compressed 16-bit RVC instructions //(((compressed, 16-bit))) @@ -226,7 +224,7 @@ encoding space for other instructions requiring fewer operand bits. //[%header] [float="center",align="center",cols="1a, 2a",frame="none",grid="none"] |=== -| +| [%autowidth,float="right",align="right",cols="^,^",frame="none",grid="none",options="noheader"] !=== !Format ! Meaning @@ -243,7 +241,7 @@ encoding space for other instructions requiring fewer operand bits. | [float="left",align="left",cols="1,1,1,1,1,1,1",options="noheader"] !=== -2+^!15 14 13 12 2+^!11 10 9 8 7 2+^!6 5 4 3 2 ^!1 0 +^!15 14 13 ^!12 ^!11 10 ^!9 8 7 ^!6 5 ^!4 3 2 ^!1 0 2+^!funct4 2+^!rd/rs1 2+^!rs2 ^! op ^!funct3 ^!imm 2+^!rd/rs1 2+^!imm ^! op ^!funct3 3+^!imm 2+^!rs2 ^! op @@ -261,14 +259,14 @@ encoding space for other instructions requiring fewer operand bits. //[cols="20%,10%,10%,10%,10%,10%,10%,10%,10%"] [float="center",align="center",cols="1a, 1a",frame="none",grid="none"] |=== -| +| [%autowidth,cols="<",frame="none",grid="none",options="noheader"] !=== !RVC Register Number !Integer Register Number -!Integer Register ABI Name +!Integer Register ABI Name !Floating-Point Register Number -!Floating-Point Register ABI Name +!Floating-Point Register ABI Name !=== | @@ -297,7 +295,7 @@ registers. ==== Stack-Pointer-Based Loads and Stores -include::images/wavedrom/c-sp-load-store.adoc[] +include::images/wavedrom/c-sp-load-store.edn[] [[c-sp-load-store]] //.Stack-Pointer-Based Loads and Stores--these instructions use the CI format. @@ -306,21 +304,14 @@ These instructions use the CI format. C.LWSP loads a 32-bit value from memory into register _rd_. It computes an effective address by adding the _zero_-extended offset, scaled by 4, to the stack pointer, `x2`. It expands to `lw rd, offset(x2)`. C.LWSP is -only valid when _rd_≠x0 the code points with _rd_=x0 are reserved. +valid only when _rd_≠`x0`; the code points with _rd_=`x0` are reserved. -C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value +C.LDSP is an RV64C-only instruction that loads a 64-bit value from memory into register _rd_. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, -`x2`. It expands to `ld rd, offset(x2)`. C.LDSP is only valid when -_rd_≠x0 the code points with -_rd_=x0 are reserved. - -C.LQSP is an RV128C-only instruction that loads a 128-bit value from -memory into register _rd_. It computes its effective address by adding -the zero-extended offset, scaled by 16, to the stack pointer, `x2`. It -expands to `lq rd, offset(x2)`. C.LQSP is only valid when -_rd_≠x0 the code points with -_rd_=x0 are reserved. +`x2`. It expands to `ld rd, offset(x2)`. C.LDSP is valid only when +_rd_≠`x0`; the code points with +_rd_=`x0` are reserved. C.FLWSP is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register _rd_. It @@ -334,7 +325,7 @@ register _rd_. It computes its effective address by adding the _zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It expands to `fld rd, offset(x2)`. -include::images/wavedrom/c-sp-load-store-css.adoc[] +include::images/wavedrom/c-sp-load-store-css.edn[] [[c-sp-load-store-css]] //.Stack-Pointer-Based Loads and Stores--these instructions use the CSS format. @@ -344,16 +335,11 @@ C.SWSP stores a 32-bit value in register _rs2_ to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 4, to the stack pointer, `x2`. It expands to `sw rs2, offset(x2)`. -C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in +C.SDSP is an RV64C-only instruction that stores a 64-bit value in register _rs2_ to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It expands to `sd rs2, offset(x2)`. -C.SQSP is an RV128C-only instruction that stores a 128-bit value in -register _rs2_ to memory. It computes an effective address by adding the -_zero_-extended offset, scaled by 16, to the stack pointer, `x2`. It -expands to `sq rs2, offset(x2)`. - C.FSWSP is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register _rs2_ to memory. It computes an effective address by adding the _zero_-extended offset, @@ -385,11 +371,12 @@ physical memory and some could not, which requires a new restart mechanism for partially executed instructions. * Unlike the rest of the RVC instructions, there is no IFD equivalent to Load Multiple and Store Multiple. -* Unlike the rest of the RVC instructions, the compiler would have to be -aware of these instructions to both generate the instructions and to -allocate registers in an order to maximize the chances of the them being -saved and stored, since they would be saved and restored in sequential -order. +* Unlike the rest of the RVC instructions, the compiler would have to be aware +of these load-multiple and store-multiple instructions to both allocate +registers in the expected order and also to schedule the loads and +stores contiguously and in the proper order, to maximize the chances of them +being detected and replaced by an assembler or linker with the equivalent +load-multiple or store-multiple compressed instruction. * Simple microarchitectural implementations will constrain how other instructions can be scheduled around the load and store multiple instructions, leading to a potential performance loss. @@ -409,7 +396,7 @@ attain the greatest code size reduction. ==== Register-Based Loads and Stores [[reg-based-ldnstr]] -include::images/wavedrom/reg-based-ldnstr.adoc[] +include::images/wavedrom/reg-based-ldnstr.edn[] //.Compressed, register-based load and stores--these instructions use the CL format. (((compressed, register-based load and store))) These instructions use the CL format. @@ -419,18 +406,12 @@ C.LW loads a 32-bit value from memory into register _zero_-extended offset, scaled by 4, to the base address in register `_rs1′_`. It expands to `lw rd′, offset(rs1′)`. -C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from +C.LD is an RV64C-only instruction that loads a 64-bit value from memory into register `_rd′_`. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the base address in register `_rs1′_`. It expands to `ld rd′, offset(rs1′)`. -C.LQ is an RV128C-only instruction that loads a 128-bit value from -memory into register `_rd′_`. It computes an effective -address by adding the _zero_-extended offset, scaled by 16, to the base -address in register `_rs1′_`. It expands to -`lq rd′, offset(rs1′)`. - C.FLW is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register `_rd′_`. It computes an effective address by adding the @@ -446,7 +427,7 @@ _zero_-extended offset, scaled by 8, to the base address in register `fld rd′, offset(rs1′)`. [[c-cs-format-ls]] -include::images/wavedrom/c-cs-format-ls.adoc[] +include::images/wavedrom/c-cs-format-ls.edn[] //.Compressed, CS format load and store--these instructions use the CS format. (((compressed, cs-format load and store))) @@ -457,18 +438,12 @@ It computes an effective address by adding the _zero_-extended offset, scaled by 4, to the base address in register `_rs1′_`. It expands to `sw rs2′, offset(rs1′)`. -C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in +C.SD is an RV64C-only instruction that stores a 64-bit value in register `_rs2′_` to memory. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the base address in register `_rs1′_`. It expands to `sd rs2′, offset(rs1′)`. -C.SQ is an RV128C-only instruction that stores a 128-bit value in -register `_rs2′_` to memory. It computes an effective -address by adding the _zero_-extended offset, scaled by 16, to the base -address in register `_rs1′_`. It expands to -`sq rs2′, offset(rs1′)`. - C.FSW is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register `_rs2′_` to memory. It computes an effective address by adding the _zero_-extended @@ -490,7 +465,7 @@ instructions. As with base RVI instructions, the offsets of all RVC control transfer instructions are in multiples of 2 bytes. [[c-cj-format-ls]] -include::images/wavedrom/c-cj-format-ls.adoc[] +include::images/wavedrom/c-cj-format-ls.edn[] //.Compressed, CJ format load and store--these instructions use the CJ format. (((compressed, cj-format load and store))) @@ -507,7 +482,7 @@ the jump (`pc+2`) to the link register, `x1`. C.JAL expands to `jal x1, offset`. [[c-cr-format-ls]] -include::images/wavedrom/c-cr-format-ls.adoc[] +include::images/wavedrom/c-cr-format-ls.edn[] //.Compressed, CR format load and store--these instructions use the CR format. (((compressed, cr-format load and store))) @@ -515,18 +490,18 @@ These instructions use the CR format. C.JR (jump register) performs an unconditional control transfer to the address in register _rs1_. C.JR expands to `jalr x0, 0(rs1)`. C.JR is -only valid when latexmath:[$\textit{rs1}{\neq}\texttt{x0}$]; the code -point with latexmath:[$\textit{rs1}{=}\texttt{x0}$] is reserved. +valid only when _rs1_≠`x0`; the code +point with _rs1_=`x0` is reserved. C.JALR (jump and link register) performs the same operation as C.JR, but additionally writes the address of the instruction following the jump (`pc`+2) to the link register, `x1`. C.JALR expands to -`jalr x1, 0(rs1)`. C.JALR is only valid when -latexmath:[$\textit{rs1}{\neq}\texttt{x0}$]; the code point with -latexmath:[$\textit{rs1}{=}\texttt{x0}$] corresponds to the C.EBREAK +`jalr x1, 0(rs1)`. C.JALR is valid only when +_rs1_≠`x0`; the code point with +_rs1_=`x0` corresponds to the C.EBREAK instruction. -[TIP] +[NOTE] ==== Strictly speaking, C.JALR does not expand exactly to a base RVI instruction as the value added to the PC to form the link address is 2 @@ -535,7 +510,7 @@ bytes is only a very minor change to the base microarchitecture. ==== [[c-cb-format-ls]] -include::images/wavedrom/c-cb-format-ls.adoc[] +include::images/wavedrom/c-cb-format-ls.edn[] //.Compressed, CB format load and store--these instructions use the CB format. (((compressed, cb-format load and store))) @@ -562,24 +537,25 @@ The two constant-generation instructions both use the CI instruction format and can target any integer register. [[c-integer-const-gen]] -include::images/wavedrom/c-integer-const-gen.adoc[] +include::images/wavedrom/c-integer-const-gen.edn[] //.Integer constant generation format. (((compressed, integer constant generation))) C.LI loads the sign-extended 6-bit immediate, _imm_, into register _rd_. -C.LI expands into `addi rd, x0, imm`. C.LI is only valid when -`_rd_≠x0`; the code points with `_rd_=x0` encode HINTs. +C.LI expands into `addi rd, x0, imm`. +The C.LI code points with _rd_=`x0` are HINTs. C.LUI loads the non-zero 6-bit immediate field into bits 17–12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination. C.LUI expands into -`lui rd, imm`. C.LUI is only valid when -latexmath:[$\textit{rd}{\neq}{\left\{\texttt{x0},\texttt{x2}\right\}}$], +`lui rd, imm`. C.LUI is valid only when +_rd_≠`x2`, and when the immediate is not equal to zero. The code points with -_imm_=0 are reserved; the remaining code points with _rd_=`x0` are -HINTs; and the remaining code points with _rd_=`x2` correspond to the +_imm_=0 are reserved. +The code points with _rd_=`x2` and _imm_≠0 correspond to the C.ADDI16SP instruction. +The code points with _rd_=`x0` and _imm_≠0 are HINTs. ==== Integer Register-Immediate Operations @@ -587,30 +563,32 @@ These integer register-immediate operations are encoded in the CI format and perform operations on an integer register and a 6-bit immediate. [[c-integer-register-immediate]] -include::images/wavedrom/c-int-reg-immed.adoc[] +include::images/wavedrom/c-int-reg-immed.edn[] //.Integer register-immediate format. (((compressed, integer register-immediate))) C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in register _rd_ then writes the result to _rd_. C.ADDI expands into -`addi rd, rd, imm`. C.ADDI is only valid when -`_rd_≠x0` and `_imm_≠0`. The code -points with `_rd_=x0` encode the C.NOP instruction; the remaining code -points with _imm_=0 encode HINTs. +`addi rd, rd, imm`. +The code points with _rd_≠0 and _imm_=0 are HINTs. +The code points with _rd_=`x0` encode the C.NOP instruction, of +which the code points with _imm_≠0 are HINTs. + -C.ADDIW is an RV64C/RV128C-only instruction that performs the same +C.ADDIW is an RV64C-only instruction that performs the same computation but produces a 32-bit result, then sign-extends result to 64 bits. C.ADDIW expands into `addiw rd, rd, imm`. The immediate can be zero for C.ADDIW, where this corresponds to `sext.w rd`. C.ADDIW is -only valid when `_rd_≠x0`; the code points with -`_rd_=x0` are reserved. +valid only when _rd_≠`x0`; the code points with +_rd_=`x0` are reserved. -C.ADDI16SP shares the opcode with C.LUI, but has a destination field of +C.ADDI16SP (add immediate to stack pointer) +shares the opcode with C.LUI, but has a destination field of `x2`. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to the value in the stack pointer (`sp=x2`), where the immediate is scaled to -represent multiples of 16 in the range (-512,496). C.ADDI16SP is used to +represent multiples of 16 in the range [-512, 496]. C.ADDI16SP is used to adjust the stack pointer in procedure prologues and epilogues. It -expands into `addi x2, x2, nzimm[9:4]`. C.ADDI16SP is only valid when +expands into `addi x2, x2, nzimm[9:4]`. C.ADDI16SP is valid only when _nzimm_≠0; the code point with _nzimm_=0 is reserved. [NOTE] @@ -620,53 +598,48 @@ always 16-byte aligned. ==== [[c-ciw]] -include::images/wavedrom/c-ciw.adoc[] +include::images/wavedrom/c-ciw.edn[] //.CIW format. (((compressed, CIW))) -C.ADDI4SPN is a CIW-format instruction that adds a _zero_-extended +C.ADDI4SPN (add immediate to stack pointer, non-destructive) +is a CIW-format instruction that adds a _zero_-extended non-zero immediate, scaled by 4, to the stack pointer, `x2`, and writes the result to `rd′`. This instruction is used to generate pointers to stack-allocated variables, and expands to -`addi rd′, x2, nzuimm[9:2]`. C.ADDI4SPN is only valid when +`addi rd′, x2, nzuimm[9:2]`. C.ADDI4SPN is valid only when _nzuimm_≠0; the code points with _nzuimm_=0 are reserved. [[c-ci]] -include::images/wavedrom/c-ci.adoc[] +include::images/wavedrom/c-ci.edn[] //.CI format. (((compressed, CI))) C.SLLI is a CI-format instruction that performs a logical left shift of the value in register _rd_ then writes the result to _rd_. The shift -amount is encoded in the _shamt_ field. For RV128C, a shift amount of -zero is used to encode a shift of 64. C.SLLI expands into -`slli rd, rd, shamt[5:0]`, except for RV128C with `shamt=0`, which expands to -`slli rd, rd, 64`. +amount is encoded in the _shamt_ field. +C.SLLI expands into `slli rd, rd, shamt[5:0]`. + +The C.SLLI code points with _shamt_=0 or with _rd_=`x0` are HINTs. For RV32C, _shamt[5]_ must be zero; the code points with _shamt[5]_=1 -are designated for custom extensions. For RV32C and RV64C, the shift -amount must be non-zero; the code points with _shamt_=0 are HINTs. For -all base ISAs, the code points with `_rd_=x0` are HINTs, except those -with _shamt[5]_=1 in RV32C. +are designated for custom extensions. [[c-srli-srai]] -include::images/wavedrom/c-srli-srai.adoc[] + +include::images/wavedrom/c-srli-srai.edn[] //.C-SRLI-SRAI format. (((compressed, C.SRLI, C.SRAI))) C.SRLI is a CB-format instruction that performs a logical right shift of the value in register _rd′_ then writes the result to _rd′_. The shift amount is encoded in the _shamt_ field. -For RV128C, a shift amount of zero is used to encode a shift of 64. -Furthermore, the shift amount is sign-extended for RV128C, and so the -legal shift amounts are 1-31, 64, and 96-127. C.SRLI expands into -`srli rd′, rd′, shamt`, except for -RV128C with `shamt=0`, which expands to -`srli rd′, rd′, 64`. +C.SRLI expands into `srli rd′, rd′, shamt`. + +The C.SRLI code points with _shamt_=0 are HINTs. For RV32C, _shamt[5]_ must be zero; the code points with _shamt[5]_=1 -are designated for custom extensions. For RV32C and RV64C, the shift -amount must be non-zero; the code points with _shamt_=0 are HINTs. +are designated for custom extensions. C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic right shift. C.SRAI expands to @@ -677,16 +650,10 @@ arithmetic right shift. C.SRAI expands to Left shifts are usually more frequent than right shifts, as left shifts are frequently used to scale address values. Right shifts have therefore been granted less encoding space and are placed in an encoding quadrant -where all other immediates are sign-extended. For RV128, the decision -was made to have the 6-bit shift-amount immediate also be sign-extended. -Apart from reducing the decode complexity, we believe right-shift -amounts of 96-127 will be more useful than 64-95, to allow extraction of -tags located in the high portions of 128-bit address pointers. We note -that RV128C will not be frozen at the same point as RV32C and RV64C, to -allow evaluation of typical usage of 128-bit address-space codes. +where all other immediates are sign-extended. ==== [[c-andi]] -include::images/wavedrom/c-andi.adoc[] +include::images/wavedrom/c-andi.edn[] //.C.ANDI format (((compressed, C.ANDI))) @@ -698,16 +665,16 @@ expands to `andi rd′, rd′, imm`. ==== Integer Register-Register Operations [[c-cr]] -include::images/wavedrom/c-int-reg-to-reg-cr-format.adoc[] +include::images/wavedrom/c-int-reg-to-reg-cr-format.edn[] //C.CR format ((((compressed. C.CR)))) These instructions use the CR format. C.MV copies the value in register _rs2_ into register _rd_. C.MV expands -into `add rd, x0, rs2`. C.MV is only valid when -`rs2≠x0` the code points with `rs2=x0` correspond to the C.JR instruction. The code points with `rs2≠x0` and `rd=x0` are HINTs. +into `add rd, x0, rs2`. C.MV is valid only when +_rs2_≠`x0`; the code points with _rs2_=`x0` correspond to the C.JR instruction. The code points with _rs2_≠`x0` and _rd_=`x0` are HINTs. -[TIP] +[NOTE] ==== _C.MV expands to a different instruction than the canonical MV pseudoinstruction, which instead uses ADDI. Implementations that handle @@ -718,11 +685,11 @@ hardware cost._ C.ADD adds the values in registers _rd_ and _rs2_ and writes the result to register _rd_. C.ADD expands into `add rd, rd, rs2`. C.ADD is only -valid when `rs2≠x0` the code points with `rs2=x0` correspond to the C.JALR -and C.EBREAK instructions. The code points with `rs2≠x0` and rd=x0 are HINTs. +valid when _rs2_≠`x0`; the code points with _rs2_=`x0` correspond to the C.JALR +and C.EBREAK instructions. The code points with _rs2_≠`x0` and _rd_=`x0` are HINTs. [[c-ca]] -include::images/wavedrom/c-int-reg-to-reg-ca-format.adoc[] +include::images/wavedrom/c-int-reg-to-reg-ca-format.edn[] //C.CA format ((((compressed. C.CA)))) @@ -731,34 +698,34 @@ These instructions use the CA format. `C.AND` computes the bitwise `AND` of the values in registers _rd′_ and _rs2′_, then writes the result to register _rd′_. `C.AND` expands into -*`_and rd′, rd′, rs2′_`*. +`and rd′, rd′, rs2′`. `C.OR` computes the bitwise `OR` of the values in registers _rd′_ and _rs2′_, then writes the result to register _rd′_. `C.OR` expands into -*`_or rd′, rd′, rs2′_`*. +`or rd′, rd′, rs2′`. `C.XOR` computes the bitwise `XOR` of the values in registers _rd′_ and _rs2′_, then writes the result to register _rd′_. `C.XOR` expands into -*`_xor rd′, rd′, rs2′_`*. +`xor rd′, rd′, rs2′`. `C.SUB` subtracts the value in register _rs2′_ from the value in register _rd′_, then writes the result to register _rd′_. `C.SUB` expands into -*`_sub rd′, rd′, rs2′_`*. +`sub rd′, rd′, rs2′`. -`C.ADDW` is an RV64C/RV128C-only instruction that adds the values in +`C.ADDW` is an RV64C-only instruction that adds the values in registers _rd′_ and _rs2′_, then sign-extends the lower 32 bits of the sum before writing the result to register _rd′_. `C.ADDW` expands into -*`_addw rd′, rd′, rs2′_`*. +`addw rd′, rd′, rs2′`. -`C.SUBW` is an RV64C/RV128C-only instruction that subtracts the value in +`C.SUBW` is an RV64C-only instruction that subtracts the value in register _rs2′_ from the value in register _rd′_, then sign-extends the lower 32 bits of the difference before writing the result to register _rd′_. -`C.SUBW` expands into *`_subw rd′, rd′, rs2′_`*. +`C.SUBW` expands into `subw rd′, rd′, rs2′`. [NOTE] ==== @@ -771,7 +738,7 @@ improvement in static and dynamic compression. ==== Defined Illegal Instruction [[c-def-illegal-inst]] -include::images/wavedrom/c-def-illegal-inst.adoc[] +include::images/wavedrom/c-def-illegal-inst.edn[] ((((compressed. C.DIINST)))) A 16-bit instruction with all bits zero is permanently reserved as an @@ -791,18 +758,18 @@ non-existent memory regions. ==== NOP Instruction [[c-nop-instr]] -include::images/wavedrom/c-nop-instr.adoc[] +include::images/wavedrom/c-nop-instr.edn[] ((((compressed. C.NOPINSTR)))) `C.NOP` is a CI-format instruction that does not change any user-visible state, except for advancing the `pc` and incrementing any applicable -performance counters. `C.NOP` expands to `nop`. `C.NOP` is only valid when -_imm_=0; the code points with _imm_≠0 encode HINTs. +performance counters. `C.NOP` expands to `nop`. The `C.NOP` code points +with _imm_≠0 encode HINTs. ==== Breakpoint Instruction [[c-breakpoint-instr]] -include::images/wavedrom/c-breakpoint-instr.adoc[] +include::images/wavedrom/c-breakpoint-instr.edn[] ((((compressed. C.BREAKPOINTINSTR)))) Debuggers can use the `C.EBREAK` instruction, which expands to `ebreak`, @@ -883,17 +850,13 @@ no standard HINTs will ever be defined in this subspace. |C.ADD | _rd_=`x0`, _rs2_≠`x0`, _rs2_≠`x2-x5` | 27 -|C.ADD | _rd_=`x0`, _rs2_≠`x2-x5` |4|(rs2=x2) C.NTL.P1 (rs2=x3) C.NTL.PALL (rs2=x4) C.NTL.S1 (rs2=x5) C.NTL.ALL - -|C.SLLI |_rd_=`x0`, _imm_≠0 |31 (RV32), 63 (RV64/128) .5+.^|_Designated for custom use_ - -|C.SLLI64 | _rd_=_x0_ |1 +|C.ADD | _rd_=`x0`, _rs2_=`x2-x5` |4|(rs2=x2) C.NTL.P1 (rs2=x3) C.NTL.PALL (rs2=x4) C.NTL.S1 (rs2=x5) C.NTL.ALL -|C.SLLI64 | _rd_≠`x0`, RV32 and RV64 only |31 +|C.SLLI |_rd_=`x0` or _imm_=0 |63 (RV32), 95 (RV64) .3+.^|_Designated for custom use_ -|C.SRLI64 | RV32 and RV64 only |8 +|C.SRLI | _imm_=0 |8 -|C.SRAI64 | RV32 and RV64 only |8 +|C.SRAI | _imm_=0 |8 |=== === RVC Instruction Set Listings @@ -907,7 +870,7 @@ valid for certain operands; when invalid, they are marked either _RES_ to indicate that the opcode is reserved for future standard extensions; _Custom_ to indicate that the opcode is designated for custom extensions; or _HINT_ to indicate that the opcode is reserved for -microarchitectural hints (see <<rvc-hints, Section 18.7>>). +microarchitectural hints (see <<rvc-hints>>). <<< @@ -919,35 +882,23 @@ microarchitectural hints (see <<rvc-hints, Section 18.7>>). inst[1:0] ^.^s|000 ^.^s|001 ^.^s|010 ^.^s|011 ^.^s|100 ^.^s|101 ^.^s|110 ^.^s|111 | 2+>.^|00 .^|ADDI4SPN ^.^|FLD + -FLD + -LQ ^.^| LW ^.^| FLW + -LD + +FLD ^.^| LW ^.^| FLW + LD ^.^| _Reserved_ ^.^| FSD + -FSD + -SQ ^.^| SW ^.^| FSW + -SD + -SD +FSD ^.^| SW ^.^| FSW + +SD ^.^| RV32 + -RV64 + -RV128 +RV64 2+>.^|01 ^.^|ADDI ^.^|JAL + -ADDIW + ADDIW ^.^|LI ^.^|LUI/ADDI16SP ^.^|MISC-ALU ^.^|J ^.^|BEQZ ^.^|BNEZ ^.^|RV32 + -RV64 + -RV128 +RV64 2+>.^|10 ^.^|SLLI ^.^|FLDSP + -FLDSP + -LQSP ^.^|LWSP ^.^|FLWSP + -LDSP + +FLDSP ^.^|LWSP ^.^|FLWSP + LDSP ^.^|J[AL]R/MV/ADD ^.^|FSDSP + -FSDSP + -SQSP ^.^|SWSP ^.^|FSWSP + -SDSP + +FSDSP ^.^|SWSP ^.^|FSWSP + SDSP ^.^|RV32 + -RV64 + -RV128 +RV64 2+>.^|11 9+^|>16b |=== @@ -956,15 +907,15 @@ RV128 [[rvc-instr-table0]] .Instruction listing for RVC, Quadrant 0 -include::images/bytefield/rvc-instr-quad0.adoc[] +include::images/bytefield/rvc-instr-quad0.edn[] //include::images/bytefield/rvc-instr-quad0.png[] [[rvc-instr-table1]] .Instruction listing for RVC, Quadrant 1 -include::images/bytefield/rvc-instr-quad1.adoc[] +include::images/bytefield/rvc-instr-quad1.edn[] //include::images/bytefield/rvc-instr-quad1.png[] [[rvc-instr-table2]] .Instruction listing for RVC, Quadrant 2 -include::images/bytefield/rvc-instr-quad2.adoc[] +include::images/bytefield/rvc-instr-quad2.edn[] //include::images/bytefield/rvc-instr-quad2.png[] diff --git a/src/cmo.adoc b/src/cmo.adoc index 710106e..4a446ae 100644 --- a/src/cmo.adoc +++ b/src/cmo.adoc @@ -193,7 +193,7 @@ caches_, and each coherent cache has the following behaviors, assuming all operations are performed by the agents in a set of coherent agents: * A coherent cache is permitted to allocate and deallocate copies of a cache - block and perform read and write transfers as described in <<#memory-caches>> + block and perform read and write transfers as described in <<#memory-caches>> * A coherent cache is permitted to perform a write transfer to memory provided that a store operation has modified the data in the cache block since the most @@ -282,7 +282,7 @@ load. In particular, an additional condition is added to the Load Value Axiom: global memory order, either the value of the latest store to _x_ that precedes the latest clean or flush operation on _x_ or the value of any store to _x_ that both precedes _i_ and succeeds the latest clean or flush operation on _x_ - that precedes _i_ + that precedes _i_ . The value of any store to _x_ by a non-coherent agent regardless of the above conditions @@ -302,17 +302,17 @@ described in the <<#csr_state>> section, or due to the address translation and protection mechanisms. The trapping behavior of CMO instructions is described in the following sections. -===== Illegal Instruction and Virtual Instruction Exceptions +===== Illegal-Instruction and Virtual-Instruction Exceptions Cache-block management instructions and cache-block zero instructions may raise -illegal instruction exceptions or virtual instruction exceptions depending on +illegal-instruction exceptions or virtual-instruction exceptions depending on the current privilege mode and the state of the CMO control registers described in the <<#csr_state>> section. -Cache-block prefetch instructions raise neither illegal instruction exceptions -nor virtual instruction exceptions. +Cache-block prefetch instructions raise neither illegal-instruction exceptions +nor virtual-instruction exceptions. -===== Page Fault, Guest-Page Fault, and Access Fault Exceptions +===== Page-Fault, Guest-Page-Fault, and Access-Fault Exceptions Similar to load and store instructions, CMO instructions are explicit memory access instructions that compute an effective address. The effective address is @@ -352,8 +352,8 @@ instruction is permitted to access the physical addresses, but an instruction fetch is permitted to access the physical addresses, whether a cache-block management instruction is permitted to access the cache block is UNSPECIFIED. If access to the cache block is not permitted, a cache-block management instruction -raises a store page fault or store guest-page fault exception if address -translation does not permit any access or raises a store access fault exception +raises a store page-fault or store guest-page-fault exception if address +translation does not permit any access or raises a store access-fault exception otherwise. During address translation, the instruction also checks the accessed bit and may either raise an exception or set the bit as required. @@ -370,9 +370,9 @@ A cache-block zero instruction is permitted to access the specified cache block whenever a store instruction is permitted to access the corresponding physical addresses and when the PMAs indicate that cache-block zero instructions are a supported access type. If access to the cache block is not permitted, a -cache-block zero instruction raises a store page fault or store guest-page fault +cache-block zero instruction raises a store page-fault or store guest-page-fault exception if address translation does not permit write access or raises a store -access fault exception otherwise. During address translation, the instruction +access-fault exception otherwise. During address translation, the instruction also checks the accessed and dirty bits and may either raise an exception or set the bits as required. @@ -384,9 +384,9 @@ exceptions and shall not access any caches or memory. During address translation, the instruction does _not_ check the accessed and dirty bits and neither raises an exception nor sets the bits. -When a page fault, guest-page fault, or access fault exception is taken, the -relevant *tval CSR is written with the faulting effective address (i.e. the same -faulting address value as for other causes of these exceptions). +When a page-fault, guest-page-fault, or access-fault exception is taken, the +relevant *tval CSR is written with the faulting effective address (i.e. the +value of _rs1_). [NOTE] ==== @@ -400,9 +400,9 @@ management instructions like store/AMO instructions, so store/AMO exceptions are appropriate for these instructions, regardless of the permissions required._ ==== -===== Address Misaligned Exceptions +===== Address-Misaligned Exceptions -CMO instructions do _not_ generate address misaligned exceptions. +CMO instructions do _not_ generate address-misaligned exceptions. ===== Breakpoint Exceptions and Debug Mode Entry @@ -419,15 +419,15 @@ the following common trigger module behaviors:_ * Type 2 address/data match triggers, i.e. `tdata1.type=2`, should be unsupported - + * The size of a memory access equals the size of the cache block accessed, and the compare values follow from the addresses of the NAPOT memory region corresponding to the cache block containing the effective address - + * Unless an encoding for a cache block is added to the `mcontrol6.size` field, an address trigger should only match a memory access from a CBO instruction if `mcontrol6.size=0` - + _If the Zicbom extension is implemented, this specification recommends the following additional trigger module behaviors:_ @@ -477,11 +477,11 @@ instructions and cache-block zero instructions: [wavedrom, , svg] .... {reg:[ - { bits: 7, name: 'opcode'}, - { bits: 5, name: 0x0 }, - { bits: 3, name: 'funct3'}, - { bits: 5, name: 0x0}, - { bits: 12, name: 'operation'}, + { bits: 7, name: 'opcode'}, + { bits: 5, name: 0x0 }, + { bits: 3, name: 'funct3'}, + { bits: 5, name: 0x0}, + { bits: 12, name: 'operation'}, ]} .... @@ -533,13 +533,6 @@ mechanism. [#csr_state,reftext="Control and Status Register State"] === Control and Status Register State -[NOTE] -==== -_The CMO extensions rely on state in {csrname} CSRs that will be defined in a -future update to the privileged architecture. If this CSR update is not -ratified, the CMO extension will define its own CSRs._ -==== - Three CSRs control the execution of CMO instructions: * `m{csrname}` @@ -559,12 +552,12 @@ generic format: |=== | Bits | Name | Description -| [5:4] | `CBIE` | Cache Block Invalidate instruction Enable +| [5:4] | `CBIE` | Cache Block Invalidate instruction Enable. *WARL*. Enables the execution of the cache block invalidate instruction, `CBO.INVAL`, in a lower privilege mode: -* `00`: The instruction raises an illegal instruction or virtual instruction +* `00`: The instruction raises an illegal-instruction or virtual-instruction exception * `01`: The instruction is executed and performs a flush operation * `10`: _Reserved_ @@ -575,7 +568,7 @@ a lower privilege mode: Enables the execution of the cache block clean instruction, `CBO.CLEAN`, and the cache block flush instruction, `CBO.FLUSH`, in a lower privilege mode: -* `0`: The instruction raises an illegal instruction or virtual instruction +* `0`: The instruction raises an illegal-instruction or virtual-instruction exception * `1`: The instruction is executed @@ -584,7 +577,7 @@ cache block flush instruction, `CBO.FLUSH`, in a lower privilege mode: Enables the execution of the cache block zero instruction, `CBO.ZERO`, in a lower privilege mode: -* `0`: The instruction raises an illegal instruction or virtual instruction +* `0`: The instruction raises an illegal-instruction or virtual-instruction exception * `1`: The instruction is executed @@ -593,24 +586,24 @@ lower privilege mode: The x{csrname} registers control CBO instruction execution based on the current privilege mode and the state of the appropriate CSRs, as detailed below. -A `CBO.INVAL` instruction executes or raises either an illegal instruction -exception or a virtual instruction exception based on the state of the +A `CBO.INVAL` instruction executes or raises either an illegal-instruction +exception or a virtual-instruction exception based on the state of the `x{csrname}.CBIE` fields: [source,sail,subs="attributes+"] -- -// illegal instruction exceptions +// illegal-instruction exceptions if (((priv_mode != M) && (m{csrname}.CBIE == 00)) || ((priv_mode == U) && (s{csrname}.CBIE == 00))) { - <raise illegal instruction exception> + <raise illegal-instruction exception> } -// virtual instruction exceptions +// virtual-instruction exceptions else if (((priv_mode == VS) && (h{csrname}.CBIE == 00)) || ((priv_mode == VU) && ((h{csrname}.CBIE == 00) || (s{csrname}.CBIE == 00)))) { - <raise virtual instruction exception> + <raise virtual-instruction exception> } // execute instruction else @@ -647,23 +640,23 @@ either traps or performs a flush operation in a lower privileged level._ ==== A `CBO.CLEAN` or `CBO.FLUSH` instruction executes or raises an illegal -instruction or virtual instruction exception based on the state of the +instruction or virtual-instruction exception based on the state of the `x{csrname}.CBCFE` bits: [source,sail,subs="attributes+"] -- -// illegal instruction exceptions +// illegal-instruction exceptions if (((priv_mode != M) && !m{csrname}.CBCFE) || ((priv_mode == U) && !s{csrname}.CBCFE)) { - <raise illegal instruction exception> + <raise illegal-instruction exception> } -// virtual instruction exceptions +// virtual-instruction exceptions else if (((priv_mode == VS) && !h{csrname}.CBCFE) || ((priv_mode == VU) && !(h{csrname}.CBCFE && s{csrname}.CBCFE))) { - <raise virtual instruction exception> + <raise virtual-instruction exception> } // execute instruction else @@ -673,23 +666,23 @@ else -- -Finally, a `CBO.ZERO` instruction executes or raises an illegal instruction or -virtual instruction exception based on the state of the `x{csrname}.CBZE` bits: +Finally, a `CBO.ZERO` instruction executes or raises an illegal-instruction or +virtual-instruction exception based on the state of the `x{csrname}.CBZE` bits: [source,sail,subs="attributes+"] -- -// illegal instruction exceptions +// illegal-instruction exceptions if (((priv_mode != M) && !m{csrname}.CBZE) || ((priv_mode == U) && !s{csrname}.CBZE)) { - <raise illegal instruction exception> + <raise illegal-instruction exception> } -// virtual instruction exceptions +// virtual-instruction exceptions else if (((priv_mode == VS) && !h{csrname}.CBZE) || ((priv_mode == VU) && !(h{csrname}.CBZE && s{csrname}.CBZE))) { - <raise virtual instruction exception> + <raise virtual-instruction exception> } // execute instruction else @@ -722,14 +715,14 @@ following operations: non-coherent agents visible to the set of coherent agents at a point common to both sets by deallocating all copies of a cache block from the set of coherent caches up to that point - + * A clean operation makes data from store operations performed by the set of coherent agents visible to a set of non-coherent agents at a point common to both sets by performing a write transfer of a copy of a cache block to that point provided a coherent agent performed a store operation that modified the data in the cache block since the previous invalidate, clean, or flush operation on the cache block - + * A flush operation atomically performs a clean operation followed by an invalidate operation @@ -877,11 +870,11 @@ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, - { bits: 5, name: 0x0 }, - { bits: 3, name: 0x2, attr: ['CBO'] }, - { bits: 5, name: 'rs1', attr: ['base'] }, - { bits: 12, name: 0x001, attr: ['CBO.CLEAN'] }, + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x001, attr: ['CBO.CLEAN'] }, ]} .... @@ -920,22 +913,26 @@ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, - { bits: 5, name: 0x0 }, - { bits: 3, name: 0x2, attr: ['CBO'] }, - { bits: 5, name: 'rs1', attr: ['base'] }, - { bits: 12, name: 0x002, attr: ['CBO.FLUSH'] }, + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x002, attr: ['CBO.FLUSH'] }, ]} .... Description:: A *cbo.flush* instruction performs a flush operation on the cache block whose -effective address is the base address specified in _rs1_. The offset operand may -be omitted; otherwise, any expression that computes the offset shall evaluate to -zero. The instruction operates on the set of coherent caches accessed by the +that contains the address specified in _rs1_. It is not required that _rs1_ is +aligned to the size of a cache block. On faults, the faulting virtual address +is considered to be the value in rs1, rather than the base address of the cache +block. The instruction operates on the set of coherent caches accessed by the agent executing the instruction. +The assembly _offset_ operand may be omitted. If it isn't then any expression +that computes the offset shall evaluate to zero. + Operation:: [source,sail] -- @@ -955,23 +952,28 @@ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, - { bits: 5, name: 0x0 }, - { bits: 3, name: 0x2, attr: ['CBO'] }, - { bits: 5, name: 'rs1', attr: ['base'] }, - { bits: 12, name: 0x000, attr: ['CBO.INVAL'] }, + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x000, attr: ['CBO.INVAL'] }, ]} .... Description:: A *cbo.inval* instruction performs an invalidate operation on the cache block -whose effective address is the base address specified in _rs1_. The offset -operand may be omitted; otherwise, any expression that computes the offset shall -evaluate to zero. The instruction operates on the set of coherent caches -accessed by the agent executing the instruction. Depending on CSR programming, -the instruction may perform a flush operation instead of an invalidate -operation. +that contains the address specified in _rs1_. It is not required that _rs1_ is +aligned to the size of a cache block. On faults, the faulting virtual address +is considered to be the value in rs1, rather than the base address of the cache +block. The instruction operates on the set of coherent caches accessed by the +agent executing the instruction. + +Depending on CSR programming, the instruction may perform a flush operation +instead of an invalidate operation. + +The assembly _offset_ operand may be omitted. If it isn't then any expression +that computes the offset shall evaluate to zero. [NOTE] ==== @@ -1000,21 +1002,25 @@ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, - { bits: 5, name: 0x0 }, - { bits: 3, name: 0x2, attr: ['CBO'] }, - { bits: 5, name: 'rs1', attr: ['base'] }, - { bits: 12, name: 0x004, attr: ['CBO.ZERO'] }, + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x004, attr: ['CBO.ZERO'] }, ]} .... Description:: A *cbo.zero* instruction performs stores of zeros to the full set of bytes -corresponding to the cache block whose effective address is the base address -specified in _rs1_. The offset operand may be omitted; otherwise, any expression -that computes the offset shall evaluate to zero. An implementation may or may -not update the entire set of bytes atomically. +corresponding to the cache block that contains the address specified in _rs1_. +It is not required that _rs1_ is aligned to the size of a cache block. On +faults, the faulting virtual address is considered to be the value in rs1, +rather than the base address of the cache block. An implementation may or +may not update the entire set of bytes atomically. + +The assembly _offset_ operand may be omitted. If it isn't then any expression +that computes the offset shall evaluate to zero. Operation:: [source,sail] @@ -1036,12 +1042,12 @@ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, - { bits: 3, name: 0x6, attr: ['ORI'] }, - { bits: 5, name: 'rs1', attr: ['base'] }, - { bits: 5, name: 0x0, attr: ['PREFETCH.I'] }, - { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x0, attr: ['PREFETCH.I'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, ]} .... @@ -1079,12 +1085,12 @@ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, - { bits: 3, name: 0x6, attr: ['ORI'] }, - { bits: 5, name: 'rs1', attr: ['base'] }, - { bits: 5, name: 0x1, attr: ['PREFETCH.R'] }, - { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x1, attr: ['PREFETCH.R'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, ]} .... @@ -1122,12 +1128,12 @@ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 7, name: 0x13, attr: ['OP-IMM'] }, - { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, - { bits: 3, name: 0x6, attr: ['ORI'] }, - { bits: 5, name: 'rs1', attr: ['base'] }, - { bits: 5, name: 0x3, attr: ['PREFETCH.W'] }, - { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x3, attr: ['PREFETCH.W'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, ]} .... diff --git a/src/colophon.adoc b/src/colophon.adoc index b7b52af..66fc078 100644 --- a/src/colophon.adoc +++ b/src/colophon.adoc @@ -1,6 +1,13 @@ [colophon] -= Preface - +== Preface +// Had to make the above a level 1 heading (two equals signs) to avoid error when building +// the ISA manual as a book with other "parts". This is opposite to what the adoc says to do +// but otherwise asciidoctor creates the error message: +// +// asciidoctor: ERROR: ext/riscv-isa-manual/src/colophon.adoc: line 2: invalid part, must have at least one section (e.g., chapter, appendix, etc.) +// +// See asciidoctor doc which seems wrong: https://docs.asciidoctor.org/asciidoc/latest/sections/colophon/ +[.big]*_Preface to Document Version 20250508_* This document describes the RISC-V unprivileged architecture. @@ -18,7 +25,6 @@ The document contains the following versions of the RISC-V ISA modules: |*RV32E* |*2.0* |*Ratified* |*RV64E* |*2.0* |*Ratified* |*RV64I* |*2.1* |*Ratified* -|_RV128I_ |_1.7_ |_Draft_ h|Extension h|Version h|Status @@ -29,11 +35,13 @@ h|Extension h|Version h|Status |*Zihintpause* |*2.0* |*Ratified* |*Zimop* | *1.0* | *Ratified* |*Zicond* | *1.0* |*Ratified* +|*Zilsd* | *1.0* |*Ratified* |*M* |*2.0* |*Ratified* |*Zmmul* |*1.0* |*Ratified* |*A* |*2.1* |*Ratified* |*Zawrs* |*1.01* |*Ratified* -|*Zacas* |*1.0* |*Ratifed* +|*Zacas* |*1.0* |*Ratified* +|*Zabha* |*1.0* |*Ratified* |*RVWMO* |*2.0* |*Ratified* |*Ztso* |*1.0* |*Ratified* |*CMO* |*1.0* |*Ratified* @@ -42,6 +50,7 @@ h|Extension h|Version h|Status |*Q* |*2.2* |*Ratified* |*Zfh* |*1.0* |*Ratified* |*Zfhmin* |*1.0* |*Ratified* +|*BF16* |*1.0* |*Ratified* |*Zfa* |*1.0* |*Ratified* |*Zfinx* |*1.0* |*Ratified* |*Zdinx* |*1.0* |*Ratified* @@ -49,8 +58,80 @@ h|Extension h|Version h|Status |*Zhinxmin* |*1.0* |*Ratified* |*C* |*2.0* |*Ratified* |*Zce* |*1.0* |*Ratified* +|*Zclsd* |*1.0* |*Ratified* +|*B* |*1.0* |*Ratified* +|*V* |*1.0* |*Ratified* +|*Zbkb* |*1.0* |*Ratified* +|*Zbkc* |*1.0* |*Ratified* +|*Zbkx* |*1.0* |*Ratified* +|*Zk* |*1.0* |*Ratified* +|*Zks* |*1.0* |*Ratified* +|*Zvbb* |*1.0* |*Ratified* +|*Zvbc* |*1.0* |*Ratified* +|*Zvkg* |*1.0* |*Ratified* +|*Zvkned* |*1.0* |*Ratified* +|*Zvknhb* |*1.0* |*Ratified* +|*Zvksed* |*1.0* |*Ratified* +|*Zvksh* |*1.0* |*Ratified* +|*Zvkt* |*1.0* |*Ratified* +|*Zicfiss* |*1.0* |*Ratified* +|*Zicfilp* |*1.0* |*Ratified* +|=== + +The changes in this version of the document include: + +* The inclusion of all ratified extensions through May 2025. +* Removal of all unratified material. +* Addition of the BFloat16-preceision Floating Point extension. +* Addition of the Zabha extension for Byte and Halfword Atomic Memory Operations. + + +[.big]*_Preface to Document Version 20240411_* + +This document describes the RISC-V unprivileged architecture. +It contains the following versions of the RISC-V ISA modules: + +[%autowidth,float="center",align="center",cols="^,<,^",options="header"] +|=== +|Base |Version |Status +|*RV32I* |*2.1* |*Ratified* +|*RV32E* |*2.0* |*Ratified* +|*RV64E* |*2.0* |*Ratified* +|*RV64I* |*2.1* |*Ratified* + +h|Extension h|Version h|Status + +|*Zifencei* |*2.0* |*Ratified* +|*Zicsr* |*2.0* |*Ratified* +|*Zicntr* |*2.0* |*Ratified* +|*Zihintntl* |*1.0* |*Ratified* +|*Zihintpause* |*2.0* |*Ratified* +|*Zimop* | *1.0* | *Ratified* +|*Zicond* | *1.0* |*Ratified* +|*Zilsd* | *1.0* |*Ratified* +|*M* |*2.0* |*Ratified* +|*Zmmul* |*1.0* |*Ratified* +|*A* |*2.1* |*Ratified* +|*Zawrs* |*1.01* |*Ratified* +|*Zacas* |*1.0* |*Ratified* +|*Zabha* |*1.0* |*Ratified* +|*RVWMO* |*2.0* |*Ratified* +|*Ztso* |*1.0* |*Ratified* +|*CMO* |*1.0* |*Ratified* +|*F* |*2.2* |*Ratified* +|*D* |*2.2* |*Ratified* +|*Q* |*2.2* |*Ratified* +|*Zfh* |*1.0* |*Ratified* +|*Zfhmin* |*1.0* |*Ratified* +|*Zfa* |*1.0* |*Ratified* +|*Zfinx* |*1.0* |*Ratified* +|*Zdinx* |*1.0* |*Ratified* +|*Zhinx* |*1.0* |*Ratified* +|*Zhinxmin* |*1.0* |*Ratified* +|*C* |*2.0* |*Ratified* +|*Zce* |*1.0* |*Ratified* +|*Zclsd* |*1.0* |*Ratified* |*B* |*1.0* |*Ratified* -|_P_ |_0.2_ |_Draft_ |*V* |*1.0* |*Ratified* |*Zbkb* |*1.0* |*Ratified* |*Zbkc* |*1.0* |*Ratified* @@ -65,13 +146,16 @@ h|Extension h|Version h|Status |*Zvksed* |*1.0* |*Ratified* |*Zvksh* |*1.0* |*Ratified* |*Zvkt* |*1.0* |*Ratified* +|*Zicfiss* |*1.0* |*Ratified* +|*Zicfilp* |*1.0* |*Ratified* |=== The changes in this version of the document include: -* The inclusion of all ratified extensions through March 2024. +* The inclusion of all ratified extensions through February 2025. * The draft Zam extension has been removed, in favor of the definition of a misaligned atomicity granule PMA. * The concept of vacant memory regions has been superseded by inaccessible memory or I/O regions. +* The removal of unratified content, including the sketch of the RV128I base ISA. [.big]*_Preface to Document Version 20191213-Base-Ratified_* @@ -209,7 +293,7 @@ inaccuracies. floating-point instructions in the 2-bit _fmt field._ * Defined the signed-zero behavior of FMIN._fmt_ and FMAX._fmt_, and changed their behavior on signaling-NaN inputs to conform to the -minimumNumber and maximumNumber operations in the proposed IEEE 754-201x +`minimumNumber` and `maximumNumber` operations in the proposed IEEE 754-201x specification. * The memory consistency model, RVWMO, has been defined. * The "Zam" extension, which permits misaligned AMOs and specifies @@ -378,4 +462,3 @@ added. allocated for user-defined custom extensions. * A typographical error that suggested that stores source their data from _rd_ has been corrected to refer to _rs2_. - diff --git a/src/counters-f.adoc b/src/counters-f.adoc deleted file mode 100644 index 4678d78..0000000 --- a/src/counters-f.adoc +++ /dev/null @@ -1,167 +0,0 @@ -== Counters - -RISC-V ISAs provide a set of up to 32latexmath:[$\times$]64-bit -performance counters and timers that are accessible via unprivileged -XLEN read-only CSR registers `0xC00`–`0xC1F` (with the upper 32 bits -accessed via CSR registers `0xC80`–`0xC9F` on RV32). The first three of -these (CYCLE, TIME, and INSTRET) have dedicated functions (cycle count, -real-time clock, and instructions-retired respectively), while the -remaining counters, if implemented, provide programmable event counting. - -=== Base Counters and Timers - -M@R@F@R@S + -& & & & + -& & & & + -& 5 & 3 & 5 & 7 + -RDCYCLE[H] & 0 & CSRRS & dest & SYSTEM + -RDTIME[H] & 0 & CSRRS & dest & SYSTEM + -RDINSTRET[H] & 0 & CSRRS & dest & SYSTEM + - -RV32I provides a number of 64-bit read-only user-level counters, which -are mapped into the 12-bit CSR address space and accessed in 32-bit -pieces using CSRRS instructions. In RV64I, the CSR instructions can -manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and -RDINSTRET pseudoinstructions read the full 64 bits of the `cycle`, -`time`, and `instret` counters. Hence, the RDCYCLEH, RDTIMEH, and -RDINSTRETH instructions are RV32I-only. - -Some execution environments might prohibit access to counters to impede -timing side-channel attacks. - -The RDCYCLE pseudoinstruction reads the low XLEN bits of the ` cycle` -CSR which holds a count of the number of clock cycles executed by the -processor core on which the hart is running from an arbitrary start time -in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63–32 -of the same cycle counter. The underlying 64-bit counter should never -overflow in practice. The rate at which the cycle counter advances will -depend on the implementation and operating environment. The execution -environment should provide a means to determine the current rate -(cycles/second) at which the cycle counter is incrementing. - -RDCYCLE is intended to return the number of cycles executed by the -processor core, not the hart. Precisely defining what is a ``core'' is -difficult given some implementation choices (e.g., AMD Bulldozer). -Precisely defining what is a ``clock cycle'' is also difficult given the -range of implementations (including software emulations), but the intent -is that RDCYCLE is used for performance monitoring along with the other -performance counters. In particular, where there is one hart/core, one -would expect cycle-count/instructions-retired to measure CPI for a hart. - -Cores don’t have to be exposed to software at all, and an implementor -might choose to pretend multiple harts on one physical core are running -on separate cores with one hart/core, and provide separate cycle -counters for each hart. This might make sense in a simple barrel -processor (e.g., CDC 6600 peripheral processors) where inter-hart timing -interactions are non-existent or minimal. - -Where there is more than one hart/core and dynamic multithreading, it is -not generally possible to separate out cycles per hart (especially with -SMT). It might be possible to define a separate performance counter that -tried to capture the number of cycles a particular hart was running, but -this definition would have to be very fuzzy to cover all the possible -threading implementations. For example, should we only count cycles for -which any instruction was issued to execution for this hart, and/or -cycles any instruction retired, or include cycles this hart was -occupying machine resources but couldn’t execute due to stalls while -other harts went into execution? Likely, ``all of the above'' would be -needed to have understandable performance stats. This complexity of -defining a per-hart cycle count, and also the need in any case for a -total per-core cycle count when tuning multithreaded code led to just -standardizing the per-core cycle counter, which also happens to work -well for the common single hart/core case. - -Standardizing what happens during ``sleep'' is not practical given that -what ``sleep'' means is not standardized across execution environments, -but if the entire core is paused (entirely clock-gated or powered-down -in deep sleep), then it is not executing clock cycles, and the cycle -count shouldn’t be increasing per the spec. There are many details, -e.g., whether clock cycles required to reset a processor after waking up -from a power-down event should be counted, and these are considered -execution-environment-specific details. - -Even though there is no precise definition that works for all platforms, -this is still a useful facility for most platforms, and an imprecise, -common, ``usually correct'' standard here is better than no standard. -The intent of RDCYCLE was primarily performance monitoring/tuning, and -the specification was written with that goal in mind. - -The RDTIME pseudoinstruction reads the low XLEN bits of the ` time` CSR, -which counts wall-clock real time that has passed from an arbitrary -start time in the past. RDTIMEH is an RV32I-only instruction that reads -bits 63–32 of the same real-time counter. The underlying 64-bit counter -should never overflow in practice. The execution environment should -provide a means of determining the period of the real-time counter -(seconds/tick). The period must be constant. The real-time clocks of all -harts in a single user application should be synchronized to within one -tick of the real-time clock. The environment should provide a means to -determine the accuracy of the clock. - -On some simple platforms, cycle count might represent a valid -implementation of RDTIME, but in this case, platforms should implement -the RDTIME instruction as an alias for RDCYCLE to make code more -portable, rather than using RDCYCLE to measure wall-clock time. - -The RDINSTRET pseudoinstruction reads the low XLEN bits of the -` instret` CSR, which counts the number of instructions retired by this -hart from some arbitrary start point in the past. RDINSTRETH is an -RV32I-only instruction that reads bits 63–32 of the same instruction -counter. The underlying 64-bit counter should never overflow in -practice. - -The following code sequence will read a valid 64-bit cycle counter value -into `x3`:`x2`, even if the counter overflows its lower half between -reading its upper and lower halves. - -.... - again: - rdcycleh x3 - rdcycle x2 - rdcycleh x4 - bne x3, x4, again -.... - -We recommend provision of these basic counters in implementations as -they are essential for basic performance analysis, adaptive and dynamic -optimization, and to allow an application to work with real-time -streams. Additional counters should be provided to help diagnose -performance problems and these should be made accessible from user-level -application code with low overhead. - -We required the counters be 64 bits wide, even on RV32, as otherwise it -is very difficult for software to determine if values have overflowed. -For a low-end implementation, the upper 32 bits of each counter can be -implemented using software counters incremented by a trap handler -triggered by overflow of the lower 32 bits. The sample code described -above shows how the full 64-bit width value can be safely read using the -individual 32-bit instructions. - -In some applications, it is important to be able to read multiple -counters at the same instant in time. When run under a multitasking -environment, a user thread can suffer a context switch while attempting -to read the counters. One solution is for the user thread to read the -real-time counter before and after reading the other counters to -determine if a context switch occurred in the middle of the sequence, in -which case the reads can be retried. We considered adding output latches -to allow a user thread to snapshot the counter values atomically, but -this would increase the size of the user context, especially for -implementations with a richer set of counters. - -=== Hardware Performance Counters - -There is CSR space allocated for 29 additional unprivileged 64-bit -hardware performance counters, `hpmcounter3`–`hpmcounter31`. For RV32, -the upper 32 bits of these performance counters is accessible via -additional CSRs `hpmcounter3h`–` hpmcounter31h`. These counters count -platform-specific events and are configured via additional privileged -registers. The number and width of these additional counters, and the -set of events they count is platform-specific. - -The privileged architecture manual describes the privileged CSRs -controlling access to these counters and to set the events to be -counted. - -It would be useful to eventually standardize event settings to count -ISA-level metrics, such as the number of floating-point instructions -executed for example, and possibly a few common microarchitectural -metrics, such as ``L1 instruction cache misses''. diff --git a/src/counters.adoc b/src/counters.adoc index f4a34af..a56ce02 100644 --- a/src/counters.adoc +++ b/src/counters.adoc @@ -14,7 +14,7 @@ counters (CYCLE, TIME, and INSTRET), which have dedicated functions (cycle count, real-time clock, and instructions retired, respectively). The Zicntr extension depends on the Zicsr extension. -[TIP] +[NOTE] ==== We recommend provision of these basic counters in implementations as they are essential for basic performance analysis, adaptive and dynamic @@ -27,7 +27,7 @@ Some execution environments might prohibit access to counters, for example, to impede timing side-channel attacks. ==== -include::images/wavedrom/counters-diag.adoc[] +include::images/wavedrom/counters-diag.edn[] For base ISAs with XLEN≥64, CSR instructions can access @@ -35,7 +35,7 @@ the full 64-bit CSRs directly. In particular, the RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of the `cycle`, `time`, and `instret` counters. -[TIP] +[NOTE] ==== The counter pseudoinstructions are mapped to the read-only `csrrs rd, counter, x0` canonical form, but the other read-only CSR @@ -47,7 +47,7 @@ For base ISAs with XLEN=32, the Zicntr extension enables the three RDTIME, and RDINSTRET pseudoinstructions provide the lower 32 bits, and the RDCYCLEH, RDTIMEH, and RDINSTRETH pseudoinstructions provide the upper 32 bits of the respective counters. -[TIP] +[NOTE] ==== We required the counters be 64 bits wide, even when XLEN=32, as otherwise it is very difficult for software to determine if values have @@ -67,7 +67,7 @@ overflow in practice. The rate at which the cycle counter advances will depend on the implementation and operating environment. The execution environment should provide a means to determine the current rate (cycles/second) at which the cycle counter is incrementing. -[TIP] +[NOTE] ==== RDCYCLE is intended to return the number of cycles executed by the processor core, not the hart. Precisely defining what is a "core" is @@ -78,7 +78,7 @@ is that RDCYCLE is used for performance monitoring along with the other performance counters. In particular, where there is one hart/core, one would expect cycle-count/instructions-retired to measure CPI for a hart. -Cores don't have to be exposed to software at all, and an implementor +Cores don't have to be exposed to software at all, and an implementer might choose to pretend multiple harts on one physical core are running on separate cores with one hart/core, and provide separate cycle counters for each hart. This might make sense in a simple barrel @@ -128,7 +128,7 @@ should be constant within a small error bound. The environment should provide a means to determine the accuracy of the clock (i.e., the maximum relative error between the nominal and actual real-time clock periods). -[TIP] +[NOTE] ==== On some simple platforms, cycle count might represent a valid implementation of RDTIME, in which case RDTIME and RDCYCLE may return @@ -141,12 +141,28 @@ bound should be set based on the requirements of the platform. The real-time clocks of all harts must be synchronized to within one tick of the real-time clock. -[TIP] +[NOTE] ==== As with other architectural mandates, it suffices to appear "as if" harts are synchronized to within one tick of the real-time clock, i.e., software is unable to observe that there is a greater delta between the real-time clock values observed on two harts. + +If, for example, the real-time clock increments at a frequency of 1 GHz, then +all harts must appear to be synchronized to within 1 nsec. +But it is also acceptable for this example implementation to only update the +real-time clock at, say, a frequency of 100 MHz with increments of 10 ticks. +As long as software cannot observe this seeming violation of the above +synchronization requirement, and software always observes time across harts to +be monotonically nondecreasing, then this implementation is compliant. + +A platform spec may then, for example, specify an apparent real-time clock +tick frequency (e.g. 1 GHz) and also a minimum update frequency (e.g. 100 MHz) +at which updated time values are guaranteed to be observable by software. +Software may read time more frequently, but it should only observe +monotonically nondecreasing values and it should observe a new value at least +once every 10 ns (corresponding to the 100 MHz update frequency in this +example). ==== The RDINSTRET pseudoinstruction reads the low XLEN bits of the `instret` CSR, which counts the number of instructions retired by this @@ -154,7 +170,7 @@ hart from some arbitrary start point in the past. RDINSTRETH is only present when XLEN=32 and reads bits 63-32 of the same instruction counter. The underlying 64-bit counter should never overflow in practice. -[TIP] +[NOTE] ==== Instructions that cause synchronous exceptions, including ECALL and EBREAK, are not considered to retire and hence do not increment the @@ -180,7 +196,7 @@ hardware performance counters, `hpmcounter3-hpmcounter31`. When XLEN=32, the upper 32 bits of these performance counters are accessible via additional CSRs `hpmcounter3h- hpmcounter31h`. The Zihpm extension depends on the Zicsr extension. -[TIP] +[NOTE] ==== In some applications, it is important to be able to read multiple counters at the same instant in time. When run under a multitasking @@ -202,7 +218,7 @@ exception or may return a constant value. The execution environment should provide a means to determine the number and width of the implemented counters, and an interface to configure the events to be counted by each counter. -[TIP] +[NOTE] ==== For execution environments implemented on RISC-V privileged platforms, the privileged architecture manual describes privileged CSRs controlling diff --git a/src/d-st-ext.adoc b/src/d-st-ext.adoc index 7c5eb4c..9cfd49b 100644 --- a/src/d-st-ext.adoc +++ b/src/d-st-ext.adoc @@ -39,7 +39,7 @@ floating-point register but has to be able to save and restore the register values, hence the result of using wider operations to transfer narrower values has to be defined. A common case is for callee-saved registers, but a standard convention is also desirable for features -including varargs, user-level threading libraries, virtual machine +including variadic functions, user-level threading libraries, virtual machine migration, and debugging. ==== @@ -58,7 +58,7 @@ so, the _n_ least-significant bits of the input are used as the input value, otherwise the input value is treated as an _n_-bit canonical NaN. -[TIP] +[NOTE] ==== Earlier versions of this document did not define the behavior of feeding the results of narrower or wider operands into an operation, except to @@ -87,7 +87,7 @@ but the datapath and latency costs are minimal. The recoding process has to handle shifting of input subnormal values for wide operands in any case, and extracting the NaN-boxed value is a similar process to normalization except for skipping over leading-1 bits instead of -skipping over leading-0 bits, allowing the datapath muxing to be shared. +skipping over leading-0 bits, allowing the datapath multiplexing to be shared. ==== [[fld_fsd]] @@ -103,7 +103,7 @@ value from the floating-point registers to memory. The double-precision value may be a NaN-boxed single-precision value. ==== -include::images/wavedrom/double-ls.adoc[] +include::images/wavedrom/double-ls.edn[] [[double-ls]] //.Double-precision load and store @@ -119,7 +119,7 @@ The double-precision floating-point computational instructions are defined analogously to their single-precision counterparts, but operate on double-precision operands and produce double-precision results. -include::images/wavedrom/double-fl-compute.adoc[] +include::images/wavedrom/double-fl-compute.edn[] [[fl-compute]] //.Double-precision float computational @@ -143,7 +143,7 @@ All floating-point to integer and integer to floating-point conversion instructions round according to the _rm_ field. Note FCVT.D.W[U] always produces an exact result and is unaffected by rounding mode. -include::images/wavedrom/double-fl-convert-mv.adoc[] +include::images/wavedrom/double-fl-convert-mv.edn[] [[fl-convert-mv]] //.Double-precision float convert and move @@ -157,7 +157,7 @@ never round. (((double-precision, to single-precision))) (((single-precision, to double-precision ))) -include::images/wavedrom/fcvt-sd-ds.adoc[] +include::images/wavedrom/fcvt-sd-ds.edn[] [[fcvt-sd-ds]] //.Double-precision FCVT.S.D and FCVT.D.S @@ -166,7 +166,7 @@ FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction. //FSGNJ.D, FSGNJN.D, and FSGNJX.D -include::images/wavedrom/fsjgnjnx-d.adoc[] +include::images/wavedrom/fsjgnjnx-d.edn[] //.Double-precision sign-injection For XLEN≥64 only, instructions are provided to move bit @@ -180,11 +180,11 @@ register _rd_. FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. -include::images/wavedrom/d-xwwx.adoc[] +include::images/wavedrom/d-xwwx.edn[] [[fmvxddx]] //.Double-precision float move to _rd_ -[TIP] +[NOTE] ==== Early versions of the RISC-V ISA had additional instructions to allow RV32 systems to transfer between the upper and lower portions of a @@ -214,7 +214,7 @@ analogously to their single-precision counterparts, but operate on double-precision operands. (((floating-point, compare))) -include::images/wavedrom/double-fl-compare.adoc[] +include::images/wavedrom/double-fl-compare.edn[] [[fl-compare]] //.Double-precision float compare @@ -225,8 +225,6 @@ defined analogously to its single-precision counterpart, but operates on double-precision operands. (((floating-point, classify))) -include::images/wavedrom/double-fl-class.adoc[] +include::images/wavedrom/double-fl-class.edn[] [[fl-class]] //.Double-precision float classify - - diff --git a/src/example/sgemm.S b/src/example/sgemm.S index e29cc8d..328149d 100644 --- a/src/example/sgemm.S +++ b/src/example/sgemm.S @@ -51,7 +51,7 @@ sgemm_nn: sd s1, OFFSET(sp) sd s2, OFFSET(sp) - # Check for zero size matrices + # Check for zero size matrices beqz n, exit beqz m, exit beqz k, exit @@ -73,12 +73,12 @@ c_row_loop: # Loop across rows of C blocks mv cnp, cp # Initialize C n-loop pointer c_col_loop: # Loop across one row of C blocks - vsetvli nvl, nt, e32, ta, ma # 32-bit vectors, LMUL=1 + vsetvli nvl, nt, e32, m1, ta, ma # 32-bit vectors, LMUL=1 mv akp, ap # reset pointer into A to beginning mv bkp, bnp # step to next column in B matrix - # Initalize current C submatrix block from memory. + # Initialize current C submatrix block from memory. vle32.v v0, (cnp); add ccp, cnp, cstride; vle32.v v1, (ccp); add ccp, ccp, cstride; vle32.v v2, (ccp); add ccp, ccp, cstride; @@ -169,7 +169,7 @@ c_col_loop: # Loop across one row of C blocks j k_loop 1: vfmacc.vf v15, ft15, v16 - + # Save C matrix block back to memory vse32.v v0, (cnp); add ccp, cnp, cstride; vse32.v v1, (ccp); add ccp, ccp, cstride; @@ -204,7 +204,7 @@ c_col_loop: # Loop across one row of C blocks add ap, ap, t6 # Move A matrix pointer down 16 rows slli t6, cstride, 4 # Multiply cstride by 16 add cp, cp, t6 # Move C matrix pointer down 16 rows - + slti t6, m, 16 beqz t6, c_row_loop diff --git a/src/example/strcmp.s b/src/example/strcmp.s index c657703..5e9177c 100644 --- a/src/example/strcmp.s +++ b/src/example/strcmp.s @@ -15,10 +15,10 @@ loop: vmseq.vi v0, v8, 0 # Flag zero bytes in src1 vmsne.vv v1, v8, v16 # Flag if src1 != src2 vmor.mm v0, v0, v1 # Combine exit conditions - + vfirst.m a2, v0 # ==0 or != ? csrr t1, vl # Get number of bytes fetched - + bltz a2, loop # Loop if all same and no zero byte add a0, a0, a2 # Get src1 element address @@ -30,5 +30,3 @@ loop: sub a0, a3, a4 # Return value. ret - - diff --git a/src/example/vvaddint32.s b/src/example/vvaddint32.s index 22305d9..34d849b 100644 --- a/src/example/vvaddint32.s +++ b/src/example/vvaddint32.s @@ -8,7 +8,7 @@ # a0 = n, a1 = x, a2 = y, a3 = z # Non-vector instructions are indented vvaddint32: - vsetvli t0, a0, e32, ta, ma # Set vector length based on 32-bit vectors + vsetvli t0, a0, e32, m1, ta, ma # Set vector length based on 32-bit vectors vle32.v v0, (a1) # Get first vector sub a0, a0, t0 # Decrement number done slli t0, t0, 2 # Multiply number done by 4 bytes diff --git a/src/extending.adoc b/src/extending.adoc deleted file mode 100644 index 9124a26..0000000 --- a/src/extending.adoc +++ /dev/null @@ -1,370 +0,0 @@ -[[extending]] -== Extending RISC-V - -In addition to supporting standard general-purpose software development, -another goal of RISC-V is to provide a basis for more specialized -instruction-set extensions or more customized accelerators. The -instruction encoding spaces and optional variable-length instruction -encoding are designed to make it easier to leverage software development -effort for the standard ISA toolchain when building more customized -processors. For example, the intent is to continue to provide full -software support for implementations that only use the standard I base, -perhaps together with many non-standard instruction-set extensions. - -This chapter describes various ways in which the base RISC-V ISA can be -extended, together with the scheme for managing instruction-set -extensions developed by independent groups. This volume only deals with -the unprivileged ISA, although the same approach and terminology is used -for supervisor-level extensions described in the second volume. - -=== Extension Terminology - -This section defines some standard terminology for describing RISC-V -extensions. - -==== Standard versus Non-Standard Extension - -Any RISC-V processor implementation must support a base integer ISA -(RV32I, RV32E, RV64I, RV64E, or RV128I). In addition, an implementation may -support one or more extensions. We divide extensions into two broad -categories: _standard_ versus _non-standard_. - -* A standard extension is one that is generally useful and that is -designed to not conflict with any other standard extension. Currently, -"MAFDQLCBTPV", described in other chapters of this manual, are either -complete or planned standard extensions. -* A non-standard extension may be highly specialized and may conflict -with other standard or non-standard extensions. We anticipate a wide -variety of non-standard extensions will be developed over time, with -some eventually being promoted to standard extensions. - -==== Instruction Encoding Spaces and Prefixes - -An instruction encoding space is some number of instruction bits within -which a base ISA or ISA extension is encoded. RISC-V supports varying -instruction lengths, but even within a single instruction length, there -are various sizes of encoding space available. For example, the base -ISAs are defined within a 30-bit encoding space (bits 31-2 of the 32-bit -instruction), while the atomic extension "A" fits within a 25-bit -encoding space (bits 31-7). - -We use the term _prefix_ to refer to the bits to the _right_ of an -instruction encoding space (since instruction fetch in RISC-V is -little-endian, the bits to the right are stored at earlier memory -addresses, hence form a prefix in instruction-fetch order). The prefix -for the standard base ISA encoding is the two-bit "11" field held in -bits 1-0 of the 32-bit word, while the prefix for the standard atomic -extension "A" is the seven-bit "0101111" field held in bits 6-0 of -the 32-bit word representing the AMO major opcode. A quirk of the -encoding format is that the 3-bit funct3 field used to encode a minor -opcode is not contiguous with the major opcode bits in the 32-bit -instruction format, but is considered part of the prefix for 22-bit -instruction spaces. - -Although an instruction encoding space could be of any size, adopting a -smaller set of common sizes simplifies packing independently developed -extensions into a single global encoding. -<<encodingspaces>> gives the suggested sizes for RISC-V. - -[[encodingspaces]] -.Suggested standard RISC-V instruction encoding space sizes. -[%autowidth,float="center",align="center",cols="^,<,>,>,>,>", options="header"] -|=== -|Size |Usage -4+^| # Available in standard instruction length -| | |16-bit |32-bit |48-bit |64-bit - -6+| -|14-bit |Quadrant of compressed 16-bit encoding |3 | | | - -6+| -|22-bit |Minor opcode in base 32-bit encoding | |latexmath:[$2^{8}$] -|latexmath:[$2^{20}$] |latexmath:[$2^{35}$] - -|25-bit |Major opcode in base 32-bit encoding | |32 -|latexmath:[$2^{17}$] |latexmath:[$2^{32}$] - -|30-bit |Quadrant of base 32-bit encoding | |1 |latexmath:[$2^{12}$] -|latexmath:[$2^{27}$] - -6+| -|32-bit |Minor opcode in 48-bit encoding | | |latexmath:[$2^{10}$] -|latexmath:[$2^{25}$] - -|37-bit |Major opcode in 48-bit encoding | | |32 |latexmath:[$2^{20}$] - -|40-bit |Quadrant of 48-bit encoding | | |4 |latexmath:[$2^{17}$] - -6+| -|45-bit |Sub-minor opcode in 64-bit encoding | | | |latexmath:[$2^{12}$] - -|48-bit |Minor opcode in 64-bit encoding | | | |latexmath:[$2^{9}$] - -|52-bit |Major opcode in 64-bit encoding | | | |32 -|=== - -==== Greenfield versus Brownfield Extensions - -We use the term _greenfield extension_ to describe an extension that -begins populating a new instruction encoding space, and hence can only -cause encoding conflicts at the prefix level. We use the term -_brownfield extension_ to describe an extension that fits around -existing encodings in a previously defined instruction space. A -brownfield extension is necessarily tied to a particular greenfield -parent encoding, and there may be multiple brownfield extensions to the -same greenfield parent encoding. For example, the base ISAs are -greenfield encodings of a 30-bit instruction space, while the FDQ -floating-point extensions are all brownfield extensions adding to the -parent base ISA 30-bit encoding space. - -Note that we consider the standard A extension to have a greenfield -encoding as it defines a new previously empty 25-bit encoding space in -the leftmost bits of the full 32-bit base instruction encoding, even -though its standard prefix locates it within the 30-bit encoding space -of its parent base ISA. Changing only its single 7-bit prefix could move -the A extension to a different 30-bit encoding space while only worrying -about conflicts at the prefix level, not within the encoding space -itself. - -[[exttax]] -.Two-dimensional characterization of standard instruction-set extensions. -[cols="^,^,^",options="header",] -[%autowidth, float="center", align="center"] -|=== -| |Adds state |No new state -|Greenfield |RV32I(30), RV64I(30) |A(25) -|Brownfield |F(I), D(F), Q(D) |M(I) -|=== - -<<exttax>> shows the bases and standard extensions placed -in a simple two-dimensional taxonomy. One axis is whether the extension -is greenfield or brownfield, while the other axis is whether the -extension adds architectural state. For greenfield extensions, the size -of the instruction encoding space is given in parentheses. For -brownfield extensions, the name of the extension (greenfield or -brownfield) it builds upon is given in parentheses. Additional -user-level architectural state usually implies changes to the -supervisor-level system or possibly to the standard calling convention. - -Note that RV64I is not considered an extension of RV32I, but a different -complete base encoding. - -==== Standard-Compatible Global Encodings - -A complete or _global_ encoding of an ISA for an actual RISC-V -implementation must allocate a unique non-conflicting prefix for every -included instruction encoding space. The bases and every standard -extension have each had a standard prefix allocated to ensure they can -all coexist in a global encoding. - -A _standard-compatible_ global encoding is one where the base and every -included standard extension have their standard prefixes. A -standard-compatible global encoding can include non-standard extensions -that do not conflict with the included standard extensions. A -standard-compatible global encoding can also use standard prefixes for -non-standard extensions if the associated standard extensions are not -included in the global encoding. In other words, a standard extension -must use its standard prefix if included in a standard-compatible global -encoding, but otherwise its prefix is free to be reallocated. These -constraints allow a common toolchain to target the standard subset of -any RISC-V standard-compatible global encoding. - -==== Guaranteed Non-Standard Encoding Space - -To support development of proprietary custom extensions, portions of the -encoding space are guaranteed to never be used by standard extensions. - -=== RISC-V Extension Design Philosophy - -We intend to support a large number of independently developed -extensions by encouraging extension developers to operate within -instruction encoding spaces, and by providing tools to pack these into a -standard-compatible global encoding by allocating unique prefixes. Some -extensions are more naturally implemented as brownfield augmentations of -existing extensions, and will share whatever prefix is allocated to -their parent greenfield extension. The standard extension prefixes avoid -spurious incompatibilities in the encoding of core functionality, while -allowing custom packing of more esoteric extensions. - -This capability of repacking RISC-V extensions into different -standard-compatible global encodings can be used in a number of ways. - -One use-case is developing highly specialized custom accelerators, -designed to run kernels from important application domains. These might -want to drop all but the base integer ISA and add in only the extensions -that are required for the task in hand. The base ISAs have been designed -to place minimal requirements on a hardware implementation, and has been -encoded to use only a small fraction of a 32-bit instruction encoding -space. - -Another use-case is to build a research prototype for a new type of -instruction-set extension. The researchers might not want to expend the -effort to implement a variable-length instruction-fetch unit, and so -would like to prototype their extension using a simple 32-bit -fixed-width instruction encoding. However, this new extension might be -too large to coexist with standard extensions in the 32-bit space. If -the research experiments do not need all of the standard extensions, a -standard-compatible global encoding might drop the unused standard -extensions and reuse their prefixes to place the proposed extension in a -non-standard location to simplify engineering of the research prototype. -Standard tools will still be able to target the base and any standard -extensions that are present to reduce development time. Once the -instruction-set extension has been evaluated and refined, it could then -be made available for packing into a larger variable-length encoding -space to avoid conflicts with all standard extensions. - -The following sections describe increasingly sophisticated strategies -for developing implementations with new instruction-set extensions. -These are mostly intended for use in highly customized, educational, or -experimental architectures rather than for the main line of RISC-V ISA -development. - -[[fix32b]] -=== Extensions within fixed-width 32-bit instruction format - -In this section, we discuss adding extensions to implementations that -only support the base fixed-width 32-bit instruction format. -[NOTE] -==== -We anticipate the simplest fixed-width 32-bit encoding will be popular -for many restricted accelerators and research prototypes. -==== -==== Available 30-bit instruction encoding spaces - -In the standard encoding, three of the available 30-bit instruction -encoding spaces (those with 2-bit prefixes `00`, `01`, and `10`) are used to -enable the optional compressed instruction extension. However, if the -compressed instruction-set extension is not required, then these three -further 30-bit encoding spaces become available. This quadruples the -available encoding space within the 32-bit format. - -==== Available 25-bit instruction encoding spaces - -A 25-bit instruction encoding space corresponds to a major opcode in the -base and standard extension encodings. - -There are four major opcodes expressly designated for custom extensions -<<opcodemap>>, each of which represents a 25-bit -encoding space. Two of these are reserved for eventual use in the RV128 -base encoding (will be OP-IMM-64 and OP-64), but can be used for -non-standard extensions for RV32 and RV64. - -The two major opcodes reserved for RV64 (OP-IMM-32 and OP-32) can also -be used for non-standard extensions to RV32 only. - -If an implementation does not require floating-point, then the seven -major opcodes reserved for standard floating-point extensions (LOAD-FP, -STORE-FP, MADD, MSUB, NMSUB, NMADD, OP-FP) can be reused for -non-standard extensions. Similarly, the AMO major opcode can be reused -if the standard atomic extensions are not required. - -If an implementation does not require instructions longer than 32-bits, -then an additional four major opcodes are available (those marked in -gray in <<opcodemap>>). - -The base RV32I encoding uses only 11 major opcodes plus 3 reserved -opcodes, leaving up to 18 available for extensions. The base RV64I -encoding uses only 13 major opcodes plus 3 reserved opcodes, leaving up -to 16 available for extensions. - -==== Available 22-bit instruction encoding spaces - -A 22-bit encoding space corresponds to a funct3 minor opcode space in -the base and standard extension encodings. Several major opcodes have a -funct3 field minor opcode that is not completely occupied, leaving -available several 22-bit encoding spaces. - -Usually a major opcode selects the format used to encode operands in the -remaining bits of the instruction, and ideally, an extension should -follow the operand format of the major opcode to simplify hardware -decoding. - -==== Other spaces - -Smaller spaces are available under certain major opcodes, and not all -minor opcodes are entirely filled. - -=== Adding aligned 64-bit instruction extensions - -The simplest approach to provide space for extensions that are too large -for the base 32-bit fixed-width instruction format is to add naturally -aligned 64-bit instructions. The implementation must still support the -32-bit base instruction format, but can require that 64-bit instructions -are aligned on 64-bit boundaries to simplify instruction fetch, with a -32-bit NOP instruction used as alignment padding where necessary. - -To simplify use of standard tools, the 64-bit instructions should be -encoded as described in <<instlengthcode, Table 1>>. -However, an implementation might choose a non-standard -instruction-length encoding for 64-bit instructions, while retaining the -standard encoding for 32-bit instructions. For example, if compressed -instructions are not required, then a 64-bit instruction could be -encoded using one or more zero bits in the first two bits of an -instruction. -[NOTE] -==== -We anticipate processor generators that produce instruction-fetch units -capable of automatically handling any combination of supported -variable-length instruction encodings. -==== -=== Supporting VLIW encodings - -Although RISC-V was not designed as a base for a pure VLIW machine, VLIW -encodings can be added as extensions using several alternative -approaches. In all cases, the base 32-bit encoding has to be supported -to allow use of any standard software tools. - -==== Fixed-size instruction group - -The simplest approach is to define a single large naturally aligned -instruction format (e.g., 128 bits) within which VLIW operations are -encoded. In a conventional VLIW, this approach would tend to waste -instruction memory to hold NOPs, but a RISC-V-compatible implementation -would have to also support the base 32-bit instructions, confining the -VLIW code size expansion to VLIW-accelerated functions. - -==== Encoded-Length Groups - -Another approach is to use the standard length encoding from -<<instlengthcode>> to encode parallel -instruction groups, allowing NOPs to be compressed out of the VLIW -instruction. For example, a 64-bit instruction could hold two 28-bit -operations, while a 96-bit instruction could hold three 28-bit -operations, and so on. Alternatively, a 48-bit instruction could hold -one 42-bit operation, while a 96-bit instruction could hold two 42-bit -operations, and so on. - -This approach has the advantage of retaining the base ISA encoding for -instructions holding a single operation, but has the disadvantage of -requiring a new 28-bit or 42-bit encoding for operations within the VLIW -instructions, and misaligned instruction fetch for larger groups. One -simplification is to not allow VLIW instructions to straddle certain -microarchitecturally significant boundaries (e.g., cache lines or -virtual memory pages). - -==== Fixed-Size Instruction Bundles - -Another approach, similar to Itanium, is to use a larger naturally -aligned fixed instruction bundle size (e.g., 128 bits) across which -parallel operation groups are encoded. This simplifies instruction -fetch, but shifts the complexity to the group execution engine. To -remain RISC-V compatible, the base 32-bit instruction would still have -to be supported. - -==== End-of-Group bits in Prefix - -None of the above approaches retains the RISC-V encoding for the -individual operations within a VLIW instruction. Yet another approach is -to repurpose the two prefix bits in the fixed-width 32-bit encoding. One -prefix bit can be used to signal "end-of-group" if set, while the -second bit could indicate execution under a predicate if clear. Standard -RISC-V 32-bit instructions generated by tools unaware of the VLIW -extension would have both prefix bits set (11) and thus have the correct -semantics, with each instruction at the end of a group and not -predicated. - -The main disadvantage of this approach is that the base ISAs lack the -complex predication support usually required in an aggressive VLIW -system, and it is difficult to add space to specify more predicate -registers in the standard 30-bit encoding space. - diff --git a/src/f-st-ext.adoc b/src/f-st-ext.adoc index 96d5b44..8b8e9c3 100644 --- a/src/f-st-ext.adoc +++ b/src/f-st-ext.adoc @@ -21,13 +21,13 @@ instructions operate on values in the floating-point register file. Floating-point load and store instructions transfer floating-point values between registers and memory. Instructions to transfer values to and from the integer register file are also provided. -[TIP] +[NOTE] ==== We considered a unified register file for both integer and floating-point values as this simplifies software register allocation and calling conventions, and reduces total user state. However, a split organization increases the total number of registers accessible with a -given instruction width, simplifies provision of enough regfile ports +given instruction width, simplifies provision of enough register file ports for wide superscalar issue, supports decoupled floating-point-unit architectures, and simplifies use of internal floating-point encoding techniques. Compiler support and calling conventions for split register @@ -87,7 +87,7 @@ operations and holds the accrued exception flags, as shown in <<fcsr>>. [[fcsr, Floating-Point Control and Status Register]] .Floating-point control and status register -include::images/wavedrom/float-csr.adoc[] +include::images/wavedrom/float-csr.edn[] The `fcsr` register can be read and written with the FRCSR and FSCSR instructions, which are assembler pseudoinstructions built on the @@ -116,7 +116,7 @@ the instruction, or a dynamic rounding mode held in `frm`. Rounding modes are encoded as shown in <<rm>>. A value of 111 in the instruction's _rm_ field selects the dynamic rounding mode held in `frm`. The behavior of floating-point instructions that depend on -rounding mode when executed with a reserved rounding mode is _reserved_, including both static reserved rounding modes (101-110) and dynamic reserved rounding modes (101-111). Some instructions, including widening conversions, have the _rm_ field but are nevertheless mathematically unaffected by the rounding mode; software should set their _rm_ field to +rounding mode when executed with a reserved rounding mode is _reserved_, including both static reserved rounding modes (101-110) and dynamic reserved rounding modes (101-111). Some instructions, including widening conversions, have the _rm_ field but are nevertheless mathematically unaffected by the rounding mode; software should set their _rm_ field to RNE (000) but implementations must treat the _rm_ field as usual (in particular, with regard to decoding legal vs. reserved encodings). @@ -189,14 +189,14 @@ quiet bit. For single-precision floating-point, this corresponds to the pattern (((NaN, generation))) (((NaN, propagation))) -[TIP] +[NOTE] ==== We considered propagating NaN payloads, as is recommended by the standard, but this decision would have increased hardware cost. Moreover, since this feature is optional in the standard, it cannot be used in portable code. -Implementors are free to provide a NaN payload propagation scheme as a +Implementers are free to provide a NaN payload propagation scheme as a nonstandard extension enabled by a nonstandard operating mode. However, the canonical NaN scheme described above must always be supported and should be the default mode. ==== ''' @@ -231,7 +231,7 @@ signals. Floating-point loads and stores use the same base+offset addressing mode as the integer base ISAs, with a base address in register _rs1_ and a 12-bit signed byte offset. The FLW instruction loads a single-precision floating-point value from memory into floating-point register _rd_. FSW stores a single-precision value from floating-point register _rs2_ to memory. -include::images/wavedrom/sp-load-store-2.adoc[] +include::images/wavedrom/sp-load-store-2.edn[] [[sp-ldst]] //.SP load and store @@ -278,12 +278,12 @@ Signaling NaN inputs set the invalid operation exception flag, even when the res ==== Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x -minimumNumber and maximumNumber operations, rather than the IEEE +`minimumNumber` and `maximumNumber` operations, rather than the IEEE 754-2008 minNum and maxNum operations. These operations differ in their handling of signaling NaNs. ==== -include::images/wavedrom/spfloat.adoc[] +include::images/wavedrom/spfloat.edn[] [[spfloat]] //.Single-Precision Floating-Point Computational Instructions (((floating point, fused multiply-add))) @@ -311,11 +311,12 @@ product as the RISC-V instructions do, so the naming scheme was more rational at the time. The two definitions differ with respect to signed-zero results. The RISC-V definition matches the behavior of the x86 and ARM fused multiply-add instructions, but unfortunately the -RISC-V FNMSUB and FNMADD instruction names are swapped compared to x86 -and ARM. +RISC-V FNMSUB and FNMADD instruction names are swapped as compared to x86, +whereas the RISC-V FMSUB and FNMSUB instruction names are swapped as +compared to ARM. ==== -include::images/wavedrom/spfloat2.adoc[] +include::images/wavedrom/spfloat2.edn[] [[fnmaddsub]] //.F[N]MADD/F[N]MSUB instructions @@ -389,7 +390,7 @@ All floating-point conversion instructions set the Inexact exception flag if the rounded result differs from the operand value and the Invalid exception flag is not set. -include::images/wavedrom/spfloat-cn-cmp.adoc[] +include::images/wavedrom/spfloat-cn-cmp.edn[] [[fcvt]] //.SP float convert and move @@ -405,13 +406,13 @@ FSGNJN.S _rx, ry, ry_ moves the negation of _ry_ to _rx_ (assembler pseudoinstruction FNEG.S _rx, ry_); and FSGNJX.S _rx, ry, ry_ moves the absolute value of _ry_ to _rx_ (assembler pseudoinstruction FABS.S _rx, ry_). -include::images/wavedrom/spfloat-sign-inj.adoc[] +include::images/wavedrom/spfloat-sign-inj.edn[] [[inj]] [NOTE] ==== The sign-injection instructions provide floating-point MV, ABS, and NEG, as well as supporting a few other operations, including the IEEE -copySign operation and sign manipulation in transcendental math function libraries. Although MV, ABS, and NEG only need a single register operand, whereas FSGNJ instructions need two, it is unlikely most microarchitectures would add optimizations to benefit from the reduced number of register reads for these relatively infrequent instructions. Even in this case, a microarchitecture can simply detect when both source registers are the same for FSGNJ instructions and only read a single copy. +`copySign` operation and sign manipulation in transcendental math function libraries. Although MV, ABS, and NEG only need a single register operand, whereas FSGNJ instructions need two, it is unlikely most microarchitectures would add optimizations to benefit from the reduced number of register reads for these relatively infrequent instructions. Even in this case, a microarchitecture can simply detect when both source registers are the same for FSGNJ instructions and only read a single copy. ==== Instructions are provided to move bit patterns between the @@ -428,11 +429,11 @@ preserved. The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S. The use of W is more consistent with their semantics as an instruction that moves 32 bits without interpreting them. This became clearer after defining NaN-boxing. To avoid disturbing existing code, both the W and S versions will be supported by tools. ==== -include::images/wavedrom/spfloat-mv.adoc[] +include::images/wavedrom/spfloat-mv.edn[] [[spfloat-mv]] //.SP floating point move -[TIP] +[NOTE] ==== The base floating-point ISA was defined so as to allow implementations to employ an internal recoding of the floating-point format in registers to simplify handling of subnormal values and possibly to reduce functional unit latency. To this end, the F extension avoids @@ -454,7 +455,7 @@ _signaling_ comparisons: that is, they set the invalid operation exception flag if either input is NaN. FEQ.S performs a _quiet_ comparison: it only sets the invalid operation exception flag if either input is a signaling NaN. For all three instructions, the result is 0 if either operand is NaN. -include::images/wavedrom/spfloat-comp.adoc[] +include::images/wavedrom/spfloat-comp.edn[] [[spfloat-comp]] //.SP floating point compare @@ -478,7 +479,7 @@ _rd_ are cleared. Note that exactly one bit in _rd_ will be set. FCLASS.S does not set the floating-point exception flags. (((floating-point, classification))) -include::images/wavedrom/spfloat-classify.adoc[] +include::images/wavedrom/spfloat-classify.edn[] [[spfloat-classify]] //.SP floating point classify @@ -498,4 +499,3 @@ include::images/wavedrom/spfloat-classify.adoc[] |8 |_rs1_ is a signaling NaN. |9 |_rs1_ is a quiet NaN. |=== - diff --git a/src/fraclmul.adoc b/src/fraclmul.adoc index 6f12f58..32e9b81 100644 --- a/src/fraclmul.adoc +++ b/src/fraclmul.adoc @@ -5,6 +5,7 @@ compilers can make good use of the fractional LMUL feature. Consider the following (admittedly contrived) loop written in C: +[source,c] ---- void add_ref(long N, signed char *restrict c_c, signed char *restrict c_a, signed char *restrict c_b, @@ -31,86 +32,86 @@ the loop: an 8-bit 'char' and a 64-bit 'long *'. Without fractional LMUL, the compiler would be forced to use LMUL=1 for the 8-bit computation and LMUL=8 for the 64-bit computation(s), to have equal number of elements on all computations within the same loop iteration. Under LMUL=8, only 4 registers are available -to the register allocator. Given the large number of 64-bit variables and -temporaries required in this loop, the compiler ends up generating a lot of +to the register allocator. Given the large number of 64-bit variables and +temporaries required in this loop, the compiler ends up generating a lot of spill code. The code below demonstrates this effect: ---- .LBB0_4: # %vector.body # =>This Inner Loop Header: Depth=1 - add s9, a2, s6 - vsetvli s1, zero, e8,m1,ta,mu - vle8.v v25, (s9) - add s1, a3, s6 - vle8.v v26, (s1) - vadd.vv v25, v26, v25 - add s1, a1, s6 - vse8.v v25, (s1) - add s9, a5, s10 - vsetvli s1, zero, e64,m8,ta,mu - vle64.v v8, (s9) - add s1, a6, s10 - vle64.v v16, (s1) - add s1, a7, s10 - vle64.v v24, (s1) - add s1, s3, s10 - vle64.v v0, (s1) - sd a0, -112(s0) - ld a0, -128(s0) - vs8r.v v0, (a0) # Spill LMUL=8 - add s9, t6, s10 - add s11, t5, s10 - add ra, t2, s10 - add s1, t3, s10 - vle64.v v0, (s9) - ld s9, -136(s0) - vs8r.v v0, (s9) # Spill LMUL=8 - vle64.v v0, (s11) - ld s9, -144(s0) - vs8r.v v0, (s9) # Spill LMUL=8 - vle64.v v0, (ra) - ld s9, -160(s0) - vs8r.v v0, (s9) # Spill LMUL=8 - vle64.v v0, (s1) - ld s1, -152(s0) - vs8r.v v0, (s1) # Spill LMUL=8 - vadd.vv v16, v16, v8 - ld s1, -128(s0) - vl8r.v v8, (s1) # Reload LMUL=8 - vadd.vv v8, v8, v24 - ld s1, -136(s0) - vl8r.v v24, (s1) # Reload LMUL=8 - ld s1, -144(s0) - vl8r.v v0, (s1) # Reload LMUL=8 - vadd.vv v24, v0, v24 - ld s1, -128(s0) - vs8r.v v24, (s1) # Spill LMUL=8 - ld s1, -152(s0) - vl8r.v v0, (s1) # Reload LMUL=8 - ld s1, -160(s0) - vl8r.v v24, (s1) # Reload LMUL=8 - vadd.vv v0, v0, v24 - add s1, a4, s10 - vse64.v v16, (s1) - add s1, s2, s10 - vse64.v v8, (s1) - vadd.vv v8, v8, v16 - add s1, t4, s10 - ld s9, -128(s0) - vl8r.v v16, (s9) # Reload LMUL=8 - vse64.v v16, (s1) - add s9, t0, s10 - vadd.vv v8, v8, v16 - vle64.v v16, (s9) - add s1, t1, s10 - vse64.v v0, (s1) - vadd.vv v8, v8, v0 - vsll.vi v16, v16, 1 - vadd.vv v8, v8, v16 - vse64.v v8, (s9) - add s6, s6, s7 - add s10, s10, s8 - bne s6, s4, .LBB0_4 + add s9, a2, s6 + vsetvli s1, zero, e8,m1,ta,mu + vle8.v v25, (s9) + add s1, a3, s6 + vle8.v v26, (s1) + vadd.vv v25, v26, v25 + add s1, a1, s6 + vse8.v v25, (s1) + add s9, a5, s10 + vsetvli s1, zero, e64,m8,ta,mu + vle64.v v8, (s9) + add s1, a6, s10 + vle64.v v16, (s1) + add s1, a7, s10 + vle64.v v24, (s1) + add s1, s3, s10 + vle64.v v0, (s1) + sd a0, -112(s0) + ld a0, -128(s0) + vs8r.v v0, (a0) # Spill LMUL=8 + add s9, t6, s10 + add s11, t5, s10 + add ra, t2, s10 + add s1, t3, s10 + vle64.v v0, (s9) + ld s9, -136(s0) + vs8r.v v0, (s9) # Spill LMUL=8 + vle64.v v0, (s11) + ld s9, -144(s0) + vs8r.v v0, (s9) # Spill LMUL=8 + vle64.v v0, (ra) + ld s9, -160(s0) + vs8r.v v0, (s9) # Spill LMUL=8 + vle64.v v0, (s1) + ld s1, -152(s0) + vs8r.v v0, (s1) # Spill LMUL=8 + vadd.vv v16, v16, v8 + ld s1, -128(s0) + vl8r.v v8, (s1) # Reload LMUL=8 + vadd.vv v8, v8, v24 + ld s1, -136(s0) + vl8r.v v24, (s1) # Reload LMUL=8 + ld s1, -144(s0) + vl8r.v v0, (s1) # Reload LMUL=8 + vadd.vv v24, v0, v24 + ld s1, -128(s0) + vs8r.v v24, (s1) # Spill LMUL=8 + ld s1, -152(s0) + vl8r.v v0, (s1) # Reload LMUL=8 + ld s1, -160(s0) + vl8r.v v24, (s1) # Reload LMUL=8 + vadd.vv v0, v0, v24 + add s1, a4, s10 + vse64.v v16, (s1) + add s1, s2, s10 + vse64.v v8, (s1) + vadd.vv v8, v8, v16 + add s1, t4, s10 + ld s9, -128(s0) + vl8r.v v16, (s9) # Reload LMUL=8 + vse64.v v16, (s1) + add s9, t0, s10 + vadd.vv v8, v8, v16 + vle64.v v16, (s9) + add s1, t1, s10 + vse64.v v0, (s1) + vadd.vv v8, v8, v0 + vsll.vi v16, v16, 1 + vadd.vv v8, v8, v16 + vse64.v v8, (s9) + add s6, s6, s7 + add s10, s10, s8 + bne s6, s4, .LBB0_4 ---- If instead of using LMUL=1 for the 8-bit computation, the compiler is allowed @@ -123,52 +124,52 @@ shown in the loop below: ---- .LBB0_4: # %vector.body # =>This Inner Loop Header: Depth=1 - add s9, a2, s6 - vsetvli s1, zero, e8,mf2,ta,mu // LMUL=1/2 ! - vle8.v v25, (s9) - add s1, a3, s6 - vle8.v v26, (s1) - vadd.vv v25, v26, v25 - add s1, a1, s6 - vse8.v v25, (s1) - add s9, a5, s10 - vsetvli s1, zero, e64,m4,ta,mu // LMUL=4 - vle64.v v28, (s9) - add s1, a6, s10 - vle64.v v8, (s1) - vadd.vv v28, v8, v28 - add s1, a7, s10 - vle64.v v8, (s1) - add s1, s3, s10 - vle64.v v12, (s1) - add s1, t6, s10 - vle64.v v16, (s1) - add s1, t5, s10 - vle64.v v20, (s1) - add s1, a4, s10 - vse64.v v28, (s1) - vadd.vv v8, v12, v8 - vadd.vv v12, v20, v16 - add s1, t2, s10 - vle64.v v16, (s1) - add s1, t3, s10 - vle64.v v20, (s1) - add s1, s2, s10 - vse64.v v8, (s1) - add s9, t4, s10 - vadd.vv v16, v20, v16 - add s11, t0, s10 - vle64.v v20, (s11) - vse64.v v12, (s9) - add s1, t1, s10 - vse64.v v16, (s1) - vsll.vi v20, v20, 1 - vadd.vv v28, v8, v28 - vadd.vv v28, v28, v12 - vadd.vv v28, v28, v16 - vadd.vv v28, v28, v20 - vse64.v v28, (s11) - add s6, s6, s7 - add s10, s10, s8 - bne s6, s4, .LBB0_4 + add s9, a2, s6 + vsetvli s1, zero, e8,mf2,ta,mu // LMUL=1/2 ! + vle8.v v25, (s9) + add s1, a3, s6 + vle8.v v26, (s1) + vadd.vv v25, v26, v25 + add s1, a1, s6 + vse8.v v25, (s1) + add s9, a5, s10 + vsetvli s1, zero, e64,m4,ta,mu // LMUL=4 + vle64.v v28, (s9) + add s1, a6, s10 + vle64.v v8, (s1) + vadd.vv v28, v8, v28 + add s1, a7, s10 + vle64.v v8, (s1) + add s1, s3, s10 + vle64.v v12, (s1) + add s1, t6, s10 + vle64.v v16, (s1) + add s1, t5, s10 + vle64.v v20, (s1) + add s1, a4, s10 + vse64.v v28, (s1) + vadd.vv v8, v12, v8 + vadd.vv v12, v20, v16 + add s1, t2, s10 + vle64.v v16, (s1) + add s1, t3, s10 + vle64.v v20, (s1) + add s1, s2, s10 + vse64.v v8, (s1) + add s9, t4, s10 + vadd.vv v16, v20, v16 + add s11, t0, s10 + vle64.v v20, (s11) + vse64.v v12, (s9) + add s1, t1, s10 + vse64.v v16, (s1) + vsll.vi v20, v20, 1 + vadd.vv v28, v8, v28 + vadd.vv v28, v28, v12 + vadd.vv v28, v28, v16 + vadd.vv v28, v28, v20 + vse64.v v28, (s11) + add s6, s6, s7 + add s10, s10, s8 + bne s6, s4, .LBB0_4 ---- diff --git a/src/history.adoc b/src/history.adoc deleted file mode 100644 index 7995d01..0000000 --- a/src/history.adoc +++ /dev/null @@ -1,362 +0,0 @@ -[[history]] -== History and Acknowledgments - -=== "Why Develop a new ISA?" Rationale from Berkeley Group - -We developed RISC-V to support our own needs in research and education, -where our group is particularly interested in actual hardware -implementations of research ideas (we have completed eleven different -silicon fabrications of RISC-V since the first edition of this -specification), and in providing real implementations for students to -explore in classes (RISC-V processor RTL designs have been used in -multiple undergraduate and graduate classes at Berkeley). In our current -research, we are especially interested in the move towards specialized -and heterogeneous accelerators, driven by the power constraints imposed -by the end of conventional transistor scaling. We wanted a highly -flexible and extensible base ISA around which to build our research -effort. - -A question we have been repeatedly asked is "Why develop a new ISA?" -The biggest obvious benefit of using an existing commercial ISA is the -large and widely supported software ecosystem, both development tools -and ported applications, which can be leveraged in research and -teaching. Other benefits include the existence of large amounts of -documentation and tutorial examples. However, our experience of using -commercial instruction sets for research and teaching is that these -benefits are smaller in practice, and do not outweigh the disadvantages: - -* *Commercial ISAs are proprietary.* Except for SPARC V8, which is an -open IEEE standard cite:[sparcieee1994] , most owners of commercial ISAs carefully guard -their intellectual property and do not welcome freely available -competitive implementations. This is much less of an issue for academic -research and teaching using only software simulators, but has been a -major concern for groups wishing to share actual RTL implementations. It -is also a major concern for entities who do not want to trust the few -sources of commercial ISA implementations, but who are prohibited from -creating their own clean room implementations. We cannot guarantee that -all RISC-V implementations will be free of third-party patent -infringements, but we can guarantee we will not attempt to sue a RISC-V -implementor. -* *Commercial ISAs are only popular in certain market domains.* The most -obvious examples at time of writing are that the ARM architecture is not -well supported in the server space, and the Intel x86 architecture (or -for that matter, almost every other architecture) is not well supported -in the mobile space, though both Intel and ARM are attempting to enter -each other's market segments. Another example is ARC and Tensilica, -which provide extensible cores but are focused on the embedded space. -This market segmentation dilutes the benefit of supporting a particular -commercial ISA as in practice the software ecosystem only exists for -certain domains, and has to be built for others. -* *Commercial ISAs come and go.* Previous research infrastructures have -been built around commercial ISAs that are no longer popular (SPARC, -MIPS) or even no longer in production (Alpha). These lose the benefit of -an active software ecosystem, and the lingering intellectual property -issues around the ISA and supporting tools interfere with the ability of -interested third parties to continue supporting the ISA. An open ISA -might also lose popularity, but any interested party can continue using -and developing the ecosystem. -* *Popular commercial ISAs are complex.* The dominant commercial ISAs -(x86 and ARM) are both very complex to implement in hardware to the -level of supporting common software stacks and operating systems. Worse, -nearly all the complexity is due to bad, or at least outdated, ISA -design decisions rather than features that truly improve efficiency. -* *Commercial ISAs alone are not enough to bring up applications.* Even -if we expend the effort to implement a commercial ISA, this is not -enough to run existing applications for that ISA. Most applications need -a complete ABI (application binary interface) to run, not just the -user-level ISA. Most ABIs rely on libraries, which in turn rely on -operating system support. To run an existing operating system requires -implementing the supervisor-level ISA and device interfaces expected by -the OS. These are usually much less well-specified and considerably more -complex to implement than the user-level ISA. -* *Popular commercial ISAs were not designed for extensibility.* The -dominant commercial ISAs were not particularly designed for -extensibility, and as a consequence have added considerable instruction -encoding complexity as their instruction sets have grown. Companies such -as Tensilica (acquired by Cadence) and ARC (acquired by Synopsys) have -built ISAs and toolchains around extensibility, but have focused on -embedded applications rather than general-purpose computing systems. -* *A modified commercial ISA is a new ISA.* One of our main goals is to -support architecture research, including major ISA extensions. Even -small extensions diminish the benefit of using a standard ISA, as -compilers have to be modified and applications rebuilt from source code -to use the extension. Larger extensions that introduce new architectural -state also require modifications to the operating system. Ultimately, -the modified commercial ISA becomes a new ISA, but carries along all the -legacy baggage of the base ISA. - -Our position is that the ISA is perhaps the most important interface in -a computing system, and there is no reason that such an important -interface should be proprietary. The dominant commercial ISAs are based -on instruction-set concepts that were already well known over 30 years -ago. Software developers should be able to target an open standard -hardware target, and commercial processor designers should compete on -implementation quality. - -We are far from the first to contemplate an open ISA design suitable for -hardware implementation. We also considered other existing open ISA -designs, of which the closest to our goals was the OpenRISC -architecture cite:[openriscarch]. We decided against adopting the OpenRISC ISA for several -technical reasons: - -* OpenRISC has condition codes and branch delay slots, which complicate -higher performance implementations. -* OpenRISC uses a fixed 32-bit encoding and 16-bit immediates, which -precludes a denser instruction encoding and limits space for later -expansion of the ISA. -* OpenRISC does not support the 2008 revision to the IEEE 754 -floating-point standard. -* The OpenRISC 64-bit design had not been completed when we began. - -By starting from a clean slate, we could design an ISA that met all of -our goals, though of course, this took far more effort than we had -planned at the outset. We have now invested considerable effort in -building up the RISC-V ISA infrastructure, including documentation, -compiler tool chains, operating system ports, reference ISA simulators, -FPGA implementations, efficient ASIC implementations, architecture test -suites, and teaching materials. Since the last edition of this manual, -there has been considerable uptake of the RISC-V ISA in both academia -and industry, and we have created the non-profit RISC-V Foundation to -protect and promote the standard. The RISC-V Foundation website at -https://riscv.org contains the latest information on the Foundation -membership and various open-source projects using RISC-V. - -=== History from Revision 1.0 of ISA manual - -The RISC-V ISA and instruction-set manual builds upon several earlier -projects. Several aspects of the supervisor-level machine and the -overall format of the manual date back to the T0 (Torrent-0) vector -microprocessor project at UC Berkeley and ICSI, begun in 1992. T0 was a -vector processor based on the MIPS-II ISA, with Krste Asanović as main -architect and RTL designer, and Brian Kingsbury and Bertrand Irrisou as -principal VLSI implementors. David Johnson at ICSI was a major -contributor to the T0 ISA design, particularly supervisor mode, and to -the manual text. John Hauser also provided considerable feedback on the -T0 ISA design. - -The Scale (Software-Controlled Architecture for Low Energy) project at -MIT, begun in 2000, built upon the T0 project infrastructure, refined -the supervisor-level interface, and moved away from the MIPS scalar ISA -by dropping the branch delay slot. Ronny Krashinsky and Christopher -Batten were the principal architects of the Scale Vector-Thread -processor at MIT, while Mark Hampton ported the GCC-based compiler -infrastructure and tools for Scale. - -A lightly edited version of the T0 MIPS scalar processor specification -(MIPS-6371) was used in teaching a new version of the MIT 6.371 -Introduction to VLSI Systems class in the Fall 2002 semester, with Chris -Terman and Krste Asanović as lecturers. Chris Terman contributed most of -the lab material for the class (there was no TA!). The 6.371 class -evolved into the trial 6.884 Complex Digital Design class at MIT, taught -by Arvind and Krste Asanović in Spring 2005, which became a regular -Spring class 6.375. A reduced version of the Scale MIPS-based scalar -ISA, named SMIPS, was used in 6.884/6.375. Christopher Batten was the TA -for the early offerings of these classes and developed a considerable -amount of documentation and lab material based around the SMIPS ISA. -This same SMIPS lab material was adapted and enhanced by TA Yunsup Lee -for the UC Berkeley Fall 2009 CS250 VLSI Systems Design class taught by -John Wawrzynek, Krste Asanović, and John Lazzaro. - -The Maven (Malleable Array of Vector-thread ENgines) project was a -second-generation vector-thread architecture. Its design was led by -Christopher Batten when he was an Exchange Scholar at UC Berkeley -starting in summer 2007. Hidetaka Aoki, a visiting industrial fellow -from Hitachi, gave considerable feedback on the early Maven ISA and -microarchitecture design. The Maven infrastructure was based on the -Scale infrastructure but the Maven ISA moved further away from the MIPS -ISA variant defined in Scale, with a unified floating-point and integer -register file. Maven was designed to support experimentation with -alternative data-parallel accelerators. Yunsup Lee was the main -implementor of the various Maven vector units, while Rimas Avižienis was -the main implementor of the various Maven scalar units. Yunsup Lee and -Christopher Batten ported GCC to work with the new Maven ISA. -Christopher Celio provided the initial definition of a traditional -vector instruction set ("Flood") variant of Maven. - -Based on experience with all these previous projects, the RISC-V ISA -definition was begun in Summer 2010, with Andrew Waterman, Yunsup Lee, -Krste Asanović, and David Patterson as principal designers. An initial -version of the RISC-V 32-bit instruction subset was used in the UC -Berkeley Fall 2010 CS250 VLSI Systems Design class, with Yunsup Lee as -TA. RISC-V is a clean break from the earlier MIPS-inspired designs. John -Hauser contributed to the floating-point ISA definition, including the -sign-injection instructions and a register encoding scheme that permits -internal recoding of floating-point values. - -=== History from Revision 2.0 of ISA manual - -Multiple implementations of RISC-V processors have been completed, -including several silicon fabrications, as shown in -<<silicon, Fabricated RISC-V testchips table>>. - -[[silicon]] -[%autowidth,float="center",align="center",cols="^,^,^,^",options="header",] -|=== -|Name |Tapeout Date |Process |ISA -|Raven-1 |May 29, 2011 |ST 28nm FDSOI |RV64G1_Xhwacha1 -|EOS14 |April 1, 2012 |IBM 45nm SOI |RV64G1p1_Xhwacha2 -|EOS16 |August 17, 2012 |IBM 45nm SOI |RV64G1p1_Xhwacha2 -|Raven-2 |August 22, 2012 |ST 28nm FDSOI |RV64G1p1_Xhwacha2 -|EOS18 |February 6, 2013 |IBM 45nm SOI |RV64G1p1_Xhwacha2 -|EOS20 |July 3, 2013 |IBM 45nm SOI |RV64G1p99_Xhwacha2 -|Raven-3 |September 26, 2013 |ST 28nm SOI |RV64G1p99_Xhwacha2 -|EOS22 |March 7, 2014 |IBM 45nm SOI |RV64G1p9999_Xhwacha3 -|=== - -The first RISC-V processors to be fabricated were written in Verilog and -manufactured in a pre-production FDSOI technology from ST as the Raven-1 -testchip in 2011. Two cores were developed by Yunsup Lee and Andrew -Waterman, advised by Krste Asanović, and fabricated together: 1) an RV64 -scalar core with error-detecting flip-flops, and 2) an RV64 core with an -attached 64-bit floating-point vector unit. The first microarchitecture -was informally known as "TrainWreck", due to the short time available -to complete the design with immature design libraries. - -Subsequently, a clean microarchitecture for an in-order decoupled RV64 -core was developed by Andrew Waterman, Rimas Avižienis, and Yunsup Lee, -advised by Krste Asanović, and, continuing the railway theme, was -codenamed "Rocket" after George Stephenson's successful steam -locomotive design. Rocket was written in Chisel, a new hardware design -language developed at UC Berkeley. The IEEE floating-point units used in -Rocket were developed by John Hauser, Andrew Waterman, and Brian -Richards. Rocket has since been refined and developed further, and has -been fabricated two more times in FDSOI (Raven-2, Raven-3), and five -times in IBM SOI technology (EOS14, EOS16, EOS18, EOS20, EOS22) for a -photonics project. Work is ongoing to make the Rocket design available -as a parameterized RISC-V processor generator. - -EOS14-EOS22 chips include early versions of Hwacha, a 64-bit IEEE -floating-point vector unit, developed by Yunsup Lee, Andrew Waterman, -Huy Vo, Albert Ou, Quan Nguyen, and Stephen Twigg, advised by Krste -Asanović. EOS16-EOS22 chips include dual cores with a cache-coherence -protocol developed by Henry Cook and Andrew Waterman, advised by Krste -Asanović. EOS14 silicon has successfully run at 1.25 GHz. EOS16 silicon suffered -from a bug in the IBM pad libraries. EOS18 and EOS20 have successfully -run at 1.35 GHz. - -Contributors to the Raven testchips include Yunsup Lee, Andrew Waterman, -Rimas Avižienis, Brian Zimmer, Jaehwa Kwak, Ruzica Jevtić, Milovan -Blagojević, Alberto Puggelli, Steven Bailey, Ben Keller, Pi-Feng Chiu, -Brian Richards, Borivoje Nikolić, and Krste Asanović. - -Contributors to the EOS testchips include Yunsup Lee, Rimas Avižienis, -Andrew Waterman, Henry Cook, Huy Vo, Daiwei Li, Chen Sun, Albert Ou, -Quan Nguyen, Stephen Twigg, Vladimir Stojanović, and Krste Asanović. - -Andrew Waterman and Yunsup Lee developed the C++ ISA simulator -"Spike", used as a golden model in development and named after the -golden spike used to celebrate completion of the US transcontinental -railway. Spike has been made available as a BSD open-source project. - -Andrew Waterman completed a Master's thesis with a preliminary design of -the RISC-V compressed instruction set cite:[waterman-ms]. - -Various FPGA implementations of the RISC-V have been completed, -primarily as part of integrated demos for the Par Lab project research -retreats. The largest FPGA design has 3 cache-coherent RV64IMA -processors running a research operating system. Contributors to the FPGA -implementations include Andrew Waterman, Yunsup Lee, Rimas Avižienis, -and Krste Asanović. - -RISC-V processors have been used in several classes at UC Berkeley. -Rocket was used in the Fall 2011 offering of CS250 as a basis for class -projects, with Brian Zimmer as TA. For the undergraduate CS152 class in -Spring 2012, Christopher Celio used Chisel to write a suite of -educational RV32 processors, named "Sodor" after the island on which -"Thomas the Tank Engine" and friends live. The suite includes a -microcoded core, an unpipelined core, and 2, 3, and 5-stage pipelined -cores, and is publicly available under a BSD license. The suite was -subsequently updated and used again in CS152 in Spring 2013, with Yunsup -Lee as TA, and in Spring 2014, with Eric Love as TA. Christopher Celio -also developed an out-of-order RV64 design known as BOOM (Berkeley -Out-of-Order Machine), with accompanying pipeline visualizations, that -was used in the CS152 classes. The CS152 classes also used -cache-coherent versions of the Rocket core developed by Andrew Waterman -and Henry Cook. - -Over the summer of 2013, the RoCC (Rocket Custom Coprocessor) interface -was defined to simplify adding custom accelerators to the Rocket core. -Rocket and the RoCC interface were used extensively in the Fall 2013 -CS250 VLSI class taught by Jonathan Bachrach, with several student -accelerator projects built to the RoCC interface. The Hwacha vector unit -has been rewritten as a RoCC coprocessor. - -Two Berkeley undergraduates, Quan Nguyen and Albert Ou, have -successfully ported Linux to run on RISC-V in Spring 2013. - -Colin Schmidt successfully completed an LLVM backend for RISC-V 2.0 in -January 2014. - -Darius Rad at Bluespec contributed soft-float ABI support to the GCC -port in March 2014. - -John Hauser contributed the definition of the floating-point -classification instructions. - -We are aware of several other RISC-V core implementations, including one -in Verilog by Tommy Thorn, and one in Bluespec by Rishiyur Nikhil. - -=== Acknowledgments - -Thanks to Christopher F. Batten, Preston Briggs, Christopher Celio, -David Chisnall, Stefan Freudenberger, John Hauser, Ben Keller, Rishiyur -Nikhil, Michael Taylor, Tommy Thorn, and Robert Watson for comments on -the draft ISA version 2.0 specification. - -=== History from Revision 2.1 - -Uptake of the RISC-V ISA has been very rapid since the introduction of -the frozen version 2.0 in May 2014, with too much activity to record in -a short history section such as this. Perhaps the most important single -event was the formation of the non-profit RISC-V Foundation in August -2015. The Foundation will now take over stewardship of the official -RISC-V ISA standard, and the official website `riscv.org` is the best -place to obtain news and updates on the RISC-V standard. - -=== Acknowledgments - -Thanks to Scott Beamer, Allen J. Baum, Christopher Celio, David -Chisnall, Paul Clayton, Palmer Dabbelt, Jan Gray, Michael Hamburg, and -John Hauser for comments on the version 2.0 specification. - -=== History from Revision 2.2 - -=== Acknowledgments - -Thanks to Jacob Bachmeyer, Alex Bradbury, David Horner, Stefan O’Rear, -and Joseph Myers for comments on the version 2.1 specification. - -=== History for Revision 2.3 - -Uptake of RISC-V continues at a breakneck pace. - -John Hauser and Andrew Waterman contributed a hypervisor ISA extension -based upon a proposal from Paolo Bonzini. - -Daniel Lustig, Arvind, Krste Asanović, Shaked Flur, Paul Loewenstein, -Yatin Manerkar, Luc Maranget, Margaret Martonosi, Vijayanand Nagarajan, -Rishiyur Nikhil, Jonas Oberhauser, Christopher Pulte, Jose Renau, Peter -Sewell, Susmit Sarkar, Caroline Trippel, Muralidaran Vijayaraghavan, -Andrew Waterman, Derek Williams, Andrew Wright, and Sizhuo Zhang -contributed the memory consistency model. - -=== Funding - -Development of the RISC-V architecture and implementations has been -partially funded by the following sponsors. - -* *Par Lab:* Research supported by Microsoft (Award # 024263) and Intel -(Award # 024894) funding and by matching funding by U.C. Discovery (Award -# DIG07-10227). Additional support came from Par Lab affiliates Nokia, -NVIDIA, Oracle, and Samsung. -* *Project Isis:* DoE Award DE-SC0003624. -* *ASPIRE Lab*: DARPA PERFECT program, Award HR0011-12-2-0016. DARPA -POEM program Award HR0011-11-C-0100. The Center for Future Architectures -Research (C-FAR), a STARnet center funded by the Semiconductor Research -Corporation. Additional support from ASPIRE industrial sponsor, Intel, -and ASPIRE affiliates, Google, Hewlett Packard Enterprise, Huawei, -Nokia, NVIDIA, Oracle, and Samsung. - -The content of this paper does not necessarily reflect the position or -the policy of the US government and no official endorsement should be -inferred. diff --git a/src/hypervisor.adoc b/src/hypervisor.adoc index d8a77e0..9887cb0 100644 --- a/src/hypervisor.adoc +++ b/src/hypervisor.adoc @@ -158,7 +158,8 @@ In this chapter, we use the term _HSXLEN_ to refer to the effective XLEN when executing in HS-mode, and _VSXLEN_ to refer to the effective XLEN when executing in VS-mode. -==== Hypervisor Status (`hstatus`) Register +[[sec:hstatus]] +==== Hypervisor Status (`hstatus`) Register The `hstatus` register is an HSXLEN-bit read/write register formatted as shown in <<hstatusreg-rv32>> when HSXLEN=32 @@ -167,19 +168,56 @@ register provides facilities analogous to the `mstatus` register for tracking and controlling the exception behavior of a VS-mode guest. [[hstatusreg-rv32]] -.Hypervisor status register (`hstatus`) when HSLEN=32 -include::images/bytefield/hstatusreg-rv32.edn[] +.Hypervisor status register (`hstatus`) when HSXLEN=32 +[wavedrom,, svg] +.... +{reg: [ + {bits: 5, name: 'WPRI'}, + {bits: 1, name: 'VSBE'}, + {bits: 1, name: 'GVA'}, + {bits: 1, name: 'SPV'}, + {bits: 1, name: 'SPVP'}, + {bits: 1, name: 'HU'}, + {bits: 2, name: 'WPRI'}, + {bits: 6, name: 'VGEIN'}, + {bits: 2, name: 'WPRI'}, + {bits: 1, name: 'VTVM'}, + {bits: 1, name: 'VTW'}, + {bits: 1, name: 'VTSR'}, + {bits: 9, name: 'WPRI'}, +], config:{lanes: 2, hspace:1024}} +.... [[hstatusreg]] .Hypervisor status register (`hstatus`) when HSXLEN=64. -include::images/bytefield/hstatusreg.edn[] - +[wavedrom,, svg] +.... +{reg: [ + {bits: 5, name: 'WPRI'}, + {bits: 1, name: 'VSBE'}, + {bits: 1, name: 'GVA'}, + {bits: 1, name: 'SPV'}, + {bits: 1, name: 'SPVP'}, + {bits: 1, name: 'HU'}, + {bits: 2, name: 'WPRI'}, + {bits: 6, name: 'VGEIN'}, + {bits: 2, name: 'WPRI'}, + {bits: 1, name: 'VTVM'}, + {bits: 1, name: 'VTW'}, + {bits: 1, name: 'VTSR'}, + {bits: 9, name: 'WPRI'}, + {bits: 2, name: 'VSXL'}, + {bits: 14, name: 'WPRI'}, + {bits: 2, name: 'HUPMM'}, + {bits: 14, name: 'WPRI'}, +], config:{lanes: 4, hspace:1024}} +.... The VSXL field controls the effective XLEN for VS-mode (known as VSXLEN), which may differ from the XLEN for HS-mode (HSXLEN). When HSXLEN=32, the VSXL field does not exist, and VSXLEN=32. When HSXLEN=64, VSXL is a *WARL* field that is encoded the same as the MXL field of `misa`, -shown in <<misabase>> on page <<misabase, 19>>. In particular, an +shown in <<misabase>>. In particular, an implementation may make VSXL be a read-only field whose value always ensures that VSXLEN=HSXLEN. @@ -270,7 +308,7 @@ to VS-level memory management data structures, such as page tables. An implementation may make VSBE a read-only field that always specifies the same endianness as HS-mode. -==== Hypervisor Trap Delegation (`hedeleg` and `hideleg`) Registers +==== Hypervisor Trap Delegation (`hedeleg` and `hideleg`) Registers Register `hedeleg` is a 64-bit read/write register, formatted as shown in <<hedelegreg>>. @@ -287,7 +325,7 @@ to a VS-mode guest; their layout is the same as `medeleg` and `mideleg`. include::images/bytefield/hedelegreg.edn[] [[hidelegreg]] -.Hypervisor exception delegation register (`hideleg`). +.Hypervisor interrupt delegation register (`hideleg`). include::images/bytefield/hidelegreg.edn[] A synchronous trap that has been delegated to HS-mode (using `medeleg`) @@ -296,7 +334,7 @@ corresponding `hedeleg` bit is set. Each bit of `hedeleg` shall be either writable or read-only zero. Many bits of `hedeleg` are required specifically to be writable or zero, as enumerated in <<hedeleg-bits>>. Bit 0, corresponding to -instruction address misaligned exceptions, must be writable if +instruction address-misaligned exceptions, must be writable if IALIGN=32. [NOTE] @@ -399,7 +437,7 @@ Store/AMO guest-page fault |=== [[hinterruptregs]] -==== Hypervisor Interrupt (`hvip`, `hip`, and `hie`) Registers +==== Hypervisor Interrupt (`hvip`, `hip`, and `hie`) Registers Register `hvip` is an HSXLEN-bit read/write register that a hypervisor can write to indicate virtual interrupts intended for VS-mode. Bits of @@ -501,7 +539,7 @@ is an alias (writable) of the same bit in `hvip`. Multiple simultaneous interrupts destined for HS-mode are handled in the following decreasing priority order: SEI, SSI, STI, SGEI, VSEI, VSSI, -VSTI. +VSTI, LCOFI. [[hgeinterruptregs]] ==== Hypervisor Guest External Interrupt Registers (`hgeip` and `hgeie`) @@ -651,9 +689,7 @@ The definition of the CBZE field is furnished by the Zicboz extension. The definitions of the CBCFE and CBIE fields are furnished by the Zicbom extension. -The definition of the PMM field will be furnished by the forthcoming -Ssnpm extension. Its allocation within `henvcfg` may change prior to the -ratification of that extension. +The definition of the PMM field is furnished by the Ssnpm extension. The Zicfilp extension adds the `LPE` field in `henvcfg`. When the `LPE` field is set to 1, the Zicfilp extension is enabled in VS-mode. When the `LPE` field @@ -666,13 +702,13 @@ apply to VS-mode: The Zicfiss extension adds the `SSE` field in `henvcfg`. If the `SSE` field is set to 1, the Zicfiss extension is activated in VS-mode. When the `SSE` field is 0, the Zicfiss extension remains inactive in VS-mode, and the following rules -apply when `V=1`: +apply when `V=1`: * 32-bit Zicfiss instructions will revert to their behavior as defined by Zimop. * 16-bit Zicfiss instructions will revert to their behavior as defined by Zcmop. * The `pte.xwr=010b` encoding in VS-stage page tables becomes reserved. * The `senvcfg.SSE` field will read as zero and is read-only. -* When `menvcfg.SSE` is one, `SSAMOSWAP.W/D` raises a virtual instruction +* When `menvcfg.SSE` is one, `SSAMOSWAP.W/D` raises a virtual-instruction exception. The Ssdbltrp extension adds the double-trap-enable (`DTE`) field in `henvcfg`. @@ -684,7 +720,7 @@ When XLEN=32, `henvcfgh` is a of `henvcfg`. Register `henvcfgh` does not exist when XLEN=64. -==== Hypervisor Counter-Enable (`hcounteren`) Register +==== Hypervisor Counter-Enable (`hcounteren`) Register The counter-enable register `hcounteren` is a 32-bit register that controls the availability of the hardware performance monitoring @@ -706,7 +742,7 @@ readable unless the applicable bits are set in both `hcounteren` and read-only zero, indicating reads to the corresponding counter will cause an exception when V=1. Hence, they are effectively *WARL* fields. -==== Hypervisor Time Delta (`htimedelta`) Register +==== Hypervisor Time Delta (`htimedelta`) Register The `htimedelta` CSR is a 64-bit read/write register that contains the delta between the value of the `time` CSR and the value returned in VS-mode or @@ -726,7 +762,10 @@ When XLEN=32, `htimedeltah` is a 32-bit read/write register that aliases bits 63:32 of `htimedelta`. Register `htimedeltah` does not exist when XLEN=64. -==== Hypervisor Trap Value (`htval`) Register +If the `time` CSR is implemented, `htimedelta` (and `htimedeltah` for XLEN=32) +must be implemented. + +==== Hypervisor Trap Value (`htval`) Register The `htval` register is an HSXLEN-bit read/write register formatted as shown in <<htvalreg>>. When a trap is taken into @@ -787,7 +826,7 @@ software that writes a value to `htval` should read back from `htval` to confirm the stored value. ==== -==== Hypervisor Trap Instruction (`htinst`) Register +==== Hypervisor Trap Instruction (`htinst`) Register The `htinst` register is an HSXLEN-bit read/write register formatted as shown in <<htinstreg>>. When a trap is taken into @@ -804,7 +843,7 @@ include::images/bytefield/htinstreg.edn[] the implementation may automatically write to it on a trap. [[hgatp]] -==== Hypervisor Guest Address Translation and Protection (`hgatp`) Register +==== Hypervisor Guest Address Translation and Protection (`hgatp`) Register The `hgatp` register is an HSXLEN-bit read/write register, formatted as shown in <<rv32hgatp>> for HSXLEN=32 and @@ -824,7 +863,7 @@ executing in HS-mode will raise an illegal-instruction exception. include::images/bytefield/rv32hgatp.edn[] [[rv64hgatp]] -.Hypervisor guest address translation and protection register `hgatp` when HSXLEN=64 for MODE values Bare, Sv39x4, and Sv57x4. +.Hypervisor guest address translation and protection register `hgatp` when HSXLEN=64 for MODE values Bare, Sv39x4, Sv48x4, and Sv57x4. include::images/bytefield/rv64hgatp.edn[] <<hgatp-mode>> shows the encodings of the MODE field when @@ -832,7 +871,10 @@ HSXLEN=32 and HSXLEN=64. When MODE=Bare, guest physical addresses are equal to supervisor physical addresses, and there is no further memory protection for a guest virtual machine beyond the physical memory protection scheme described in <<pmp>>. In this -case, the remaining fields in `hgatp` must be set to zeros. +case, software must write zero to the remaining fields in `hgatp`. +Attempting to select MODE=Bare with a nonzero pattern in the remaining fields +has an UNSPECIFIED effect on the value that the remaining fields assume and an +UNSPECIFIED effect on G-stage address translation and protection behavior. When HSXLEN=32, the only other valid setting for MODE is Sv32x4, which is a modification of the usual Sv32 paged virtual-memory scheme, @@ -925,7 +967,7 @@ HFENCE.GVMA instruction (see <<hfence.vma>>) before or after writing `hgatp`. [[vsstatus]] -==== Virtual Supervisor Status (`vsstatus`) Register +==== Virtual Supervisor Status (`vsstatus`) Register The `vsstatus` register is a VSXLEN-bit read/write register that is VS-mode’s version of supervisor register `sstatus`, formatted as shown @@ -935,7 +977,7 @@ in <<vsstatusreg-rv32>> when VSXLEN=32 and normally read or modify `sstatus` actually access `vsstatus` instead. [[vsstatusreg-rv32]] -.Virtual supervisor status (`vstatus`) register when VSXLEN=32. +.Virtual supervisor status (`vsstatus`) register when VSXLEN=32. [wavedrom, ,svg] .... {reg: [ @@ -1045,7 +1087,7 @@ encoded as follows: The Ssdbltrp adds an S-mode-disable-trap (`SDT`) field extension to address double trap (See <<supv-double-trap>>) in VS-mode. -==== Virtual Supervisor Interrupt (`vsip` and `vsie`) Registers +==== Virtual Supervisor Interrupt (`vsip` and `vsie`) Registers The `vsip` and `vsie` registers are VSXLEN-bit read/write registers that are VS-mode’s versions of supervisor CSRs `sip` and `sie`, formatted as @@ -1198,7 +1240,7 @@ execution, unless the effective privilege mode is VS or VU. [NOTE] ==== In particular, virtual-machine load/store (HLV, HLVX, or HSV) -instructions that are misspeculatively executed must not cause VS-stage +instructions that are mispredicted must not cause VS-stage A bits to be set. ==== @@ -1223,14 +1265,12 @@ include::images/wavedrom/hypv-virt-load-and-store.edn[] The hypervisor virtual-machine load and store instructions are valid only in M-mode or HS-mode, or in U-mode when `hstatus`.HU=1. Each -instruction performs an explicit memory access as though V=1; i.e., with -the address translation and protection, and the endianness, that apply -to memory accesses in either VS-mode or VU-mode. Field SPVP of `hstatus` -controls the privilege level of the access. The explicit memory access -is done as though in VU-mode when SPVP=0, and as though in VS-mode when -SPVP=1. As usual when V=1, two-stage address translation is applied, and +instruction performs an explicit memory access with an effective privilege mode +of VS or VU. The effective privilege mode of the explicit memory access is VU +when `hstatus`.SPVP=0, and VS when `hstatus`.SPVP=1. As usual for VS-mode and +VU-mode, two-stage address translation is applied, and the HS-level `sstatus`.SUM is ignored. HS-level `sstatus`.MXR makes -execute-only pages readable for both stages of address translation +execute-only pages readable by explicit loads for both stages of address translation (VS-stage and G-stage), whereas `vsstatus`.MXR affects only the first translation stage (VS-stage). @@ -1598,7 +1638,7 @@ there is no option to disable two-stage address translation when V=1, either stage of translation can be effectively disabled by zeroing the corresponding `vsatp` or `hgatp` register. -The `vsstatus` field MXR, which makes execute-only pages readable, only +The `vsstatus` field MXR, which makes execute-only pages readable by explicit loads, only overrides VS-stage page protection. Setting MXR at VS-level does not override guest-physical page protections. Setting MXR at HS-level, however, overrides both VS-stage and G-stage execute-only permissions. @@ -1842,7 +1882,7 @@ usual. .Machine and supervisor cause register (`mcause` and `scause`) values when the hypervisor extension is implemented. [%autowidth,float="center",align="center",cols=">,>,<",options="header"] |=== -|Interrupt |Exception Code |Description +|Interrupt |Exception Code |Description |1 + 1 + 1 + @@ -1866,7 +1906,7 @@ Machine software interrupt |_Reserved_ + Supervisor timer interrupt + Virtual supervisor timer interrupt + -Machine timer interrupt +Machine timer interrupt |1 + 1 + 1 + @@ -1878,7 +1918,7 @@ Machine timer interrupt |_Reserved_ + Supervisor external interrupt + Virtual supervisor external interrupt + -Machine external interrupt +Machine external interrupt |1 + 1 + 1 + @@ -1916,6 +1956,9 @@ _Designated for platform use_ 0 + 0 + 0 + +0 + +0 + +0 + |0 + 1 + 2 + @@ -1932,7 +1975,10 @@ _Designated for platform use_ 13 + 14 + 15 + -16-19 + +16 + +17 + +18 + +19 + 20 + 21 + 22 + @@ -1957,7 +2003,10 @@ Instruction page fault + Load page fault + _Reserved_ + Store/AMO page fault + +Double trap + _Reserved_ + +Software check + +Hardware error + Instruction guest-page fault + Load guest-page fault + Virtual instruction + @@ -1965,7 +2014,7 @@ Store/AMO guest-page fault + _Designated for custom use_ + _Reserved_ + _Designated for custom use_ + -_Reserved_ +_Reserved_ |=== HS-mode and VS-mode ECALLs use different cause values so they can be @@ -2112,7 +2161,7 @@ or access fault | .>|5, 7 |With physical address for an explicit memory access: +    Load/store/AMO access fault -.>|_Lowest_ .>|4, 6 |If not higher priority: + +.>|_Lowest_ .>|4, 6 |If not higher priority: +    Load/store/AMO address misaligned |=== @@ -2140,7 +2189,7 @@ MPIE, and MIE in `mstatus`/`mstatush` and writes CSRs `mepc`, `mcause`, [%autowidth,float="center",align="center",cols="<,^,^",options="header"] |=== |Previous Mode |MPV |MPP -|U-mode + +|U-mode + HS-mode + M-mode|0 + 0 + @@ -2391,7 +2440,7 @@ transformed instruction has the format shown in <<transformedloadinst>>. [[transformedloadinst]] -.Transformed noncompressed load instruction (LB, LBU, LH, LHU, LW, LWU, LD, FLW, FLD, FLQ, or FLH). Fields funct3, rd, and opcode are the same as the trapping load instruction. +.Transformed load instruction (LB, LBU, LH, LHU, LW, LWU, LD, FLW, FLD, FLQ, or FLH). Fields funct3, rd, and opcode are the same as the trapping load instruction. include::images/wavedrom/transformedloadinst.edn[] For a standard store instruction that is not a compressed instruction @@ -2400,13 +2449,13 @@ instruction has the format shown in <<transformedstoreinst>>. [[transformedstoreinst]] -.Transformed noncompressed store instruction (SB, SH, SW, SD, FSW, FSD, FSQ, or FSH). Fields rs2, funct3, and opcode are the same as the trapping store instruction. +.Transformed store instruction (SB, SH, SW, SD, FSW, FSD, FSQ, or FSH). Fields rs2, funct3, and opcode are the same as the trapping store instruction. include::images/wavedrom/transformedstoreinst.edn[] For a standard atomic instruction (load-reserved, store-conditional, or AMO instruction), the transformed instruction has the format shown in <<transformedatomicinst>>. [[transformedatomicinst]] -.Transformed atomic instruction (load-reserved, store-conditional, or AMO instruc-tion). All fields are the same as the trapping instruction except bits 19:15, Addr. Offset. +.Transformed atomic instruction (load-reserved, store-conditional, or AMO instruction). All fields are the same as the trapping instruction except bits 19:15, Addr. Offset. include::images/wavedrom/transformedatomicinst.edn[] For a standard virtual-machine load/store instruction (HLV, HLVX, or HSV), the transformed instruction has the format shown in <<transformedvmaccessinst>>. @@ -2478,7 +2527,7 @@ with the encodings of basic loads and stores, as illustrated by <<pseudoinsts-basis>>. [[pseudoinsts-basis]] -.Standard instructions corresponding to the special psudoinstructions of <<pseudoinsts>>. +.Standard instructions corresponding to the special pseudoinstructions of <<pseudoinsts>>. [%autowidth,float="center",align="center",cols="<,<",options="header"] |=== |Encoding |Instruction diff --git a/src/images/bytefield/counteren.adoc b/src/images/bytefield/counteren.edn index 9759ca5..1fceba1 100644 --- a/src/images/bytefield/counteren.adoc +++ b/src/images/bytefield/counteren.edn @@ -27,4 +27,4 @@ (draw-box "1" {:span 1 :borders {}}) (draw-box "1" {:span 1 :borders {}}) (draw-box "1" {:span 1 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/counterinh.adoc b/src/images/bytefield/counterinh.edn index 3b3e956..48ebc10 100644 --- a/src/images/bytefield/counterinh.adoc +++ b/src/images/bytefield/counterinh.edn @@ -27,4 +27,4 @@ (draw-box "1" {:span 1 :borders {}}) (draw-box "1" {:span 1 :borders {}}) (draw-box "1" {:span 1 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/cust-sys-instr.adoc b/src/images/bytefield/cust-sys-instr.edn index 07db770..a4ce653 100644 --- a/src/images/bytefield/cust-sys-instr.adoc +++ b/src/images/bytefield/cust-sys-instr.edn @@ -77,4 +77,4 @@ (draw-box (text "custom" {:font-style "italic"}) {:span 3 :borders {}}) (draw-box "SYSTEM" {:span 3 :borders {}}) (draw-box (text "Machine-Level" {}) {:span 7 :text-anchor "middle" :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/epcreg.edn b/src/images/bytefield/epcreg.edn index 9486709..567db54 100644 --- a/src/images/bytefield/epcreg.edn +++ b/src/images/bytefield/epcreg.edn @@ -13,4 +13,4 @@ (draw-box "sepc" {:span 32}) (draw-box "SXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hcounterenreg.edn b/src/images/bytefield/hcounterenreg.edn index 177e64d..71a195b 100644 --- a/src/images/bytefield/hcounterenreg.edn +++ b/src/images/bytefield/hcounterenreg.edn @@ -40,4 +40,4 @@ (draw-box "1" {:borders {}}) (draw-box "1" {:borders {}}) (draw-box "1" {:borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hgeiereg.edn b/src/images/bytefield/hgeiereg.edn index 251bae0..76540e1 100644 --- a/src/images/bytefield/hgeiereg.edn +++ b/src/images/bytefield/hgeiereg.edn @@ -13,4 +13,4 @@ (draw-box "0" ) (draw-box "HSXLEN" {:font-size 24 :span 31 :borders {}}) (draw-box "1" {:borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hgeipreg.edn b/src/images/bytefield/hgeipreg.edn index 11eac20..d821c76 100644 --- a/src/images/bytefield/hgeipreg.edn +++ b/src/images/bytefield/hgeipreg.edn @@ -12,4 +12,4 @@ (draw-box "0" ) (draw-box "HSXLEN" {:span 31 :borders {}}) (draw-box "1" {:borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hidelegreg.edn b/src/images/bytefield/hidelegreg.edn index 472c6e5..a5730ec 100644 --- a/src/images/bytefield/hidelegreg.edn +++ b/src/images/bytefield/hidelegreg.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" { :font-weight "bold" :font-size 24}) {:span 16 :text-anchor "start" :borders{:top :border-unrelated :bottom :border-unrelated :right :border-unrelated}}) (draw-box "HSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hiereg-standard.edn b/src/images/bytefield/hiereg-standard.edn index 413d764..d1b79cb 100644 --- a/src/images/bytefield/hiereg-standard.edn +++ b/src/images/bytefield/hiereg-standard.edn @@ -40,4 +40,4 @@ (draw-box "3" {:span 2 :borders {}}) (draw-box "1" {:span 4 :borders {}}) (draw-box "2" {:span 2 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hiereg.edn b/src/images/bytefield/hiereg.edn index 472c6e5..a5730ec 100644 --- a/src/images/bytefield/hiereg.edn +++ b/src/images/bytefield/hiereg.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" { :font-weight "bold" :font-size 24}) {:span 16 :text-anchor "start" :borders{:top :border-unrelated :bottom :border-unrelated :right :border-unrelated}}) (draw-box "HSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hipreg-standard.edn b/src/images/bytefield/hipreg-standard.edn index 082d524..a746582 100644 --- a/src/images/bytefield/hipreg-standard.edn +++ b/src/images/bytefield/hipreg-standard.edn @@ -40,4 +40,4 @@ (draw-box "3" {:span 2 :borders {}}) (draw-box "1" {:span 4 :borders {}}) (draw-box "2" {:span 2 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hipreg.edn b/src/images/bytefield/hipreg.edn index 472c6e5..a5730ec 100644 --- a/src/images/bytefield/hipreg.edn +++ b/src/images/bytefield/hipreg.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" { :font-weight "bold" :font-size 24}) {:span 16 :text-anchor "start" :borders{:top :border-unrelated :bottom :border-unrelated :right :border-unrelated}}) (draw-box "HSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hpmevents.adoc b/src/images/bytefield/hpmevents.edn index 8556b8b..8556b8b 100644 --- a/src/images/bytefield/hpmevents.adoc +++ b/src/images/bytefield/hpmevents.edn diff --git a/src/images/bytefield/hstatusreg-rv32.edn b/src/images/bytefield/hstatusreg-rv32.edn deleted file mode 100644 index 2762ce6..0000000 --- a/src/images/bytefield/hstatusreg-rv32.edn +++ /dev/null @@ -1,59 +0,0 @@ -[bytefield] ----- -(defattrs :plain [:plain {:font-family "M+ 1p Fallback" :font-size 24}]) -(def row-height 40 ) -(def row-header-fn nil) -(def left-margin 30) -(def right-margin 30) -(def boxes-per-row 32) - -(draw-box nil {:borders {}}) -(draw-box "31" {:span 2 :borders {} :text-anchor "start"}) -(draw-box "23" {:borders {}}) -(draw-box "22" {:span 2 :borders {}}) -(draw-box "21" {:span 2 :borders {}}) -(draw-box "20" {:span 2 :borders {}}) -(draw-box "19" {:borders {}}) -(draw-box "18" {:borders {}}) -(draw-box "17" {:span 2 :borders {} :text-anchor "start"}) -(draw-box "12" {:span 3:borders {}}) -(draw-box "11" {:borders {}}) -(draw-box "10" {:borders {}}) -(draw-box "9" {:borders {}}) -(draw-box "8" {:span 2 :borders {}}) -(draw-box "7" {:span 3 :borders {}}) -(draw-box "6" {:span 3 :borders {}}) -(draw-box "5" {:span 2 :borders {}}) -(draw-box "4" {:borders {}}) -(draw-box "0" {:borders {}}) - -(draw-box nil {:borders {}}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 3}) -(draw-box "VTSR" {:span 2}) -(draw-box "VTW" {:span 2}) -(draw-box "VTVM" {:span 2}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 2}) -(draw-box "VGEIN[5:0]" {:span 5}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 2}) -(draw-box "HU") -(draw-box "SPVP" {:span 2}) -(draw-box "SPV" {:span 3}) -(draw-box "GVA" {:span 3}) -(draw-box "VSBE" {:span 2}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 2}) - -(draw-box nil {:borders {}}) -(draw-box "9" {:span 3 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "2" {:span 2 :borders {}}) -(draw-box "6" {:span 5 :borders {}}) -(draw-box "2" {:span 2 :borders {}}) -(draw-box "1" {:borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 3 :borders {}}) -(draw-box "1" {:span 3 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "5" {:span 2 :borders {}}) ----- diff --git a/src/images/bytefield/hstatusreg.edn b/src/images/bytefield/hstatusreg.edn deleted file mode 100644 index cce601e..0000000 --- a/src/images/bytefield/hstatusreg.edn +++ /dev/null @@ -1,86 +0,0 @@ -[bytefield] ----- -(defattrs :plain [:plain {:font-family "M+ 1p Fallback" :font-size 24}]) -(def row-height 40 ) -(def row-header-fn nil) -(def left-margin 30) -(def right-margin 30) -(def boxes-per-row 32) - -(draw-box nil {:span 3 :borders {}}) -(draw-box "63" {:span 8 :borders {} :text-anchor "start"}) -(draw-box "34" {:borders {}}) -(draw-box "33" {:span 2 :borders {} :text-anchor "start"}) -(draw-box "32" {:span 2 :borders {} :text-anchor "end"}) -(draw-box "31" {:span 3 :borders {} :text-anchor "start"}) -(draw-box "23" {:span 3 :borders {} :text-anchor "end"}) -(draw-box "22" {:span 2:borders {}}) -(draw-box "21" {:span 2 :borders {}}) -(draw-box "20" {:span 2:borders {}}) -(draw-box nil {:borders {}}) -(draw-box nil {:span 3 :borders {}}) - -(draw-box nil {:span 3 :borders {}}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 9}) -(draw-box "VSXL[1:0]" {:span 4}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 6}) -(draw-box "VTSR" {:span 2}) -(draw-box "VTW" {:span 2}) -(draw-box "VTVM" {:span 2}) -(draw-box nil {:borders {:top :border-unrelated :bottom :border-unrelated}}) -(draw-box nil {:span 3 :borders {}}) - -(draw-box nil {:span 3 :borders {}}) -(draw-box "30" {:span 9 :borders {}}) -(draw-box "2" {:span 4 :borders {}}) -(draw-box "9" {:span 6 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box nil {:span 4 :borders {}}) - -(draw-box nil {:span 32 :borders {}}) - -(draw-box nil {:span 6 :borders {}}) -(draw-box nil {:borders {}}) -(draw-box "19" {:borders {}}) -(draw-box "18" {:borders {}}) -(draw-box "17" {:span 2 :borders {} :text-anchor "start"}) -(draw-box "12" {:span 2 :borders {} :text-anchor "end"}) -(draw-box "11" {:borders {}}) -(draw-box "10" {:borders {}}) -(draw-box "9" {:borders {}}) -(draw-box "8" {:span 2 :borders {}}) -(draw-box "7" {:span 2 :borders {}}) -(draw-box "6" {:span 2:borders {}}) -(draw-box "5" {:span 2 :borders {}}) -(draw-box "4" {:span 2 :borders {} :text-anchor "start"}) -(draw-box "0" {:span 2 :borders {} :text-anchor "end"}) -(draw-box nil {:span 4 :borders {}}) - -(draw-box nil {:span 6 :borders {}}) -(draw-box nil {:borders {:top :border-unrelated :bottom :border-unrelated}}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 2}) -(draw-box "VGEIN[5:0]" {:span 4}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 2}) -(draw-box "HU") -(draw-box "SPVP" {:span 2}) -(draw-box "SPV" {:span 2}) -(draw-box "GVA" {:span 2}) -(draw-box "VSBE" {:span 2}) -(draw-box (text "WPRI" {:font-weight "bold" :font-size 24}) {:span 4}) -(draw-box nil {:span 4 :borders {}}) - -(draw-box nil {:span 7 :borders {}}) -(draw-box "2" {:span 2 :borders {}}) -(draw-box "6" {:span 4 :borders {}}) -(draw-box "2" {:span 2 :borders {}}) -(draw-box "1" {:borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "5" {:span 4 :borders {}}) -(draw-box nil {:span 4 :borders {}}) - -----
\ No newline at end of file diff --git a/src/images/bytefield/htimedelta.edn b/src/images/bytefield/htimedelta.edn index 946778b..6853c65 100644 --- a/src/images/bytefield/htimedelta.edn +++ b/src/images/bytefield/htimedelta.edn @@ -13,4 +13,4 @@ (draw-box "htimedelta" {:font-size 20 :span 32}) (draw-box "64" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/htimedeltah.edn b/src/images/bytefield/htimedeltah.edn index 19ecdae..aab3ef4 100644 --- a/src/images/bytefield/htimedeltah.edn +++ b/src/images/bytefield/htimedeltah.edn @@ -14,4 +14,4 @@ (draw-box "htimedeltah" {:font-size 20 :span 32}) (draw-box "32" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/htinstreg.edn b/src/images/bytefield/htinstreg.edn index d9852bd..6d830ea 100644 --- a/src/images/bytefield/htinstreg.edn +++ b/src/images/bytefield/htinstreg.edn @@ -13,4 +13,4 @@ (draw-box "htinst" {:font-size 20 :span 32}) (draw-box "HSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/htvalreg.edn b/src/images/bytefield/htvalreg.edn index e35a054..31b81d3 100644 --- a/src/images/bytefield/htvalreg.edn +++ b/src/images/bytefield/htvalreg.edn @@ -13,4 +13,4 @@ (draw-box "htval" {:font-size 20 :span 32}) (draw-box "HSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hvipreg-standard.edn b/src/images/bytefield/hvipreg-standard.edn index 37c80ad..969c342 100644 --- a/src/images/bytefield/hvipreg-standard.edn +++ b/src/images/bytefield/hvipreg-standard.edn @@ -34,4 +34,4 @@ (draw-box "3" {:span 4 :borders {}}) (draw-box "1" {:span 4 :borders {}}) (draw-box "2" {:span 4 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hvipreg.edn b/src/images/bytefield/hvipreg.edn index 1df8d40..735068d 100644 --- a/src/images/bytefield/hvipreg.edn +++ b/src/images/bytefield/hvipreg.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" { :font-weight "bold" :font-size 24}) {:span 14 :text-anchor "start" :borders{:top :border-unrelated :bottom :border-unrelated :right :border-unrelated}}) (draw-box "HSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hypv-miereg-standard.edn b/src/images/bytefield/hypv-miereg-standard.edn index 154983d..820faee 100644 --- a/src/images/bytefield/hypv-miereg-standard.edn +++ b/src/images/bytefield/hypv-miereg-standard.edn @@ -8,9 +8,9 @@ (def boxes-per-row 32) (draw-box "15" {:borders {}}) -(draw-box nil {:span 2 :borders {}}) -(draw-box "13" {:borders {}}) -(draw-box "12" {:span 3 :borders {}}) +(draw-box "14" {:borders {}}) +(draw-box "13" {:span 3 :borders {}}) +(draw-box "12" {:span 2 :borders {}}) (draw-box "11" {:span 2 :borders {}}) (draw-box "10" {:span 3 :borders {}}) (draw-box "9" {:span 2 :borders {}}) @@ -24,8 +24,9 @@ (draw-box "1" {:span 2 :borders {}}) (draw-box "0" {:span 2 :borders {}}) -(draw-box "0" {:span 4}) -(draw-box "SGEIE" {:span 3}) +(draw-box "0" {:span 2}) +(draw-box "LCOFIE" {:span 3}) +(draw-box "SGEIE" {:span 2}) (draw-box "MEIE" {:span 2}) (draw-box "VSEIE" {:span 3}) (draw-box "SEIE" {:span 2}) @@ -39,8 +40,9 @@ (draw-box "SSIE" {:span 2}) (draw-box "0" {:span 2}) -(draw-box "3" {:span 4 :borders {}}) -(draw-box "1" {:span 3:borders {}}) +(draw-box "2" {:span 2 :borders {}}) +(draw-box "1" {:span 3 :borders {}}) +(draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 3 :borders {}}) (draw-box "1" {:span 2 :borders {}}) @@ -53,4 +55,4 @@ (draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 2 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hypv-mipreg-standard.edn b/src/images/bytefield/hypv-mipreg-standard.edn index c75ec02..968bfb2 100644 --- a/src/images/bytefield/hypv-mipreg-standard.edn +++ b/src/images/bytefield/hypv-mipreg-standard.edn @@ -8,9 +8,9 @@ (def boxes-per-row 32) (draw-box "15" {:borders {}}) -(draw-box nil {:span 2 :borders {}}) -(draw-box "13" {:borders {}}) -(draw-box "12" {:span 3 :borders {}}) +(draw-box "14" {:borders {}}) +(draw-box "13" {:span 3 :borders {}}) +(draw-box "12" {:span 2 :borders {}}) (draw-box "11" {:span 2 :borders {}}) (draw-box "10" {:span 3 :borders {}}) (draw-box "9" {:span 2 :borders {}}) @@ -24,8 +24,9 @@ (draw-box "1" {:span 2 :borders {}}) (draw-box "0" {:span 2 :borders {}}) -(draw-box "0" {:span 4}) -(draw-box "SGEIP" {:span 3}) +(draw-box "0" {:span 2}) +(draw-box "LCOFIP" {:span 3}) +(draw-box "SGEIP" {:span 2}) (draw-box "MEIP" {:span 2}) (draw-box "VSEIP" {:span 3}) (draw-box "SEIP" {:span 2}) @@ -39,8 +40,9 @@ (draw-box "SSIP" {:span 2}) (draw-box "0" {:span 2}) -(draw-box "3" {:span 4 :borders {}}) -(draw-box "1" {:span 3:borders {}}) +(draw-box "2" {:span 2 :borders {}}) +(draw-box "1" {:span 3 :borders {}}) +(draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 3 :borders {}}) (draw-box "1" {:span 2 :borders {}}) @@ -53,4 +55,4 @@ (draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 2 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hypv-mstatus.edn b/src/images/bytefield/hypv-mstatus.edn index 885dc00..3dc0861 100644 --- a/src/images/bytefield/hypv-mstatus.edn +++ b/src/images/bytefield/hypv-mstatus.edn @@ -126,4 +126,4 @@ (draw-box "1" {:span 3 :borders {}}) (draw-box "1" {:span 3 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/hypv-mstatush.edn b/src/images/bytefield/hypv-mstatush.edn index 319484d..58d650b 100644 --- a/src/images/bytefield/hypv-mstatush.edn +++ b/src/images/bytefield/hypv-mstatush.edn @@ -29,4 +29,4 @@ (draw-box "1" {:span 3 :borders {}}) (draw-box "1" {:span 3 :borders {}}) (draw-box "4" {:span 4 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/marchid.adoc b/src/images/bytefield/marchid.edn index 8b6319c..dafe5b5 100644 --- a/src/images/bytefield/marchid.adoc +++ b/src/images/bytefield/marchid.edn @@ -11,4 +11,4 @@ (draw-box "Architecture ID" {:span 32 :vertical-align "middle"}) (draw-box "MXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mcausereg.adoc b/src/images/bytefield/mcausereg.edn index 10ace3e..f0117eb 100644 --- a/src/images/bytefield/mcausereg.adoc +++ b/src/images/bytefield/mcausereg.edn @@ -17,4 +17,4 @@ (draw-box "1" {:span 4 :borders {}}) (draw-box "MXLEN-1" {:font-size 24 :span 28 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mconfigptrreg.adoc b/src/images/bytefield/mconfigptrreg.edn index 8cabb11..8c427bf 100644 --- a/src/images/bytefield/mconfigptrreg.adoc +++ b/src/images/bytefield/mconfigptrreg.edn @@ -10,4 +10,4 @@ (draw-box "mconfigptr" {:span 32}) (draw-box "MXLEN" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/medeleg.adoc b/src/images/bytefield/medeleg.edn index a63156d..a63156d 100644 --- a/src/images/bytefield/medeleg.adoc +++ b/src/images/bytefield/medeleg.edn diff --git a/src/images/bytefield/mepcreg.adoc b/src/images/bytefield/mepcreg.edn index 7b3de31..de38623 100644 --- a/src/images/bytefield/mepcreg.adoc +++ b/src/images/bytefield/mepcreg.edn @@ -12,4 +12,4 @@ (draw-box "mepc" {:span 32}) (draw-box "MXLEN" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mhartid.adoc b/src/images/bytefield/mhartid.edn index 2c73982..6934d74 100644 --- a/src/images/bytefield/mhartid.adoc +++ b/src/images/bytefield/mhartid.edn @@ -11,4 +11,4 @@ (draw-box "Hart ID" {:span 32 :vertical-align "middle"}) (draw-box "MXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mideleg.adoc b/src/images/bytefield/mideleg.edn index 08e75b6..e6073e3 100644 --- a/src/images/bytefield/mideleg.adoc +++ b/src/images/bytefield/mideleg.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" {:font-weight "bold"}) {:font-size 18 :span 15 :text-anchor "start" :borders {:top :border-unrelated :bottom :border-unrelated :right :border-unrelated}}) (draw-box "MXLEN" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/miereg-standard.adoc b/src/images/bytefield/miereg-standard.edn index 680fb1c..680fb1c 100644 --- a/src/images/bytefield/miereg-standard.adoc +++ b/src/images/bytefield/miereg-standard.edn diff --git a/src/images/bytefield/mimpid.adoc b/src/images/bytefield/mimpid.edn index 5296d58..2a955ef 100644 --- a/src/images/bytefield/mimpid.adoc +++ b/src/images/bytefield/mimpid.edn @@ -11,4 +11,4 @@ (draw-box "Implementation" {:span 32}) (draw-box "MXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mipreg-standard.adoc b/src/images/bytefield/mipreg-standard.edn index e32e302..e32e302 100644 --- a/src/images/bytefield/mipreg-standard.adoc +++ b/src/images/bytefield/mipreg-standard.edn diff --git a/src/images/bytefield/misareg.edn b/src/images/bytefield/misareg.edn index 71da62a..ff445c3 100644 --- a/src/images/bytefield/misareg.edn +++ b/src/images/bytefield/misareg.edn @@ -24,4 +24,4 @@ (draw-box "2" {:span 12 :borders {}}) (draw-box "MXLEN-28" {:span 6 :borders {}}) (draw-box "26" {:span 14 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mnepc.edn b/src/images/bytefield/mnepc.edn index 7d0394b..37a84e9 100644 --- a/src/images/bytefield/mnepc.edn +++ b/src/images/bytefield/mnepc.edn @@ -11,4 +11,4 @@ (draw-box (text "mnepc " {:font-size 24}) {:span 16 :text-anchor "end" :borders {:top :border-unrelated :bottom :border-unrelated :left :border-unrelated}}) (draw-box (text "(WARL)"{:font-weight "bold" :font-size 24}) {:span 16 :text-anchor "start" :borders {:top :border-unrelated :right :border-unrelated :bottom :border-unrelated}}) (draw-box "MXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mnscratch.adoc b/src/images/bytefield/mnscratch.edn index 90e3a65..79294f8 100644 --- a/src/images/bytefield/mnscratch.adoc +++ b/src/images/bytefield/mnscratch.edn @@ -10,4 +10,4 @@ (draw-box (text "mnscratch" {:font-size 24}) {:span 32}) (draw-box "MXLEN" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mscratch.adoc b/src/images/bytefield/mscratch.edn index 1e83f2c..3148240 100644 --- a/src/images/bytefield/mscratch.adoc +++ b/src/images/bytefield/mscratch.edn @@ -13,4 +13,4 @@ (draw-box "mscratch" {:font-size 18 :span 32}) (draw-box "MXLEN" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mseccfg.adoc b/src/images/bytefield/mseccfg.adoc deleted file mode 100644 index 1c8cc35..0000000 --- a/src/images/bytefield/mseccfg.adoc +++ /dev/null @@ -1,28 +0,0 @@ -[bytefield] ----- -(defattrs :plain [:plain {:font-family "M+ 1p Fallback"}]) -(def row-height 45) -(def row-header-fn nil) -(def boxes-per-row 32) -(draw-column-headers {:height 20 :font-size 18 :labels (reverse ["" "0" "" "1" "" "2" "" "3" "" "7" "" "8" "" "" "9" "" "10" "" "" "" "31" "32" "" "33" "34" "" "" "" "" "" "" "63"])}) - -(draw-box (text "WPRI" {:font-weight "bold"}) {:span 8}) -(draw-box "PMM" {:span 3}) -(draw-box (text "WPRI" {:font-weight "bold"}) {:span 5}) -(draw-box "SSEED" {:span 3}) -(draw-box "USEED" {:span 3}) -(draw-box (text "WPRI" {:font-weight "bold"}) {:span 3}) -(draw-box "RLB" {:span 2}) -(draw-box "MMWP" {:span 3}) -(draw-box "MML" {:span 2}) - -(draw-box "30" {:span 8 :borders {}}) -(draw-box "2" {:span 3 :borders {}}) -(draw-box "22" {:span 5 :borders {}}) -(draw-box "1" {:span 3 :borders {}}) -(draw-box "1" {:span 3 :borders {}}) -(draw-box "5" {:span 3 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) -(draw-box "1" {:span 3 :borders {}}) -(draw-box "1" {:span 2 :borders {}}) ----- diff --git a/src/images/bytefield/mtime.adoc b/src/images/bytefield/mtime.edn index 250c0f3..811ef47 100644 --- a/src/images/bytefield/mtime.adoc +++ b/src/images/bytefield/mtime.edn @@ -10,4 +10,4 @@ (draw-box "mtime" {:span 32}) (draw-box "64" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mtimecmp.adoc b/src/images/bytefield/mtimecmp.edn index 222ef3d..89773ad 100644 --- a/src/images/bytefield/mtimecmp.adoc +++ b/src/images/bytefield/mtimecmp.edn @@ -10,4 +10,4 @@ (draw-box "mtimecmp" {:span 32}) (draw-box "64" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mtinstreg.edn b/src/images/bytefield/mtinstreg.edn index d733690..2ee51c6 100644 --- a/src/images/bytefield/mtinstreg.edn +++ b/src/images/bytefield/mtinstreg.edn @@ -11,4 +11,4 @@ (draw-box "0" {:span 16 :text-anchor "end" :borders {}}) (draw-box "mtinst" {:span 32}) (draw-box "MXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mtval2reg.edn b/src/images/bytefield/mtval2reg.edn index ca4f10a..2d64f27 100644 --- a/src/images/bytefield/mtval2reg.edn +++ b/src/images/bytefield/mtval2reg.edn @@ -11,4 +11,4 @@ (draw-box "0" {:span 16 :text-anchor "end" :borders {}}) (draw-box "mtval2" {:span 32}) (draw-box "MXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mtvalreg.adoc b/src/images/bytefield/mtvalreg.edn index 392031c..a74226e 100644 --- a/src/images/bytefield/mtvalreg.adoc +++ b/src/images/bytefield/mtvalreg.edn @@ -10,4 +10,4 @@ (draw-box "mtval" {:span 32}) (draw-box "MXLEN" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mtvec.adoc b/src/images/bytefield/mtvec.edn index a6d64ed..c625d40 100644 --- a/src/images/bytefield/mtvec.adoc +++ b/src/images/bytefield/mtvec.edn @@ -15,4 +15,4 @@ (draw-box "MXLEN-2" {:span 24 :borders {}}) (draw-box "2" {:span 8 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/mvendorid.adoc b/src/images/bytefield/mvendorid.edn index fa8116a..588470d 100644 --- a/src/images/bytefield/mvendorid.adoc +++ b/src/images/bytefield/mvendorid.edn @@ -12,4 +12,4 @@ (draw-box "25" {:span 25 :borders {}}) (draw-box "7" {:span 7 :borders{}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/pmp-rv32.adoc b/src/images/bytefield/pmp-rv32.edn index 8599082..420eb76 100644 --- a/src/images/bytefield/pmp-rv32.adoc +++ b/src/images/bytefield/pmp-rv32.edn @@ -76,4 +76,4 @@ (draw-box "8" {:span 4 :borders {}}) (draw-box "8" {:span 4 :borders {}}) (draw-box nil {:span 4 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/pmp-rv64.adoc b/src/images/bytefield/pmp-rv64.edn index ab6e692..42dbbf1 100644 --- a/src/images/bytefield/pmp-rv64.adoc +++ b/src/images/bytefield/pmp-rv64.edn @@ -128,4 +128,4 @@ (draw-box "8" {:span 4 :borders {}}) (draw-box "8" {:span 4 :borders {}}) (draw-box nil {:span 8 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/pmpaddr-rv32.adoc b/src/images/bytefield/pmpaddr-rv32.edn index daeef0f..7d9bc96 100644 --- a/src/images/bytefield/pmpaddr-rv32.adoc +++ b/src/images/bytefield/pmpaddr-rv32.edn @@ -12,4 +12,4 @@ (draw-box (text "(WARL)" {:font-size 24 :font-weight "bold"}) {:span 16 :text-anchor "start" :borders{:top :border-unrelated :bottom :border-unrelated :right :border-unrelated}}) (draw-box "32" {:font-size 24 :span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/pmpaddr-rv64.adoc b/src/images/bytefield/pmpaddr-rv64.edn index e0f15f6..672eb69 100644 --- a/src/images/bytefield/pmpaddr-rv64.adoc +++ b/src/images/bytefield/pmpaddr-rv64.edn @@ -15,4 +15,4 @@ (draw-box "10" {:font-size 24 :span 8 :borders {}}) (draw-box "54" {:font-size 24 :span 24 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/pmpcfg.adoc b/src/images/bytefield/pmpcfg.edn index 9b010c4..9b010c4 100644 --- a/src/images/bytefield/pmpcfg.adoc +++ b/src/images/bytefield/pmpcfg.edn diff --git a/src/images/bytefield/rv32hgatp.edn b/src/images/bytefield/rv32hgatp.edn index 076c02a..ff90565 100644 --- a/src/images/bytefield/rv32hgatp.edn +++ b/src/images/bytefield/rv32hgatp.edn @@ -27,4 +27,4 @@ (draw-box "2" {:span 7 :borders {}}) (draw-box "7" {:span 7 :borders {}}) (draw-box "22" {:span 15 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/rv32satp.edn b/src/images/bytefield/rv32satp.edn index 5d388a5..5d8552b 100644 --- a/src/images/bytefield/rv32satp.edn +++ b/src/images/bytefield/rv32satp.edn @@ -23,4 +23,4 @@ (draw-box "1" {:span 8 :borders {}}) (draw-box "9" {:span 8 :borders {}}) (draw-box "22" {:span 16 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/rv32vsatpreg.edn b/src/images/bytefield/rv32vsatpreg.edn index b5ccb0b..2af138d 100644 --- a/src/images/bytefield/rv32vsatpreg.edn +++ b/src/images/bytefield/rv32vsatpreg.edn @@ -23,4 +23,4 @@ (draw-box "1" {:span 8 :borders {}}) (draw-box "9" {:span 8 :borders {}}) (draw-box "22" {:span 16 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/rv64hgatp.edn b/src/images/bytefield/rv64hgatp.edn index bc79235..be294f5 100644 --- a/src/images/bytefield/rv64hgatp.edn +++ b/src/images/bytefield/rv64hgatp.edn @@ -29,4 +29,4 @@ (draw-box "2" {:span 6 :borders {}}) (draw-box "14" {:span 8 :borders {}}) (draw-box "44" {:span 10 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/rv64satp.edn b/src/images/bytefield/rv64satp.edn index ccc1d31..8da3490 100644 --- a/src/images/bytefield/rv64satp.edn +++ b/src/images/bytefield/rv64satp.edn @@ -24,4 +24,4 @@ (draw-box "4" {:span 10 :borders {}}) (draw-box "16" {:span 10 :borders {}}) (draw-box "44" {:span 12 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/rv64vsatpreg.edn b/src/images/bytefield/rv64vsatpreg.edn index 7f61f70..73ad5de 100644 --- a/src/images/bytefield/rv64vsatpreg.edn +++ b/src/images/bytefield/rv64vsatpreg.edn @@ -24,4 +24,4 @@ (draw-box "4" {:span 8 :borders {}}) (draw-box "16" {:span 8 :borders {}}) (draw-box "44" {:span 16 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/rvc-instr-quad0.adoc b/src/images/bytefield/rvc-instr-quad0.edn index 3a34205..5b95168 100644 --- a/src/images/bytefield/rvc-instr-quad0.adoc +++ b/src/images/bytefield/rvc-instr-quad0.edn @@ -17,7 +17,7 @@ (draw-box "uimm[5:4|9:6|2|3]" {:span 8}) (draw-box (text "rd′" [:plain {:font-family "M+ 1p Fallback"}]) {:span 3}) (draw-box "00" {:span 2}) -(draw-box (text "C.ADDI4SPN" :math [:sub "RES, uimm=0"]) {:span 7 :text-anchor "start" :borders {}}) +(draw-box (text "C.ADDI4SPN" :math [:sub "(RES, uimm=0)"]) {:span 7 :text-anchor "start" :borders {}}) (draw-box "001" {:span 3}) (draw-box "uimm[5:3]" {:span 3}) @@ -25,15 +25,7 @@ (draw-box (text "uimm[7:6]" {:font-size 16}) {:span 2}) (draw-box "rd′" {:span 3}) (draw-box "00" {:span 2}) -(draw-box (text "C.FLD" :math [:sub "(RV32/64)"]) {:span 7 :text-anchor "start" :borders {}}) - -(draw-box "001" {:span 3}) -(draw-box "uimm[5:4|8]" {:span 3}) -(draw-box "rs1′" {:span 3}) -(draw-box (text "uimm[7:6]" {:font-size 16}) {:span 2}) -(draw-box "rd′" {:span 3}) -(draw-box "00" {:span 2}) -(draw-box (text "C.LQ" :math [:sub "(RV128)"]) {:span 7 :text-anchor "start" :borders {}}) +(draw-box "C.FLD" {:span 7 :text-anchor "start" :borders {}}) (draw-box "010" {:span 3}) (draw-box "uimm[5:3]" {:span 3}) @@ -57,7 +49,7 @@ (draw-box (text "uimm[7:6]" {:font-size 16}) {:span 2}) (draw-box "rd′" {:span 3}) (draw-box "00" {:span 2}) -(draw-box (text "C.LD" :math [:sub "(RV64/128)"]) {:span 7 :text-anchor "start" :borders {}}) +(draw-box (text "C.LD" :math [:sub "(RV64)"]) {:span 7 :text-anchor "start" :borders {}}) (draw-box "100" {:span 3}) (draw-box "---" {:span 11}) @@ -70,15 +62,7 @@ (draw-box (text "uimm[7:6]" {:font-size 16}) {:span 2}) (draw-box "rs2′" {:span 3}) (draw-box "00" {:span 2}) -(draw-box (text "C.FSD" :math [:sub "(RV32/64)"]) {:span 7 :text-anchor "start" :borders {}}) - -(draw-box "101" {:span 3}) -(draw-box "uimm[5:4|8]" {:span 3}) -(draw-box "rs1′" {:span 3}) -(draw-box (text "uimm[7:6]" {:font-size 16}) {:span 2}) -(draw-box "rs2′" {:span 3}) -(draw-box "00" {:span 2}) -(draw-box (text "C.SQ" :math [:sub "(RV128)"]) {:span 7 :text-anchor "start" :borders {}}) +(draw-box "C.FSD" {:span 7 :text-anchor "start" :borders {}}) (draw-box "110" {:span 3}) (draw-box "uimm[5:3]" {:span 3}) @@ -102,5 +86,5 @@ (draw-box (text "uimm[7:6]" {:font-size 16}) {:span 2}) (draw-box "rs2′" {:span 3}) (draw-box "00" {:span 2}) -(draw-box (text "C.SD" :math [:sub "(RV64/128)"]) {:span 7 :text-anchor "start" :borders {}}) -----
\ No newline at end of file +(draw-box (text "C.SD" :math [:sub "(RV64)"]) {:span 7 :text-anchor "start" :borders {}}) +---- diff --git a/src/images/bytefield/rvc-instr-quad1.adoc b/src/images/bytefield/rvc-instr-quad1.edn index e0f6073..a7aaacf 100644 --- a/src/images/bytefield/rvc-instr-quad1.adoc +++ b/src/images/bytefield/rvc-instr-quad1.edn @@ -14,7 +14,7 @@ (draw-box "0" {:span 5}) (draw-box "imm[4:0]" {:span 5}) (draw-box "01" {:span 2}) -(draw-box (text "C.NOP" :math [:sub "(HINT, imm=0)"]) {:span 3 :text-anchor "start" :borders {}}) +(draw-box (text "C.NOP" :math [:sub "(HINT, imm≠0)"]) {:span 3 :text-anchor "start" :borders {}}) (draw-box "000" {:span 3}) (draw-box "imm[5]") {:span 1} @@ -33,7 +33,7 @@ (draw-box "rs1/rd≠0" {:span 5}) (draw-box "imm[4:0]" {:span 5}) (draw-box "01" {:span 2}) -(draw-box (text "C.ADDIW" :math [:sub "(RV64/128; RES, rd=0)"]) {:span 3 :text-anchor "start" :borders {}}) +(draw-box (text "C.ADDIW" :math [:sub "(RV64; RES, rd=0)"]) {:span 3 :text-anchor "start" :borders {}}) (draw-box "010" {:span 3}) (draw-box "imm[5]" {:span 1}) @@ -51,7 +51,7 @@ (draw-box "011" {:span 3}) (draw-box (text "imm[17]" {:font-width 11}) {:span 1}) -(draw-box "rd̸={0, 2}" {:span 5}) +(draw-box "rd≠{0, 2}" {:span 5}) (draw-box "imm[16:12]" {:span 5}) (draw-box "01" {:span 2}) (draw-box (text "C.LUI" :math [:sub "(RES, imm=0; HINT, rd=0)"]) {:span 3 :text-anchor "start" :borders {}}) @@ -62,15 +62,7 @@ (draw-box "rs1ʹ/rdʹ" {:span 3}) (draw-box "uimm[4:0]" {:span 5}) (draw-box "01" {:span 2}) -(draw-box (text "C.SRLI" :math [:sub "(RV32 Custom, uimm[5]=1)"]) {:span 3 :text-anchor "start" :borders {}}) - -(draw-box "100" {:span 3}) -(draw-box "0" {:span 1}) -(draw-box "00" {:span 2}) -(draw-box "rs1ʹ/rdʹ" {:span 3}) -(draw-box "0" {:span 5}) -(draw-box "01" {:span 2}) -(draw-box (text "C.SRLI64" :math [:sub "(RV128; RV32/64 HINT)"]) {:span 3 :text-anchor "start" :borders {}}) +(draw-box (text "C.SRLI" :math [:sub "(HINT, uimm=0)"]) {:span 3 :text-anchor "start" :borders {}}) (draw-box "100" {:span 3}) (draw-box (text "uimm[5]" {:font-width 11}) {:span 1}) @@ -78,15 +70,7 @@ (draw-box "rs1ʹ/rdʹ" {:span 3}) (draw-box "uimm[4:0]" {:span 5}) (draw-box "01" {:span 2}) -(draw-box (text "C.SRAI" :math [:sub "(RV32 Custom, uimm[5]=1)"]) {:span 3 :text-anchor "start" :borders {}}) - -(draw-box "100" {:span 3}) -(draw-box "0" {:span 1}) -(draw-box "01" {:span 2}) -(draw-box "rs1ʹ/rdʹ" {:span 3}) -(draw-box "0" {:span 5}) -(draw-box "01" {:span 2}) -(draw-box (text "C.SRAI64" :math [:sub "(RV128; RV32/64 HINT)"]) {:span 3 :text-anchor "start" :borders {}}) +(draw-box (text "C.SRAI" :math [:sub "(HINT, uimm=0)"]) {:span 3 :text-anchor "start" :borders {}}) (draw-box "100" {:span 3}) (draw-box "imm[5]" {:span 1}) @@ -139,7 +123,7 @@ (draw-box "00" {:span 2}) (draw-box "rs2′" {:span 3}) (draw-box "01" {:span 2}) -(draw-box (text "C.SUBW" :math [:sub "(RV64/128; RV32 RES)"]) {:span 3 :text-anchor "start" :borders {}}) +(draw-box (text "C.SUBW" :math [:sub "(RV64; RV32 RES)"]) {:span 3 :text-anchor "start" :borders {}}) (draw-box "100" {:span 3}) (draw-box "1" {:span 1}) @@ -148,7 +132,7 @@ (draw-box "01" {:span 2}) (draw-box "rs2′" {:span 3}) (draw-box "01" {:span 2}) -(draw-box (text "C.ADDW" :math [:sub "(RV64/128; RV32 RES)"]) {:span 3 :text-anchor "start" :borders {}}) +(draw-box (text "C.ADDW" :math [:sub "(RV64; RV32 RES)"]) {:span 3 :text-anchor "start" :borders {}}) (draw-box "100" {:span 3}) (draw-box "1" {:span 1}) @@ -186,4 +170,4 @@ (draw-box "imm[7:6|2:1|5]" {:span 5}) (draw-box "01" {:span 2}) (draw-box "C.BNEZ" {:span 3 :text-anchor "start" :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/rvc-instr-quad2.adoc b/src/images/bytefield/rvc-instr-quad2.edn index 03c6e3d..10471c0 100644 --- a/src/images/bytefield/rvc-instr-quad2.adoc +++ b/src/images/bytefield/rvc-instr-quad2.edn @@ -10,32 +10,18 @@ (draw-column-headers {:labels (reverse ["" "" "" "" "" "" "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"])}) (draw-box "000" {:span 3}) -(draw-box (text "uimm[5]" {:font-size 16}) {:span 1}) +(draw-box (text "nzuimm[5]" {:font-size 16}) {:span 1}) (draw-box "rs1/rd≠0" {:span 5}) -(draw-box "uimm[4:0]" {:span 5}) +(draw-box "nzuimm[4:0]" {:span 5}) (draw-box "10" {:span 2}) -(draw-box (text "C.SLLI" :math [:sub "(HINT, rd=0; RV32 Custom, uimm[5]=1)"]) {:span 6 :text-anchor "start" :borders {}}) - -(draw-box "000" {:span 3}) -(draw-box "0" {:span 1}) -(draw-box "rs1/rd≠0" {:span 5}) -(draw-box "0" {:span 5}) -(draw-box "10" {:span 2}) -(draw-box (text "C.SLLI64" :math [:sub "(RV128; RV32/64 HINT; HINT, rd=0)"]) {:span 6 :text-anchor "start" :borders {}}) +(draw-box (text "C.SLLI" :math [:sub "(HINT, rd=0 or imm=0)"]) {:span 6 :text-anchor "start" :borders {}}) (draw-box "001" {:span 3}) (draw-box (text "uimm[5]" {:font-size 16}) {:span 1}) (draw-box "rd" {:span 5}) (draw-box "uimm[4:3|8:6]" {:span 5}) (draw-box "10" {:span 2}) -(draw-box (text "C.FLDSP" :math [:sub "(RV32/64)"]) {:span 6 :text-anchor "start" :borders {}}) - -(draw-box "001" {:span 3}) -(draw-box (text "uimm[5]" {:font-size 16}) {:span 1}) -(draw-box "rd≠0" {:span 5}) -(draw-box "uimm[4|9:6]" {:span 5}) -(draw-box "10" {:span 2}) -(draw-box (text "C.LQSP" :math [:sub "(RV128; RES, rd=0)"]) {:span 6 :text-anchor "start" :borders {}}) +(draw-box "C.FLDSP" {:span 6 :text-anchor "start" :borders {}}) (draw-box "010" {:span 3}) (draw-box (text "uimm[5]" {:font-size 16}) {:span 1}) @@ -56,7 +42,7 @@ (draw-box "rd≠0" {:span 5}) (draw-box "uimm[4:3|8:6]" {:span 5}) (draw-box "10" {:span 2}) -(draw-box (text "C.LDSP" :math [:sub "(RV64/128; RES, rd=0)"]) {:span 6 :text-anchor "start" :borders {}}) +(draw-box (text "C.LDSP" :math [:sub "(RV64; RES, rd=0)"]) {:span 6 :text-anchor "start" :borders {}}) (draw-box "100" {:span 3}) (draw-box "0" {:span 1}) @@ -97,13 +83,7 @@ (draw-box "uimm[5:3|8:6]" {:span 6}) (draw-box "rs2" {:span 5}) (draw-box "10" {:span 2}) -(draw-box (text "C.FSDSP" :math [:sub "(RV32/64)"]) {:span 6 :text-anchor "start" :borders {}}) - -(draw-box "101" {:span 3}) -(draw-box "uimm[5:4|9:6]" {:span 6}) -(draw-box "rs2" {:span 5}) -(draw-box "10" {:span 2}) -(draw-box (text "C.SQSP" :math [:sub "(RV128)"]) {:span 6 :text-anchor "start" :borders {}}) +(draw-box "C.FSDSP" {:span 6 :text-anchor "start" :borders {}}) (draw-box "110" {:span 3}) (draw-box "uimm[5:2|7:6]" {:span 6}) @@ -121,5 +101,5 @@ (draw-box "uimm[5:3|8:6]" {:span 6}) (draw-box "rs2" {:span 5}) (draw-box "10" {:span 2}) -(draw-box (text "C.SDSP" :math [:sub "(RV64/128)"]) {:span 6 :text-anchor "start" :borders {}}) -----
\ No newline at end of file +(draw-box (text "C.SDSP" :math [:sub "(RV64)"]) {:span 6 :text-anchor "start" :borders {}}) +---- diff --git a/src/images/bytefield/scausereg.edn b/src/images/bytefield/scausereg.edn index 0dca01e..cbdcb91 100644 --- a/src/images/bytefield/scausereg.edn +++ b/src/images/bytefield/scausereg.edn @@ -18,4 +18,4 @@ (draw-box "1" {:span 6 :borders {}}) (draw-box "SXLEN-1" {:span 26 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/scounteren.edn b/src/images/bytefield/scounteren.edn index e179fe9..52e78e2 100644 --- a/src/images/bytefield/scounteren.edn +++ b/src/images/bytefield/scounteren.edn @@ -40,4 +40,4 @@ (draw-box "1" {:span 1 :borders {}}) (draw-box "1" {:span 1 :borders {}}) (draw-box "1" {:span 1 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sie.edn b/src/images/bytefield/sie.edn index 7188a1e..50be75e 100644 --- a/src/images/bytefield/sie.edn +++ b/src/images/bytefield/sie.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" {:font-weight "bold" :font-size 24}) {:span 16 :text-anchor "start" :borders {:top :border-unrelated :right :border-unrelated :bottom :border-unrelated}}) (draw-box "SXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sip.edn b/src/images/bytefield/sip.edn index 7188a1e..50be75e 100644 --- a/src/images/bytefield/sip.edn +++ b/src/images/bytefield/sip.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" {:font-weight "bold" :font-size 24}) {:span 16 :text-anchor "start" :borders {:top :border-unrelated :right :border-unrelated :bottom :border-unrelated}}) (draw-box "SXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sscratch.edn b/src/images/bytefield/sscratch.edn index 81c5fa4..177660f 100644 --- a/src/images/bytefield/sscratch.edn +++ b/src/images/bytefield/sscratch.edn @@ -13,4 +13,4 @@ (draw-box "sscratch" {:span 32}) (draw-box "SXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/stvalreg.edn b/src/images/bytefield/stvalreg.edn index eb1fe32..66c1e20 100644 --- a/src/images/bytefield/stvalreg.edn +++ b/src/images/bytefield/stvalreg.edn @@ -13,4 +13,4 @@ (draw-box "stval" {:span 32}) (draw-box "SXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/stvec.edn b/src/images/bytefield/stvec.edn index 2a3502f..8e1ce10 100644 --- a/src/images/bytefield/stvec.edn +++ b/src/images/bytefield/stvec.edn @@ -25,4 +25,4 @@ (draw-box "SXLEN-2" {:span 14 :borders {}}) (draw-box "2" {:span 6 :borders {}}) (draw-box nil {:span 6 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv32pa.edn b/src/images/bytefield/sv32pa.edn index e0a827c..887d1d2 100644 --- a/src/images/bytefield/sv32pa.edn +++ b/src/images/bytefield/sv32pa.edn @@ -21,4 +21,4 @@ (draw-box "12" {:span 10 :borders {}}) (draw-box "10" {:span 10 :borders {}}) (draw-box "12" {:span 12 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv32pte.edn b/src/images/bytefield/sv32pte.edn index c1dbb69..9710587 100644 --- a/src/images/bytefield/sv32pte.edn +++ b/src/images/bytefield/sv32pte.edn @@ -45,4 +45,4 @@ (draw-box "1" {:borders {}}) (draw-box "1" {:borders {}}) (draw-box "1" {:borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv32va.edn b/src/images/bytefield/sv32va.edn index d378444..231f0cd 100644 --- a/src/images/bytefield/sv32va.edn +++ b/src/images/bytefield/sv32va.edn @@ -21,4 +21,4 @@ (draw-box "10" {:span 10 :borders {}}) (draw-box "10" {:span 10 :borders {}}) (draw-box "12" {:span 12 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv32x4va.edn b/src/images/bytefield/sv32x4va.edn index 379253e..dcc1f3f 100644 --- a/src/images/bytefield/sv32x4va.edn +++ b/src/images/bytefield/sv32x4va.edn @@ -27,4 +27,4 @@ (draw-box "10" {:span 6 :borders {}}) (draw-box "12" {:span 8 :borders {}}) (draw-box nil {:span 6 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv39pa.edn b/src/images/bytefield/sv39pa.edn index 749a851..0c3e734 100644 --- a/src/images/bytefield/sv39pa.edn +++ b/src/images/bytefield/sv39pa.edn @@ -25,4 +25,4 @@ (draw-box "9" {:span 8 :borders {}}) (draw-box "9" {:span 8 :borders {}}) (draw-box "12" {:span 8 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv39pte.edn b/src/images/bytefield/sv39pte.edn index 45ef38a..1aeb5a3 100644 --- a/src/images/bytefield/sv39pte.edn +++ b/src/images/bytefield/sv39pte.edn @@ -61,4 +61,4 @@ (draw-box "1" {:borders {}}) (draw-box "1" {:borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv39va.edn b/src/images/bytefield/sv39va.edn index 89ee6cb..285565d 100644 --- a/src/images/bytefield/sv39va.edn +++ b/src/images/bytefield/sv39va.edn @@ -25,4 +25,4 @@ (draw-box "9" {:span 6 :borders {}}) (draw-box "9" {:span 6 :borders {}}) (draw-box "12" {:span 14 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv39x4va.edn b/src/images/bytefield/sv39x4va.edn index 06b04f2..fe0bc89 100644 --- a/src/images/bytefield/sv39x4va.edn +++ b/src/images/bytefield/sv39x4va.edn @@ -31,4 +31,4 @@ (draw-box "9" {:span 4 :borders {}}) (draw-box "12" {:span 8 :borders {}}) (draw-box nil {:span 6 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv48pa.edn b/src/images/bytefield/sv48pa.edn index fee1617..3d4eaa3 100644 --- a/src/images/bytefield/sv48pa.edn +++ b/src/images/bytefield/sv48pa.edn @@ -29,4 +29,4 @@ (draw-box "9" {:span 6 :borders {}}) (draw-box "9" {:span 6 :borders {}}) (draw-box "12" {:span 6 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv48pte.edn b/src/images/bytefield/sv48pte.edn index e01f83d..292aeee 100644 --- a/src/images/bytefield/sv48pte.edn +++ b/src/images/bytefield/sv48pte.edn @@ -65,4 +65,4 @@ (draw-box "1" {:borders {}}) (draw-box "1" {:borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv48va.edn b/src/images/bytefield/sv48va.edn index 2ef5054..520ed15 100644 --- a/src/images/bytefield/sv48va.edn +++ b/src/images/bytefield/sv48va.edn @@ -35,4 +35,4 @@ (draw-box "9" {:span 6 :borders {}}) (draw-box "12" {:span 6 :borders {}}) (draw-box nil {:span 1 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv48x4va.edn b/src/images/bytefield/sv48x4va.edn index e368e51..7be14a3 100644 --- a/src/images/bytefield/sv48x4va.edn +++ b/src/images/bytefield/sv48x4va.edn @@ -35,4 +35,4 @@ (draw-box "9" {:span 4 :borders {}}) (draw-box "12" {:span 4 :borders {}}) (draw-box nil {:span 6 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv57pa.edn b/src/images/bytefield/sv57pa.edn index 511b9d6..39e4b08 100644 --- a/src/images/bytefield/sv57pa.edn +++ b/src/images/bytefield/sv57pa.edn @@ -33,4 +33,4 @@ (draw-box "9" {:span 5 :borders {}}) (draw-box "9" {:span 6 :borders {}}) (draw-box "12" {:span 6 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv57pte.edn b/src/images/bytefield/sv57pte.edn index 0f8e2a8..f3bb80c 100644 --- a/src/images/bytefield/sv57pte.edn +++ b/src/images/bytefield/sv57pte.edn @@ -84,4 +84,4 @@ (draw-box "9" {:span 4 :borders {}}) (draw-box nil {:span 6 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv57va.edn b/src/images/bytefield/sv57va.edn index a7f525a..2610e3b 100644 --- a/src/images/bytefield/sv57va.edn +++ b/src/images/bytefield/sv57va.edn @@ -33,4 +33,4 @@ (draw-box "9" {:span 5 :borders {}}) (draw-box "9" {:span 5 :borders {}}) (draw-box "12" {:span 7 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/sv57x4va.edn b/src/images/bytefield/sv57x4va.edn index 37f401c..f0c7369 100644 --- a/src/images/bytefield/sv57x4va.edn +++ b/src/images/bytefield/sv57x4va.edn @@ -39,4 +39,4 @@ (draw-box "9" {:span 4 :borders {}}) (draw-box "12" {:span 4 :borders {}}) (draw-box nil {:span 4 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vscausereg.edn b/src/images/bytefield/vscausereg.edn index 8716d87..74599e2 100644 --- a/src/images/bytefield/vscausereg.edn +++ b/src/images/bytefield/vscausereg.edn @@ -15,4 +15,4 @@ (draw-box (text "(WLRL)" {:font-weight "bold" :font-size 24}) {:span 14 :text-anchor "start" :borders {:top :border-unrelated :bottom :border-unrelated :right :border-unrelated}}) (draw-box "1" {:span 4 :borders {}}) (draw-box "VSXLEN-1" {:span 28 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vsepcreg.edn b/src/images/bytefield/vsepcreg.edn index fb6c757..f604f54 100644 --- a/src/images/bytefield/vsepcreg.edn +++ b/src/images/bytefield/vsepcreg.edn @@ -11,4 +11,4 @@ (draw-box "0" {:span 16 :text-anchor "end" :borders {}}) (draw-box "vsepc" {:span 32}) (draw-box "VSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vsiereg.edn b/src/images/bytefield/vsiereg.edn index aad78d7..53577ae 100644 --- a/src/images/bytefield/vsiereg.edn +++ b/src/images/bytefield/vsiereg.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" {:font-weight "bold" :font-size 24}) {:span 16 :text-anchor "start" :borders {:right :border-unrelated :top :border-unrelated :bottom :border-unrelated}}) (draw-box "VSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vsipreg.edn b/src/images/bytefield/vsipreg.edn index aad78d7..53577ae 100644 --- a/src/images/bytefield/vsipreg.edn +++ b/src/images/bytefield/vsipreg.edn @@ -14,4 +14,4 @@ (draw-box (text "(WARL)" {:font-weight "bold" :font-size 24}) {:span 16 :text-anchor "start" :borders {:right :border-unrelated :top :border-unrelated :bottom :border-unrelated}}) (draw-box "VSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vsscratchreg.edn b/src/images/bytefield/vsscratchreg.edn index 8eb14d4..5464300 100644 --- a/src/images/bytefield/vsscratchreg.edn +++ b/src/images/bytefield/vsscratchreg.edn @@ -11,4 +11,4 @@ (draw-box "0" {:span 16 :text-anchor "end" :borders {}}) (draw-box "vsscratch" {:span 32}) (draw-box "VSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vsstatusreg-rv32.edn b/src/images/bytefield/vsstatusreg-rv32.edn index 88a3c16..5b3a2b7 100644 --- a/src/images/bytefield/vsstatusreg-rv32.edn +++ b/src/images/bytefield/vsstatusreg-rv32.edn @@ -83,4 +83,4 @@ (draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 2 :borders {}}) (draw-box nil {:span 2 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vsstatusreg.edn b/src/images/bytefield/vsstatusreg.edn index 95780a6..cf8b828 100644 --- a/src/images/bytefield/vsstatusreg.edn +++ b/src/images/bytefield/vsstatusreg.edn @@ -87,4 +87,4 @@ (draw-box "1" {:span 2 :borders {}}) (draw-box "1" {:span 2 :borders {}}) (draw-box nil {:span 3 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vstvalreg.edn b/src/images/bytefield/vstvalreg.edn index 269653f..d097b0f 100644 --- a/src/images/bytefield/vstvalreg.edn +++ b/src/images/bytefield/vstvalreg.edn @@ -11,4 +11,4 @@ (draw-box "0" {:span 16 :text-anchor "end" :borders {}}) (draw-box "vstval" {:span 32}) (draw-box "VSXLEN" {:span 32 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/bytefield/vstvecreg.edn b/src/images/bytefield/vstvecreg.edn index 26d1315..c93d6e0 100644 --- a/src/images/bytefield/vstvecreg.edn +++ b/src/images/bytefield/vstvecreg.edn @@ -19,4 +19,4 @@ (draw-box "VSXLEN-2" {:span 20 :borders {}}) (draw-box "2" {:span 12 :borders {}}) -----
\ No newline at end of file +---- diff --git a/src/images/es_dataflow.svg b/src/images/es_dataflow.svg index 32d16ce..c9d3306 100644 --- a/src/images/es_dataflow.svg +++ b/src/images/es_dataflow.svg @@ -178,4 +178,4 @@ <path d='M108.5018 10.148284C108.7512 10.148284 108.9902 10.187971 109.2243 10.262971C109.4587 10.337659 109.6627 10.452034 109.8421 10.601721L109.4734 11.199534C109.4487 11.23422 109.4237 11.26422 109.3987 11.28422C109.3737 11.29922 109.344 11.30922 109.309 11.30922C109.2793 11.30922 109.2493 11.29922 109.2193 11.28422C109.1896 11.26422 109.1546 11.24422 109.1196 11.224534C109.0799 11.204534 109.0349 11.184534 108.9802 11.164534C108.9306 11.149534 108.8656 11.139846 108.7909 11.139846C108.6565 11.139846 108.5468 11.184534 108.4721 11.27422C108.3924 11.36391 108.3524 11.4886 108.3524 11.64797V14.41797H109.7177V15.29485H108.3524V16.90391H107.7149C107.6399 16.90391 107.5802 16.88422 107.5356 16.84922C107.4906 16.81422 107.4556 16.75953 107.4409 16.69485L107.1818 15.29985L106.3596 15.16516V14.67703C106.3596 14.59235 106.3846 14.52766 106.4293 14.48266C106.4743 14.43797 106.534 14.41797 106.5987 14.41797H107.1218V11.56328C107.1218 11.119846 107.2415 10.776096 107.4756 10.527034C107.7149 10.272659 108.0587 10.148284 108.5018 10.148284ZM111.7794 15.33953H110.5441V10.227971H111.7794V15.33953ZM111.9488 16.82922C111.9488 16.93891 111.9291 17.0436 111.8841 17.13828C111.8394 17.23297 111.7844 17.31766 111.7097 17.38735C111.64 17.45703 111.5553 17.51172 111.4606 17.55172C111.3659 17.59641 111.2613 17.61641 111.1519 17.61641C111.0472 17.61641 110.9425 17.59641 110.8528 17.55172C110.7531 17.51172 110.6734 17.45703 110.6038 17.38735C110.5341 17.31766 110.4791 17.23297 110.4344 17.13828C110.3944 17.0436 110.3747 16.93891 110.3747 16.82922C110.3747 16.72453 110.3944 16.62516 110.4344 16.53047C110.4791 16.43578 110.5341 16.3561 110.6038 16.2861C110.6734 16.21641 110.7531 16.16172 110.8528 16.11672C110.9425 16.07703 111.0472 16.05703 111.1519 16.05703C111.2613 16.05703 111.3659 16.07703 111.4606 16.11672C111.5553 16.16172 111.64 16.21641 111.7097 16.2861C111.7844 16.3561 111.8394 16.43578 111.8841 16.53047C111.9291 16.62516 111.9488 16.72453 111.9488 16.82922ZM115.2964 15.41922C114.9127 15.41922 114.5689 15.35953 114.2552 15.23516C113.9414 15.11047 113.6774 14.9361 113.453 14.71203C113.2339 14.48766 113.0645 14.2086 112.9449 13.88485C112.8202 13.5611 112.7605 13.19735 112.7605 12.79391C112.7605 12.39016 112.8202 12.02141 112.9449 11.69766C113.0645 11.37391 113.2339 11.094846 113.453 10.865784C113.6774 10.636409 113.9414 10.462034 114.2552 10.337659C114.5689 10.212971 114.9127 10.153284 115.2964 10.153284C115.6752 10.153284 116.0189 10.212971 116.3327 10.337659C116.6417 10.462034 116.9058 10.636409 117.1299 10.865784C117.3492 11.094846 117.5186 11.37391 117.638 11.69766C117.7577 12.02141 117.8174 12.39016 117.8174 12.79391C117.8174 13.19735 117.7577 13.5611 117.638 13.88485C117.5186 14.2086 117.3492 14.48766 117.1299 14.71203C116.9058 14.9361 116.6417 15.11047 116.3327 15.23516C116.0189 15.35953 115.6752 15.41922 115.2964 15.41922ZM115.2964 11.099846C114.863 11.099846 114.5442 11.24422 114.3398 11.53328C114.1305 11.81735 114.0261 12.23578 114.0261 12.78391C114.0261 13.33172 114.1305 13.75047 114.3398 14.04422C114.5442 14.33328 114.863 14.47766 115.2964 14.47766C115.7199 14.47766 116.0339 14.33328 116.243 14.04422C116.4424 13.75547 116.547 13.33672 116.547 12.78391C116.547 12.23078 116.4424 11.81235 116.243 11.52828C116.0339 11.24422 115.7199 11.099846 115.2964 11.099846ZM119.9089 14.71203L119.8242 15.11547C119.7742 15.26485 119.6699 15.33953 119.5102 15.33953H118.758V10.227971H119.9886V13.92485C120.153 14.08922 120.3323 14.2236 120.5167 14.31828C120.7011 14.41297 120.9002 14.46266 121.1145 14.46266C121.4086 14.46266 121.6227 14.37797 121.7674 14.2036C121.9117 14.03422 121.9814 13.79516 121.9814 13.48141V10.227971H123.212V13.48141C123.212 13.76516 123.172 14.02922 123.1024 14.2636C123.0227 14.50266 122.913 14.70703 122.7736 14.88141C122.6242 15.05078 122.4449 15.18016 122.2305 15.27985C122.0164 15.36953 121.7674 15.41922 121.4883 15.41922C121.3139 15.41922 121.1545 15.39953 121.0099 15.36953C120.8655 15.32953 120.7258 15.28485 120.6014 15.22016C120.4717 15.16016 120.3524 15.08547 120.2327 15.00078C120.123 14.9111 120.0136 14.81641 119.9089 14.71203Z'/> </g> </g> -</svg>
\ No newline at end of file +</svg> diff --git a/src/images/es_noisetest.svg b/src/images/es_noisetest.svg index 806d3a2..e38dcec 100644 --- a/src/images/es_noisetest.svg +++ b/src/images/es_noisetest.svg @@ -119,4 +119,4 @@ <path d='M219.4898 74.80009C219.1535 74.80009 218.8529 74.74603 218.5842 74.63415C218.3195 74.52197 218.0907 74.36509 217.9026 74.16322C217.7142 73.96134 217.5707 73.71478 217.472 73.42353C217.3688 73.13634 217.3195 72.81353 217.3195 72.45947C217.3195 72.10072 217.3688 71.7779 217.472 71.49072C217.5707 71.20384 217.7142 70.95728 217.9026 70.755405C218.0907 70.553842 218.3195 70.396655 218.5842 70.28478C218.8529 70.176967 219.1535 70.123217 219.4898 70.123217C219.8217 70.123217 220.122 70.176967 220.3867 70.28478C220.651 70.396655 220.8798 70.553842 221.0682 70.755405C221.2563 70.95728 221.3998 71.20384 221.4985 71.49072C221.6017 71.7779 221.651 72.10072 221.651 72.45947C221.651 72.81353 221.6017 73.13634 221.4985 73.42353C221.3998 73.71478 221.2563 73.96134 221.0682 74.16322C220.8798 74.36509 220.651 74.52197 220.3867 74.63415C220.122 74.74603 219.8217 74.80009 219.4898 74.80009ZM219.4898 70.746655C219.037 70.746655 218.696 70.894467 218.4763 71.19947C218.2523 71.4954 218.1401 71.91697 218.1401 72.45478C218.1401 72.72384 218.167 72.96603 218.2254 73.18134C218.2792 73.39197 218.3598 73.57134 218.4763 73.71947C218.5842 73.86728 218.7276 73.97947 218.8979 74.05572C219.0638 74.13197 219.261 74.17228 219.4898 74.17228C219.9382 74.17228 220.2745 74.01978 220.4942 73.71947C220.7138 73.41884 220.826 72.99759 220.826 72.45478C220.826 71.91697 220.7138 71.4954 220.4942 71.19947C220.2745 70.894467 219.9382 70.746655 219.4898 70.746655ZM226.394 74.72822H225.5959V71.38322C225.4212 71.18603 225.2328 71.029155 225.0312 70.921342C224.8246 70.80478 224.6049 70.75103 224.3674 70.75103C224.0446 70.75103 223.8024 70.845092 223.6499 71.03353C223.4887 71.22197 223.4124 71.48634 223.4124 71.83165V74.72822H222.6143V71.83165C222.6143 71.57603 222.6456 71.33853 222.7084 71.127592C222.7756 70.916967 222.8699 70.733217 222.9999 70.585092C223.1299 70.43728 223.2868 70.320405 223.4796 70.23978C223.6681 70.15478 223.8878 70.114155 224.1343 70.114155C224.4528 70.114155 224.7306 70.181655 224.9771 70.311655C225.224 70.44603 225.4437 70.62103 225.6409 70.840717L225.7037 70.351967C225.7349 70.23978 225.8068 70.18603 225.919 70.18603H226.394V74.72822ZM229.0524 70.114155C229.2587 70.114155 229.4603 70.150092 229.6531 70.221967C229.8459 70.298217 230.0118 70.396655 230.1509 70.526655L229.9178 70.907905C229.8862 70.95728 229.8459 70.984155 229.8056 70.984155C229.7831 70.984155 229.7518 70.975092 229.7249 70.948217C229.689 70.930405 229.6531 70.90353 229.6084 70.876655C229.5634 70.84978 229.5096 70.822592 229.4468 70.80478C229.3887 70.777905 229.3168 70.768842 229.2315 70.768842C229.0837 70.768842 228.9671 70.813842 228.8731 70.907905C228.7787 71.001967 228.734 71.14103 228.734 71.33384V74.06009H230.0478V74.63853H228.734V76.20353H228.3303C228.2812 76.20353 228.2362 76.19009 228.2093 76.15853C228.1734 76.13165 228.1556 76.09572 228.1465 76.05103L227.9628 74.64759L227.2184 74.55322V74.23509C227.2184 74.17665 227.2362 74.13197 227.2678 74.10509C227.299 74.07353 227.3393 74.06009 227.3887 74.06009H227.9359V71.28009C227.9359 70.90353 228.0299 70.616342 228.2271 70.41478C228.4156 70.212905 228.6937 70.114155 229.0524 70.114155ZM231.675 71.28009V73.47728C231.8406 73.69697 232.02 73.8629 232.2128 73.98384C232.4103 74.10509 232.6388 74.16322 232.8991 74.16322C233.0828 74.16322 233.2444 74.13197 233.3878 74.06915C233.5313 74.00634 233.6522 73.90759 233.751 73.77322C233.8497 73.63415 233.926 73.45915 233.9797 73.24853C234.0291 73.0379 234.056 72.78228 234.056 72.48634C234.056 71.92572 233.9438 71.4954 233.715 71.19478C233.4863 70.894467 233.1635 70.741967 232.7375 70.741967C232.5222 70.741967 232.3294 70.78228 232.1547 70.863217C231.9797 70.943842 231.8228 71.082905 231.675 71.28009ZM231.6344 74.02415L231.5628 74.56228C231.536 74.67447 231.4641 74.72822 231.3519 74.72822H230.8722V68.6479H231.675V70.656967C231.8272 70.49103 232.0022 70.356342 232.2041 70.266655C232.4013 70.168217 232.6388 70.123217 232.9169 70.123217C233.2219 70.123217 233.4953 70.18603 233.7375 70.302592C233.9841 70.428217 234.1903 70.589467 234.3563 70.80478C234.5266 71.01103 234.6566 71.26228 234.7463 71.54915C234.836 71.84072 234.881 72.15009 234.881 72.48634C234.881 72.8629 234.8406 73.19478 234.7597 73.48634C234.6791 73.77759 234.5625 74.01978 234.4147 74.21259C234.2666 74.40978 234.0828 74.5579 233.8719 74.66103C233.6613 74.75978 233.4235 74.80884 233.1591 74.80884C232.8363 74.80884 232.5494 74.73728 232.2981 74.59384C232.0469 74.45009 231.8272 74.26197 231.6344 74.02415ZM239.6328 74.72822H238.8347V71.38322C238.66 71.18603 238.4716 71.029155 238.27 70.921342C238.0635 70.80478 237.8438 70.75103 237.6063 70.75103C237.2835 70.75103 237.0413 70.845092 236.8888 71.03353C236.7275 71.22197 236.6513 71.48634 236.6513 71.83165V74.72822H235.8531V71.83165C235.8531 71.57603 235.8844 71.33853 235.9472 71.127592C236.0144 70.916967 236.1088 70.733217 236.2388 70.585092C236.3688 70.43728 236.5256 70.320405 236.7185 70.23978C236.9069 70.15478 237.1266 70.114155 237.3731 70.114155C237.6916 70.114155 237.9694 70.181655 238.216 70.311655C238.4628 70.44603 238.6825 70.62103 238.8797 70.840717L238.9425 70.351967C238.9738 70.23978 239.0456 70.18603 239.1578 70.18603H239.6328V74.72822ZM242.2913 70.114155C242.4975 70.114155 242.6991 70.150092 242.8919 70.221967C243.0847 70.298217 243.2506 70.396655 243.3897 70.526655L243.1566 70.907905C243.125 70.95728 243.0847 70.984155 243.0444 70.984155C243.0219 70.984155 242.9906 70.975092 242.9638 70.948217C242.9278 70.930405 242.8919 70.90353 242.8472 70.876655C242.8022 70.84978 242.7485 70.822592 242.6856 70.80478C242.6275 70.777905 242.5556 70.768842 242.4703 70.768842C242.3225 70.768842 242.206 70.813842 242.1119 70.907905C242.0175 71.001967 241.9728 71.14103 241.9728 71.33384V74.06009H243.2866V74.63853H241.9728V76.20353H241.5691C241.52 76.20353 241.475 76.19009 241.4481 76.15853C241.4122 76.13165 241.3944 76.09572 241.3853 76.05103L241.2016 74.64759L240.4572 74.55322V74.23509C240.4572 74.17665 240.475 74.13197 240.5066 74.10509C240.5378 74.07353 240.5781 74.06009 240.6275 74.06009H241.1747V71.28009C241.1747 70.90353 241.2688 70.616342 241.466 70.41478C241.6544 70.212905 241.9325 70.114155 242.2913 70.114155ZM243.9767 70.679155C243.9767 70.59853 243.9901 70.526655 244.017 70.455092C244.0482 70.39228 244.0885 70.329467 244.1335 70.280092C244.1873 70.230717 244.2457 70.190405 244.3129 70.159155C244.3801 70.13228 244.4563 70.118842 244.5326 70.118842C244.6088 70.118842 244.6851 70.13228 244.7523 70.159155C244.8195 70.190405 244.8823 70.230717 244.9317 70.280092C244.981 70.329467 245.0213 70.39228 245.0526 70.455092C245.0842 70.526655 245.0976 70.59853 245.0976 70.679155C245.0976 70.755405 245.0842 70.831655 245.0526 70.898842C245.0213 70.966342 244.981 71.029155 244.9317 71.078217C244.8823 71.127592 244.8195 71.167905 244.7523 71.19947C244.6851 71.23072 244.6088 71.24415 244.5326 71.24415C244.4563 71.24415 244.3801 71.23072 244.3129 71.19947C244.2457 71.167905 244.1873 71.127592 244.1335 71.078217C244.0885 71.029155 244.0482 70.966342 244.017 70.898842C243.9901 70.831655 243.9767 70.755405 243.9767 70.679155Z'/> </g> </g> -</svg>
\ No newline at end of file +</svg> diff --git a/src/images/es_state.svg b/src/images/es_state.svg index 31c5613..d50e2fd 100644 --- a/src/images/es_state.svg +++ b/src/images/es_state.svg @@ -68,4 +68,4 @@ <path d='M65.589851 156.285522C64.578129 156.668345 62.980472 157.558956 61.925786 158.453489L61.406255 155.512087C62.703132 155.988646 64.507821 156.273806 65.589851 156.285522' fill='#00f'/> </g> </g> -</svg>
\ No newline at end of file +</svg> diff --git a/src/images/graphviz/litmus_ppoca.txt b/src/images/graphviz/litmus_ppoca.txt index f435ec1..fbf6d00 100644 --- a/src/images/graphviz/litmus_ppoca.txt +++ b/src/images/graphviz/litmus_ppoca.txt @@ -32,5 +32,3 @@ eiid3 -> eiid4 [label=<<font color="red">rf</font>>, color="red", fontsize=11, p eiid4 -> eiid5 [label=<<font color="indigo">addr</font><font color="indigo">ppo</font>>, color="indigo", fontsize=11, penwidth="3.000000", arrowsize="0.666700"]; eiid5 -> eiid0 [label=<<font color="#ffa040">fr</font>>, color="#ffa040", fontsize=11, penwidth="3.000000", arrowsize="0.666700"]; } - - diff --git a/src/images/graphviz/litmus_sb_fwd.txt b/src/images/graphviz/litmus_sb_fwd.txt index 428c212..1bcd5ff 100644 --- a/src/images/graphviz/litmus_sb_fwd.txt +++ b/src/images/graphviz/litmus_sb_fwd.txt @@ -30,5 +30,3 @@ eiid3 -> eiid4 [label=<<font color="red">rf</font>>, color="red", fontsize=11, p eiid4 -> eiid5 [label=<<font color="darkgreen">fence</font><font color="indigo">ppo</font>>, color="darkgreen:indigo", fontsize=11, penwidth="3.000000", arrowsize="0.666700"]; eiid5 -> eiid0 [label=<<font color="#ffa040">fr</font>>, color="#ffa040", fontsize=11, penwidth="3.000000", arrowsize="0.666700"]; } - - diff --git a/src/images/riscv-horizontal-color.svg b/src/images/riscv-horizontal-color.svg index be6e6b9..8b6e13b 100644 --- a/src/images/riscv-horizontal-color.svg +++ b/src/images/riscv-horizontal-color.svg @@ -1,36 +1,36 @@ <?xml version="1.0" encoding="utf-8"?> <!-- Generator: Adobe Illustrator 26.4.0, SVG Export Plug-In . SVG Version: 6.00 Build 0) --> <svg version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" - viewBox="0 0 1000 175.4" style="enable-background:new 0 0 1000 175.4;" xml:space="preserve"> + viewBox="0 0 1000 175.4" style="enable-background:new 0 0 1000 175.4;" xml:space="preserve"> <style type="text/css"> - .st0{fill:#E6AC2C;} - .st1{fill:#2C356D;} + .st0{fill:#E6AC2C;} + .st1{fill:#2C356D;} </style> <g> - <g> - <path class="st0" d="M91.6,58.2c0,22.9-13.9,43.7-40.9,48.6l38.1,45.1l3.4-4.8L161,49.9V9H49.3C77.7,11.7,91.6,35.3,91.6,58.2z"/> - <path class="st1" d="M13.2,85.9h26.4C59,85.9,68.7,72,68.7,58.2c0-13.9-9.7-27.1-29.1-27.1H3.5v135.3h67.3L13.2,97.1V85.9z - M111,159.5l50.6-70.8v77.7H106L111,159.5z"/> - <rect x="392.7" y="35.3" class="st1" width="22.9" height="107.5"/> - <path class="st1" d="M552,119.3l-119.1-0.1v23.6h120c9,0,16.7-3.5,22.9-9.7s9.7-13.9,9.7-22.9s-3.5-16.6-9.7-22.9 - c-6.2-6.2-13.9-9.7-22.9-9.7L465.6,77c-5.1,0-9.3-4.2-9.4-9.3l0,0l0,0c0-5.2,4.2-9.3,9.4-9.3l119.9-0.1v-23h-120 - c-9,0-16.7,3.5-22.9,9.7c-6.2,6.2-9.7,13.9-9.7,22.9s3.5,16.6,9.7,22.9c6.2,6.2,13.9,9,22.9,9h87.2c5.2,0,9.4,4.2,9.3,9.4l0,0l0,0 - C562.1,114.8,557.6,119.3,552,119.3z"/> - <path class="st1" d="M650,35.3h99.2v22.9H650c-8.3,0-15.2,2.7-21.5,9c-6.2,6.2-9,13.2-9,21.5s2.8,15.2,9,21.5 - c6.3,6.2,13.2,9,21.5,9h99.2v23.6H650c-14.6,0-27.1-5.6-37.5-16s-15.2-22.9-15.2-37.5s4.8-27.1,15.2-37.5 - C623,40.9,635.4,35.3,650,35.3z"/> - <path class="st1" d="M342,78.5l-95.7-0.2V58.9l96.7-0.2c5.2,0,9.5,4.2,9.5,9.4l0,0l0,0C352.4,73.9,347.8,78.5,342,78.5z - M376,142.8l-30.5-42.3c8.3-0.7,15.2-3.5,20.8-9.7c6.2-6.3,9.7-13.9,9.7-22.9s-3.5-16.7-9.7-22.9s-13.9-9.7-22.9-9.7h-120v107.5 - h22.9v-42.3h70.8l30.5,42.3H376z"/> - <polyline class="st0" points="863.7,142.8 800.8,35.3 827.7,35.3 876.8,120.6 926,35.3 952.5,35.3 890,142.8 "/> - <rect x="763.1" y="79" class="st0" width="45.1" height="20.8"/> - </g> - <g> - <path class="st1" d="M996.5,52.5c0,9.4-7.3,16.7-16.9,16.7c-9.5,0-17-7.3-17-16.7c0-9.2,7.5-16.5,17-16.5 - C989.1,36,996.5,43.3,996.5,52.5z M966.8,52.5c0,7.3,5.4,13.2,12.9,13.2c7.2,0,12.6-5.8,12.6-13.1s-5.3-13.3-12.7-13.3 - S966.8,45.2,966.8,52.5z M977,61.1h-3.8V44.6c1.5-0.3,3.6-0.5,6.3-0.5c3.1,0,4.5,0.5,5.7,1.2c0.9,0.7,1.6,2,1.6,3.6 - c0,1.8-1.4,3.2-3.4,3.8V53c1.6,0.6,2.5,1.8,3,4c0.5,2.5,0.8,3.5,1.2,4.1h-4.1c-0.5-0.6-0.8-2.1-1.3-4c-0.3-1.8-1.3-2.6-3.4-2.6 - H977V61.1L977,61.1z M977,51.8h1.8c2.1,0,3.8-0.7,3.8-2.4c0-1.5-1.1-2.5-3.5-2.5c-1,0-1.7,0.1-2.1,0.2V51.8z"/> - </g> + <g> + <path class="st0" d="M91.6,58.2c0,22.9-13.9,43.7-40.9,48.6l38.1,45.1l3.4-4.8L161,49.9V9H49.3C77.7,11.7,91.6,35.3,91.6,58.2z"/> + <path class="st1" d="M13.2,85.9h26.4C59,85.9,68.7,72,68.7,58.2c0-13.9-9.7-27.1-29.1-27.1H3.5v135.3h67.3L13.2,97.1V85.9z + M111,159.5l50.6-70.8v77.7H106L111,159.5z"/> + <rect x="392.7" y="35.3" class="st1" width="22.9" height="107.5"/> + <path class="st1" d="M552,119.3l-119.1-0.1v23.6h120c9,0,16.7-3.5,22.9-9.7s9.7-13.9,9.7-22.9s-3.5-16.6-9.7-22.9 + c-6.2-6.2-13.9-9.7-22.9-9.7L465.6,77c-5.1,0-9.3-4.2-9.4-9.3l0,0l0,0c0-5.2,4.2-9.3,9.4-9.3l119.9-0.1v-23h-120 + c-9,0-16.7,3.5-22.9,9.7c-6.2,6.2-9.7,13.9-9.7,22.9s3.5,16.6,9.7,22.9c6.2,6.2,13.9,9,22.9,9h87.2c5.2,0,9.4,4.2,9.3,9.4l0,0l0,0 + C562.1,114.8,557.6,119.3,552,119.3z"/> + <path class="st1" d="M650,35.3h99.2v22.9H650c-8.3,0-15.2,2.7-21.5,9c-6.2,6.2-9,13.2-9,21.5s2.8,15.2,9,21.5 + c6.3,6.2,13.2,9,21.5,9h99.2v23.6H650c-14.6,0-27.1-5.6-37.5-16s-15.2-22.9-15.2-37.5s4.8-27.1,15.2-37.5 + C623,40.9,635.4,35.3,650,35.3z"/> + <path class="st1" d="M342,78.5l-95.7-0.2V58.9l96.7-0.2c5.2,0,9.5,4.2,9.5,9.4l0,0l0,0C352.4,73.9,347.8,78.5,342,78.5z + M376,142.8l-30.5-42.3c8.3-0.7,15.2-3.5,20.8-9.7c6.2-6.3,9.7-13.9,9.7-22.9s-3.5-16.7-9.7-22.9s-13.9-9.7-22.9-9.7h-120v107.5 + h22.9v-42.3h70.8l30.5,42.3H376z"/> + <polyline class="st0" points="863.7,142.8 800.8,35.3 827.7,35.3 876.8,120.6 926,35.3 952.5,35.3 890,142.8 "/> + <rect x="763.1" y="79" class="st0" width="45.1" height="20.8"/> + </g> + <g> + <path class="st1" d="M996.5,52.5c0,9.4-7.3,16.7-16.9,16.7c-9.5,0-17-7.3-17-16.7c0-9.2,7.5-16.5,17-16.5 + C989.1,36,996.5,43.3,996.5,52.5z M966.8,52.5c0,7.3,5.4,13.2,12.9,13.2c7.2,0,12.6-5.8,12.6-13.1s-5.3-13.3-12.7-13.3 + S966.8,45.2,966.8,52.5z M977,61.1h-3.8V44.6c1.5-0.3,3.6-0.5,6.3-0.5c3.1,0,4.5,0.5,5.7,1.2c0.9,0.7,1.6,2,1.6,3.6 + c0,1.8-1.4,3.2-3.4,3.8V53c1.6,0.6,2.5,1.8,3,4c0.5,2.5,0.8,3.5,1.2,4.1h-4.1c-0.5-0.6-0.8-2.1-1.3-4c-0.3-1.8-1.3-2.6-3.4-2.6 + H977V61.1L977,61.1z M977,51.8h1.8c2.1,0,3.8-0.7,3.8-2.4c0-1.5-1.1-2.5-3.5-2.5c-1,0-1.7,0.1-2.1,0.2V51.8z"/> + </g> </g> </svg> diff --git a/src/images/wavedrom/atomic-mem.adoc b/src/images/wavedrom/atomic-mem.adoc deleted file mode 100644 index ef66028..0000000 --- a/src/images/wavedrom/atomic-mem.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 9.4 Atomic Memory Operations - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', type: 8, attr: ['7','AMO','AMO','AMO','AMO','AMO','AMO','AMO']}, - {bits: 5, name: 'rd', type: 2, attr: ['5','dest','dest','dest','dest','dest','dest','dest']}, - {bits: 3, name: 'funct3', type: 8, attr: ['3','width','width','width','width','width','width','width']}, - {bits: 5, name: 'rs1', type: 4, attr: ['5','addr','addr','addr','addr','addr','addr','addr']}, - {bits: 5, name: 'rs2', type: 4, attr: ['5','src','src','src','src','src','src','src']}, - {bits: 1, name: 'rl', type: 8, attr: ['1']}, - {bits: 1, name: 'aq', type: 8, attr: ['1']}, - {bits: 6, name: 'funct5', type: 8, attr: ['5','AMOSWAP.W/D', 'AMOADD.W/D', 'AMOAND.W/D', 'AMOOR.W/D', 'AMOXOR.W/D', 'AMOMAX[U].W/D','AMOMIN[U].W/D']}, -], config: {bits: 32}} -.... diff --git a/src/images/wavedrom/atomic-mem.edn b/src/images/wavedrom/atomic-mem.edn new file mode 100644 index 0000000..1e95eb4 --- /dev/null +++ b/src/images/wavedrom/atomic-mem.edn @@ -0,0 +1,15 @@ +//## 9.4 Atomic Memory Operations + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','AMO','AMO','AMO','AMO','AMO','AMO','AMO']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest','dest','dest','dest']}, + {bits: 3, name: 'funct3', attr: ['3','width','width','width','width','width','width','width']}, + {bits: 5, name: 'rs1', attr: ['5','addr','addr','addr','addr','addr','addr','addr']}, + {bits: 5, name: 'rs2', attr: ['5','src','src','src','src','src','src','src']}, + {bits: 1, name: 'rl', attr: ['1']}, + {bits: 1, name: 'aq', attr: ['1']}, + {bits: 6, name: 'funct5', attr: ['5','AMOSWAP.W/D', 'AMOADD.W/D', 'AMOAND.W/D', 'AMOOR.W/D', 'AMOXOR.W/D', 'AMOMAX[U].W/D','AMOMIN[U].W/D']}, +], config: {bits: 32}} +.... diff --git a/src/images/wavedrom/b-immediate.edn b/src/images/wavedrom/b-immediate.edn new file mode 100644 index 0000000..138b0aa --- /dev/null +++ b/src/images/wavedrom/b-immediate.edn @@ -0,0 +1,12 @@ +//#### B-immediate + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: '0'}, + {bits: 4, name: 'inst[11:8]'}, + {bits: 6, name: 'inst[30:25]'}, + {bits: 1, name: '[7]'}, + {bits: 20, name: '— inst[31] —'}, +], config:{fontsize: 12, label:{right: 'B-immediate'}}} +.... diff --git a/src/images/wavedrom/c-andi.adoc b/src/images/wavedrom/c-andi.adoc deleted file mode 100644 index 5eca644..0000000 --- a/src/images/wavedrom/c-andi.adoc +++ /dev/null @@ -1,13 +0,0 @@ -//c-andi.adoc - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 5, attr: ['2','C1'],}, - {bits: 5, name: 'imm[4:0]', type: 5, attr: ['5','imm[4:0]']}, - {bits: 3, name: 'rd′/rs1′', type: 5, attr: ['3','dest'],}, - {bits: 2, name: 'funct2', type: 5, attr: ['2','C.ANDI'],}, - {bits: 1, name: 'imm[5]', type: 1, attr: ['1','imm[5]'],}, - {bits: 3, name: 'funct3', type: 5, attr: ['3','C.ANDI'],}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/c-andi.edn b/src/images/wavedrom/c-andi.edn new file mode 100644 index 0000000..e0ab053 --- /dev/null +++ b/src/images/wavedrom/c-andi.edn @@ -0,0 +1,13 @@ +//c-andi.adoc + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C1'],}, + {bits: 5, name: 'imm[4:0]', attr: ['5','imm[4:0]']}, + {bits: 3, name: 'rd′/rs1′', attr: ['3','dest'],}, + {bits: 2, name: 'funct2', attr: ['2','C.ANDI'],}, + {bits: 1, name: 'imm[5]', attr: ['1','imm[5]'],}, + {bits: 3, name: 'funct3', attr: ['3','C.ANDI'],}, +]} +.... diff --git a/src/images/wavedrom/c-breakpoint-instr.adoc b/src/images/wavedrom/c-breakpoint-instr.adoc deleted file mode 100644 index 99ae2d5..0000000 --- a/src/images/wavedrom/c-breakpoint-instr.adoc +++ /dev/null @@ -1,11 +0,0 @@ -// - -[wavedrom, ,svg] - -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2','C2'],}, - {bits: 10, name: '0', type: 4, attr: ['10','0'],}, - {bits: 4, name: 'funct4', type: 8, attr: ['4','C.EBREAK'],}, -], config: {bits: 16}} -....
\ No newline at end of file diff --git a/src/images/wavedrom/c-breakpoint-instr.edn b/src/images/wavedrom/c-breakpoint-instr.edn new file mode 100644 index 0000000..25e245e --- /dev/null +++ b/src/images/wavedrom/c-breakpoint-instr.edn @@ -0,0 +1,11 @@ +// + +[wavedrom, ,svg] + +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C2'],}, + {bits: 10, name: '0', attr: ['10','0'],}, + {bits: 4, name: 'funct4', attr: ['4','C.EBREAK'],}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-cb-format-ls.adoc b/src/images/wavedrom/c-cb-format-ls.adoc deleted file mode 100644 index daf2248..0000000 --- a/src/images/wavedrom/c-cb-format-ls.adoc +++ /dev/null @@ -1,13 +0,0 @@ -//c-cb-format-ls - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2','C1', 'C1']}, - {bits: 5, name: 'imm', type: 3, attr: ['5','offset[7:6|2:1|5]', 'offset[7:6|2:1|5]']}, - {bits: 3, name: 'rs1′', type: 4, attr: ['3','src', 'src']}, - {bits: 3, name: 'imm', type: 3, attr: ['3','offset[8|4:3]', 'offset[8|4:3]'],}, - {bits: 3, name: 'funct3', type: 8, attr: ['3','C.BEQZ', 'C.BNEZ'],}, -], config: {bits: 16}} -.... - diff --git a/src/images/wavedrom/c-cb-format-ls.edn b/src/images/wavedrom/c-cb-format-ls.edn new file mode 100644 index 0000000..3ffd7a0 --- /dev/null +++ b/src/images/wavedrom/c-cb-format-ls.edn @@ -0,0 +1,12 @@ +//c-cb-format-ls + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C1', 'C1']}, + {bits: 5, name: 'imm', attr: ['5','offset[7:6|2:1|5]', 'offset[7:6|2:1|5]']}, + {bits: 3, name: 'rs1′', attr: ['3','src', 'src']}, + {bits: 3, name: 'imm', attr: ['3','offset[8|4:3]', 'offset[8|4:3]'],}, + {bits: 3, name: 'funct3', attr: ['3','C.BEQZ', 'C.BNEZ'],}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-ci.adoc b/src/images/wavedrom/c-ci.adoc deleted file mode 100644 index 7dae51e..0000000 --- a/src/images/wavedrom/c-ci.adoc +++ /dev/null @@ -1,13 +0,0 @@ -// - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 3, attr: ['2', 'C2']}, - {bits: 5, name: 'shamt[4:0]', type: 1, attr: ['5', 'shamt[4:0]']}, - {bits: 5, name: 'rd/rs1', type: 5, attr: ['5', 'dest != 0']}, - {bits: 1, name: 'shamt[5]', type: 5, attr: ['1', 'shamt[5]']}, - {bits: 3, name: 'funct3', type: 5, attr: ['3', 'C.SLLI']}, -]} -.... - diff --git a/src/images/wavedrom/c-ci.edn b/src/images/wavedrom/c-ci.edn new file mode 100644 index 0000000..4f36a63 --- /dev/null +++ b/src/images/wavedrom/c-ci.edn @@ -0,0 +1,12 @@ +// + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2', 'C2']}, + {bits: 5, name: 'shamt[4:0]', attr: ['5', 'shamt[4:0]']}, + {bits: 5, name: 'rd/rs1', attr: ['5', 'dest != 0']}, + {bits: 1, name: 'shamt[5]', attr: ['1', 'shamt[5]']}, + {bits: 3, name: 'funct3', attr: ['3', 'C.SLLI']}, +]} +.... diff --git a/src/images/wavedrom/c-ciw.adoc b/src/images/wavedrom/c-ciw.adoc deleted file mode 100644 index 111b272..0000000 --- a/src/images/wavedrom/c-ciw.adoc +++ /dev/null @@ -1,12 +0,0 @@ -//c-ciw.adoc - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 3, attr: ['2','C0'],}, - {bits: 3, name: 'rd′', type: 5, attr: ['3','dest'],}, - {bits: 8, name: 'imm', type: 5, attr: ['8','nzuimm[5:4|9:6|2|3]']}, - {bits: 3, name: 'funct3', type: 5, attr: ['3','C.ADDI4SPN']}, -], config: {bits: 16}} -.... - diff --git a/src/images/wavedrom/c-ciw.edn b/src/images/wavedrom/c-ciw.edn new file mode 100644 index 0000000..5486c78 --- /dev/null +++ b/src/images/wavedrom/c-ciw.edn @@ -0,0 +1,11 @@ +//c-ciw.adoc + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C0'],}, + {bits: 3, name: 'rd′', attr: ['3','dest'],}, + {bits: 8, name: 'imm', attr: ['8','nzuimm[5:4|9:6|2|3]']}, + {bits: 3, name: 'funct3', attr: ['3','C.ADDI4SPN']}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-cj-format-ls.adoc b/src/images/wavedrom/c-cj-format-ls.adoc deleted file mode 100644 index 1ecbd35..0000000 --- a/src/images/wavedrom/c-cj-format-ls.adoc +++ /dev/null @@ -1,23 +0,0 @@ -//c-cj-format-ls - -//[wavedrom, ,svg] -//.... -//{reg: [ -// {bits: 2, name: 'op', type: 4, attr: ['2','CI','CI']}, -// {bits: 10, name: 'imm', type: 2, }, -// {bits: 4, name: 'funct3' type: 4, attr:['3','CJ','CJAL']}, -//] config: {bits: 16}} -//.... - - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2','C1','C1']}, - {bits: 11, name: 'imm', type: 2, attr: ['11','offset[11|4|9:8|10|6|7|3:1|5]','offset[11|4|9:8|10|6|7|3:1|5]']}, - {bits: 3, name: 'funct3', type: 8, attr: ['3','C.J','C.JAL']}, -], config: {bits: 16}} -.... - - - diff --git a/src/images/wavedrom/c-cj-format-ls.edn b/src/images/wavedrom/c-cj-format-ls.edn new file mode 100644 index 0000000..b6e3a98 --- /dev/null +++ b/src/images/wavedrom/c-cj-format-ls.edn @@ -0,0 +1,20 @@ +//c-cj-format-ls + +//[wavedrom, ,svg] +//.... +//{reg: [ +// {bits: 2, name: 'op', attr: ['2','CI','CI']}, +// {bits: 10, name: 'imm'}, +// {bits: 4, name: 'funct3' attr:['3','CJ','CJAL']}, +//] config: {bits: 16}} +//.... + + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C1','C1']}, + {bits: 11, name: 'imm', attr: ['11','offset[11|4|9:8|10|6|7|3:1|5]','offset[11|4|9:8|10|6|7|3:1|5]']}, + {bits: 3, name: 'funct3', attr: ['3','C.J','C.JAL']}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-cr-format-ls.adoc b/src/images/wavedrom/c-cr-format-ls.adoc deleted file mode 100644 index 0329261..0000000 --- a/src/images/wavedrom/c-cr-format-ls.adoc +++ /dev/null @@ -1,12 +0,0 @@ -//These instructions use the CR format. - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2','C2', 'C2']}, - {bits: 5, name: 'rs2', type: 4, attr: ['5','0', '0']}, - {bits: 5, name: 'rs1', type: 4, attr: ['5','src≠0', 'src≠0']}, - {bits: 4, name: 'funct4', type: 8, attr: ['4','C.JR', 'C.JALR']}, -], config: {bits: 16}} -.... - diff --git a/src/images/wavedrom/c-cr-format-ls.edn b/src/images/wavedrom/c-cr-format-ls.edn new file mode 100644 index 0000000..a584426 --- /dev/null +++ b/src/images/wavedrom/c-cr-format-ls.edn @@ -0,0 +1,11 @@ +//These instructions use the CR format. + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C2', 'C2']}, + {bits: 5, name: 'rs2', attr: ['5','0', '0']}, + {bits: 5, name: 'rs1', attr: ['5','src≠0', 'src≠0']}, + {bits: 4, name: 'funct4', attr: ['4','C.JR', 'C.JALR']}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-cs-format-ls.adoc b/src/images/wavedrom/c-cs-format-ls.adoc deleted file mode 100644 index 1f759a7..0000000 --- a/src/images/wavedrom/c-cs-format-ls.adoc +++ /dev/null @@ -1,16 +0,0 @@ -//## 16.X Load and Store Instructions -//### c-cs-format-ls - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2', 'C0','C0','C0','C0','C0']}, - {bits: 3, name: 'rs2ʹ', type: 3, attr: ['3', 'src','src','src','src','src']}, - {bits: 2, name: 'imm', type: 2, attr: ['2', 'offset[2|6]','offset[7:6]','offset[7:6]','offset[2|6]','offset[7:6]']}, - {bits: 3, name: 'rs1ʹ', type: 3, attr: ['3', 'base','base','base','base','base']}, - {bits: 3, name: 'imm', type: 3, attr: ['3', 'offset[5:3]','offset[5:3]','offset[5|4|8]','offset[5:3]','offset[5:3]']}, - {bits: 3, name: 'funct3', type: 8, attr: ['3', 'C.SW','C.SD','C.SQ','C.FSW','C.FSD']}, -], config: {bits: 16}} -.... - - diff --git a/src/images/wavedrom/c-cs-format-ls.edn b/src/images/wavedrom/c-cs-format-ls.edn new file mode 100644 index 0000000..e9f8726 --- /dev/null +++ b/src/images/wavedrom/c-cs-format-ls.edn @@ -0,0 +1,14 @@ +//## 16.X Load and Store Instructions +//### c-cs-format-ls + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2', 'C0','C0','C0','C0','C0']}, + {bits: 3, name: 'rs2ʹ', attr: ['3', 'src','src','src','src','src']}, + {bits: 2, name: 'imm', attr: ['2', 'offset[2|6]','offset[7:6]','offset[7:6]','offset[2|6]','offset[7:6]']}, + {bits: 3, name: 'rs1ʹ', attr: ['3', 'base','base','base','base','base']}, + {bits: 3, name: 'imm', attr: ['3', 'offset[5:3]','offset[5:3]','offset[5|4|8]','offset[5:3]','offset[5:3]']}, + {bits: 3, name: 'funct3', attr: ['3', 'C.SW','C.SD','C.SQ','C.FSW','C.FSD']}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-def-illegal-inst.adoc b/src/images/wavedrom/c-def-illegal-inst.adoc deleted file mode 100644 index add949d..0000000 --- a/src/images/wavedrom/c-def-illegal-inst.adoc +++ /dev/null @@ -1,13 +0,0 @@ -// - -[wavedrom, ,svg] - -.... -{reg: [ - {bits: 2, name: '0', type: 8, attr: ['2','0'],}, - {bits: 5, name: '0', type: 4, attr: ['5','0'],}, - {bits: 5, name: '0', type: 8, attr: ['5','0'],}, - {bits: 1, name: '0', type: 8, attr: ['1','0'],}, - {bits: 3, name: '0', type: 8, attr: ['3','0'],}, -], config: {bits: 16}} -....
\ No newline at end of file diff --git a/src/images/wavedrom/c-def-illegal-inst.edn b/src/images/wavedrom/c-def-illegal-inst.edn new file mode 100644 index 0000000..5d05eb7 --- /dev/null +++ b/src/images/wavedrom/c-def-illegal-inst.edn @@ -0,0 +1,13 @@ +// + +[wavedrom, ,svg] + +.... +{reg: [ + {bits: 2, name: '0', attr: ['2','0'],}, + {bits: 5, name: '0', attr: ['5','0'],}, + {bits: 5, name: '0', attr: ['5','0'],}, + {bits: 1, name: '0', attr: ['1','0'],}, + {bits: 3, name: '0', attr: ['3','0'],}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-int-reg-immed.adoc b/src/images/wavedrom/c-int-reg-immed.adoc deleted file mode 100644 index 45168d7..0000000 --- a/src/images/wavedrom/c-int-reg-immed.adoc +++ /dev/null @@ -1,12 +0,0 @@ -//c-int-reg-immed.adoc - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 3, attr: ['2','C1', 'C1', 'C1']}, - {bits: 5, name: 'imm[4:]', type: 1, attr: ['5','nzimm[4:0]', 'imm[4:0]', 'nzimm[4|6|8:7|5]']}, - {bits: 5, name: 'rd/rs1', type: 5, attr: ['5','dest != 0', 'dest != 0', '2']}, - {bits: 1, name: 'imm[5]', type: 5, attr: ['1','nzimm[5]', 'imm[5]', 'nzimm[9]']}, - {bits: 3, name: 'funct3', type: 5, attr: ['3','C.ADDI', 'C.ADDIW', 'C.ADDI16SP']}, -], config: {bits: 16}} -.... diff --git a/src/images/wavedrom/c-int-reg-immed.edn b/src/images/wavedrom/c-int-reg-immed.edn new file mode 100644 index 0000000..f509065 --- /dev/null +++ b/src/images/wavedrom/c-int-reg-immed.edn @@ -0,0 +1,12 @@ +//c-int-reg-immed.adoc + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C1', 'C1', 'C1']}, + {bits: 5, name: 'imm[4:]', attr: ['5','nzimm[4:0]', 'imm[4:0]', 'nzimm[4|6|8:7|5]']}, + {bits: 5, name: 'rd/rs1', attr: ['5','dest != 0', 'dest != 0', '2']}, + {bits: 1, name: 'imm[5]', attr: ['1','nzimm[5]', 'imm[5]', 'nzimm[9]']}, + {bits: 3, name: 'funct3', attr: ['3','C.ADDI', 'C.ADDIW', 'C.ADDI16SP']}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-int-reg-to-reg-ca-format.adoc b/src/images/wavedrom/c-int-reg-to-reg-ca-format.adoc deleted file mode 100644 index b2cf982..0000000 --- a/src/images/wavedrom/c-int-reg-to-reg-ca-format.adoc +++ /dev/null @@ -1,13 +0,0 @@ -// - -[wavedrom, ,svg] - -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2', 'C1', 'C1', 'C1', 'C1', 'C1', 'C1'],}, - {bits: 3, name: 'rs2′', type: 4, attr: ['3', 'src', 'src', 'src', 'src', 'src', 'src'],}, - {bits: 2, name: 'funct2', type: 8, attr: ['2', 'C.AND', 'C.OR', 'C.XOR', 'C.SUB', 'C.ADDW', 'C.SUBW'],}, - {bits: 3, name: 'rd′/rs1′', type: 7, attr: ['3', 'dest', 'dest', 'dest', 'dest', 'dest', 'dest'],}, - {bits: 6, name: 'funct6', type: 8, attr: ['6', 'C.AND', 'C.OR', 'C.XOR', 'C.SUB', 'C.ADDW', 'C.SUBW'],}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/c-int-reg-to-reg-ca-format.edn b/src/images/wavedrom/c-int-reg-to-reg-ca-format.edn new file mode 100644 index 0000000..33749e6 --- /dev/null +++ b/src/images/wavedrom/c-int-reg-to-reg-ca-format.edn @@ -0,0 +1,13 @@ +// + +[wavedrom, ,svg] + +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2', 'C1', 'C1', 'C1', 'C1', 'C1', 'C1'],}, + {bits: 3, name: 'rs2′', attr: ['3', 'src', 'src', 'src', 'src', 'src', 'src'],}, + {bits: 2, name: 'funct2', attr: ['2', 'C.AND', 'C.OR', 'C.XOR', 'C.SUB', 'C.ADDW', 'C.SUBW'],}, + {bits: 3, name: 'rd′/rs1′', attr: ['3', 'dest', 'dest', 'dest', 'dest', 'dest', 'dest'],}, + {bits: 6, name: 'funct6', attr: ['6', 'C.AND', 'C.OR', 'C.XOR', 'C.SUB', 'C.ADDW', 'C.SUBW'],}, +]} +.... diff --git a/src/images/wavedrom/c-int-reg-to-reg-cr-format.adoc b/src/images/wavedrom/c-int-reg-to-reg-cr-format.adoc deleted file mode 100644 index 5e607f8..0000000 --- a/src/images/wavedrom/c-int-reg-to-reg-cr-format.adoc +++ /dev/null @@ -1,12 +0,0 @@ -// - -[wavedrom, ,svg] - -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2', 'C2', 'C2'],}, - {bits: 5, name: 'rs2', type: 4, attr: ['5', 'src≠0', 'src≠0'],}, - {bits: 5, name: 'rd/rs1', type: 7, attr: ['5', 'dest≠0', 'dest≠0'],}, - {bits: 4, name: 'funct4', type: 8, attr: ['4', 'C.MV', 'C.ADD'],}, -], config: {bits: 16}} -....
\ No newline at end of file diff --git a/src/images/wavedrom/c-int-reg-to-reg-cr-format.edn b/src/images/wavedrom/c-int-reg-to-reg-cr-format.edn new file mode 100644 index 0000000..3cb28c7 --- /dev/null +++ b/src/images/wavedrom/c-int-reg-to-reg-cr-format.edn @@ -0,0 +1,12 @@ +// + +[wavedrom, ,svg] + +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2', 'C2', 'C2'],}, + {bits: 5, name: 'rs2', attr: ['5', 'src≠0', 'src≠0'],}, + {bits: 5, name: 'rd/rs1', attr: ['5', 'dest≠0', 'dest≠0'],}, + {bits: 4, name: 'funct4', attr: ['4', 'C.MV', 'C.ADD'],}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-integer-const-gen.adoc b/src/images/wavedrom/c-integer-const-gen.adoc deleted file mode 100644 index 732961b..0000000 --- a/src/images/wavedrom/c-integer-const-gen.adoc +++ /dev/null @@ -1,13 +0,0 @@ -//c-integer-const-gen - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 3, attr: ['2','C1', 'C1']}, - {bits: 5, name: 'imm[4:0]', type: 1, attr: ['5','imm[4:0]','imm[16:12]']}, - {bits: 5, name: 'rd', type: 5, attr: ['5','dest != 0', 'dest != {0, 2}']}, - {bits: 1, name: 'imm[5]', type: 5, attr: ['1','imm[5]', 'nzimm[17]'],}, - {bits: 3, name: 'funct3', type: 5, attr: ['3','C.LI', 'C.LUI'],}, -], config: {bits: 16}} -.... - diff --git a/src/images/wavedrom/c-integer-const-gen.edn b/src/images/wavedrom/c-integer-const-gen.edn new file mode 100644 index 0000000..159e462 --- /dev/null +++ b/src/images/wavedrom/c-integer-const-gen.edn @@ -0,0 +1,12 @@ +//c-integer-const-gen + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C1', 'C1']}, + {bits: 5, name: 'imm[4:0]', attr: ['5','imm[4:0]','nzimm[16:12]']}, + {bits: 5, name: 'rd', attr: ['5','dest != 0', 'dest != {0, 2}']}, + {bits: 1, name: 'imm[5]', attr: ['1','imm[5]', 'nzimm[17]'],}, + {bits: 3, name: 'funct3', attr: ['3','C.LI', 'C.LUI'],}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-mop.adoc b/src/images/wavedrom/c-mop.adoc deleted file mode 100644 index 0aee8e4..0000000 --- a/src/images/wavedrom/c-mop.adoc +++ /dev/null @@ -1,12 +0,0 @@ -[wavedrom, ,svg] -.... -{reg:[ - { bits: 2, name: 0x1, type: 8 }, - { bits: 5, name: 0x0 }, - { bits: 1, name: 0x1, type: 4 }, - { bits: 3, name: 'n[3:1]', type: 4 }, - { bits: 1, name: 0x0, type: 4 }, - { bits: 1, name: 0x0 }, - { bits: 3, name: 0x3 }, -]} -.... diff --git a/src/images/wavedrom/c-mop.edn b/src/images/wavedrom/c-mop.edn new file mode 100644 index 0000000..9b850a5 --- /dev/null +++ b/src/images/wavedrom/c-mop.edn @@ -0,0 +1,12 @@ +[wavedrom, ,svg] +.... +{reg:[ + { bits: 2, name: 0x1 }, + { bits: 5, name: 0x0 }, + { bits: 1, name: 0x1 }, + { bits: 3, name: 'n[3:1]' }, + { bits: 1, name: 0x0 }, + { bits: 1, name: 0x0 }, + { bits: 3, name: 0x3 }, +]} +.... diff --git a/src/images/wavedrom/c-nop-instr.adoc b/src/images/wavedrom/c-nop-instr.adoc deleted file mode 100644 index e3fada1..0000000 --- a/src/images/wavedrom/c-nop-instr.adoc +++ /dev/null @@ -1,13 +0,0 @@ -// - -[wavedrom, ,svg] - -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2','C1'],}, - {bits: 5, name: 'imm[4:0]', type: 4, attr: ['5','0'],}, - {bits: 5, name: 'rd/rs1', type: 8, attr: ['5','0'],}, - {bits: 1, name: 'imm[5]', type: 8, attr: ['1','0'],}, - {bits: 3, name: 'funct3', type: 8, attr: ['3','C.NOP'],}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/c-nop-instr.edn b/src/images/wavedrom/c-nop-instr.edn new file mode 100644 index 0000000..d6770dc --- /dev/null +++ b/src/images/wavedrom/c-nop-instr.edn @@ -0,0 +1,13 @@ +// + +[wavedrom, ,svg] + +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C1'],}, + {bits: 5, name: 'imm[4:0]', attr: ['5','0'],}, + {bits: 5, name: 'rd/rs1', attr: ['5','0'],}, + {bits: 1, name: 'imm[5]', attr: ['1','0'],}, + {bits: 3, name: 'funct3', attr: ['3','C.NOP'],}, +]} +.... diff --git a/src/images/wavedrom/c-sp-load-store-css.adoc b/src/images/wavedrom/c-sp-load-store-css.adoc deleted file mode 100644 index 2cafcd8..0000000 --- a/src/images/wavedrom/c-sp-load-store-css.adoc +++ /dev/null @@ -1,14 +0,0 @@ -//c-sp load and store, css format--is this correct? - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2','C2','C2','C2','C2','C2']}, - {bits: 5, name: 'rs2', type: 4, attr: ['5','src', 'src', 'src', 'src', 'src']}, - {bits: 6, name: 'imm', type: 3, attr: ['6','offset[5:2|7:6]', 'offset[5:3|8:6]', 'offset[5:4|9:6]', 'offset[5:2|7:6]','offset[5:3|8:6]']}, - {bits: 3, name: 'funct3', type: 8, attr: ['3','C.SWSP', 'C.SDSP', 'C.SQSP', 'C.FSWSP', 'C.FSDSP']}, -], config: {bits: 16}} -.... - - - diff --git a/src/images/wavedrom/c-sp-load-store-css.edn b/src/images/wavedrom/c-sp-load-store-css.edn new file mode 100644 index 0000000..3bca126 --- /dev/null +++ b/src/images/wavedrom/c-sp-load-store-css.edn @@ -0,0 +1,11 @@ +//c-sp load and store, css format--is this correct? + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C2','C2','C2','C2','C2']}, + {bits: 5, name: 'rs2', attr: ['5','src', 'src', 'src', 'src', 'src']}, + {bits: 6, name: 'imm', attr: ['6','offset[5:2|7:6]', 'offset[5:3|8:6]', 'offset[5:4|9:6]', 'offset[5:2|7:6]','offset[5:3|8:6]']}, + {bits: 3, name: 'funct3', attr: ['3','C.SWSP', 'C.SDSP', 'C.SQSP', 'C.FSWSP', 'C.FSDSP']}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-sp-load-store.adoc b/src/images/wavedrom/c-sp-load-store.adoc deleted file mode 100644 index c39f2f6..0000000 --- a/src/images/wavedrom/c-sp-load-store.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 16.3 Load and Store Instructions -//### Stack-Pointer-Based Loads and Stores - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8, attr: ['2','C2','C2','C2','C2','C2']}, - {bits: 5, name: 'imm', type: 5, attr: ['5','offset[4:2|7:6]', 'offset[4:3|8:6]', 'offset[4|9:6]', 'offset[4:2|7:6]', 'offset[4:3|8:6]']}, - {bits: 5, name: 'rd', type: 5, attr: ['5','dest≠0', 'dest≠0', 'dest≠0', 'dest', 'dest']}, - {bits: 1, name: 'imm', type: 1, attr: ['1','offset[5]','offset[5]','offset[5]','offset[5]','offset[5]']}, - {bits: 3, name: 'funct3', type: 3, attr: ['3','C.LWSP', 'C.LDSP', 'C.LQSP', 'C.FLWSP', 'C.FLDSP']}, -], config: {bits: 16}} -.... - - diff --git a/src/images/wavedrom/c-sp-load-store.edn b/src/images/wavedrom/c-sp-load-store.edn new file mode 100644 index 0000000..36a497a --- /dev/null +++ b/src/images/wavedrom/c-sp-load-store.edn @@ -0,0 +1,13 @@ +//## 16.3 Load and Store Instructions +//### Stack-Pointer-Based Loads and Stores + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C2','C2','C2','C2','C2']}, + {bits: 5, name: 'imm', attr: ['5','offset[4:2|7:6]', 'offset[4:3|8:6]', 'offset[4|9:6]', 'offset[4:2|7:6]', 'offset[4:3|8:6]']}, + {bits: 5, name: 'rd', attr: ['5','dest≠0', 'dest≠0', 'dest≠0', 'dest', 'dest']}, + {bits: 1, name: 'imm', attr: ['1','offset[5]','offset[5]','offset[5]','offset[5]','offset[5]']}, + {bits: 3, name: 'funct3', attr: ['3','C.LWSP', 'C.LDSP', 'C.LQSP', 'C.FLWSP', 'C.FLDSP']}, +], config: {bits: 16}} +.... diff --git a/src/images/wavedrom/c-srli-srai.adoc b/src/images/wavedrom/c-srli-srai.adoc deleted file mode 100644 index 557bb39..0000000 --- a/src/images/wavedrom/c-srli-srai.adoc +++ /dev/null @@ -1,13 +0,0 @@ -//c-srli-srai.adoc - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 3, attr: ['2','C1', 'C1'],}, - {bits: 5, name: 'shamt[4:0]', type: 1, attr: ['5','shamt[4:0]', 'shamt[4:0]'],}, - {bits: 3, name: 'rd′/rs1′', type: 5, attr: ['3','dest', 'dest'],}, - {bits: 2, name: 'funct2', type: 5, attr: ['2','C.SRLI', 'C.SRAI'],}, - {bits: 1, name: 'shamt[5]', type: 5, attr: ['1','shamt[5]', 'shamt[5]'],}, - {bits: 3, name: 'funct3', type: 5, attr: ['3','C.SRLI', 'C.SRAI'],}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/c-srli-srai.edn b/src/images/wavedrom/c-srli-srai.edn new file mode 100644 index 0000000..fc31dfe --- /dev/null +++ b/src/images/wavedrom/c-srli-srai.edn @@ -0,0 +1,13 @@ +//c-srli-srai.adoc + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op', attr: ['2','C1', 'C1'],}, + {bits: 5, name: 'shamt[4:0]', attr: ['5','shamt[4:0]', 'shamt[4:0]'],}, + {bits: 3, name: 'rd′/rs1′', attr: ['3','dest', 'dest'],}, + {bits: 2, name: 'funct2', attr: ['2','C.SRLI', 'C.SRAI'],}, + {bits: 1, name: 'shamt[5]', attr: ['1','shamt[5]', 'shamt[5]'],}, + {bits: 3, name: 'funct3', attr: ['3','C.SRLI', 'C.SRAI'],}, +]} +.... diff --git a/src/images/wavedrom/counters-diag.adoc b/src/images/wavedrom/counters-diag.edn index 8668162..5c99cb3 100644 --- a/src/images/wavedrom/counters-diag.adoc +++ b/src/images/wavedrom/counters-diag.edn @@ -4,11 +4,10 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7','SYSTEM','SYSTEM','SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest','dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3','CSRRS','CSRRS','CSRRS'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','0','0','0'], type: 8}, - {bits: 12, name: 'csr', attr: ['12','RDCYCLE[H]', 'RDTIME[H]','RDINSTRET[H]'], type: 4}, + {bits: 7, name: 'opcode', attr: ['7','SYSTEM','SYSTEM','SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest','dest']}, + {bits: 3, name: 'funct3', attr: ['3','CSRRS','CSRRS','CSRRS']}, + {bits: 5, name: 'rs1', attr: ['5','0','0','0']}, + {bits: 12, name: 'csr', attr: ['12','RDCYCLE[H]', 'RDTIME[H]','RDINSTRET[H]']}, ]} .... - diff --git a/src/images/wavedrom/cr-register.adoc b/src/images/wavedrom/cr-register.adoc deleted file mode 100644 index 63286e4..0000000 --- a/src/images/wavedrom/cr-register.adoc +++ /dev/null @@ -1,112 +0,0 @@ -//# 16 "C" Standard Extension for Compressed Instructions, Version 2.0 -//## 16.2 Compressed Instruction Formats -//Table 16.1: Compressed 16-bit RVC instruction formats. -//### CR : Register - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 5, name: 'rs2', type: 4}, - {bits: 5, name: 'rd/rs1', type: 7}, - {bits: 4, name: 'funct4', type: 8}, - ]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 5, name: 'imm', type: 3}, - {bits: 5, name: 'rd/rs1', type: 7}, - {bits: 1, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 5, name: 'rs2', type: 4}, - {bits: 6, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 3, name: 'rdʹ', type: 2}, - {bits: 8, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 3, name: 'rdʹ', type: 2}, - {bits: 2, name: 'imm', type: 3}, - {bits: 3, name: 'rs1ʹ', type: 4}, - {bits: 3, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 3, name: 'rs2ʹ', type: 4}, - {bits: 2, name: 'imm', type: 3}, - {bits: 3, name: 'rs1ʹ', type: 4}, - {bits: 3, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 3, name: 'rs2ʹ', type: 4}, - {bits: 2, name: 'funct2', type: 8}, - {bits: 3, name: 'rdʹ/rs1ʹ', type: 7}, - {bits: 6, name: 'funct6', type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 5, name: 'offset', type: 3}, - {bits: 3, name: 'rdʹ/rs1ʹ', type: 7}, - {bits: 3, name: 'offset', type: 3}, - {bits: 3, name: 'funct3', type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 11, name: 'jmp trgt', type: 3}, - {bits: 3, name: 'funct3', type: 8}, -]} -.... - -//the following configuration broke the build. -//config: { -// hflip: true, -// compact: true, -// bits: 16 * 9, lanes: 9, -// margin: {right: width / 4}, -// label: {right: ['CR : Register', 'CI : Immediate', 'CSS : Stack-relative Store', 'CIW : Wide Immediate', 'CL : Load', 'CS //: Store', 'CA : //Arithmetic', 'CB : Branch/Arithmetic', 'CJ : Jump']} -//} - - - diff --git a/src/images/wavedrom/cr-register.edn b/src/images/wavedrom/cr-register.edn new file mode 100644 index 0000000..2eee07f --- /dev/null +++ b/src/images/wavedrom/cr-register.edn @@ -0,0 +1,109 @@ +//# 16 "C" Standard Extension for Compressed Instructions, Version 2.0 +//## 16.2 Compressed Instruction Formats +//Table 16.1: Compressed 16-bit RVC instruction formats. +//### CR : Register + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 5, name: 'rs2' }, + {bits: 5, name: 'rd/rs1' }, + {bits: 4, name: 'funct4' }, + ]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 5, name: 'imm' }, + {bits: 5, name: 'rd/rs1' }, + {bits: 1, name: 'imm' }, + {bits: 3, name: 'funct3' }, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 5, name: 'rs2' }, + {bits: 6, name: 'imm' }, + {bits: 3, name: 'funct3' }, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 3, name: 'rdʹ' }, + {bits: 8, name: 'imm' }, + {bits: 3, name: 'funct3' }, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 3, name: 'rdʹ' }, + {bits: 2, name: 'imm' }, + {bits: 3, name: 'rs1ʹ' }, + {bits: 3, name: 'imm' }, + {bits: 3, name: 'funct3' }, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 3, name: 'rs2ʹ' }, + {bits: 2, name: 'imm' }, + {bits: 3, name: 'rs1ʹ' }, + {bits: 3, name: 'imm' }, + {bits: 3, name: 'funct3' }, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 3, name: 'rs2ʹ' }, + {bits: 2, name: 'funct2' }, + {bits: 3, name: 'rdʹ/rs1ʹ' }, + {bits: 6, name: 'funct6' }, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 5, name: 'offset' }, + {bits: 3, name: 'rdʹ/rs1ʹ' }, + {bits: 3, name: 'offset' }, + {bits: 3, name: 'funct3' }, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 'op' }, + {bits: 11, name: 'jmp trgt' }, + {bits: 3, name: 'funct3' }, +]} +.... + +//the following configuration broke the build. +//config: { +// hflip: true, +// compact: true, +// bits: 16 * 9, lanes: 9, +// margin: {right: width / 4}, +// label: {right: ['CR : Register', 'CI : Immediate', 'CSS : Stack-relative Store', 'CIW : Wide Immediate', 'CL : Load', 'CS //: Store', 'CA : //Arithmetic', 'CB : Branch/Arithmetic', 'CJ : Jump']} +//} diff --git a/src/images/wavedrom/cr-registers-new.adoc b/src/images/wavedrom/cr-registers-new.adoc deleted file mode 100644 index 46a34e6..0000000 --- a/src/images/wavedrom/cr-registers-new.adoc +++ /dev/null @@ -1,62 +0,0 @@ -[wavedrom, ,svg] -.... -### CR : Register -${wd({reg: [ - {bits: 2, name: 'op', type: 8}, - {bits: 5, name: 'rs2', type: 4}, - {bits: 5, name: 'rd / rs1ʹ, type: 7}, - {bits: 4, name: 'funct4', type: 8}, - - {bits: 2, name: 'op', type: 8}, - {bits: 5, name: 'imm', type: 3}, - {bits: 5, name: 'rd / rs1', type: 7}, - {bits: 1, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, - - {bits: 2, name: 'op', type: 8}, - {bits: 5, name: 'rs2', type: 4}, - {bits: 6, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, - - {bits: 2, name: 'op', type: 8}, - {bits: 3, name: 'rdʹ', type: 2}, - {bits: 8, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, - - {bits: 2, name: 'op', type: 8}, - {bits: 3, name: 'rdʹ', type: 2}, - {bits: 2, name: 'imm', type: 3}, - {bits: 3, name: 'rs1ʹ', type: 4}, - {bits: 3, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, - - {bits: 2, name: 'op', type: 8}, - {bits: 3, name: 'rs2ʹ', type: 4}, - {bits: 2, name: 'imm', type: 3}, - {bits: 3, name: 'rs1ʹ', type: 4}, - {bits: 3, name: 'imm', type: 3}, - {bits: 3, name: 'funct3', type: 8}, - - {bits: 2, name: 'op', type: 8}, - {bits: 3, name: 'rs2ʹ', type: 4}, - {bits: 2, name: 'funct2', type: 8}, - {bits: 3, name: 'rd` / rs1ʹ', type: 7}, - {bits: 6, name: 'funct6', type: 8}, - - {bits: 2, name: 'op', type: 8}, - {bits: 5, name: 'offset', type: 3}, - {bits: 3, name: 'rd` / rs1ʹ', type: 7}, - {bits: 3, name: 'offset', type: 3}, - {bits: 3, name: 'funct3', type: 8}, - - {bits: 2, name: 'op', type: 8}, - {bits: 11, name: 'jump target', type: 3}, - {bits: 3, name: 'funct3', type: 8}, -], config: { - hflip: true, - compact: true, - bits: 16 * 9, lanes: 9, - margin: {right: width / 4}, - label: {right: ['CR : Register', 'CI : Immediate', 'CSS : Stack-relative Store', 'CIW : Wide Immediate', 'CL : Load', 'CS : Store', 'CA : Arithmetic', 'CB : Branch/Arithmetic', 'CJ : Jump']} -}})} -.... diff --git a/src/images/wavedrom/csr-instr.adoc b/src/images/wavedrom/csr-instr.edn index 93022be..dc50b27 100644 --- a/src/images/wavedrom/csr-instr.adoc +++ b/src/images/wavedrom/csr-instr.edn @@ -4,21 +4,21 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM', 'SYSTEM', 'SYSTEM', 'SYSTEM', 'SYSTEM', 'SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest', 'dest', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'CSRRW', 'CSRRS', 'CSRRC', 'CSRRWI', 'CSRRSI', 'CSRRCI'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'source', 'source', 'source', 'uimm[4:0]', 'uimm[4:0]', 'uimm[4:0]'], type: 4}, - {bits: 12, name: 'csr', attr: ['12', 'source/dest', 'source/dest', 'source/dest', 'source/dest', 'source/dest', 'source/dest'], type: 4}, + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM', 'SYSTEM', 'SYSTEM', 'SYSTEM', 'SYSTEM', 'SYSTEM'] }, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest', 'dest', 'dest', 'dest'] }, + {bits: 3, name: 'funct3', attr: ['3', 'CSRRW', 'CSRRS', 'CSRRC', 'CSRRWI', 'CSRRSI', 'CSRRCI'] }, + {bits: 5, name: 'rs1', attr: ['5', 'source', 'source', 'source', 'uimm[4:0]', 'uimm[4:0]', 'uimm[4:0]'] }, + {bits: 12, name: 'csr', attr: ['12', 'source/dest', 'source/dest', 'source/dest', 'source/dest', 'source/dest', 'source/dest'], }, ]} .... //[wavedrom, ,] //.... //{reg: [ -// {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM','SYSTEM','SYSTEM'], type: 8}, -// {bits: 5, name: 'rd', attr: ['3', 'dest','dest', 'dest' ], type: 2}, -// {bits: 3, name: 'funct3', attr: ['3', 'CSRRWI', 'CSRRSI', 'CSRRCI'], type: 8}, -// {bits: 5, name: 'rs1', attr: ['5', 'uimm[4:0]','uimm[4:0]', 'uimm[4:0]'], type: 3}, -// {bits: 12, name: 'csr', attr: ['12', 'source/dest','source/dest','source/dest'], type: 4}, +// {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM','SYSTEM','SYSTEM'] }, +// {bits: 5, name: 'rd', attr: ['3', 'dest','dest', 'dest' ] }, +// {bits: 3, name: 'funct3', attr: ['3', 'CSRRWI', 'CSRRSI', 'CSRRCI'] }, +// {bits: 5, name: 'rs1', attr: ['5', 'uimm[4:0]','uimm[4:0]', 'uimm[4:0]'] }, +// {bits: 12, name: 'csr', attr: ['12', 'source/dest','source/dest','source/dest'] }, //]} //.... diff --git a/src/images/wavedrom/ct-conditional.adoc b/src/images/wavedrom/ct-conditional.edn index b886d7c..e021907 100644 --- a/src/images/wavedrom/ct-conditional.adoc +++ b/src/images/wavedrom/ct-conditional.edn @@ -3,11 +3,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'BRANCH', 'BRANCH', 'BRANCH'], type: 8}, - {bits: 5, name: 'imm[4:1|11]', attr: ['5', 'offset[4:1|11]', 'offset[4:1|11]', 'offset[4:1|11]'], type: 3}, - {bits: 3, name: 'funct3', attr: ['3', 'BEQ/BNE', 'BLT[U]', 'BGE[U]'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src1', 'src1', 'src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'src2','src2', 'src2'], type: 4}, - {bits: 7, name: 'imm[12|10:5]', attr: ['7', 'offset[12|10:5]', 'offset[12|10:5]', 'offset[12|10:5]'], type: 3}, + {bits: 7, name: 'opcode', attr: ['7', 'BRANCH', 'BRANCH', 'BRANCH'] }, + {bits: 5, name: 'imm[4:1|11]', attr: ['5', 'offset[4:1|11]', 'offset[4:1|11]', 'offset[4:1|11]'] }, + {bits: 3, name: 'funct3', attr: ['3', 'BEQ/BNE', 'BLT[U]', 'BGE[U]'] }, + {bits: 5, name: 'rs1', attr: ['5', 'src1', 'src1', 'src1'] }, + {bits: 5, name: 'rs2', attr: ['5', 'src2','src2', 'src2'] }, + {bits: 7, name: 'imm[12|10:5]', attr: ['7', 'offset[12|10:5]', 'offset[12|10:5]', 'offset[12|10:5]'] }, ], config:{fontsize: 10}} .... diff --git a/src/images/wavedrom/ct-unconditional-2.adoc b/src/images/wavedrom/ct-unconditional-2.adoc deleted file mode 100644 index 4dda824..0000000 --- a/src/images/wavedrom/ct-unconditional-2.adoc +++ /dev/null @@ -1,12 +0,0 @@ -//ct-unconditional-2 - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'JALR'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', '0'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'base'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]'], type: 3}, -]} -.... diff --git a/src/images/wavedrom/ct-unconditional-2.edn b/src/images/wavedrom/ct-unconditional-2.edn new file mode 100644 index 0000000..95f103e --- /dev/null +++ b/src/images/wavedrom/ct-unconditional-2.edn @@ -0,0 +1,12 @@ +//ct-unconditional-2 + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'JALR'] }, + {bits: 5, name: 'rd', attr: ['5', 'dest'] }, + {bits: 3, name: 'funct3', attr: ['3', '0'] }, + {bits: 5, name: 'rs1', attr: ['5', 'base'] }, + {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]'] }, +]} +.... diff --git a/src/images/wavedrom/ct-unconditional.adoc b/src/images/wavedrom/ct-unconditional.adoc deleted file mode 100644 index 756108f..0000000 --- a/src/images/wavedrom/ct-unconditional.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 2.5 Control Transfer Instructions -//### Unconditional Jumps - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'JAL'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest'], type: 2}, - {bits: 8, name: 'imm[19:12]', attr: ['8'], type: 3}, - {bits: 1, name: '[11]', attr: ['1'], type: 3}, - {bits: 10, name: 'imm[10:1]', attr: ['10', 'offset[20:1]'], type: 3}, - {bits: 1, name: '[20]', attr: ['1'], type: 3}, -], config:{fontsize: 12}} -.... - diff --git a/src/images/wavedrom/ct-unconditional.edn b/src/images/wavedrom/ct-unconditional.edn new file mode 100644 index 0000000..bc2213b --- /dev/null +++ b/src/images/wavedrom/ct-unconditional.edn @@ -0,0 +1,14 @@ +//## 2.5 Control Transfer Instructions +//### Unconditional Jumps + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'JAL']}, + {bits: 5, name: 'rd', attr: ['5', 'dest']}, + {bits: 8, name: 'imm[19:12]', attr: ['8']}, + {bits: 1, name: '[11]', attr: ['1']}, + {bits: 10, name: 'imm[10:1]', attr: ['10', 'offset[20:1]']}, + {bits: 1, name: '[20]', attr: ['1']}, +], config:{fontsize: 12}} +.... diff --git a/src/images/wavedrom/d-xwwx.adoc b/src/images/wavedrom/d-xwwx.adoc deleted file mode 100644 index 5965715..0000000 --- a/src/images/wavedrom/d-xwwx.adoc +++ /dev/null @@ -1,19 +0,0 @@ -//xw-wx - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','000','000'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','0','0'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','D','D'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FMV.X.D','FMV.D.X'], type: 8}, -]} -.... - - - - - diff --git a/src/images/wavedrom/d-xwwx.edn b/src/images/wavedrom/d-xwwx.edn new file mode 100644 index 0000000..cc28715 --- /dev/null +++ b/src/images/wavedrom/d-xwwx.edn @@ -0,0 +1,14 @@ +//xw-wx + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','000','000']}, + {bits: 5, name: 'rs1', attr: ['5','src','src']}, + {bits: 5, name: 'rs2', attr: ['5','0','0']}, + {bits: 2, name: 'fmt', attr: ['2','D','D']}, + {bits: 5, name: 'funct5', attr: ['5','FMV.X.D','FMV.D.X']}, +]} +.... diff --git a/src/images/wavedrom/division-op.adoc b/src/images/wavedrom/division-op.adoc deleted file mode 100644 index fabdac1..0000000 --- a/src/images/wavedrom/division-op.adoc +++ /dev/null @@ -1,25 +0,0 @@ -//## 8.2 Division Operations - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP', 'OP-32'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3','DIV[U]/REM[U]', 'DIV[U]W/REM[U]W'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'dividend', 'dividend'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'divisor', 'divisor'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 'MULDIV', 'MULDIV'], type: 8}, -]} -.... - -//[wavedrom, ,svg] -//.... -//{reg: [ -// {bits: 7, name: 'opcode', attr: 'OP-32', type: 8}, -// {bits: 5, name: 'rd', attr: 'dest', type: 2}, -// {bits: 3, name: 'funct3', attr: ['DIVW', 'DIVUW', 'REMW', 'REMUW'], type: 8}, -// {bits: 5, name: 'rs1', attr: 'dividend', type: 4}, -// {bits: 5, name: 'rs2', attr: 'divisor', type: 4}, -// {bits: 7, name: 'funct7', attr: 'MULDIV', type: 8}, -//]} -//.... diff --git a/src/images/wavedrom/division-op.edn b/src/images/wavedrom/division-op.edn new file mode 100644 index 0000000..0dff0e3 --- /dev/null +++ b/src/images/wavedrom/division-op.edn @@ -0,0 +1,25 @@ +//## 8.2 Division Operations + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'OP', 'OP-32']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3','DIV[U]/REM[U]', 'DIV[U]W/REM[U]W']}, + {bits: 5, name: 'rs1', attr: ['5', 'dividend', 'dividend']}, + {bits: 5, name: 'rs2', attr: ['5', 'divisor', 'divisor']}, + {bits: 7, name: 'funct7', attr: ['7', 'MULDIV', 'MULDIV']}, +]} +.... + +//[wavedrom, ,svg] +//.... +//{reg: [ +// {bits: 7, name: 'opcode', attr: 'OP-32'}, +// {bits: 5, name: 'rd', attr: 'dest'}, +// {bits: 3, name: 'funct3', attr: ['DIVW', 'DIVUW', 'REMW', 'REMUW']}, +// {bits: 5, name: 'rs1', attr: 'dividend'}, +// {bits: 5, name: 'rs2', attr: 'divisor'}, +// {bits: 7, name: 'funct7', attr: 'MULDIV'}, +//]} +//.... diff --git a/src/images/wavedrom/double-fl-class.adoc b/src/images/wavedrom/double-fl-class.adoc deleted file mode 100644 index 143ff5e..0000000 --- a/src/images/wavedrom/double-fl-class.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 13.7 Double-Precision Floating-Point Classify Instruction - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','1'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','0'], type: 8}, - {bits: 2, name: 'fmt', attr: ['2','D'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCLASS'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/double-fl-class.edn b/src/images/wavedrom/double-fl-class.edn new file mode 100644 index 0000000..664ff9d --- /dev/null +++ b/src/images/wavedrom/double-fl-class.edn @@ -0,0 +1,14 @@ +//## 13.7 Double-Precision Floating-Point Classify Instruction + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','1']}, + {bits: 5, name: 'rs1', attr: ['5','src']}, + {bits: 5, name: 'rs2', attr: ['5','0']}, + {bits: 2, name: 'fmt', attr: ['2','D']}, + {bits: 5, name: 'funct5', attr: ['5','FCLASS']}, +]} +.... diff --git a/src/images/wavedrom/double-fl-compare.adoc b/src/images/wavedrom/double-fl-compare.adoc deleted file mode 100644 index 8403734..0000000 --- a/src/images/wavedrom/double-fl-compare.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 13.6 Double-Precision Floating-Point Compare Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','EQ/LT/LE'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','D'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCMP'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/double-fl-compare.edn b/src/images/wavedrom/double-fl-compare.edn new file mode 100644 index 0000000..bd381c8 --- /dev/null +++ b/src/images/wavedrom/double-fl-compare.edn @@ -0,0 +1,14 @@ +//## 13.6 Double-Precision Floating-Point Compare Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','EQ/LT/LE']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','D']}, + {bits: 5, name: 'funct5', attr: ['5','FCMP']}, +]} +.... diff --git a/src/images/wavedrom/double-fl-compute.adoc b/src/images/wavedrom/double-fl-compute.adoc deleted file mode 100644 index 4ce3b71..0000000 --- a/src/images/wavedrom/double-fl-compute.adoc +++ /dev/null @@ -1,54 +0,0 @@ -//## 13.4 Double-Precision Floating-Point Computational Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM','MIN/MAX','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1','src1','src1','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2','src2','src2','0'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','D','D','D','D'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FADD/FSUB', 'FMUL/FDIV', 'FMIN-MAX', 'FSQRT'], type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','F[N]MADD/F[N]MSUB'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','D'], type: 8}, - {bits: 5, name: 'rs3', attr: ['5','src3'], type: 8}, -]} -.... - -//[wavedrom, ,] -//.... -//{reg: [ -// {bits: 7, name: 'opcode', attr: 'OP-FP', type: 8}, -// {bits: 5, name: 'rd', attr: 'dest', type: 2}, -// {bits: 3, name: 'funct3', attr: ['MIN', 'MAX'], type: 8}, -// {bits: 5, name: 'rs1', attr: 'src1', type: 4}, -// {bits: 5, name: 'rs2', attr: 'src2', type: 4}, -// {bits: 2, name: 'fmt', attr: 'D', type: 8}, -// {bits: 5, name: 'funct5', attr: 'FMIN-MAX', type: 8}, -//]} -//.... - -//[wavedrom, ,] -//.... -//{reg: [ -// {bits: 7, name: 'opcode', attr: ['FMADD', 'FNMADD', 'FMSUB', 'FNMSUB'], type: 8}, -// {bits: 5, name: 'rd', attr: 'dest', type: 2}, -// {bits: 3, name: 'funct3', attr: 'RM', type: 8}, -// {bits: 5, name: 'rs1', attr: 'src1', type: 4}, -// {bits: 5, name: 'rs2', attr: 'src2', type: 4}, -// {bits: 2, name: 'fmt', attr: 'D', type: 8}, -// {bits: 5, name: 'rs3', attr: 'src3', type: 4}, -//]} -//.... - diff --git a/src/images/wavedrom/double-fl-compute.edn b/src/images/wavedrom/double-fl-compute.edn new file mode 100644 index 0000000..d074fea --- /dev/null +++ b/src/images/wavedrom/double-fl-compute.edn @@ -0,0 +1,53 @@ +//## 13.4 Double-Precision Floating-Point Computational Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM','MIN/MAX','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src1','src1','src1','src']}, + {bits: 5, name: 'rs2', attr: ['5','src2','src2','src2','0']}, + {bits: 2, name: 'fmt', attr: ['2','D','D','D','D']}, + {bits: 5, name: 'funct5', attr: ['5','FADD/FSUB', 'FMUL/FDIV', 'FMIN-MAX', 'FSQRT']}, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','F[N]MADD/F[N]MSUB']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','D']}, + {bits: 5, name: 'rs3', attr: ['5','src3']}, +]} +.... + +//[wavedrom, ,] +//.... +//{reg: [ +// {bits: 7, name: 'opcode', attr: 'OP-FP'}, +// {bits: 5, name: 'rd', attr: 'dest'}, +// {bits: 3, name: 'funct3', attr: ['MIN', 'MAX']}, +// {bits: 5, name: 'rs1', attr: 'src1'}, +// {bits: 5, name: 'rs2', attr: 'src2'}, +// {bits: 2, name: 'fmt', attr: 'D'}, +// {bits: 5, name: 'funct5', attr: 'FMIN-MAX'}, +//]} +//.... + +//[wavedrom, ,] +//.... +//{reg: [ +// {bits: 7, name: 'opcode', attr: ['FMADD', 'FNMADD', 'FMSUB', 'FNMSUB']}, +// {bits: 5, name: 'rd', attr: 'dest'}, +// {bits: 3, name: 'funct3', attr: 'RM'}, +// {bits: 5, name: 'rs1', attr: 'src1'}, +// {bits: 5, name: 'rs2', attr: 'src2'}, +// {bits: 2, name: 'fmt', attr: 'D'}, +// {bits: 5, name: 'rs3', attr: 'src3'}, +//]} +//.... diff --git a/src/images/wavedrom/double-fl-convert-mv.adoc b/src/images/wavedrom/double-fl-convert-mv.adoc deleted file mode 100644 index fb23b08..0000000 --- a/src/images/wavedrom/double-fl-convert-mv.adoc +++ /dev/null @@ -1,16 +0,0 @@ -//## 13.5 Double-Precision Floating-Point Conversion and Move Instructions - - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','W[U]/L[U]','W[U]/L[U]'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','D','D'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCVT.int.D','FCVT.D.int'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/double-fl-convert-mv.edn b/src/images/wavedrom/double-fl-convert-mv.edn new file mode 100644 index 0000000..711ce0c --- /dev/null +++ b/src/images/wavedrom/double-fl-convert-mv.edn @@ -0,0 +1,15 @@ +//## 13.5 Double-Precision Floating-Point Conversion and Move Instructions + + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src','src']}, + {bits: 5, name: 'rs2', attr: ['5','W[U]/L[U]','W[U]/L[U]']}, + {bits: 2, name: 'fmt', attr: ['2','D','D']}, + {bits: 5, name: 'funct5', attr: ['5','FCVT.int.D','FCVT.D.int']}, +]} +.... diff --git a/src/images/wavedrom/double-ls.adoc b/src/images/wavedrom/double-ls.adoc deleted file mode 100644 index 0c6f4dd..0000000 --- a/src/images/wavedrom/double-ls.adoc +++ /dev/null @@ -1,28 +0,0 @@ -//# "D" Standard Extension for Double-Precision Floating-Point, Version 2.2 -//## 13.3 Double-Precision Load and Store Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','LOAD-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'width', attr: ['3','D'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','base'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12','offset[11:0]'], type: 3}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','STORE-FP'], type: 8}, - {bits: 5, name: 'imm[4:0]', attr: ['5','offset[4:0]'], type: 3}, - {bits: 3, name: 'width', attr: ['3','D'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','base'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src'], type: 4}, - {bits: 7, name: 'imm[11:5]', attr: ['7','offset[11:5]'], type: 3}, -]} -.... - - - diff --git a/src/images/wavedrom/double-ls.edn b/src/images/wavedrom/double-ls.edn new file mode 100644 index 0000000..97306a9 --- /dev/null +++ b/src/images/wavedrom/double-ls.edn @@ -0,0 +1,25 @@ +//# "D" Standard Extension for Double-Precision Floating-Point, Version 2.2 +//## 13.3 Double-Precision Load and Store Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','LOAD-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'width', attr: ['3','D']}, + {bits: 5, name: 'rs1', attr: ['5','base']}, + {bits: 12, name: 'imm[11:0]', attr: ['12','offset[11:0]']}, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','STORE-FP']}, + {bits: 5, name: 'imm[4:0]', attr: ['5','offset[4:0]']}, + {bits: 3, name: 'width', attr: ['3','D']}, + {bits: 5, name: 'rs1', attr: ['5','base']}, + {bits: 5, name: 'rs2', attr: ['5','src']}, + {bits: 7, name: 'imm[11:5]', attr: ['7','offset[11:5]']}, +]} +.... diff --git a/src/images/wavedrom/env-call-breakpoint.edn b/src/images/wavedrom/env-call-breakpoint.edn new file mode 100644 index 0000000..5814faf --- /dev/null +++ b/src/images/wavedrom/env-call-breakpoint.edn @@ -0,0 +1,12 @@ +//## 2.8 Environment Call and Breakpoints + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5', '0', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'PRIV', 'PRIV']}, + {bits: 5, name: 'rs1', attr: ['5', '0', '0']}, + {bits: 12, name: 'func12', attr: ['12', 'ECALL', 'EBREAK']}, +]} +.... diff --git a/src/images/wavedrom/env_call-breakpoint.adoc b/src/images/wavedrom/env_call-breakpoint.adoc deleted file mode 100644 index 7812687..0000000 --- a/src/images/wavedrom/env_call-breakpoint.adoc +++ /dev/null @@ -1,12 +0,0 @@ -//## 2.8 Environment Call and Breakpoints - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM', 'SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'PRIV', 'PRIV'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', '0', '0'], type: 4}, - {bits: 12, name: 'func12', attr: ['12', 'ECALL', 'EBREAK'], type: 8}, -]} -.... diff --git a/src/images/wavedrom/fcvt-sd-ds.adoc b/src/images/wavedrom/fcvt-sd-ds.adoc deleted file mode 100644 index 5b68a54..0000000 --- a/src/images/wavedrom/fcvt-sd-ds.adoc +++ /dev/null @@ -1,16 +0,0 @@ -//FCVT.S.D and FCVT.D.S - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','D', 'S'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','S','D'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCVT.S.D', 'FCVT.D.S'], type: 8}, -]} -.... - - diff --git a/src/images/wavedrom/fcvt-sd-ds.edn b/src/images/wavedrom/fcvt-sd-ds.edn new file mode 100644 index 0000000..dda6234 --- /dev/null +++ b/src/images/wavedrom/fcvt-sd-ds.edn @@ -0,0 +1,14 @@ +//FCVT.S.D and FCVT.D.S + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src','src']}, + {bits: 5, name: 'rs2', attr: ['5','D', 'S']}, + {bits: 2, name: 'fmt', attr: ['2','S','D']}, + {bits: 5, name: 'funct5', attr: ['5','FCVT.S.D', 'FCVT.D.S']}, +]} +.... diff --git a/src/images/wavedrom/float-csr.adoc b/src/images/wavedrom/float-csr.adoc deleted file mode 100644 index 7b2cf24..0000000 --- a/src/images/wavedrom/float-csr.adoc +++ /dev/null @@ -1,17 +0,0 @@ -//# "F" Standard Extension for Single-Precision Floating-Point, Version 2.2 -//## 12.2 Floating-Point Control and Status Register -//### Figure 12.2: Floating-point control and status register. - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: 'NX', attr: ['1'], type: 5}, - {bits: 1, name: 'UF', attr: ['1'], type: 5}, - {bits: 1, name: 'OF', attr: ['1'], type: 5}, - {bits: 1, name: 'DZ', attr: ['1'], type: 5}, - {bits: 1, name: 'NV', attr: ['1'], type: 5}, - {bits: 3, name: 'Rounding Mode', attr:['3'], type: 6}, - {bits: 24, name: 'Reserved', attr:['24'], type: 7}, -], config: {fontsize: 10}} -.... - diff --git a/src/images/wavedrom/float-csr.edn b/src/images/wavedrom/float-csr.edn new file mode 100644 index 0000000..ed51932 --- /dev/null +++ b/src/images/wavedrom/float-csr.edn @@ -0,0 +1,16 @@ +//# "F" Standard Extension for Single-Precision Floating-Point, Version 2.2 +//## 12.2 Floating-Point Control and Status Register +//### Figure 12.2: Floating-point control and status register. + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: 'NX', attr: ['1']}, + {bits: 1, name: 'UF', attr: ['1']}, + {bits: 1, name: 'OF', attr: ['1']}, + {bits: 1, name: 'DZ', attr: ['1']}, + {bits: 1, name: 'NV', attr: ['1']}, + {bits: 3, name: 'Rounding Mode', attr:['3']}, + {bits: 24, name: 'Reserved', attr:['24']}, +], config: {fontsize: 10}} +.... diff --git a/src/images/wavedrom/flt-pt-to-int-move.adoc b/src/images/wavedrom/flt-pt-to-int-move.adoc deleted file mode 100644 index fc2a95a..0000000 --- a/src/images/wavedrom/flt-pt-to-int-move.adoc +++ /dev/null @@ -1,14 +0,0 @@ -// 16.3 Instructions for moving bit patterns between floating-point and integer registers. - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','000','000'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','0','0'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','H','H'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FMV.X.H','FMV.H.X'], type: 8}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/flt-pt-to-int-move.edn b/src/images/wavedrom/flt-pt-to-int-move.edn new file mode 100644 index 0000000..ed33285 --- /dev/null +++ b/src/images/wavedrom/flt-pt-to-int-move.edn @@ -0,0 +1,14 @@ +// 16.3 Instructions for moving bit patterns between floating-point and integer registers. + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','000','000']}, + {bits: 5, name: 'rs1', attr: ['5','src','src']}, + {bits: 5, name: 'rs2', attr: ['5','0','0']}, + {bits: 2, name: 'fmt', attr: ['2','H','H']}, + {bits: 5, name: 'funct5', attr: ['5','FMV.X.H','FMV.H.X']}, +]} +.... diff --git a/src/images/wavedrom/flt-to-flt-sgn-inj-instr.adoc b/src/images/wavedrom/flt-to-flt-sgn-inj-instr.adoc deleted file mode 100644 index 43250a4..0000000 --- a/src/images/wavedrom/flt-to-flt-sgn-inj-instr.adoc +++ /dev/null @@ -1,14 +0,0 @@ -// 16.3 Floating point to floating point sign injection instructions. - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3','J[N]/JX'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','H'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FSGNJ'], type: 8}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/flt-to-flt-sgn-inj-instr.edn b/src/images/wavedrom/flt-to-flt-sgn-inj-instr.edn new file mode 100644 index 0000000..60480cf --- /dev/null +++ b/src/images/wavedrom/flt-to-flt-sgn-inj-instr.edn @@ -0,0 +1,14 @@ +// 16.3 Floating point to floating point sign injection instructions. + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'funct3', attr: ['3','J[N]/JX']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','H']}, + {bits: 5, name: 'funct5', attr: ['5','FSGNJ']}, +]} +.... diff --git a/src/images/wavedrom/fnmaddsub.adoc b/src/images/wavedrom/fnmaddsub.adoc deleted file mode 100644 index e8bda1b..0000000 --- a/src/images/wavedrom/fnmaddsub.adoc +++ /dev/null @@ -1,16 +0,0 @@ - -//FNMSUP and FNMADD - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['FMADD', 'FNMADD', 'FMSUB', 'FNMSUB'], type: 8}, - {bits: 5, name: 'rd', attr: 'dest', type: 2}, - {bits: 3, name: 'funct3', attr: 'RM', type: 8}, - {bits: 5, name: 'rs1', attr: 'src1', type: 4}, - {bits: 5, name: 'rs2', attr: 'src2', type: 4}, - {bits: 2, name: 'fmt', attr: 'S', type: 8}, - {bits: 5, name: 'rs3', attr: 'src3', type: 4}, -]} -.... - diff --git a/src/images/wavedrom/fsjgnjnx-d.adoc b/src/images/wavedrom/fsjgnjnx-d.adoc deleted file mode 100644 index fff7808..0000000 --- a/src/images/wavedrom/fsjgnjnx-d.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//FSGNJ.D, FSGNJN.D, and FSGNJX.D - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','J[N]/JX'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','D'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FSGNJ'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/fsjgnjnx-d.edn b/src/images/wavedrom/fsjgnjnx-d.edn new file mode 100644 index 0000000..02ab6a7 --- /dev/null +++ b/src/images/wavedrom/fsjgnjnx-d.edn @@ -0,0 +1,14 @@ +//FSGNJ.D, FSGNJN.D, and FSGNJX.D + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','J[N]/JX']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','D']}, + {bits: 5, name: 'funct5', attr: ['5','FSGNJ']}, +]} +.... diff --git a/src/images/wavedrom/half-ls.adoc b/src/images/wavedrom/half-ls.adoc deleted file mode 100644 index fb26d9b..0000000 --- a/src/images/wavedrom/half-ls.adoc +++ /dev/null @@ -1,14 +0,0 @@ -//## 15.1 Half-Precision Load and Store Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: 'LOAD-FP', type: 8}, - {bits: 5, name: 'rd', attr: 'dest', type: 2}, - {bits: 3, name: 'width', attr: 'H', type: 8}, - {bits: 5, name: 'rs1', attr: 'base', type: 4}, - {bits: 12, name: 'imm[11:0]', attr: 'offset', type: 3}, -]} - -.... - diff --git a/src/images/wavedrom/half-ls.edn b/src/images/wavedrom/half-ls.edn new file mode 100644 index 0000000..be24c0d --- /dev/null +++ b/src/images/wavedrom/half-ls.edn @@ -0,0 +1,13 @@ +//## 15.1 Half-Precision Load and Store Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: 'LOAD-FP'}, + {bits: 5, name: 'rd', attr: 'dest'}, + {bits: 3, name: 'width', attr: 'H'}, + {bits: 5, name: 'rs1', attr: 'base'}, + {bits: 12, name: 'imm[11:0]', attr: 'offset'}, +]} + +.... diff --git a/src/images/wavedrom/half-pr-flt-pt-class.adoc b/src/images/wavedrom/half-pr-flt-pt-class.adoc deleted file mode 100644 index 5490f5e..0000000 --- a/src/images/wavedrom/half-pr-flt-pt-class.adoc +++ /dev/null @@ -1,14 +0,0 @@ -//## 15.5 Half-Precision Floating-Point Classify Instruction - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','001'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','0'], type: 8}, - {bits: 2, name: 'fmt', attr: ['2','H'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCLASS'], type: 8}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/half-pr-flt-pt-class.edn b/src/images/wavedrom/half-pr-flt-pt-class.edn new file mode 100644 index 0000000..e12608c --- /dev/null +++ b/src/images/wavedrom/half-pr-flt-pt-class.edn @@ -0,0 +1,14 @@ +//## 15.5 Half-Precision Floating-Point Classify Instruction + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','001']}, + {bits: 5, name: 'rs1', attr: ['5', 'src']}, + {bits: 5, name: 'rs2', attr: ['5','0']}, + {bits: 2, name: 'fmt', attr: ['2','H']}, + {bits: 5, name: 'funct5', attr: ['5','FCLASS']}, +]} +.... diff --git a/src/images/wavedrom/half-pr-flt-pt-compare.adoc b/src/images/wavedrom/half-pr-flt-pt-compare.adoc deleted file mode 100644 index 78033c1..0000000 --- a/src/images/wavedrom/half-pr-flt-pt-compare.adoc +++ /dev/null @@ -1,14 +0,0 @@ -// 16.4 Half-Precision Floating-Point Compare Instructions. - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','EQ/LT/LE'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','H'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCMP'], type: 8}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/half-pr-flt-pt-compare.edn b/src/images/wavedrom/half-pr-flt-pt-compare.edn new file mode 100644 index 0000000..b05b77f --- /dev/null +++ b/src/images/wavedrom/half-pr-flt-pt-compare.edn @@ -0,0 +1,14 @@ +// 16.4 Half-Precision Floating-Point Compare Instructions. + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','EQ/LT/LE']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','H']}, + {bits: 5, name: 'funct5', attr: ['5','FCMP']}, +]} +.... diff --git a/src/images/wavedrom/half-prec-conv-and-mv.adoc b/src/images/wavedrom/half-prec-conv-and-mv.adoc deleted file mode 100644 index 013f1b9..0000000 --- a/src/images/wavedrom/half-prec-conv-and-mv.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 16.3 Half-Precision Conversion and Move Instructions - - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','W[U]/L[U]','W[U]/L[U]'], type: 3}, - {bits: 2, name: 'fmt', attr: ['2','H', 'H'], type: 2}, - {bits: 5, name: 'funct5', attr: ['5','FCVT.int.H','FCVT.H.int'], type: 8}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/half-prec-conv-and-mv.edn b/src/images/wavedrom/half-prec-conv-and-mv.edn new file mode 100644 index 0000000..443afd3 --- /dev/null +++ b/src/images/wavedrom/half-prec-conv-and-mv.edn @@ -0,0 +1,15 @@ +//## 16.3 Half-Precision Conversion and Move Instructions + + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src','src']}, + {bits: 5, name: 'rs2', attr: ['5','W[U]/L[U]','W[U]/L[U]']}, + {bits: 2, name: 'fmt', attr: ['2','H', 'H']}, + {bits: 5, name: 'funct5', attr: ['5','FCVT.int.H','FCVT.H.int']}, +]} +.... diff --git a/src/images/wavedrom/half-prec-flpt-to-flpt-conv.adoc b/src/images/wavedrom/half-prec-flpt-to-flpt-conv.edn index c42038c..bfd1f8b 100644 --- a/src/images/wavedrom/half-prec-flpt-to-flpt-conv.adoc +++ b/src/images/wavedrom/half-prec-flpt-to-flpt-conv.edn @@ -3,12 +3,12 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM','RM','RM','RM','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src','src','src','src','SRC'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','H','S','H','D','H','Q'], type: 3}, - {bits: 2, name: 'fmt', attr: ['2','S','H','D','H','Q','H'], type: 2}, - {bits: 5, name: 'funct5', attr: ['5','FCVT.S.H','FCVT.H.S','FCVT.D.H','FCVT.H.D','FCVT.Q.H','FCVT.H.Q'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM','RM','RM','RM','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src','src','src','src','src','SRC']}, + {bits: 5, name: 'rs2', attr: ['5','H','S','H','D','H','Q']}, + {bits: 2, name: 'fmt', attr: ['2','S','H','D','H','Q','H']}, + {bits: 5, name: 'funct5', attr: ['5','FCVT.S.H','FCVT.H.S','FCVT.D.H','FCVT.H.D','FCVT.Q.H','FCVT.H.Q']}, ]} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/half-store.adoc b/src/images/wavedrom/half-store.adoc deleted file mode 100644 index fb0d18c..0000000 --- a/src/images/wavedrom/half-store.adoc +++ /dev/null @@ -1,11 +0,0 @@ -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: 'STORE-FP', type: 8}, - {bits: 5, name: 'imm[4:0]', attr: 'offset', type: 3}, - {bits: 3, name: 'width', attr: 'H', type: 8}, - {bits: 5, name: 'rs1', attr: 'base', type: 4}, - {bits: 5, name: 'rs2', attr: 'src', type: 4}, - {bits: 12, name: 'imm[11:5]', attr: 'offset', type: 3}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/hint-nopv_rv32i.adoc b/src/images/wavedrom/hint-nopv_rv32i.adoc deleted file mode 100644 index b26a6d1..0000000 --- a/src/images/wavedrom/hint-nopv_rv32i.adoc +++ /dev/null @@ -1,55 +0,0 @@ -//### RV32I -//These instructions reserved as HINTs in the latest spec: https://github.com/riscv/riscv-isa-manual/releases (2.9) -//{ADDI, SLTI, SLTIU, XORI, ORI, ANDI} x0, ? ( ${ 6 * 1 << 17} ) -[wavedrom, ,svg] -.... -{reg: [ - {name: 'OP-IMM', bits: 7, attr: 0b0010011}, - {name: 'rd', bits: 5, attr: 0}, - {name: 'funct3', bits: 3, attr: ['ADDI', 'SLTI', 'SLTIU', 'XORI', 'ORI', 'ANDI']}, - {bits: 17} -], config: {hspace: width}} -.... -//{SLLI, SRLI, SRAI} x0, ? ( ${ 3 * 1 << 10} ) - -[wavedrom, ,svg] -.... -{reg:[ - {name: 'OP-IMM', bits: 7, attr: 0b0010011}, - {name: 'rd', bits: 5, attr: 0}, - {name: 'funct3', bits: 3, attr: ['SLLI', 'SRLI', 'SRAI']}, - {bits: 10}, - {name: 'imm?', bits: 7, attr: [0, 0, 32]} -], config: {hspace: width}} -.... -//{LUI, AUIPC} x0, ? ( ${ 2 * (1 << 20) } ) - -[wavedrom, ,svg] -.... -{reg:[ - {name: 'opcode', bits: 7, attr: ['AUIPC', 'LUI']}, - {name: 'rd', bits: 5, attr: 0}, - {bits: 20} -], config: {hspace: width}} -.... -//{ADD, SUB, SLL, SLT, SLTU, XOR, SRL, SRA, OR, AND} x0, ?, ? ( ${ 10 * 1 << 10} ) - -[wavedrom, ,svg] -.... -{reg:[ - {name: 'OP', bits: 7, attr: 0b0110011}, - {name: 'rd', bits: 5, attr: 0}, - {name: 'funct3', bits: 3, attr: 'ADD SUB SLL SLT SLTU XOR SRL SRA OR AND'.split(' ', - {bits: 10}, - {name: 'funct7', bits: 7, attr: [0, 0, 0, 0, 0, 0, 32, 32, 0, 0]} -], config: {hspace: width}} -.... - -//RV32I_extra = ( -// 3 * 31 + -// 31 + -// 7 * 31 + -// 3 * 31 + -// 2 * 31 -//) - diff --git a/src/images/wavedrom/hint-nopv_rv64i.adoc b/src/images/wavedrom/hint-nopv_rv64i.adoc deleted file mode 100644 index ee78cf8..0000000 --- a/src/images/wavedrom/hint-nopv_rv64i.adoc +++ /dev/null @@ -1,57 +0,0 @@ -//### RV64I -//These instructions reserved as HINTs in the latest spec: https://github.com/riscv/riscv-isa-manual/releases (4.4) -//All RV32I NOPs plus: -//ADDIW x0, ? ( ${ 1 << 17 } ) -[wavedrom, ,svg] -.... -{reg:[ - {name: 'OP-IMM-32', bits: 7, attr: 0b0011011}, - {name: 'rd', bits: 5, attr: 0}, - {name: 'funct3', bits: 3, attr: 'ADDIW'}, - {bits: 17} -], config: {hspace: width}} -.... -//Extra bit for the shift ammont: -//{SLLI, SRLI, SRAI} x0, ? ( ${ 3 * 1 << 10} ) - -[wavedrom, ,svg] -.... -{reg: [ - {name: 'OP-IMM', bits: 7, attr: 0b0010011}, - {name: 'rd', bits: 5, attr: 0}, - {name: 'funct3', bits: 3, attr: ['SLLI', 'SRLI', 'SRAI']}, - {bits: 10}, - {name: 'imm?', bits: 7, attr: [1, 33, 33]} -], config: {hspace: width}} -.... -//{SLLIW, SRLIW, SRAIW} x0, ?( ${ 3 * 1 << 10} ) - -[wavedrom, ,svg] -.... -{reg:[ - {name: 'OP-IMM-32', bits: 7, attr: 0b0011011}, - {name: 'rd', bits: 5, attr: 0}, - {name: 'funct3', bits: 3, attr: ['SLLIW', 'SRLIW', 'SRAIW']}, - {bits: 10}, - {name: 'imm?', bits: 7, attr: [0, 32, 32]} -], config: {hspace: width}} -.... -//SLL, SLT, SRA ( ??? ) -//{ADDW, SLLW, SRLW, SUBW, SRAW} x0, ?, ? ( ${ 5 * 1 << 10 } ) - -[wavedrom, ,svg] -.... -{reg:[ - {name: 'OP-32', bits: 7, attr: 0b0111011}, - {name: 'rd', bits: 5, attr: 0}, - {name: 'funct3', bits: 3, attr: ['ADDW', 'SLLW', 'SRLW', 'SUBW', 'SRAW']}, - {bits: 10}, - {name: 'funct7', bits: 7, attr: [0, 0, 32, 0, 32]} -], config: {hspace: width}} -.... - -//RV64I_extra = ( -// 4 * 31 + -// 5 * 31 + -// 31 -//` diff --git a/src/images/wavedrom/hinvalgvma.edn b/src/images/wavedrom/hinvalgvma.edn index ab1a0cd..6d1b134 100644 --- a/src/images/wavedrom/hinvalgvma.edn +++ b/src/images/wavedrom/hinvalgvma.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'PRIV'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'gaddr'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'vmid'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 'HINVAL.GVMA'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'PRIV']}, + {bits: 5, name: 'rs1', attr: ['5', 'gaddr']}, + {bits: 5, name: 'rs2', attr: ['5', 'vmid']}, + {bits: 7, name: 'funct7', attr: ['7', 'HINVAL.GVMA']}, ]} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/hinvalvvma.edn b/src/images/wavedrom/hinvalvvma.edn index 0b93b9f..c339d86 100644 --- a/src/images/wavedrom/hinvalvvma.edn +++ b/src/images/wavedrom/hinvalvvma.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'PRIV'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'vaddr'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'asid'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 'HINVAL.VVMA'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'PRIV']}, + {bits: 5, name: 'rs1', attr: ['5', 'vaddr']}, + {bits: 5, name: 'rs2', attr: ['5', 'asid']}, + {bits: 7, name: 'funct7', attr: ['7', 'HINVAL.VVMA']}, ]} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/hypv-mm-fence.edn b/src/images/wavedrom/hypv-mm-fence.edn index 2840b1a..1109c1b 100644 --- a/src/images/wavedrom/hypv-mm-fence.edn +++ b/src/images/wavedrom/hypv-mm-fence.edn @@ -3,11 +3,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', type: 3, attr: ['7', 'SYSTEM', 'SYSTEM']}, - {bits: 5, name: 'rd', type: 5, attr: ['5','0', '0']}, - {bits: 3, name: 'funct3', type: 5, attr: ['3','PRIV', 'PRIV']}, - {bits: 5, name: 'rs1', type: 4, attr: ['5','vaddr', 'gaddr']}, - {bits: 5, name: 'rs2', type: 4, attr: ['5','asid', 'vmid']}, - {bits: 7, name: 'funct7', type: 5, attr: ['7','HFENCE.VVMA', 'HFENCE.GVMA']}, + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5','0', '0']}, + {bits: 3, name: 'funct3', attr: ['3','PRIV', 'PRIV']}, + {bits: 5, name: 'rs1', attr: ['5','vaddr', 'gaddr']}, + {bits: 5, name: 'rs2', attr: ['5','asid', 'vmid']}, + {bits: 7, name: 'funct7', attr: ['7','HFENCE.VVMA', 'HFENCE.GVMA']}, ], config: {bits: 32}} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/hypv-virt-load-and-store.edn b/src/images/wavedrom/hypv-virt-load-and-store.edn index d0e1d9e..0f8b802 100644 --- a/src/images/wavedrom/hypv-virt-load-and-store.edn +++ b/src/images/wavedrom/hypv-virt-load-and-store.edn @@ -3,11 +3,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', type: 3, attr: ['7','SYSTEM', 'SYSTEM', 'SYSTEM']}, - {bits: 5, name: 'rd', type: 5, attr: ['5','dest', 'dest', '0']}, - {bits: 3, name: 'funct3', type: 5, attr: ['3','PRIVM', 'PRIVM', 'PRIVM']}, - {bits: 5, name: 'rs1', type: 4, attr: ['5','addr', 'addr', 'addr']}, - {bits: 5, name: 'rs2', type: 4, attr: ['5','[U]', 'HLVX', 'src']}, - {bits: 7, name: 'funct7', type: 5, attr: ['7','HLV.width', 'HLVX.HU/WU', 'HSV.width']}, + {bits: 7, name: 'opcode', attr: ['7','SYSTEM', 'SYSTEM', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5','dest', 'dest', '0']}, + {bits: 3, name: 'funct3', attr: ['3','PRIVM', 'PRIVM', 'PRIVM']}, + {bits: 5, name: 'rs1', attr: ['5','addr', 'addr', 'addr']}, + {bits: 5, name: 'rs2', attr: ['5','[U]', 'HLVX', 'src']}, + {bits: 7, name: 'funct7', attr: ['7','HLV.width', 'HLVX.HU/WU', 'HSV.width']}, ], config: {bits: 32}} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/i-immediate.edn b/src/images/wavedrom/i-immediate.edn new file mode 100644 index 0000000..adb5622 --- /dev/null +++ b/src/images/wavedrom/i-immediate.edn @@ -0,0 +1,13 @@ +//### Figure 2.4 +//Types of immediate produced by RISC-V instructions. The fields are labeled with the instruction bits used to construct their value. Sign extension always uses inst[31]. +//#### I-immediate + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: '[20]'}, + {bits: 4, name: 'inst[24:21]'}, + {bits: 6, name: 'inst[30:25]'}, + {bits: 21, name: '— inst[31] —'}, +], config:{fontsize: 12, label:{right: 'I-immediate'}}} +.... diff --git a/src/images/wavedrom/immediate_variants.adoc b/src/images/wavedrom/immediate-variants.edn index c1f8335..5fc3a73 100644 --- a/src/images/wavedrom/immediate_variants.adoc +++ b/src/images/wavedrom/immediate-variants.edn @@ -21,7 +21,7 @@ {bits: 5, name: 'rd'}, {bits: 3, name: 'funct3'}, {bits: 5, name: 'rs1'}, - {bits: 12, name: 'imm[11:0]', type: 3}, + {bits: 12, name: 'imm[11:0]'}, ], config: {label: {right: 'I-Type'}}} .... @@ -29,11 +29,11 @@ .... {reg: [ {bits: 7, name: 'opcode'}, - {bits: 5, name: 'imm[4:0]', type: 3}, + {bits: 5, name: 'imm[4:0]'}, {bits: 3, name: 'funct3'}, {bits: 5, name: 'rs1'}, {bits: 5, name: 'rs2'}, - {bits: 7, name: 'imm[11:5]', type: 3} + {bits: 7, name: 'imm[11:5]'} ], config: {label: {right: 'S-Type'}}} .... @@ -41,13 +41,13 @@ .... {reg: [ {bits: 7, name: 'opcode'}, - {bits: 1, name: '[11]', type: 3}, - {bits: 4, name: 'imm[4:1]', type: 3}, + {bits: 1, name: '[11]'}, + {bits: 4, name: 'imm[4:1]'}, {bits: 3, name: 'funct3'}, {bits: 5, name: 'rs1'}, {bits: 5, name: 'rs2'}, - {bits: 6, name: 'imm[10:5]', type: 3}, - {bits: 1, name: '[12]', type: 3} + {bits: 6, name: 'imm[10:5]'}, + {bits: 1, name: '[12]'} ], config: {fontsize: 12, label: {right: 'B-Type'}}} .... @@ -56,7 +56,7 @@ {reg: [ {bits: 7, name: 'opcode'}, {bits: 5, name: 'rd'}, - {bits: 20, name: 'imm[31:12]', type: 3} + {bits: 20, name: 'imm[31:12]'} ], config: {label: {right: 'U-Type'}}} .... @@ -65,11 +65,9 @@ {reg: [ {bits: 7, name: 'opcode'}, {bits: 5, name: 'rd'}, - {bits: 8, name: 'imm[19:12]', type: 3}, - {bits: 1, name: '[11]', type: 3}, - {bits: 10, name: 'imm[10:1]', type: 3}, - {bits: 1, name: '[20]', type: 3} + {bits: 8, name: 'imm[19:12]'}, + {bits: 1, name: '[11]'}, + {bits: 10, name: 'imm[10:1]'}, + {bits: 1, name: '[20]'} ], config: {fontsize: 12, label: {right: 'J-Type'}}} .... - - diff --git a/src/images/wavedrom/immediate.adoc b/src/images/wavedrom/immediate.adoc deleted file mode 100644 index c6fb00d..0000000 --- a/src/images/wavedrom/immediate.adoc +++ /dev/null @@ -1,60 +0,0 @@ -//### Figure 2.4 -//Types of immediate produced by RISC-V instructions. The fields are labeled with the instruction bits used to construct their value. Sign extension always uses inst[31]. -//#### I-immediate - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: '[20]'}, - {bits: 4, name: 'inst[24:21]'}, - {bits: 6, name: 'inst[30:25]'}, - {bits: 21, name: '— inst[31] —', type: 7}, -], config:{fontsize: 12, label:{right: 'I-immediate'}}} -.... -//#### S-immediate - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: '[7]'}, - {bits: 4, name: 'inst[11:8]'}, - {bits: 6, name: 'inst[30:25]'}, - {bits: 21, name: '— inst[31] —', type: 7}, -], config:{fontsize: 12, label:{right: 'S-immediate'}}} -.... -//#### B-immediate - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: '0', type: 5}, - {bits: 4, name: 'inst[11:8]'}, - {bits: 6, name: 'inst[30:25]'}, - {bits: 1, name: '[7]'}, - {bits: 20, name: '— inst[31] —', type: 7}, -], config:{fontsize: 12, label:{right: 'B-immediate'}}} -.... -//#### U-immediate - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 12, name: '0', type: 5}, - {bits: 8, name: 'inst[19:12]'}, - {bits: 11, name: 'inst[30:20]'}, - {bits: 1, name: '[31]', type: 7}, -], config:{fontsize: 12, label:{right: 'U-immediate'}}} -.... -//#### J-immediate - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: '0', type: 5}, - {bits: 4, name: 'inst[24:21]'}, - {bits: 6, name: 'inst[30:25]'}, - {bits: 1, name: '[20]'}, - {bits: 8, name: 'inst[19:12]'}, - {bits: 12, name: '— inst[31] —', type: 7}, -], config:{fontsize: 12, label:{right: 'J-immediate'}}} -.... diff --git a/src/images/wavedrom/immediate.edn b/src/images/wavedrom/immediate.edn new file mode 100644 index 0000000..adb5622 --- /dev/null +++ b/src/images/wavedrom/immediate.edn @@ -0,0 +1,13 @@ +//### Figure 2.4 +//Types of immediate produced by RISC-V instructions. The fields are labeled with the instruction bits used to construct their value. Sign extension always uses inst[31]. +//#### I-immediate + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: '[20]'}, + {bits: 4, name: 'inst[24:21]'}, + {bits: 6, name: 'inst[30:25]'}, + {bits: 21, name: '— inst[31] —'}, +], config:{fontsize: 12, label:{right: 'I-immediate'}}} +.... diff --git a/src/images/wavedrom/immediate_variants2.adoc b/src/images/wavedrom/immediate_variants2.adoc deleted file mode 100644 index 498b282..0000000 --- a/src/images/wavedrom/immediate_variants2.adoc +++ /dev/null @@ -1,56 +0,0 @@ -## 2.3 Immediate Encoding Variants -### Figure 2.3 - -RISC-V base instruction formats showing immediate variants. - -${wd({reg: [ - {bits: 7, name: 'opcode'}, - {bits: 5, name: 'rd'}, - {bits: 3, name: 'func3'}, - {bits: 5, name: 'rs1'}, - {bits: 5, name: 'rs2'}, - {bits: 7, name: 'funct7'} -], config: {label: {right: 'R-Type'}}})} - -${wd({reg: [ - {bits: 7, name: 'opcode'}, - {bits: 5, name: 'rd'}, - {bits: 3, name: 'func3'}, - {bits: 5, name: 'rs1'}, - {bits: 12, name: 'imm[11:0]', type: 3}, -], config: {label: {right: 'I-Type'}}})} - -${wd({reg: [ - {bits: 7, name: 'opcode'}, - {bits: 5, name: 'imm[4:0]', type: 3}, - {bits: 3, name: 'func3'}, - {bits: 5, name: 'rs1'}, - {bits: 5, name: 'rs2'}, - {bits: 7, name: 'imm[11:5]', type: 3} -], config: {label: {right: 'S-Type'}}})} - -${wd({reg: [ - {bits: 7, name: 'opcode'}, - {bits: 1, name: '[11]', type: 3}, - {bits: 4, name: 'imm[4:1]', type: 3}, - {bits: 3, name: 'func3'}, - {bits: 5, name: 'rs1'}, - {bits: 5, name: 'rs2'}, - {bits: 6, name: 'imm[10:5]', type: 3}, - {bits: 1, name: '[12]', type: 3} -], config: {label: {right: 'B-Type'}}})} - -${wd({reg: [ - {bits: 7, name: 'opcode'}, - {bits: 5, name: 'rd'}, - {bits: 20, name: 'imm[31:12]', type: 3} -], config: {label: {right: 'U-Type'}}})} - -${wd({reg: [ - {bits: 7, name: 'opcode'}, - {bits: 5, name: 'rd'}, - {bits: 8, name: 'imm[19:12]', type: 3}, - {bits: 1, name: '[11]', type: 3}, - {bits: 10, name: 'imm[10:1]', type: 3}, - {bits: 1, name: '[20]', type: 3} -], config: {label: {right: 'J-Type'}}})}
\ No newline at end of file diff --git a/src/images/wavedrom/instruction-formats.edn b/src/images/wavedrom/instruction-formats.edn new file mode 100644 index 0000000..5cd5f8c --- /dev/null +++ b/src/images/wavedrom/instruction-formats.edn @@ -0,0 +1,47 @@ +//### Figure 2.2 + +//RISC-V base instruction formats. Each immediate subfield is labeled with the bit position (imm[x]) in the immediate value being produced, rather than the bit position within the instruction’s immediate field as is usually done. + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode'}, + {bits: 5, name: 'rd'}, + {bits: 3, name: 'funct3'}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'rs2'}, + {bits: 7, name: 'funct7'} +], config: {label: {right: 'R-Type'}}} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode'}, + {bits: 5, name: 'rd'}, + {bits: 3, name: 'funct3'}, + {bits: 5, name: 'rs1'}, + {bits: 12, name: 'imm[11:0]'}, +], config: {label: {right: 'I-Type'}}} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode'}, + {bits: 5, name: 'imm[4:0]'}, + {bits: 3, name: 'funct3'}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'rs2'}, + {bits: 7, name: 'imm[11:5]'} +], config: {label: {right: 'S-Type'}}} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode'}, + {bits: 5, name: 'rd'}, + {bits: 20, name: 'imm[31:12]'} +], config: {label: {right: 'U-Type'}}} +.... diff --git a/src/images/wavedrom/instruction_formats.adoc b/src/images/wavedrom/instruction_formats.adoc deleted file mode 100644 index 442e27d..0000000 --- a/src/images/wavedrom/instruction_formats.adoc +++ /dev/null @@ -1,48 +0,0 @@ -//### Figure 2.2 - -//RISC-V base instruction formats. Each immediate subfield is labeled with the bit position (imm[x]) in the immediate value being produced, rather than the bit position within the instruction’s immediate field as is usually done. - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', type: 8}, - {bits: 5, name: 'rd', type: 2}, - {bits: 3, name: 'funct3', type: 8}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'rs2', type: 4}, - {bits: 7, name: 'funct7', type: 8} -], config: {label: {right: 'R-Type'}}} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', type: 8}, - {bits: 5, name: 'rd', type: 2}, - {bits: 3, name: 'funct3', type: 8}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 12, name: 'imm[11:0]', type: 3}, -], config: {label: {right: 'I-Type'}}} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', type: 8}, - {bits: 5, name: 'imm[4:0]', type: 3}, - {bits: 3, name: 'funct3', type: 8}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'rs2', type: 4}, - {bits: 7, name: 'imm[11:5]', type: 3} -], config: {label: {right: 'S-Type'}}} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', type: 8}, - {bits: 5, name: 'rd', type: 2}, - {bits: 20, name: 'imm[31:12]', type: 3} -], config: {label: {right: 'U-Type'}}} -.... - diff --git a/src/images/wavedrom/int-comp-lui-aiupc.adoc b/src/images/wavedrom/int-comp-lui-aiupc.edn index c3dbf95..dfb77d1 100644 --- a/src/images/wavedrom/int-comp-lui-aiupc.adoc +++ b/src/images/wavedrom/int-comp-lui-aiupc.edn @@ -5,8 +5,8 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'LUI', 'AUIPC'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest'], type: 2}, - {bits: 20, name: 'imm[31:12]', attr: ['20', 'U-immediate[31:12]', 'U-immediate[31:12]'], type: 3} + {bits: 7, name: 'opcode', attr: ['7', 'LUI', 'AUIPC']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest']}, + {bits: 20, name: 'imm[31:12]', attr: ['20', 'U-immediate[31:12]', 'U-immediate[31:12]']} ]} .... diff --git a/src/images/wavedrom/int-comp-slli-srli-srai.adoc b/src/images/wavedrom/int-comp-slli-srli-srai.edn index 3fa49a4..3d23dfb 100644 --- a/src/images/wavedrom/int-comp-slli-srli-srai.adoc +++ b/src/images/wavedrom/int-comp-slli-srli-srai.edn @@ -5,13 +5,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM', 'OP-IMM', 'OP-IMM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'SLLI', 'SRLI', 'SRAI'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src', 'src', 'src'], type: 4}, - {bits: 5, name: 'imm[4:0]', attr: ['5', 'shamt[4:0]', 'shamt[4:0]', 'shamt[4:0]'], type: 3}, - {bits: 7, name: 'imm[11:5]', attr: ['7', 0, 0, 32], type: 8} + {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM', 'OP-IMM', 'OP-IMM']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'SLLI', 'SRLI', 'SRAI']}, + {bits: 5, name: 'rs1', attr: ['5', 'src', 'src', 'src']}, + {bits: 5, name: 'imm[4:0]', attr: ['5', 'shamt[4:0]', 'shamt[4:0]', 'shamt[4:0]']}, + {bits: 7, name: 'imm[11:5]', attr: ['7', 0, 0, 32]} ]} .... - - diff --git a/src/images/wavedrom/int_reg-reg.adoc b/src/images/wavedrom/int-reg-reg.edn index 1ec0c17..3fd19f7 100644 --- a/src/images/wavedrom/int_reg-reg.adoc +++ b/src/images/wavedrom/int-reg-reg.edn @@ -3,11 +3,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP', 'OP', 'OP', 'OP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest','dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'ADD/SLT[U]', 'AND/OR/XOR', 'SLL/SRL', 'SUB/SRA'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src1', 'src1', 'src1', 'src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'src2', 'src2', 'src2', 'src2'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 0, 0, 0, 32], type: 8} + {bits: 7, name: 'opcode', attr: ['7', 'OP', 'OP', 'OP', 'OP']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest','dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'ADD/SLT[U]', 'AND/OR/XOR', 'SLL/SRL', 'SUB/SRA']}, + {bits: 5, name: 'rs1', attr: ['5', 'src1', 'src1', 'src1', 'src1']}, + {bits: 5, name: 'rs2', attr: ['5', 'src2', 'src2', 'src2', 'src2']}, + {bits: 7, name: 'funct7', attr: ['7', 0, 0, 0, 32]} ]} .... diff --git a/src/images/wavedrom/integer-computational.edn b/src/images/wavedrom/integer-computational.edn new file mode 100644 index 0000000..707f06f --- /dev/null +++ b/src/images/wavedrom/integer-computational.edn @@ -0,0 +1,15 @@ +//## 2.4 Integer Computational Instructions +//### Integer Register-Immediate Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM', 'OP-IMM']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'ADDI/SLTI[U]', 'ANDI/ORI/XORI']}, + {bits: 5, name: 'rs1', attr: ['5', 'src', 'src']}, + {bits: 12, name: 'imm[11:0]', attr: ['12', 'I-immediate[11:0]', 'I-immediate[11:0]']} +]} +.... + +//<snio> diff --git a/src/images/wavedrom/integer_computational.adoc b/src/images/wavedrom/integer_computational.adoc deleted file mode 100644 index 5172d4e..0000000 --- a/src/images/wavedrom/integer_computational.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 2.4 Integer Computational Instructions -//### Integer Register-Immediate Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM', 'OP-IMM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'ADDI/SLTI[U]', 'ANDI/ORI/XORI'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src', 'src'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12', 'I-immediate[11:0]', 'I-immediate[11:0]'], type: 3} -]} -.... - -//<snio> diff --git a/src/images/wavedrom/j-immediate.edn b/src/images/wavedrom/j-immediate.edn new file mode 100644 index 0000000..6bebec3 --- /dev/null +++ b/src/images/wavedrom/j-immediate.edn @@ -0,0 +1,13 @@ +//#### J-immediate + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: '0'}, + {bits: 4, name: 'inst[24:21]'}, + {bits: 6, name: 'inst[30:25]'}, + {bits: 1, name: '[20]'}, + {bits: 8, name: 'inst[19:12]'}, + {bits: 12, name: '— inst[31] —'}, +], config:{fontsize: 12, label:{right: 'J-immediate'}}} +.... diff --git a/src/images/wavedrom/load-reserve-st-conditional.adoc b/src/images/wavedrom/load-reserve-st-conditional.adoc deleted file mode 100644 index 355342c..0000000 --- a/src/images/wavedrom/load-reserve-st-conditional.adoc +++ /dev/null @@ -1,19 +0,0 @@ -//# 9 "A" Standard Extension for Atomic Instructions, Version 2.1 -//## 9.2 Load-Reserved/Store-Conditional Instructions - - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'AMO', 'AMO'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'width', 'width'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'addr', 'addr'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', '0', 'src'], type: 4}, - {bits: 1, name: 'rl', attr: ['1', 'ring', 'ring'], type: 8}, - {bits: 1, name: 'aq', attr: ['1', 'orde', 'orde'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5', 'LR.W/D', 'SC.W/D'], type: 8}, -]} -.... - - diff --git a/src/images/wavedrom/load-reserve-st-conditional.edn b/src/images/wavedrom/load-reserve-st-conditional.edn new file mode 100644 index 0000000..1bd3814 --- /dev/null +++ b/src/images/wavedrom/load-reserve-st-conditional.edn @@ -0,0 +1,17 @@ +//# 9 "A" Standard Extension for Atomic Instructions, Version 2.1 +//## 9.2 Load-Reserved/Store-Conditional Instructions + + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'AMO', 'AMO']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'width', 'width']}, + {bits: 5, name: 'rs1', attr: ['5', 'addr', 'addr']}, + {bits: 5, name: 'rs2', attr: ['5', '0', 'src']}, + {bits: 1, name: 'rl', attr: ['1', 'ring', 'ring']}, + {bits: 1, name: 'aq', attr: ['1', 'orde', 'orde']}, + {bits: 5, name: 'funct5', attr: ['5', 'LR.W/D', 'SC.W/D']}, +]} +.... diff --git a/src/images/wavedrom/load-store.edn b/src/images/wavedrom/load-store.edn new file mode 100644 index 0000000..ac23d35 --- /dev/null +++ b/src/images/wavedrom/load-store.edn @@ -0,0 +1,24 @@ +//## 2.6 Load and Store Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'LOAD']}, + {bits: 5, name: 'rd', attr: ['5', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'width']}, + {bits: 5, name: 'rs1', attr: ['5', 'base']}, + {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]']}, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'STORE']}, + {bits: 5, name: 'imm[4:0]', attr: ['5', 'offset[4:0]']}, + {bits: 3, name: 'funct3', attr: ['3', 'width']}, + {bits: 5, name: 'rs1', attr: ['5', 'base']}, + {bits: 5, name: 'rs2', attr: ['5', 'src']}, + {bits: 7, name: 'imm[11:5]', attr: ['7', 'offset[11:5]']}, +]} +.... diff --git a/src/images/wavedrom/load_store.adoc b/src/images/wavedrom/load_store.adoc deleted file mode 100644 index f9de4d1..0000000 --- a/src/images/wavedrom/load_store.adoc +++ /dev/null @@ -1,24 +0,0 @@ -//## 2.6 Load and Store Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'LOAD'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'width'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'base'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]'], type: 3}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'STORE'], type: 8}, - {bits: 5, name: 'imm[4:0]', attr: ['5', 'offset[4:0]'], type: 3}, - {bits: 3, name: 'funct3', attr: ['3', 'width'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'base'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'src'], type: 4}, - {bits: 7, name: 'imm[11:5]', attr: ['7', 'offset[11:5]'], type: 3}, -]} -.... diff --git a/src/images/wavedrom/m-st-ext-for-int-mult.adoc b/src/images/wavedrom/m-st-ext-for-int-mult.adoc deleted file mode 100644 index 520951c..0000000 --- a/src/images/wavedrom/m-st-ext-for-int-mult.adoc +++ /dev/null @@ -1,28 +0,0 @@ -//# 8 "M" Standard Extension for Integer Multiplication and Division, Version 2.0 -//## 8.1 Multiplication Operations - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP', 'OP-32'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'MUL/MULH[[S]U]', 'MULW'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'multiplicand', 'multiplicand'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'multiplier', 'multiplier'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 'MULDIV', 'MULDIV'], type: 8}, -]} -.... - -//[wavedrom, ,] -//.... -//{reg: [ -// {bits: 7, name: 'opcode', attr: 'OP-32', type: 8}, -// {bits: 5, name: 'rd', attr: 'dest', type: 2}, -// {bits: 3, name: 'funct3', attr: 'MULW', type: 8}, -// {bits: 5, name: 'rs1', attr: 'multiplicand', type: 4}, -// {bits: 5, name: 'rs2', attr: 'multiplier', type: 4}, -// {bits: 7, name: 'funct7', attr: 'MULDIV', type: 8}, -//]} -//.... - - diff --git a/src/images/wavedrom/m-st-ext-for-int-mult.edn b/src/images/wavedrom/m-st-ext-for-int-mult.edn new file mode 100644 index 0000000..4adcda4 --- /dev/null +++ b/src/images/wavedrom/m-st-ext-for-int-mult.edn @@ -0,0 +1,26 @@ +//# 8 "M" Standard Extension for Integer Multiplication and Division, Version 2.0 +//## 8.1 Multiplication Operations + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'OP', 'OP-32']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'MUL/MULH[[S]U]', 'MULW']}, + {bits: 5, name: 'rs1', attr: ['5', 'multiplicand', 'multiplicand']}, + {bits: 5, name: 'rs2', attr: ['5', 'multiplier', 'multiplier']}, + {bits: 7, name: 'funct7', attr: ['7', 'MULDIV', 'MULDIV']}, +]} +.... + +//[wavedrom, ,] +//.... +//{reg: [ +// {bits: 7, name: 'opcode', attr: 'OP-32'}, +// {bits: 5, name: 'rd', attr: 'dest'}, +// {bits: 3, name: 'funct3', attr: 'MULW'}, +// {bits: 5, name: 'rs1', attr: 'multiplicand'}, +// {bits: 5, name: 'rs2', attr: 'multiplier'}, +// {bits: 7, name: 'funct7', attr: 'MULDIV'}, +//]} +//.... diff --git a/src/images/wavedrom/mem_order.adoc b/src/images/wavedrom/mem-order.edn index 75b5ab0..c7e0ba4 100644 --- a/src/images/wavedrom/mem_order.adoc +++ b/src/images/wavedrom/mem-order.edn @@ -3,10 +3,10 @@ [wavedrom,mem-order ,] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'MISC-MEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'FENCE'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', '0'], type: 4}, + {bits: 7, name: 'opcode', attr: ['7', 'MISC-MEM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'FENCE']}, + {bits: 5, name: 'rs1', attr: ['5', '0']}, {bits: 1, name: 'SW', attr: 1}, {bits: 1, name: 'SR', attr: 1}, {bits: 1, name: 'SO', attr: 1}, @@ -15,6 +15,6 @@ {bits: 1, name: 'PR', attr: 1}, {bits: 1, name: 'PO', attr: 1}, {bits: 1, name: 'PI', attr: 1}, - {bits: 4, name: 'fm', attr: ['4', 'FM'], type: 8}, + {bits: 4, name: 'fm', attr: ['4', 'FM']}, ]} .... diff --git a/src/images/wavedrom/menvcfgreg.edn b/src/images/wavedrom/menvcfgreg.edn new file mode 100644 index 0000000..5ed6fb6 --- /dev/null +++ b/src/images/wavedrom/menvcfgreg.edn @@ -0,0 +1,21 @@ +//.Machine environment configuration (`menvcfg`) register. +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: 'FIOM'}, + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'LPE'}, + {bits: 1, name: 'SSE'}, + {bits: 2, name: 'CBIE'}, + {bits: 1, name: 'CBCFE'}, + {bits: 1, name: 'CBZE'}, + {bits: 24, name: 'WPRI'}, + {bits: 2, name: 'PMM'}, + {bits: 25, name: 'WPRI'}, + {bits: 1, name: 'DTE'}, + {bits: 1, name: 'CDE'}, + {bits: 1, name: 'ADUE'}, + {bits: 1, name: 'PBMTE'}, + {bits: 1, name: 'STCE'}, +], config:{lanes: 4, hspace:1024}} +.... diff --git a/src/images/wavedrom/mm-env-call.adoc b/src/images/wavedrom/mm-env-call.adoc deleted file mode 100644 index 9838230..0000000 --- a/src/images/wavedrom/mm-env-call.adoc +++ /dev/null @@ -1,13 +0,0 @@ -// - -[wavedrom, ,svg] - -.... -{reg: [ - {bits: 7, name: 'opcode', type: 8, attr: ['7','SYSTEM','SYSTEM'],}, - {bits: 5, name: 'rd', type: 2, attr: ['5','0','0'],}, - {bits: 3, name: 'funct3', type: 8, attr: ['3','PRIV','PRIV'],}, - {bits: 5, name: 'rs1', type: 4, attr: ['5','0','0'],}, - {bits: 12, name: 'funct12', type: 8, attr: ['12','ECALL','EBREAK',]}, -], config: {bits: 32}} -....
\ No newline at end of file diff --git a/src/images/wavedrom/mm-env-call.edn b/src/images/wavedrom/mm-env-call.edn new file mode 100644 index 0000000..0b8a378 --- /dev/null +++ b/src/images/wavedrom/mm-env-call.edn @@ -0,0 +1,13 @@ +// + +[wavedrom, ,svg] + +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','SYSTEM','SYSTEM'],}, + {bits: 5, name: 'rd', attr: ['5','0','0'],}, + {bits: 3, name: 'funct3', attr: ['3','PRIV','PRIV'],}, + {bits: 5, name: 'rs1', attr: ['5','0','0'],}, + {bits: 12, name: 'funct12', attr: ['12','ECALL','EBREAK',]}, +], config: {bits: 32}} +.... diff --git a/src/images/wavedrom/mop-r.adoc b/src/images/wavedrom/mop-r.edn index 713b37c..55347e0 100644 --- a/src/images/wavedrom/mop-r.adoc +++ b/src/images/wavedrom/mop-r.edn @@ -1,10 +1,10 @@ [wavedrom, ,svg] .... {reg:[ - { bits: 7, name: 0x73, attr: ['SYSTEM'], type: 8 }, - { bits: 5, name: 'rd', type: 2 }, + { bits: 7, name: 0x73, attr: ['SYSTEM']}, + { bits: 5, name: 'rd'}, { bits: 3, name: 0x4 }, - { bits: 5, name: 'rs1', type: 4 }, + { bits: 5, name: 'rs1'}, { bits: 2, name: 'n[1:0]' }, { bits: 4, name: 0x7 }, { bits: 2, name: 'n[3:2]' }, diff --git a/src/images/wavedrom/mop-rr.adoc b/src/images/wavedrom/mop-rr.edn index b70f854..879e372 100644 --- a/src/images/wavedrom/mop-rr.adoc +++ b/src/images/wavedrom/mop-rr.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg:[ - { bits: 7, name: 0x73, attr: ['SYSTEM'], type: 8 }, - { bits: 5, name: 'rd', type: 2 }, + { bits: 7, name: 0x73, attr: ['SYSTEM']}, + { bits: 5, name: 'rd'}, { bits: 3, name: 0x4 }, - { bits: 5, name: 'rs1', type: 4 }, - { bits: 5, name: 'rs2', type: 4 }, + { bits: 5, name: 'rs1'}, + { bits: 5, name: 'rs2'}, { bits: 1, name: 0x1 }, { bits: 2, name: 'n[1:0]' }, { bits: 2, name: 0x0 }, diff --git a/src/images/wavedrom/mseccfg.edn b/src/images/wavedrom/mseccfg.edn new file mode 100644 index 0000000..7343fb3 --- /dev/null +++ b/src/images/wavedrom/mseccfg.edn @@ -0,0 +1,16 @@ +//.Machine security configuration (`mseccfg`) register. +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: 'MML'}, + {bits: 1, name: 'MMWP'}, + {bits: 1, name: 'RLB'}, + {bits: 5, name: 'WPRI'}, + {bits: 1, name: 'USEED'}, + {bits: 1, name: 'SSEED'}, + {bits: 1, name: 'MLPE'}, + {bits: 21, name: 'WPRI'}, + {bits: 2, name: 'PMM'}, + {bits: 30, name: 'WPRI'}, +], config:{lanes: 4, hspace:1024}} +.... diff --git a/src/images/wavedrom/mstatushreg.edn b/src/images/wavedrom/mstatushreg.edn new file mode 100644 index 0000000..702ea11 --- /dev/null +++ b/src/images/wavedrom/mstatushreg.edn @@ -0,0 +1,15 @@ +//.Additional machine-mode status (`mstatush`) register for RV32. +[wavedrom, ,svg] +.... +{reg: [ + {bits: 4, name: 'WPRI'}, + {bits: 1, name: 'SBE'}, + {bits: 1, name: 'MBE'}, + {bits: 1, name: 'GVA'}, + {bits: 1, name: 'MPV'}, + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'MPELP'}, + {bits: 1, name: 'MDT'}, + {bits: 21, name: 'WPRI'}, +], config:{lanes: 2, hspace:1024}} +.... diff --git a/src/images/wavedrom/mstatusreg-rv321.edn b/src/images/wavedrom/mstatusreg-rv321.edn new file mode 100644 index 0000000..cc77fc2 --- /dev/null +++ b/src/images/wavedrom/mstatusreg-rv321.edn @@ -0,0 +1,29 @@ +//.Machine-mode status (`mstatus`) register for RV32 +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'SIE'}, + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'MIE'}, + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'SPIE'}, + {bits: 1, name: 'UBE'}, + {bits: 1, name: 'MPIE'}, + {bits: 1, name: 'SPP'}, + {bits: 2, name: 'VS[1:0]'}, + {bits: 2, name: 'MPP[1:0]'}, + {bits: 2, name: 'FS[1:0]'}, + {bits: 2, name: 'XS[1:0]'}, + {bits: 1, name: 'MPRV'}, + {bits: 1, name: 'SUM'}, + {bits: 1, name: 'MXR'}, + {bits: 1, name: 'TVM'}, + {bits: 1, name: 'TW'}, + {bits: 1, name: 'TSR'}, + {bits: 1, name: 'SPELP'}, + {bits: 1, name: 'SDT'}, + {bits: 6, name: 'WPRI'}, + {bits: 1, name: 'SD'}, +], config:{lanes: 2, hspace:1024}} +.... diff --git a/src/images/wavedrom/mstatusreg.edn b/src/images/wavedrom/mstatusreg.edn new file mode 100644 index 0000000..db24626 --- /dev/null +++ b/src/images/wavedrom/mstatusreg.edn @@ -0,0 +1,39 @@ +//.Machine-mode status (`mstatus`) register for RV64 +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'SIE'}, + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'MIE'}, + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'SPIE'}, + {bits: 1, name: 'UBE'}, + {bits: 1, name: 'MPIE'}, + {bits: 1, name: 'SPP'}, + {bits: 2, name: 'VS[1:0]'}, + {bits: 2, name: 'MPP[1:0]'}, + {bits: 2, name: 'FS[1:0]'}, + {bits: 2, name: 'XS[1:0]'}, + {bits: 1, name: 'MPRV'}, + {bits: 1, name: 'SUM'}, + {bits: 1, name: 'MXR'}, + {bits: 1, name: 'TVM'}, + {bits: 1, name: 'TW'}, + {bits: 1, name: 'TSR'}, + {bits: 1, name: 'SPELP'}, + {bits: 1, name: 'SDT'}, + {bits: 7, name: 'WPRI'}, + {bits: 2, name: 'UXL[1:0]'}, + {bits: 2, name: 'SXL[1:0]'}, + {bits: 1, name: 'SBE'}, + {bits: 1, name: 'MBE'}, + {bits: 1, name: 'GVA'}, + {bits: 1, name: 'MPV'}, + {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'MPELP'}, + {bits: 1, name: 'MDT'}, + {bits: 20, name: 'WPRI'}, + {bits: 1, name: 'SD'}, +], config:{lanes: 4, hspace:1024}} +.... diff --git a/src/images/wavedrom/nop-v.adoc b/src/images/wavedrom/nop-v.adoc deleted file mode 100644 index 0c990e4..0000000 --- a/src/images/wavedrom/nop-v.adoc +++ /dev/null @@ -1,29 +0,0 @@ -//# NOP-V - -The RISC-V [User-Level ISA Specification](https://riscv.org/specifications/) defines NOP instruction as follows: - -* The NOP instruction does not change any user-visible state, except for advancing the pc. -* NOP is encoded as \`ADDI x0, x0, 0\`. - -[wavedrom, , ] ----- -{reg:[ - {name: 'opcode', bits: 7, attr: 0b0010011}, - {name: 'rd', bits: 5, attr: 0}, - {name: 'funct3', bits: 3, attr: 0}, - {name: 'rs1', bits: 5, attr: 0}, - {name: 'imm', bits: 12, attr: 0} -], config: {hspace: width}} ----- - - -NOTE: NOPs can be used to align code segments to microarchitecturally significant address boundaries, or to leave space for inline code modifications. Although **there are many possible ways** to encode a NOP, we define a canonical NOP encoding to allow microarchitectural optimizations as well as for more readable disassembly output. - -How many other possible ways to encode NOP? ----- -rd = 0 ----- - -Any Integer Computational instruction writing into \`x0\` is NOP. - -` diff --git a/src/images/wavedrom/nop.adoc b/src/images/wavedrom/nop.adoc deleted file mode 100644 index 34ad70e..0000000 --- a/src/images/wavedrom/nop.adoc +++ /dev/null @@ -1,11 +0,0 @@ -//### NOP Instruction -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'ADDI'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', '0'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12', '0'], type: 3} -]} -.... diff --git a/src/images/wavedrom/nop.edn b/src/images/wavedrom/nop.edn new file mode 100644 index 0000000..b566909 --- /dev/null +++ b/src/images/wavedrom/nop.edn @@ -0,0 +1,11 @@ +//### NOP Instruction +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'ADDI']}, + {bits: 5, name: 'rs1', attr: ['5', '0']}, + {bits: 12, name: 'imm[11:0]', attr: ['12', '0']} +]} +.... diff --git a/src/images/wavedrom/quad-cnvrt-intch-xqqx.adoc b/src/images/wavedrom/quad-cnvrt-intch-xqqx.adoc deleted file mode 100644 index ba4e224..0000000 --- a/src/images/wavedrom/quad-cnvrt-intch-xqqx.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//quad-cnvrt-intch-xqqx - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3', 'J[N]/JX'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','Q'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FSGNJ'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/quad-cnvrt-intch-xqqx.edn b/src/images/wavedrom/quad-cnvrt-intch-xqqx.edn new file mode 100644 index 0000000..097c839 --- /dev/null +++ b/src/images/wavedrom/quad-cnvrt-intch-xqqx.edn @@ -0,0 +1,14 @@ +//quad-cnvrt-intch-xqqx + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3', 'J[N]/JX']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','Q']}, + {bits: 5, name: 'funct5', attr: ['5','FSGNJ']}, +]} +.... diff --git a/src/images/wavedrom/quad-cnvrt-mv.adoc b/src/images/wavedrom/quad-cnvrt-mv.adoc deleted file mode 100644 index 3fc9f86..0000000 --- a/src/images/wavedrom/quad-cnvrt-mv.adoc +++ /dev/null @@ -1,28 +0,0 @@ -//## 14.3 Quad-Precision Convert and Move Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','W[U]/L[U]', 'W[U]/L[U]'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','Q','Q'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCVT.int.Q','FCVT.Q.int'], type: 8}, -]} -.... - -//[wavedrom, ,] -//.... -//{reg: [ -// {bits: 7, name: 'opcode', attr: 'OP-FP', type: 8}, -// {bits: 5, name: 'rd', attr: 'dest', type: 2}, -// {bits: 3, name: 'rm', attr: 'RM', type: 8}, -// {bits: 5, name: 'rs1', attr: 'src', type: 4}, -// {bits: 5, name: 'rs2', attr: ['W', 'WU', 'L', 'LU'], type: 4}, -// {bits: 2, name: 'fmt', attr: 'Q', type: 8}, -// {bits: 5, name: 'funct5', attr: 'FCVT.Q.int', type: 8}, -//]} -//.... - diff --git a/src/images/wavedrom/quad-cnvrt-mv.edn b/src/images/wavedrom/quad-cnvrt-mv.edn new file mode 100644 index 0000000..769257e --- /dev/null +++ b/src/images/wavedrom/quad-cnvrt-mv.edn @@ -0,0 +1,27 @@ +//## 14.3 Quad-Precision Convert and Move Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src','src']}, + {bits: 5, name: 'rs2', attr: ['5','W[U]/L[U]', 'W[U]/L[U]']}, + {bits: 2, name: 'fmt', attr: ['2','Q','Q']}, + {bits: 5, name: 'funct5', attr: ['5','FCVT.int.Q','FCVT.Q.int']}, +]} +.... + +//[wavedrom, ,] +//.... +//{reg: [ +// {bits: 7, name: 'opcode', attr: 'OP-FP'}, +// {bits: 5, name: 'rd', attr: 'dest'}, +// {bits: 3, name: 'rm', attr: 'RM'}, +// {bits: 5, name: 'rs1', attr: 'src'}, +// {bits: 5, name: 'rs2', attr: ['W', 'WU', 'L', 'LU']}, +// {bits: 2, name: 'fmt', attr: 'Q'}, +// {bits: 5, name: 'funct5', attr: 'FCVT.Q.int'}, +//]} +//.... diff --git a/src/images/wavedrom/quad-cnvt-interchange.adoc b/src/images/wavedrom/quad-cnvt-interchange.adoc deleted file mode 100644 index 1178397..0000000 --- a/src/images/wavedrom/quad-cnvt-interchange.adoc +++ /dev/null @@ -1,16 +0,0 @@ -//14 conv-mv - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP-FP', 'OP-FP','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM','RM','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src','src','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','Q', 'S', 'Q', 'D'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','S','Q', 'D', 'Q'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCVT.S.Q', 'FCVT.Q.S', 'FCVT.D.Q', 'FCVT.Q.D'], type: 8}, -]} -.... - - diff --git a/src/images/wavedrom/quad-cnvt-interchange.edn b/src/images/wavedrom/quad-cnvt-interchange.edn new file mode 100644 index 0000000..a1871fa --- /dev/null +++ b/src/images/wavedrom/quad-cnvt-interchange.edn @@ -0,0 +1,14 @@ +//14 conv-mv + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'OP-FP', 'OP-FP','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM','RM','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src','src','src','src']}, + {bits: 5, name: 'rs2', attr: ['5','Q', 'S', 'Q', 'D']}, + {bits: 2, name: 'fmt', attr: ['2','S','Q', 'D', 'Q']}, + {bits: 5, name: 'funct5', attr: ['5','FCVT.S.Q', 'FCVT.Q.S', 'FCVT.D.Q', 'FCVT.Q.D']}, +]} +.... diff --git a/src/images/wavedrom/quad-compute.adoc b/src/images/wavedrom/quad-compute.adoc deleted file mode 100644 index 6aa3953..0000000 --- a/src/images/wavedrom/quad-compute.adoc +++ /dev/null @@ -1,54 +0,0 @@ -//## 14.2 Quad-Precision Computational Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM','MIN/MAX','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1','src1','src1','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2','src2','src2','0'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','Q','Q','Q','Q'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FADD/FSUB', 'FMUL/FDIV', 'FMIN-MAX', 'FSQRT'], type: 8}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','F[N]MADD/F[N]MSUB'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 8}, - {bits: 2, name: 'fmt', attr: ['2','Q'], type: 8}, - {bits: 5, name: 'rs3', attr: ['5','src3'], type: 8}, -]} -.... - -//[wavedrom, ,] -//.... -//{reg: [ -// {bits: 7, name: 'opcode', attr: 'OP-FP', type: 8}, -// {bits: 5, name: 'rd', attr: 'dest', type: 2}, -// {bits: 3, name: 'funct3', attr: ['MIN', 'MAX'], type: 8}, -// {bits: 5, name: 'rs1', attr: 'src1', type: 4}, -// {bits: 5, name: 'rs2', attr: 'src2', type: 4}, -// {bits: 2, name: 'fmt', attr: 'Q', type: 8}, -// {bits: 5, name: 'funct5', attr: 'FMIN-MAX', type: 8}, -//]} -//.... - - -//[wavedrom, ,] -//.... -//{reg: [ -// {bits: 7, name: 'opcode', attr: ['FMADD', 'FNMADD', 'FMSUB', 'FNMSUB'], type: 8}, -// {bits: 5, name: 'rd', attr: 'dest', type: 2}, -// {bits: 3, name: 'funct3', attr: 'RM', type: 8}, -// {bits: 5, name: 'rs1', attr: 'src1', type: 4}, -// {bits: 5, name: 'rs2', attr: 'src2', type: 4}, -// {bits: 2, name: 'fmt', attr: 'Q', type: 8}, -// {bits: 5, name: 'rs3', attr: 'src3', type: 4}, -//]} -//....
\ No newline at end of file diff --git a/src/images/wavedrom/quad-compute.edn b/src/images/wavedrom/quad-compute.edn new file mode 100644 index 0000000..eb5095e --- /dev/null +++ b/src/images/wavedrom/quad-compute.edn @@ -0,0 +1,54 @@ +//## 14.2 Quad-Precision Computational Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM','MIN/MAX','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src1','src1','src1','src']}, + {bits: 5, name: 'rs2', attr: ['5','src2','src2','src2','0']}, + {bits: 2, name: 'fmt', attr: ['2','Q','Q','Q','Q']}, + {bits: 5, name: 'funct5', attr: ['5','FADD/FSUB', 'FMUL/FDIV', 'FMIN-MAX', 'FSQRT']}, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','F[N]MADD/F[N]MSUB']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','Q']}, + {bits: 5, name: 'rs3', attr: ['5','src3']}, +]} +.... + +//[wavedrom, ,] +//.... +//{reg: [ +// {bits: 7, name: 'opcode', attr: 'OP-FP'}, +// {bits: 5, name: 'rd', attr: 'dest'}, +// {bits: 3, name: 'funct3', attr: ['MIN', 'MAX']}, +// {bits: 5, name: 'rs1', attr: 'src1'}, +// {bits: 5, name: 'rs2', attr: 'src2'}, +// {bits: 2, name: 'fmt', attr: 'Q'}, +// {bits: 5, name: 'funct5', attr: 'FMIN-MAX'}, +//]} +//.... + + +//[wavedrom, ,] +//.... +//{reg: [ +// {bits: 7, name: 'opcode', attr: ['FMADD', 'FNMADD', 'FMSUB', 'FNMSUB']}, +// {bits: 5, name: 'rd', attr: 'dest'} +// {bits: 3, name: 'funct3', attr: 'RM'}, +// {bits: 5, name: 'rs1', attr: 'src1'}, +// {bits: 5, name: 'rs2', attr: 'src2'}, +// {bits: 2, name: 'fmt', attr: 'Q'}, +// {bits: 5, name: 'rs3', attr: 'src3'}, +//]} +//.... diff --git a/src/images/wavedrom/quad-float-clssfy.adoc b/src/images/wavedrom/quad-float-clssfy.adoc deleted file mode 100644 index 0023c7d..0000000 --- a/src/images/wavedrom/quad-float-clssfy.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 14.5 Quad-Precision Floating-Point Classify Instruction - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','001'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','0'], type: 8}, - {bits: 2, name: 'fmt', attr: ['2','Q'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCLASS'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/quad-float-clssfy.edn b/src/images/wavedrom/quad-float-clssfy.edn new file mode 100644 index 0000000..245209f --- /dev/null +++ b/src/images/wavedrom/quad-float-clssfy.edn @@ -0,0 +1,14 @@ +//## 14.5 Quad-Precision Floating-Point Classify Instruction + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','001']}, + {bits: 5, name: 'rs1', attr: ['5','src']}, + {bits: 5, name: 'rs2', attr: ['5','0']}, + {bits: 2, name: 'fmt', attr: ['2','Q']}, + {bits: 5, name: 'funct5', attr: ['5','FCLASS']}, +]} +.... diff --git a/src/images/wavedrom/quad-float-compare.adoc b/src/images/wavedrom/quad-float-compare.adoc deleted file mode 100644 index 2269bc9..0000000 --- a/src/images/wavedrom/quad-float-compare.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 14.4 Quad-Precision Floating-Point Compare Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','EQ/LT/LE'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','Q'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCMP'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/quad-float-compare.edn b/src/images/wavedrom/quad-float-compare.edn new file mode 100644 index 0000000..7fd45ea --- /dev/null +++ b/src/images/wavedrom/quad-float-compare.edn @@ -0,0 +1,14 @@ +//## 14.4 Quad-Precision Floating-Point Compare Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','EQ/LT/LE']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','Q']}, + {bits: 5, name: 'funct5', attr: ['5','FCMP']}, +]} +.... diff --git a/src/images/wavedrom/quad-ls.adoc b/src/images/wavedrom/quad-ls.adoc deleted file mode 100644 index 3ba4099..0000000 --- a/src/images/wavedrom/quad-ls.adoc +++ /dev/null @@ -1,26 +0,0 @@ -//## 14.1 Quad-Precision Load and Store Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','LOAD-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'width', attr: ['3','Q'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','base'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12','offset[11:0]'], type: 3}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','STORE-FP'], type: 8}, - {bits: 5, name: 'imm[4:0]', attr: ['5','offset[4:0]'], type: 3}, - {bits: 3, name: 'width', attr: ['3','Q'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','base'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src'], type: 4}, - {bits: 7, name: 'imm[11:5]', attr: ['7','offset[11:5]'], type: 3}, -]} -.... - - diff --git a/src/images/wavedrom/quad-ls.edn b/src/images/wavedrom/quad-ls.edn new file mode 100644 index 0000000..077a79c --- /dev/null +++ b/src/images/wavedrom/quad-ls.edn @@ -0,0 +1,24 @@ +//## 14.1 Quad-Precision Load and Store Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','LOAD-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'width', attr: ['3','Q']}, + {bits: 5, name: 'rs1', attr: ['5','base']}, + {bits: 12, name: 'imm[11:0]', attr: ['12','offset[11:0]']}, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','STORE-FP']}, + {bits: 5, name: 'imm[4:0]', attr: ['5','offset[4:0]']}, + {bits: 3, name: 'width', attr: ['3','Q']}, + {bits: 5, name: 'rs1', attr: ['5','base']}, + {bits: 5, name: 'rs2', attr: ['5','src']}, + {bits: 7, name: 'imm[11:5]', attr: ['7','offset[11:5]']}, +]} +.... diff --git a/src/images/wavedrom/reg-based-ldnstr.adoc b/src/images/wavedrom/reg-based-ldnstr.edn index ea9e245..82afaa7 100644 --- a/src/images/wavedrom/reg-based-ldnstr.adoc +++ b/src/images/wavedrom/reg-based-ldnstr.edn @@ -4,12 +4,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 2, name: 'op', attr: ['2', 'C0', 'C0', 'C0', 'C0', 'C0'], type: 8}, - {bits: 3, name: 'rdʹ', attr: ['3', 'dest', 'dest','dest','dest','dest'], type: 3}, - {bits: 2, name: 'imm', attr:['2', 'offset[2|6]', 'offset[7:6]', 'offset[7:6]', 'offset[2|6]', 'offset[7:6]'], type: 2}, - {bits: 3, name: 'rs1ʹ', attr: ['3', 'base', 'base', 'base', 'base', 'base'], type: 2}, - {bits: 3, name: 'imm', attr: ['3', 'offset[5:3]', 'offset[5:3]', 'offset[5|4|8]', 'offset[5:3]', 'offset[5:3]'], type: 3}, - {bits: 3, name: 'funct3', attr: ['3', 'C.LW', 'C.LD', 'C.LQ', 'C.FLW', 'C.FLD'], type: 8}, + {bits: 2, name: 'op', attr: ['2', 'C0', 'C0', 'C0', 'C0', 'C0']}, + {bits: 3, name: 'rdʹ', attr: ['3', 'dest', 'dest','dest','dest','dest']}, + {bits: 2, name: 'imm', attr:['2', 'offset[2|6]', 'offset[7:6]', 'offset[7:6]', 'offset[2|6]', 'offset[7:6]']}, + {bits: 3, name: 'rs1ʹ', attr: ['3', 'base', 'base', 'base', 'base', 'base']}, + {bits: 3, name: 'imm', attr: ['3', 'offset[5:3]', 'offset[5:3]', 'offset[5|4|8]', 'offset[5:3]', 'offset[5:3]']}, + {bits: 3, name: 'funct3', attr: ['3', 'C.LW', 'C.LD', 'C.LQ', 'C.FLW', 'C.FLD']}, ], config: {bits: 16}} .... - diff --git a/src/images/wavedrom/rv64-lui-auipc.edn b/src/images/wavedrom/rv64-lui-auipc.edn new file mode 100644 index 0000000..5850133 --- /dev/null +++ b/src/images/wavedrom/rv64-lui-auipc.edn @@ -0,0 +1,10 @@ +//lui-auipc + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'LUI', 'AUIPC']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest']}, + {bits: 20, name: 'imm[31:12]', attr: ['20', 'U-immediate[31:12]', 'U-immediate[31:12]']} +]} +.... diff --git a/src/images/wavedrom/rv64_lui-auipc.adoc b/src/images/wavedrom/rv64_lui-auipc.adoc deleted file mode 100644 index 132c770..0000000 --- a/src/images/wavedrom/rv64_lui-auipc.adoc +++ /dev/null @@ -1,10 +0,0 @@ -//lui-auipc - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'LUI', 'AUIPC'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest'], type: 2}, - {bits: 20, name: 'imm[31:12]', attr: ['20', 'U-immediate[31:12]', 'U-immediate[31:12]'], type: 3} -]} -.... diff --git a/src/images/wavedrom/rv64i-base-int.adoc b/src/images/wavedrom/rv64i-base-int.adoc deleted file mode 100644 index e4edaf3..0000000 --- a/src/images/wavedrom/rv64i-base-int.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//# 6 RV64I Base Integer Instruction Set, Version 2.1 -//## 6.2 Integer Computational Instructions -//### Integer Register-Immediate Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM-32'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'ADDIW'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12', 'I-immediate[11:0]'], type: 3} -]} -.... - diff --git a/src/images/wavedrom/rv64i-base-int.edn b/src/images/wavedrom/rv64i-base-int.edn new file mode 100644 index 0000000..e5df7ee --- /dev/null +++ b/src/images/wavedrom/rv64i-base-int.edn @@ -0,0 +1,14 @@ +//# 6 RV64I Base Integer Instruction Set, Version 2.1 +//## 6.2 Integer Computational Instructions +//### Integer Register-Immediate Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM-32']}, + {bits: 5, name: 'rd', attr: ['5', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'ADDIW']}, + {bits: 5, name: 'rs1', attr: ['5', 'src']}, + {bits: 12, name: 'imm[11:0]', attr: ['12', 'I-immediate[11:0]']} +]} +.... diff --git a/src/images/wavedrom/rv64i-int-reg-reg.edn b/src/images/wavedrom/rv64i-int-reg-reg.edn new file mode 100644 index 0000000..6d29ec7 --- /dev/null +++ b/src/images/wavedrom/rv64i-int-reg-reg.edn @@ -0,0 +1,27 @@ + +//rv64i int-reg-reg +//### Integer Register-Register Operations + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'OP', 'OP', 'OP-32', 'OP-32', 'OP-32']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest', 'dest', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'SLL/SRL', 'SRA', 'ADDW', 'SLLW/SRLW', 'SUBW/SRAW']}, + {bits: 5, name: 'rs1', attr: ['5', 'src1', 'src1', 'src1', 'src1', 'src1']}, + {bits: 5, name: 'rs2', attr: ['5', 'src2', 'src2', 'src2', 'src2', 'src2']}, + {bits: 7, name: 'funct7', attr: ['7', '0000000', '0100000', '0000000', '0000000', '0100000']} +]} +.... + +//[wavedrom, ,svg] +//.... +//{reg: [ +// {bits: 7, name: 'opcode', attr: 'OP-32'}, +// {bits: 5, name: 'rd', attr: 'dest'}, +// {bits: 3, name: 'funct3', attr: ['ADDW', 'SLLW', 'SRLW', 'SUBW', 'SRAW']}, +// {bits: 5, name: 'rs1', attr: 'src1'}, +// {bits: 5, name: 'rs2', attr: 'src2'}, +// {bits: 7, name: 'funct7', attr: [0, 0, 0, 32, 32]} +//]} +//.... diff --git a/src/images/wavedrom/rv64i-slli.adoc b/src/images/wavedrom/rv64i-slli.edn index 038a052..b261564 100644 --- a/src/images/wavedrom/rv64i-slli.adoc +++ b/src/images/wavedrom/rv64i-slli.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM', 'OP-IMM', 'OP-IMM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'SLLI', 'SRLI', 'SRAI'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src', 'src', 'src'], type: 4}, - {bits: 6, name: 'imm[5:0]', attr: ['6', 'shamt[5:0]', 'shamt[5:0]', 'shamt[5:0]'], type: 3}, - {bits: 6, name: 'imm[11:6]', attr: ['6', '000000', '000000', '010000'], type: 8} + {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM', 'OP-IMM', 'OP-IMM']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'SLLI', 'SRLI', 'SRAI']}, + {bits: 5, name: 'rs1', attr: ['5', 'src', 'src', 'src']}, + {bits: 6, name: 'imm[5:0]', attr: ['6', 'shamt[5:0]', 'shamt[5:0]', 'shamt[5:0]']}, + {bits: 6, name: 'imm[11:6]', attr: ['6', '000000', '000000', '010000']} ]} .... diff --git a/src/images/wavedrom/rv64i-slliw.adoc b/src/images/wavedrom/rv64i-slliw.edn index bd51e9b..0ca01ba 100644 --- a/src/images/wavedrom/rv64i-slliw.adoc +++ b/src/images/wavedrom/rv64i-slliw.edn @@ -1,12 +1,12 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM-32', 'OP-IMM-32', 'OP-IMM-32'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'SLLIW', 'SRLIW', 'SRAIW'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src', 'src', 'src'], type: 4}, - {bits: 5, name: 'imm[4:0]', attr: ['5', 'shamt[4:0]', 'shamt[4:0]', 'shamt[4:0]'], type: 3}, - {bits: 1, name: '[5]', attr: ['1', '0', '0', '0'], type: 3}, - {bits: 6, name: 'imm[11:6]', attr: ['6', '000000', '000000', '010000'], type: 8} + {bits: 7, name: 'opcode', attr: ['7', 'OP-IMM-32', 'OP-IMM-32', 'OP-IMM-32']}, + {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest']}, + {bits: 3, name: 'funct3', attr: ['3', 'SLLIW', 'SRLIW', 'SRAIW']}, + {bits: 5, name: 'rs1', attr: ['5', 'src', 'src', 'src']}, + {bits: 5, name: 'imm[4:0]', attr: ['5', 'shamt[4:0]', 'shamt[4:0]', 'shamt[4:0]']}, + {bits: 1, name: '[5]', attr: ['1', '0', '0', '0']}, + {bits: 6, name: 'imm[11:6]', attr: ['6', '000000', '000000', '010000']} ]} .... diff --git a/src/images/wavedrom/rv64i_int-reg-reg.adoc b/src/images/wavedrom/rv64i_int-reg-reg.adoc deleted file mode 100644 index a69e718..0000000 --- a/src/images/wavedrom/rv64i_int-reg-reg.adoc +++ /dev/null @@ -1,27 +0,0 @@ - -//rv64i int-reg-reg -//### Integer Register-Register Operations - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'OP', 'OP', 'OP-32', 'OP-32', 'OP-32'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest', 'dest', 'dest', 'dest', 'dest'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'SLL/SRL', 'SRA', 'ADDW', 'SLLW/SRLW', 'SUBW/SRAW'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'src1', 'src1', 'src1', 'src1', 'src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'src2', 'src2', 'src2', 'src2', 'src2'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', '000000', '010000', '000000', '000000', '010000'], type: 8} -]} -.... - -//[wavedrom, ,svg] -//.... -//{reg: [ -// {bits: 7, name: 'opcode', attr: 'OP-32', type: 8}, -// {bits: 5, name: 'rd', attr: 'dest', type: 2}, -// {bits: 3, name: 'funct3', attr: ['ADDW', 'SLLW', 'SRLW', 'SUBW', 'SRAW'], type: 8}, -// {bits: 5, name: 'rs1', attr: 'src1', type: 4}, -// {bits: 5, name: 'rs2', attr: 'src2', type: 4}, -// {bits: 7, name: 'funct7', attr: [0, 0, 0, 32, 32], type: 8} -//]} -//.... diff --git a/src/images/wavedrom/s-immediate.edn b/src/images/wavedrom/s-immediate.edn new file mode 100644 index 0000000..14abede --- /dev/null +++ b/src/images/wavedrom/s-immediate.edn @@ -0,0 +1,11 @@ +//#### S-immediate + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 1, name: '[7]'}, + {bits: 4, name: 'inst[11:8]'}, + {bits: 6, name: 'inst[30:25]'}, + {bits: 21, name: '— inst[31] —'}, +], config:{fontsize: 12, label:{right: 'S-immediate'}}} +.... diff --git a/src/images/wavedrom/sfenceinvalir.edn b/src/images/wavedrom/sfenceinvalir.edn index 639be34..ca237fc 100644 --- a/src/images/wavedrom/sfenceinvalir.edn +++ b/src/images/wavedrom/sfenceinvalir.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'PRIV'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', '0'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', '1'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 'SFENCE.INVAL.IR'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'PRIV']}, + {bits: 5, name: 'rs1', attr: ['5', '0']}, + {bits: 5, name: 'rs2', attr: ['5', '1']}, + {bits: 7, name: 'funct7', attr: ['7', 'SFENCE.INVAL.IR']}, ]} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/sfencevma.edn b/src/images/wavedrom/sfencevma.edn index a7a7663..bba975e 100644 --- a/src/images/wavedrom/sfencevma.edn +++ b/src/images/wavedrom/sfencevma.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'PRIV'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'vaddr'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'asid'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 'SFENCE.VMA'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'PRIV']}, + {bits: 5, name: 'rs1', attr: ['5', 'vaddr']}, + {bits: 5, name: 'rs2', attr: ['5', 'asid']}, + {bits: 7, name: 'funct7', attr: ['7', 'SFENCE.VMA']}, ]} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/sfencewinval.edn b/src/images/wavedrom/sfencewinval.edn index 2973af8..81e6667 100644 --- a/src/images/wavedrom/sfencewinval.edn +++ b/src/images/wavedrom/sfencewinval.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'PRIV'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', '0'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', '0'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 'SFENCE.W.INVAL'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'PRIV']}, + {bits: 5, name: 'rs1', attr: ['5', '0']}, + {bits: 5, name: 'rs2', attr: ['5', '0']}, + {bits: 7, name: 'funct7', attr: ['7', 'SFENCE.W.INVAL']}, ]} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/sinvalvma.edn b/src/images/wavedrom/sinvalvma.edn index 89d0d40..d29d14c 100644 --- a/src/images/wavedrom/sinvalvma.edn +++ b/src/images/wavedrom/sinvalvma.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'PRIV'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'vaddr'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'asid'], type: 4}, - {bits: 7, name: 'funct7', attr: ['7', 'SINVAL.VMA'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'PRIV']}, + {bits: 5, name: 'rs1', attr: ['5', 'vaddr']}, + {bits: 5, name: 'rs2', attr: ['5', 'asid']}, + {bits: 7, name: 'funct7', attr: ['7', 'SINVAL.VMA']}, ]} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/sp-load-store-2.adoc b/src/images/wavedrom/sp-load-store-2.adoc deleted file mode 100644 index f1025e9..0000000 --- a/src/images/wavedrom/sp-load-store-2.adoc +++ /dev/null @@ -1,24 +0,0 @@ -//## 12.5 Single-Precision Load and Store Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'LOAD-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest'], type: 2}, - {bits: 3, name: 'width', attr: ['3', 'W'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'base'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]'], type: 3}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'STORE-FP'], type: 8}, - {bits: 5, name: 'imm[4:0]', attr: ['5', 'offset[4:0]'], type: 3}, - {bits: 3, name: 'width', attr: ['3', 'W'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'base'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'src'], type: 4}, - {bits: 7, name: 'imm[11:5]', attr: ['7', 'offset[11:5]'], type: 3}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/sp-load-store-2.edn b/src/images/wavedrom/sp-load-store-2.edn new file mode 100644 index 0000000..fffc263 --- /dev/null +++ b/src/images/wavedrom/sp-load-store-2.edn @@ -0,0 +1,24 @@ +//## 12.5 Single-Precision Load and Store Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'LOAD-FP']}, + {bits: 5, name: 'rd', attr: ['5', 'dest']}, + {bits: 3, name: 'width', attr: ['3', 'W']}, + {bits: 5, name: 'rs1', attr: ['5', 'base']}, + {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]']}, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'STORE-FP']}, + {bits: 5, name: 'imm[4:0]', attr: ['5', 'offset[4:0]']}, + {bits: 3, name: 'width', attr: ['3', 'W']}, + {bits: 5, name: 'rs1', attr: ['5', 'base']}, + {bits: 5, name: 'rs2', attr: ['5', 'src']}, + {bits: 7, name: 'imm[11:5]', attr: ['7', 'offset[11:5]']}, +]} +.... diff --git a/src/images/wavedrom/sp-load-store.adoc b/src/images/wavedrom/sp-load-store.adoc deleted file mode 100644 index 192626b..0000000 --- a/src/images/wavedrom/sp-load-store.adoc +++ /dev/null @@ -1,25 +0,0 @@ -//## 12.5 Single-Precision Load and Store Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'LOAD-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest'], type: 2}, - {bits: 3, name: 'width', attr: ['3', 'H'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'base'], type: 4}, - {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]'], type: 3}, -]} -.... - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'STORE-FP'], type: 8}, - {bits: 5, name: 'imm[4:0]', attr: ['5', 'offset[4:0]'], type: 3}, - {bits: 3, name: 'width', attr: ['3', 'H'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', 'base'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5', 'src'], type: 4}, - {bits: 7, name: 'imm[11:5]', attr: ['7', 'offset[11:5]'], type: 3}, -]} -.... - diff --git a/src/images/wavedrom/sp-load-store.edn b/src/images/wavedrom/sp-load-store.edn new file mode 100644 index 0000000..e12818a --- /dev/null +++ b/src/images/wavedrom/sp-load-store.edn @@ -0,0 +1,24 @@ +//## 12.5 Single-Precision Load and Store Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'LOAD-FP']}, + {bits: 5, name: 'rd', attr: ['5', 'dest']}, + {bits: 3, name: 'width', attr: ['3', 'H']}, + {bits: 5, name: 'rs1', attr: ['5', 'base']}, + {bits: 12, name: 'imm[11:0]', attr: ['12', 'offset[11:0]']}, +]} +.... + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'STORE-FP']}, + {bits: 5, name: 'imm[4:0]', attr: ['5', 'offset[4:0]']}, + {bits: 3, name: 'width', attr: ['3', 'H']}, + {bits: 5, name: 'rs1', attr: ['5', 'base']}, + {bits: 5, name: 'rs2', attr: ['5', 'src']}, + {bits: 7, name: 'imm[11:5]', attr: ['7', 'offset[11:5]']}, +]} +.... diff --git a/src/images/wavedrom/spfloat-classify.adoc b/src/images/wavedrom/spfloat-classify.adoc deleted file mode 100644 index 236880d..0000000 --- a/src/images/wavedrom/spfloat-classify.adoc +++ /dev/null @@ -1,14 +0,0 @@ -//## 12.9 Single-Precision Floating-Point Classify Instruction - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','001'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','0'], type: 8}, - {bits: 2, name: 'fmt', attr: ['2','S'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCLASS'], type: 8}, -]} -.... diff --git a/src/images/wavedrom/spfloat-classify.edn b/src/images/wavedrom/spfloat-classify.edn new file mode 100644 index 0000000..52ec8bc --- /dev/null +++ b/src/images/wavedrom/spfloat-classify.edn @@ -0,0 +1,14 @@ +//## 12.9 Single-Precision Floating-Point Classify Instruction + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','001']}, + {bits: 5, name: 'rs1', attr: ['5','src']}, + {bits: 5, name: 'rs2', attr: ['5','0']}, + {bits: 2, name: 'fmt', attr: ['2','S']}, + {bits: 5, name: 'funct5', attr: ['5','FCLASS']}, +]} +.... diff --git a/src/images/wavedrom/spfloat-cn-cmp.adoc b/src/images/wavedrom/spfloat-cn-cmp.adoc deleted file mode 100644 index e46a099..0000000 --- a/src/images/wavedrom/spfloat-cn-cmp.adoc +++ /dev/null @@ -1,16 +0,0 @@ -//sp float convert and compare - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP', 'OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest', 'dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src', 'src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','W[U]/L[U]D', 'W[U]/L[U]'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','S','S'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCVT.int.fmt', 'FCVT.fmt.int'], type: 8}, -]} -.... - - diff --git a/src/images/wavedrom/spfloat-cn-cmp.edn b/src/images/wavedrom/spfloat-cn-cmp.edn new file mode 100644 index 0000000..fa6af7d --- /dev/null +++ b/src/images/wavedrom/spfloat-cn-cmp.edn @@ -0,0 +1,14 @@ +//sp float convert and compare + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP', 'OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest', 'dest']}, + {bits: 3, name: 'rm', attr: ['3','RM','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src', 'src']}, + {bits: 5, name: 'rs2', attr: ['5','W[U]/L[U]D', 'W[U]/L[U]']}, + {bits: 2, name: 'fmt', attr: ['2','S','S']}, + {bits: 5, name: 'funct5', attr: ['5','FCVT.int.fmt', 'FCVT.fmt.int']}, +]} +.... diff --git a/src/images/wavedrom/spfloat-comp.adoc b/src/images/wavedrom/spfloat-comp.adoc deleted file mode 100644 index 7059e8e..0000000 --- a/src/images/wavedrom/spfloat-comp.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//## 12.8 Single-Precision Floating-Point Compare Instructions - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','EQ', 'LT', 'LE'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','S'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FCMP'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/spfloat-comp.edn b/src/images/wavedrom/spfloat-comp.edn new file mode 100644 index 0000000..b1d200d --- /dev/null +++ b/src/images/wavedrom/spfloat-comp.edn @@ -0,0 +1,14 @@ +//## 12.8 Single-Precision Floating-Point Compare Instructions + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','EQ', 'LT', 'LE']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','S']}, + {bits: 5, name: 'funct5', attr: ['5','FCMP']}, +]} +.... diff --git a/src/images/wavedrom/spfloat-mv.adoc b/src/images/wavedrom/spfloat-mv.adoc deleted file mode 100644 index d5df81d..0000000 --- a/src/images/wavedrom/spfloat-mv.adoc +++ /dev/null @@ -1,15 +0,0 @@ -//SP flating point move - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','000', '000'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src','src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','0','0'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','S','S'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FMV.X.W','FMV.W.X'], type: 8}, -]} -.... - diff --git a/src/images/wavedrom/spfloat-mv.edn b/src/images/wavedrom/spfloat-mv.edn new file mode 100644 index 0000000..e8e441a --- /dev/null +++ b/src/images/wavedrom/spfloat-mv.edn @@ -0,0 +1,14 @@ +//SP flating point move + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','000', '000']}, + {bits: 5, name: 'rs1', attr: ['5','src','src']}, + {bits: 5, name: 'rs2', attr: ['5','0','0']}, + {bits: 2, name: 'fmt', attr: ['2','S','S']}, + {bits: 5, name: 'funct5', attr: ['5','FMV.X.W','FMV.W.X']}, +]} +.... diff --git a/src/images/wavedrom/spfloat-sign-inj.adoc b/src/images/wavedrom/spfloat-sign-inj.adoc deleted file mode 100644 index 74040b7..0000000 --- a/src/images/wavedrom/spfloat-sign-inj.adoc +++ /dev/null @@ -1,14 +0,0 @@ -//sp float sign injection - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', 'dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','J[N]/JX'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','S'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5','FSGNJ'], type: 8}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/spfloat-sign-inj.edn b/src/images/wavedrom/spfloat-sign-inj.edn new file mode 100644 index 0000000..1511e22 --- /dev/null +++ b/src/images/wavedrom/spfloat-sign-inj.edn @@ -0,0 +1,14 @@ +//sp float sign injection + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5', 'dest']}, + {bits: 3, name: 'rm', attr: ['3','J[N]/JX']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','S']}, + {bits: 5, name: 'funct5', attr: ['5','FSGNJ']}, +]} +.... diff --git a/src/images/wavedrom/spfloat-zfh.adoc b/src/images/wavedrom/spfloat-zfh.edn index d53e6bd..123b2ed 100644 --- a/src/images/wavedrom/spfloat-zfh.adoc +++ b/src/images/wavedrom/spfloat-zfh.edn @@ -3,12 +3,12 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM', 'RM', 'MIN/MAX', 'RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1', 'src1', 'src1', 'src'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2', 'src2', 'src2', '0'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','H', 'H', 'H', 'H'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5', 'FADD/FSUB', 'FMUL/FDIV', 'FMIN-MAX', 'FSQRT'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM', 'RM', 'MIN/MAX', 'RM']}, + {bits: 5, name: 'rs1', attr: ['5','src1', 'src1', 'src1', 'src']}, + {bits: 5, name: 'rs2', attr: ['5','src2', 'src2', 'src2', '0']}, + {bits: 2, name: 'fmt', attr: ['2','H', 'H', 'H', 'H']}, + {bits: 5, name: 'funct5', attr: ['5', 'FADD/FSUB', 'FMUL/FDIV', 'FMIN-MAX', 'FSQRT']}, ]} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/spfloat.adoc b/src/images/wavedrom/spfloat.edn index 9384544..27b141f 100644 --- a/src/images/wavedrom/spfloat.adoc +++ b/src/images/wavedrom/spfloat.edn @@ -3,14 +3,12 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM', 'RM', 'RM','MIN/MAX'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1', 'src1', 'src', 'src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2', 'src2', '0', 'src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','S', 'S', 'S', 'S'], type: 8}, - {bits: 5, name: 'funct5', attr: ['5', 'FADD/FSUB', 'FMUL/FDIV', 'FSQRT','FMIN-MAX'], type: 8}, + {bits: 7, name: 'opcode', attr: ['7','OP-FP','OP-FP','OP-FP','OP-FP']}, + {bits: 5, name: 'rd', attr: ['5','dest','dest','dest','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM', 'RM', 'RM','MIN/MAX']}, + {bits: 5, name: 'rs1', attr: ['5','src1', 'src1', 'src', 'src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2', 'src2', '0', 'src2']}, + {bits: 2, name: 'fmt', attr: ['2','S', 'S', 'S', 'S']}, + {bits: 5, name: 'funct5', attr: ['5', 'FADD/FSUB', 'FMUL/FDIV', 'FSQRT','FMIN-MAX']}, ]} .... - - diff --git a/src/images/wavedrom/spfloat2-zfh.adoc b/src/images/wavedrom/spfloat2-zfh.adoc deleted file mode 100644 index 44789da..0000000 --- a/src/images/wavedrom/spfloat2-zfh.adoc +++ /dev/null @@ -1,12 +0,0 @@ -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','F[N]MADD/F[N]MSUB'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','H'], type: 8}, - {bits: 5, name: 'rs3', attr: ['5','src3'], type: 4}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/spfloat2-zfh.edn b/src/images/wavedrom/spfloat2-zfh.edn new file mode 100644 index 0000000..89fc6bd --- /dev/null +++ b/src/images/wavedrom/spfloat2-zfh.edn @@ -0,0 +1,12 @@ +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','F[N]MADD/F[N]MSUB']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','H']}, + {bits: 5, name: 'rs3', attr: ['5','src3']}, +]} +.... diff --git a/src/images/wavedrom/spfloat2.adoc b/src/images/wavedrom/spfloat2.adoc deleted file mode 100644 index 8c2b976..0000000 --- a/src/images/wavedrom/spfloat2.adoc +++ /dev/null @@ -1,12 +0,0 @@ -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7','F[N]MADD/F[N]MSUB'], type: 8}, - {bits: 5, name: 'rd', attr: ['5','dest'], type: 2}, - {bits: 3, name: 'rm', attr: ['3','RM'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5','src1'], type: 4}, - {bits: 5, name: 'rs2', attr: ['5','src2'], type: 4}, - {bits: 2, name: 'fmt', attr: ['2','S'], type: 8}, - {bits: 5, name: 'rs3', attr: ['5','src3'], type: 4}, -]} -....
\ No newline at end of file diff --git a/src/images/wavedrom/spfloat2.edn b/src/images/wavedrom/spfloat2.edn new file mode 100644 index 0000000..13d152f --- /dev/null +++ b/src/images/wavedrom/spfloat2.edn @@ -0,0 +1,12 @@ +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','F[N]MADD/F[N]MSUB']}, + {bits: 5, name: 'rd', attr: ['5','dest']}, + {bits: 3, name: 'rm', attr: ['3','RM']}, + {bits: 5, name: 'rs1', attr: ['5','src1']}, + {bits: 5, name: 'rs2', attr: ['5','src2']}, + {bits: 2, name: 'fmt', attr: ['2','S']}, + {bits: 5, name: 'rs3', attr: ['5','src3']}, +]} +.... diff --git a/src/images/wavedrom/sploat2.adoc b/src/images/wavedrom/sploat2.adoc deleted file mode 100644 index e69de29..0000000 --- a/src/images/wavedrom/sploat2.adoc +++ /dev/null diff --git a/src/images/wavedrom/transformedatomicinst.edn b/src/images/wavedrom/transformedatomicinst.edn index d598bc3..4d9af20 100644 --- a/src/images/wavedrom/transformedatomicinst.edn +++ b/src/images/wavedrom/transformedatomicinst.edn @@ -1,13 +1,13 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', type: 8, attr: ['7']}, - {bits: 5, name: 'rd', type: 2, attr: ['5']}, - {bits: 3, name: 'funct3', type: 8, attr: ['3']}, - {bits: 5, name: 'Addr. Offset', type: 4, attr: ['5']}, - {bits: 5, name: 'rs2',type: 4, attr: ['5']}, - {bits: 1, name: 'rl',type: 4, attr: ['1']}, - {bits: 1, name: 'aq',type: 4, attr: ['1']}, - {bits: 5, name: 'funct5', type: 8, attr: ['5']}, + {bits: 7, name: 'opcode', attr: ['7']}, + {bits: 5, name: 'rd', attr: ['5']}, + {bits: 3, name: 'funct3', attr: ['3']}, + {bits: 5, name: 'Addr. Offset', attr: ['5']}, + {bits: 5, name: 'rs2', attr: ['5']}, + {bits: 1, name: 'rl', attr: ['1']}, + {bits: 1, name: 'aq', attr: ['1']}, + {bits: 5, name: 'funct5', attr: ['5']}, ], config: {bits: 32}} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/transformedloadinst.edn b/src/images/wavedrom/transformedloadinst.edn index 0d6e5ab..1db0f0f 100644 --- a/src/images/wavedrom/transformedloadinst.edn +++ b/src/images/wavedrom/transformedloadinst.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', type: 8, attr: ['7']}, - {bits: 5, name: 'rd', type: 2, attr: ['5']}, - {bits: 3, name: 'funct3', type: 8, attr: ['3']}, - {bits: 5, name: 'Addr. Offset', type: 4, attr: ['5']}, - {bits: 5, name: '0', type: 4, attr: ['5']}, - {bits: 7, name: '0', type: 8, attr: ['7']}, + {bits: 7, name: 'opcode', attr: ['7']}, + {bits: 5, name: 'rd', attr: ['5']}, + {bits: 3, name: 'funct3', attr: ['3']}, + {bits: 5, name: 'Addr. Offset', attr: ['5']}, + {bits: 5, name: '0', attr: ['5']}, + {bits: 7, name: '0', attr: ['7']}, ], config: {bits: 32}} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/transformedstoreinst.edn b/src/images/wavedrom/transformedstoreinst.edn index e807ad5..4a7c09d 100644 --- a/src/images/wavedrom/transformedstoreinst.edn +++ b/src/images/wavedrom/transformedstoreinst.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', type: 8, attr: ['7']}, - {bits: 5, name: '0', type: 2, attr: ['5']}, - {bits: 3, name: 'funct3', type: 8, attr: ['3']}, - {bits: 5, name: 'Addr. Offset', type: 4, attr: ['5']}, - {bits: 5, name: 'rs2', type: 4, attr: ['5']}, - {bits: 7, name: '0', type: 8, attr: ['7']}, + {bits: 7, name: 'opcode', attr: ['7']}, + {bits: 5, name: '0', attr: ['5']}, + {bits: 3, name: 'funct3', attr: ['3']}, + {bits: 5, name: 'Addr. Offset', attr: ['5']}, + {bits: 5, name: 'rs2', attr: ['5']}, + {bits: 7, name: '0', attr: ['7']}, ], config: {bits: 32}} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/transformedvmaccessinst.edn b/src/images/wavedrom/transformedvmaccessinst.edn index 9c7e9e3..0eb2739 100644 --- a/src/images/wavedrom/transformedvmaccessinst.edn +++ b/src/images/wavedrom/transformedvmaccessinst.edn @@ -1,11 +1,11 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', type: 8, attr: ['7']}, - {bits: 5, name: 'rd', type: 2, attr: ['5']}, - {bits: 3, name: 'funct3', type: 8, attr: ['3']}, - {bits: 5, name: 'Addr. Offset', type: 4, attr: ['5']}, - {bits: 5, name: 'rs2', type: 4, attr: ['5']}, - {bits: 7, name: 'funct7', type: 8, attr: ['7']}, + {bits: 7, name: 'opcode', attr: ['7']}, + {bits: 5, name: 'rd', attr: ['5']}, + {bits: 3, name: 'funct3', attr: ['3']}, + {bits: 5, name: 'Addr. Offset', attr: ['5']}, + {bits: 5, name: 'rs2', attr: ['5']}, + {bits: 7, name: 'funct7', attr: ['7']}, ], config: {bits: 32}} -....
\ No newline at end of file +.... diff --git a/src/images/wavedrom/trap-return.adoc b/src/images/wavedrom/trap-return.adoc deleted file mode 100644 index 1e15e2b..0000000 --- a/src/images/wavedrom/trap-return.adoc +++ /dev/null @@ -1,13 +0,0 @@ -// - -[wavedrom, ,svg] - -.... -{reg: [ - {bits: 7, name: 'opcode', type: 8, attr: ['7','SYSTEM'],}, - {bits: 5, name: 'rd', type: 2, attr: ['5','0'],}, - {bits: 3, name: 'funct3', type: 8, attr: ['3','PRIV'],}, - {bits: 5, name: 'rs1', type: 4, attr: ['5','0'],}, - {bits: 12, name: 'funct12', type: 8, attr: ['12','MRET/SRET',]}, -], config: {bits: 32}} -....
\ No newline at end of file diff --git a/src/images/wavedrom/trap-return.edn b/src/images/wavedrom/trap-return.edn new file mode 100644 index 0000000..b0f356a --- /dev/null +++ b/src/images/wavedrom/trap-return.edn @@ -0,0 +1,13 @@ +// + +[wavedrom, ,svg] + +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','SYSTEM'],}, + {bits: 5, name: 'rd', attr: ['5','0'],}, + {bits: 3, name: 'funct3', attr: ['3','PRIV'],}, + {bits: 5, name: 'rs1', attr: ['5','0'],}, + {bits: 12, name: 'funct12', attr: ['12','MRET/SRET',]}, +], config: {bits: 32}} +.... diff --git a/src/images/wavedrom/u-immediate.edn b/src/images/wavedrom/u-immediate.edn new file mode 100644 index 0000000..08f1813 --- /dev/null +++ b/src/images/wavedrom/u-immediate.edn @@ -0,0 +1,11 @@ +//#### U-immediate + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 12, name: '0'}, + {bits: 8, name: 'inst[19:12]'}, + {bits: 11, name: 'inst[30:20]'}, + {bits: 1, name: '[31]'}, +], config:{fontsize: 12, label:{right: 'U-immediate'}}} +.... diff --git a/src/images/wavedrom/v-inst-table.adoc b/src/images/wavedrom/v-inst-table.edn index 0c02220..144510d 100644 --- a/src/images/wavedrom/v-inst-table.adoc +++ b/src/images/wavedrom/v-inst-table.edn @@ -29,7 +29,7 @@ | 001100 |V|X|I| vrgather | 001100 | | | | 001100 | | | | 001101 | | | | | 001101 | | | | 001101 | | | | 001110 | |X|I| vslideup | 001110 | |X| vslide1up | 001110 | |F| vfslide1up -| 001110 |V| | |vrgatherei16| | | | | | | | +| 001110 |V| | |vrgatherei16| | | | | | | | | 001111 | |X|I| vslidedown | 001111 | |X| vslide1down | 001111 | |F| vfslide1down |=== @@ -92,7 +92,7 @@ | 110110 | | | | | 110110 |V|X| vwsubu.w | 110110 |V|F| vfwsub.w | 110111 | | | | | 110111 |V|X| vwsub.w | 110111 | | | | 111000 | | | | | 111000 |V|X| vwmulu | 111000 |V|F| vfwmul -| 111001 | | | | | 111001 | | | | 111001 | | | +| 111001 | | | | | 111001 | | | | 111001 | | | | 111010 | | | | | 111010 |V|X| vwmulsu | 111010 | | | | 111011 | | | | | 111011 |V|X| vwmul | 111011 | | | | 111100 | | | | | 111100 |V|X| vwmaccu | 111100 |V|F| vfwmacc @@ -206,5 +206,3 @@ | 10000 | viota | 10001 | vid |=== - - diff --git a/src/images/wavedrom/valu-format.adoc b/src/images/wavedrom/valu-format.edn index cdd3447..95732e7 100644 --- a/src/images/wavedrom/valu-format.adoc +++ b/src/images/wavedrom/valu-format.edn @@ -16,10 +16,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'OPIVV'}, - {bits: 5, name: 'vd', type: 2}, + {bits: 5, name: 'vd'}, {bits: 3, name: 0}, - {bits: 5, name: 'vs1', type: 2}, - {bits: 5, name: 'vs2', type: 2}, + {bits: 5, name: 'vs1'}, + {bits: 5, name: 'vs2'}, {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} @@ -29,10 +29,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'OPFVV'}, - {bits: 5, name: 'vd / rd', type: 7}, + {bits: 5, name: 'vd / rd'}, {bits: 3, name: 1}, - {bits: 5, name: 'vs1', type: 2}, - {bits: 5, name: 'vs2', type: 2}, + {bits: 5, name: 'vs1'}, + {bits: 5, name: 'vs2'}, {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} @@ -42,10 +42,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'OPMVV'}, - {bits: 5, name: 'vd / rd', type: 7}, + {bits: 5, name: 'vd / rd'}, {bits: 3, name: 2}, - {bits: 5, name: 'vs1', type: 2}, - {bits: 5, name: 'vs2', type: 2}, + {bits: 5, name: 'vs1'}, + {bits: 5, name: 'vs2'}, {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} @@ -55,10 +55,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: ['OPIVI']}, - {bits: 5, name: 'vd', type: 2}, + {bits: 5, name: 'vd'}, {bits: 3, name: 3}, - {bits: 5, name: 'imm[4:0]', type: 5}, - {bits: 5, name: 'vs2', type: 2}, + {bits: 5, name: 'imm[4:0]'}, + {bits: 5, name: 'vs2'}, {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} @@ -68,10 +68,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'OPIVX'}, - {bits: 5, name: 'vd', type: 2}, + {bits: 5, name: 'vd'}, {bits: 3, name: 4}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'vs2', type: 2}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'vs2'}, {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} @@ -81,10 +81,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'OPFVF'}, - {bits: 5, name: 'vd', type: 2}, + {bits: 5, name: 'vd'}, {bits: 3, name: 5}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'vs2', type: 2}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'vs2'}, {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} @@ -94,10 +94,10 @@ Formats for Vector Arithmetic Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'OPMVX'}, - {bits: 5, name: 'vd / rd', type: 7}, + {bits: 5, name: 'vd / rd'}, {bits: 3, name: 6}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'vs2', type: 2}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'vs2'}, {bits: 1, name: 'vm'}, {bits: 6, name: 'funct6'}, ]} diff --git a/src/images/wavedrom/vcfg-format.adoc b/src/images/wavedrom/vcfg-format.edn index ac0353c..0219e6b 100644 --- a/src/images/wavedrom/vcfg-format.adoc +++ b/src/images/wavedrom/vcfg-format.edn @@ -12,10 +12,10 @@ Formats for Vector Configuration Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'vsetvli'}, - {bits: 5, name: 'rd', type: 4}, + {bits: 5, name: 'rd'}, {bits: 3, name: 7}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 11, name: 'vtypei[10:0]', type: 5}, + {bits: 5, name: 'rs1'}, + {bits: 11, name: 'vtypei[10:0]'}, {bits: 1, name: '0'}, ]} .... @@ -24,10 +24,10 @@ Formats for Vector Configuration Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'vsetivli'}, - {bits: 5, name: 'rd', type: 4}, + {bits: 5, name: 'rd'}, {bits: 3, name: 7}, - {bits: 5, name: 'uimm[4:0]', type: 5}, - {bits: 10, name: 'vtypei[9:0]', type: 5}, + {bits: 5, name: 'uimm[4:0]'}, + {bits: 10, name: 'vtypei[9:0]'}, {bits: 1, name: '1'}, {bits: 1, name: '1'}, ]} @@ -37,10 +37,10 @@ Formats for Vector Configuration Instructions under OP-V major opcode .... {reg: [ {bits: 7, name: 0x57, attr: 'vsetvl'}, - {bits: 5, name: 'rd', type: 4}, + {bits: 5, name: 'rd'}, {bits: 3, name: 7}, - {bits: 5, name: 'rs1', type: 4}, - {bits: 5, name: 'rs2', type: 4}, + {bits: 5, name: 'rs1'}, + {bits: 5, name: 'rs2'}, {bits: 6, name: 0x00}, {bits: 1, name: 1}, ]} diff --git a/src/images/wavedrom/vfrec7.adoc b/src/images/wavedrom/vfrec7.edn index d33f44e..d33f44e 100644 --- a/src/images/wavedrom/vfrec7.adoc +++ b/src/images/wavedrom/vfrec7.edn diff --git a/src/images/wavedrom/vfrsqrt7.adoc b/src/images/wavedrom/vfrsqrt7.edn index 8ebc621..ffb7a96 100644 --- a/src/images/wavedrom/vfrsqrt7.adoc +++ b/src/images/wavedrom/vfrsqrt7.edn @@ -38,7 +38,7 @@ | 0| 31 | 20 | 0| 32 | 19 | 0| 33 | 19 -| 0| 34 | 18 +| 0| 34 | 18 | 0| 35 | 17 | 0| 36 | 16 | 0| 37 | 16 @@ -134,4 +134,4 @@ | 1| 62 | 54 | 1| 63 | 53 -|===
\ No newline at end of file +|=== diff --git a/src/images/wavedrom/vmem-format.adoc b/src/images/wavedrom/vmem-format.edn index f9b25ee..58cc6bf 100644 --- a/src/images/wavedrom/vmem-format.adoc +++ b/src/images/wavedrom/vmem-format.edn @@ -12,9 +12,9 @@ Format for Vector Load Instructions under LOAD-FP major opcode .... {reg: [ {bits: 7, name: 0x7, attr: 'VL* unit-stride'}, - {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 5, name: 'vd', attr: 'destination of load'}, {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'rs1', attr: 'base address'}, {bits: 5, name: 'lumop'}, {bits: 1, name: 'vm'}, {bits: 2, name: 'mop'}, @@ -27,10 +27,10 @@ Format for Vector Load Instructions under LOAD-FP major opcode .... {reg: [ {bits: 7, name: 0x7, attr: 'VLS* strided'}, - {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 5, name: 'vd', attr: 'destination of load'}, {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'rs2', attr: 'stride', type: 4}, + {bits: 5, name: 'rs1', attr: 'base address'}, + {bits: 5, name: 'rs2', attr: 'stride'}, {bits: 1, name: 'vm'}, {bits: 2, name: 'mop'}, {bits: 1, name: 'mew'}, @@ -42,10 +42,10 @@ Format for Vector Load Instructions under LOAD-FP major opcode .... {reg: [ {bits: 7, name: 0x7, attr: 'VLX* indexed'}, - {bits: 5, name: 'vd', attr: 'destination of load', type: 2}, + {bits: 5, name: 'vd', attr: 'destination of load'}, {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'vs2', attr: 'address offsets', type: 2}, + {bits: 5, name: 'rs1', attr: 'base address'}, + {bits: 5, name: 'vs2', attr: 'address offsets'}, {bits: 1, name: 'vm'}, {bits: 2, name: 'mop'}, {bits: 1, name: 'mew'}, @@ -66,9 +66,9 @@ Format for Vector Store Instructions under STORE-FP major opcode .... {reg: [ {bits: 7, name: 0x27, attr: 'VS* unit-stride'}, - {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 5, name: 'vs3', attr: 'store data'}, {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, + {bits: 5, name: 'rs1', attr: 'base address'}, {bits: 5, name: 'sumop'}, {bits: 1, name: 'vm'}, {bits: 2, name: 'mop'}, @@ -81,10 +81,10 @@ Format for Vector Store Instructions under STORE-FP major opcode .... {reg: [ {bits: 7, name: 0x27, attr: 'VSS* strided'}, - {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 5, name: 'vs3', attr: 'store data'}, {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'rs2', attr: 'stride', type: 4}, + {bits: 5, name: 'rs1', attr: 'base address'}, + {bits: 5, name: 'rs2', attr: 'stride'}, {bits: 1, name: 'vm'}, {bits: 2, name: 'mop'}, {bits: 1, name: 'mew'}, @@ -96,10 +96,10 @@ Format for Vector Store Instructions under STORE-FP major opcode .... {reg: [ {bits: 7, name: 0x27, attr: 'VSX* indexed'}, - {bits: 5, name: 'vs3', attr: 'store data', type: 2}, + {bits: 5, name: 'vs3', attr: 'store data'}, {bits: 3, name: 'width'}, - {bits: 5, name: 'rs1', attr: 'base address', type: 4}, - {bits: 5, name: 'vs2', attr: 'address offsets', type: 2}, + {bits: 5, name: 'rs1', attr: 'base address'}, + {bits: 5, name: 'vs2', attr: 'address offsets'}, {bits: 1, name: 'vm'}, {bits: 2, name: 'mop'}, {bits: 1, name: 'mew'}, diff --git a/src/images/wavedrom/vtype-format.adoc b/src/images/wavedrom/vtype-format.edn index 9e6ab34..9e6ab34 100644 --- a/src/images/wavedrom/vtype-format.adoc +++ b/src/images/wavedrom/vtype-format.edn diff --git a/src/images/wavedrom/wfi.adoc b/src/images/wavedrom/wfi.adoc deleted file mode 100644 index 4447b9f..0000000 --- a/src/images/wavedrom/wfi.adoc +++ /dev/null @@ -1,13 +0,0 @@ -// - -[wavedrom, ,svg] - -.... -{reg: [ - {bits: 7, name: 'opcode', type: 8, attr: ['7','SYSTEM'],}, - {bits: 5, name: 'rd', type: 2, attr: ['5','0'],}, - {bits: 3, name: 'funct3', type: 8, attr: ['3','PRIV'],}, - {bits: 5, name: 'rs1', type: 4, attr: ['5','0'],}, - {bits: 12, name: 'funct12', type: 8, attr: ['12','WFI',]}, -], config: {bits: 32}} -....
\ No newline at end of file diff --git a/src/images/wavedrom/wfi.edn b/src/images/wavedrom/wfi.edn new file mode 100644 index 0000000..01e3c74 --- /dev/null +++ b/src/images/wavedrom/wfi.edn @@ -0,0 +1,13 @@ +// + +[wavedrom, ,svg] + +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7','SYSTEM'],}, + {bits: 5, name: 'rd', attr: ['5','0'],}, + {bits: 3, name: 'funct3', attr: ['3','PRIV'],}, + {bits: 5, name: 'rs1', attr: ['5','0'],}, + {bits: 12, name: 'funct12', attr: ['12','WFI',]}, +], config: {bits: 32}} +.... diff --git a/src/images/wavedrom/zifencei-fetch.adoc b/src/images/wavedrom/zifencei-fetch.adoc deleted file mode 100644 index 42e0d6f..0000000 --- a/src/images/wavedrom/zifencei-fetch.adoc +++ /dev/null @@ -1,12 +0,0 @@ -//# 3 "Zifencei" Instruction-Fetch Fence, Version 2.0 - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: 'MISC-MEM', type: 8}, - {bits: 5, name: 'rd', attr: 0}, - {bits: 3, name: 'funct3', attr: 'FENCE.I', type: 8}, - {bits: 5, name: 'rs1', attr: 0}, - {bits: 12, name: 'func12', attr: 0}, -]} -.... diff --git a/src/images/wavedrom/zifencei-ff.adoc b/src/images/wavedrom/zifencei-ff.adoc deleted file mode 100644 index 5ccfae0..0000000 --- a/src/images/wavedrom/zifencei-ff.adoc +++ /dev/null @@ -1,12 +0,0 @@ -//# 3 "Zifencei" Instruction-Fetch Fence, Version 2.0 - -[wavedrom, ,svg] -.... -{reg: [ - {bits: 7, name: 'opcode', attr: ['7', 'MISC-MEM'], type: 8}, - {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, - {bits: 3, name: 'funct3', attr: ['3', 'FENCE.I'], type: 8}, - {bits: 5, name: 'rs1', attr: ['5', '0'], type: 4}, - {bits: 12, name: 'funct12', attr: ['12', '0'], type: 8}, -]} -.... diff --git a/src/images/wavedrom/zifencei-ff.edn b/src/images/wavedrom/zifencei-ff.edn new file mode 100644 index 0000000..24cf87b --- /dev/null +++ b/src/images/wavedrom/zifencei-ff.edn @@ -0,0 +1,12 @@ +//# 3 "Zifencei" Instruction-Fetch Fence, Version 2.0 + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'MISC-MEM']}, + {bits: 5, name: 'rd', attr: ['5', '0']}, + {bits: 3, name: 'funct3', attr: ['3', 'FENCE.I']}, + {bits: 5, name: 'rs1', attr: ['5', '0']}, + {bits: 12, name: 'funct12', attr: ['12', '0']}, +]} +.... diff --git a/src/images/wavedrom/zihintpause-hint.adoc b/src/images/wavedrom/zihintpause-hint.edn index 4c4a2ed..b1812ce 100644 --- a/src/images/wavedrom/zihintpause-hint.adoc +++ b/src/images/wavedrom/zihintpause-hint.edn @@ -3,9 +3,9 @@ [wavedrom, ,svg] .... {reg: [ - {bits: 7, name: 'opcode', attr: 'MISC-MEM', type: 8}, + {bits: 7, name: 'opcode', attr: 'MISC-MEM'}, {bits: 5, name: 'rd', attr: 0}, - {bits: 3, name: 'funct3', attr: 'PAUSE', type: 8}, + {bits: 3, name: 'funct3', attr: 'PAUSE'}, {bits: 5, name: 'rs1', attr: 0}, {bits: 1, name: 'SW', attr: 0}, {bits: 1, name: 'SR', attr: 0}, @@ -18,4 +18,3 @@ {bits: 4, name: 'fm', attr: 0}, ]} .... - diff --git a/src/indirect-csr.adoc b/src/indirect-csr.adoc index 09e040a..ad0a6b7 100644 --- a/src/indirect-csr.adoc +++ b/src/indirect-csr.adoc @@ -115,7 +115,7 @@ which the `miselect` value is allocated. [NOTE] ==== Ordinarily, each `mireg`*_i_* will access register state, access -read-only 0 state, or raise an illegal instruction exception. +read-only 0 state, or raise an illegal-instruction exception. For RV32, if an extension defines an indirectly accessed register as 64 bits wide, it is recommended that the lower 32 bits of the register are accessed through one of `mireg`, `mireg2`, or `mireg3`, while the upper 32 bits are accessed through `mireg4`, `mireg5`, or `mireg6`, respectively. ==== @@ -170,7 +170,7 @@ holds a value that is not implemented at supervisor level, is UNSPECIFIED. [%unbreakable] [NOTE] ==== -It is recommended that implementations raise an illegal instruction +It is recommended that implementations raise an illegal-instruction exception for such accesses, to facilitate possible emulation (by M-mode) of these accesses. ==== @@ -194,7 +194,7 @@ allocated. ==== Ordinarily, each `sireg`*_i_* will access register state, access read-only 0 state, or, unless executing in a virtual machine (covered in -the next section), raise an illegal instruction exception. +the next section), raise an illegal-instruction exception. ==== Note that the widths of `siselect` and `sireg*` are always the @@ -257,21 +257,21 @@ most-significant bit of `vsiselect` moves to the new position, retaining its value from before. For alias CSRs `sireg*` and `vsireg*`, the hypervisor extension’s usual -rules for when to raise a virtual instruction exception (based on +rules for when to raise a virtual-instruction exception (based on whether an instruction is HS-qualified) are not applicable. The rules given in this section for `sireg` and `vsireg` apply instead, unless overridden by the requirements specified in the section below, which take precedence over this section when extension Smstateen is also implemented. -A virtual instruction exception is raised for attempts from VS-mode or VU-mode to directly access `vsiselect` or `vsireg*`, or attempts from VU-mode to access `siselect` or `sireg*`. - -The behavior upon accessing `vsireg*` from M-mode or HS-mode, or accessing `sireg*` (really `vsireg*`) from VS-mode, while `vsiselect` holds a value that is not implemented at HS level, is UNSPECIFIED. +A virtual-instruction exception is raised for attempts from VS-mode or VU-mode to directly access `vsiselect` or `vsireg*`, or attempts from VU-mode to access `siselect` or `sireg*`. + +The behavior upon accessing `vsireg*` from M-mode or HS-mode, or accessing `sireg*` (really `vsireg*`) from VS-mode, while `vsiselect` holds a value that is not implemented at HS level, is UNSPECIFIED. [%unbreakable] [NOTE] ==== -It is recommended that implementations raise an illegal instruction exception for such accesses, to facilitate possible emulation (by M-mode) of these accesses. +It is recommended that implementations raise an illegal-instruction exception for such accesses, to facilitate possible emulation (by M-mode) of these accesses. ==== Otherwise, while `vsiselect` holds a number in a standard-defined and @@ -284,7 +284,7 @@ allocated. [%unbreakable] [NOTE] ==== -Ordinarily, each `vsireg`*_i_* will access register state, access read-only 0 state, or raise an exception (either an illegal instruction exception or, for select accesses from VS-mode, a virtual instruction exception). When `vsiselect` holds a value that is implemented at HS level but not at VS level, attempts to access `sireg*` (really `vsireg*`) from VS-mode will typically raise a virtual instruction exception. But there may be cases specific to an extension where different behavior is more appropriate. +Ordinarily, each `vsireg`*_i_* will access register state, access read-only 0 state, or raise an exception (either an illegal-instruction exception or, for select accesses from VS-mode, a virtual-instruction exception). When `vsiselect` holds a value that is implemented at HS level but not at VS level, attempts to access `sireg*` (really `vsireg*`) from VS-mode will typically raise a virtual-instruction exception. But there may be cases specific to an extension where different behavior is more appropriate. ==== Like `siselect` and `sireg*`, the widths of `vsiselect` and `vsireg*` are always @@ -299,7 +299,7 @@ If extension Smstateen is implemented together with Smcsrind, bit 60 of state-enable register `mstateen0` controls access to `siselect`, `sireg*`, `vsiselect`, and `vsireg*`. When `mstateen0`[60]=0, an attempt to access one of these CSRs from a privilege mode less privileged than M-mode results -in an illegal instruction exception. As always, the state-enable CSRs do +in an illegal-instruction exception. As always, the state-enable CSRs do not affect the accessibility of any state when in M-mode, only in less privileged modes. For more explanation, see the documentation for extension @@ -308,7 +308,7 @@ https://github.com/riscv/riscv-state-enable/releases/download/v1.0.0/Smstateen.p Other extensions may specify that certain mstateen bits control access to registers accessed indirectly through `siselect` + `sireg*`, and/or `vsiselect` + `vsireg*`. However, regardless of any other mstateen bits, if -`mstateen0`[60] = 1, a virtual instruction exception is raised as +`mstateen0`[60] = 1, a virtual-instruction exception is raised as described in the previous section for all attempts from VS-mode or VU-mode to directly access `vsiselect` or `vsireg*`, and for all attempts from VU-mode to access `siselect` or `sireg*`. @@ -318,8 +318,8 @@ in hypervisor CSR `hstateen0`, but controls access to only `siselect` and `sireg (really `vsiselect` and `vsireg*`), which is the state potentially accessible to a virtual machine executing in VS or VU-mode. When `hstateen0`[60]=0 and `mstateen0`[60]=1, all attempts from VS or VU-mode to -access `siselect` or `sireg*` raise a virtual instruction exception, not an -illegal instruction exception, regardless of the value of `vsiselect` or +access `siselect` or `sireg*` raise a virtual-instruction exception, not an +illegal-instruction exception, regardless of the value of `vsiselect` or any other mstateen bit. Extension Ssstateen is defined as the supervisor-level view of diff --git a/src/intro.adoc b/src/intro.adoc index 6fc871b..8d254bf 100644 --- a/src/intro.adoc +++ b/src/intro.adoc @@ -33,7 +33,7 @@ efficiency. * An ISA that simplifies experiments with new privileged architecture designs. -[TIP] +[NOTE] ==== Commentary on our design decisions is formatted as in this paragraph. This non-normative text can be skipped if the reader is only interested @@ -64,7 +64,7 @@ volume provides the design of the first ("classic") privileged architecture. The manuals use IEC 80000-13:2008 conventions, with a byte of 8 bits. -[TIP] +[NOTE] ==== In the unprivileged ISA design, we tried to remove any dependence on particular microarchitectural features, such as cache line size, or on @@ -144,7 +144,7 @@ environments for guest operating systems. harts on an underlying x86 system, and which can provide either a user-level or a supervisor-level execution environment. -[TIP] +[NOTE] ==== A bare hardware platform can be considered to define an EEI, where the accessible harts, memory, and other devices populate the environment, @@ -172,11 +172,11 @@ responsibility ends if the hart is terminated. The following events constitute forward progress: * The retirement of an instruction. -* A trap, as defined in <<trap-defn, Section 1.6>>. +* A trap, as defined in <<trap-defn>>. * Any other event defined by an extension to constitute forward progress. -[TIP] +[NOTE] ==== The term hart was introduced in the work on Lithe cite:[lithe-pan-hotpar09] and cite:[lithe-pan-pldi10] to provide a term to represent an abstract execution resource as opposed to a software thread @@ -221,24 +221,22 @@ integer variants, RV32I and RV64I, described in <<rv32>> and <<rv64>>, which provide 32-bit or 64-bit address spaces respectively. We use the term XLEN to refer to the width of an integer register in bits (either 32 or 64). -<<rv32e, Chapter 6>> describes the RV32E and RV64E subset variants of the +<<rv32e>> describes the RV32E and RV64E subset variants of the RV32I or RV64I base instruction sets respectively, which have been added to support small microcontrollers, and which have half the number of integer registers. -<<rv128, Chapter 8>> sketches a future RV128I variant of the -base integer instruction set supporting a flat 128-bit address space -(XLEN=128). The base integer instruction sets use a two's-complement +The base integer instruction sets use a two's-complement representation for signed integer values. -[TIP] +[NOTE] ==== Although 64-bit address spaces are a requirement for larger systems, we believe 32-bit address spaces will remain adequate for many embedded and client devices for decades to come and will be desirable to lower memory traffic and energy consumption. In addition, 32-bit address spaces are sufficient for educational purposes. A larger flat 128-bit address space -might eventually be required, so we ensured this could be accommodated -within the RISC-V ISA framework. +might eventually be required and could be accommodated with a new RV128I +base ISA within the existing RISC-V ISA framework. ==== [NOTE] @@ -382,7 +380,7 @@ harts may be entirely the same, or entirely different, or may be partly different but sharing some subset of resources, mapped into the same or different address ranges. -[TIP] +[NOTE] ==== For a purely "bare metal" environment, all harts may see an identical address space, accessed entirely by physical addresses. However, when @@ -470,51 +468,11 @@ multiple of IALIGN. For implementations supporting only a base instruction set, ILEN is 32 bits. Implementations supporting longer instructions have larger values of ILEN. -<<instlengthcode>> illustrates the standard -RISC-V instruction-length encoding convention. All the 32-bit +All the 32-bit instructions in the base ISA have their lowest two bits set to `11`. The optional compressed 16-bit instruction-set extensions have their lowest two bits equal to `00`, `01`, or `10`. -==== Expanded Instruction-Length Encoding -A portion of the 32-bit instruction-encoding space has been tentatively -allocated for instructions longer than 32 bits. The entirety of this -space is reserved at this time, and the following proposal for encoding -instructions longer than 32 bits is not considered frozen. -(((instruction length encoding))) - -Standard instruction-set extensions encoded with more than 32 bits have -additional low-order bits set to `1`, with the conventions for 48-bit -and 64-bit lengths shown in -<<instlengthcode>>. Instruction lengths -between 80 bits and 176 bits are encoded using a 3-bit field in bits -[14:12] giving the number of 16-bit words in addition to the first -5latexmath:[$\times$]16-bit words. The encoding with bits [14:12] set to -"111" is reserved for future longer instruction encodings. - -[[instlengthcode]] -.RISC-V instruction length encoding. Only the 16-bit and 32-bit encodings are considered frozen at this time. -[%autowidth,cols="^2,^2,^3,^3,<4"] -|=== -||||xxxxxxxxxxxxxxaa |16-bit (aa≠11) - -|||xxxxxxxxxxxxxxxx |xxxxxxxxxxxbbb11 |32-bit (bbb≠111) - -||latexmath:[$\cdot\cdot\cdot$]xxxx |xxxxxxxxxxxxxxxx -|xxxxxxxxxx011111 |48-bit - -||latexmath:[$\cdot\cdot\cdot$]xxxx |xxxxxxxxxxxxxxxx -|xxxxxxxxx0111111 |64-bit - -||latexmath:[$\cdot\cdot\cdot$]xxxx |xxxxxxxxxxxxxxxx -|xnnnxxxxx1111111 |(80+16*nnn)-bit, nnn≠111 - -||latexmath:[$\cdot\cdot\cdot$]xxxx |xxxxxxxxxxxxxxxx -|x111xxxxx1111111 |Reserved for ≥192-bits - -|Byte Address: >|base+4 >|base+2 >|base | -|=== - [NOTE] ==== Given the code size and energy savings of a compressed format, we wanted @@ -541,9 +499,7 @@ As described in <<extending>>, an implementation that does not require support for the standard compressed instruction extension can map 3 additional non-conforming 30-bit instruction spaces into the 32-bit fixed-width format, while preserving support for standard ≥32-bit instruction-set -extensions. Further, if the implementation also does not need -instructions >32-bits in length, it can recover a further -four major opcodes for non-conforming extensions. +extensions. ==== Encodings with bits [15:0] all zeros are defined as illegal @@ -552,7 +508,7 @@ instructions. These instructions are considered to be of minimal length: bits. The encoding with bits [ILEN-1:0] all ones is also illegal; this instruction is considered to be ILEN bits long. -[TIP] +[NOTE] ==== We consider it a feature that any length of instruction containing all zero bits is not legal, as this quickly traps erroneous jumps into @@ -587,7 +543,7 @@ instruction specification. (((bi-endian))) (((endian, bi-))) -[TIP] +[NOTE] ==== We originally chose little-endian byte ordering for the RISC-V memory system because little-endian systems are currently dominant commercially diff --git a/src/j-st-ext.adoc b/src/j-st-ext.adoc deleted file mode 100644 index 68c1d0d..0000000 --- a/src/j-st-ext.adoc +++ /dev/null @@ -1,11 +0,0 @@ -[[j-extendj]] -== "J" Extension for Dynamically Translated Languages, Version 0.0 - -This chapter is a placeholder for a future standard extension to support -dynamically translated languages. -[NOTE] -==== -Many popular languages are usually implemented via dynamic translation, -including Java and Javascript. These languages can benefit from -additional ISA support for dynamic checks and garbage collection. -==== diff --git a/src/m-st-ext.adoc b/src/m-st-ext.adoc index fc08be2..1e63dac 100644 --- a/src/m-st-ext.adoc +++ b/src/m-st-ext.adoc @@ -1,11 +1,11 @@ [[mstandard]] -== "M" Extension for Integer Multiplication and Division, Version 2.0 +== `M` Extension for Integer Multiplication and Division, Version 2.0 This chapter describes the standard integer multiplication and division -instruction extension, which is named "M" and contains instructions +instruction extension, which is named `M` and contains instructions that multiply or divide values held in two integer registers. -[TIP] +[NOTE] ==== We separate integer multiply and divide out from the base to simplify low-end implementations, or for applications where integer multiply and @@ -15,7 +15,7 @@ accelerators. === Multiplication Operations -include::images/wavedrom/m-st-ext-for-int-mult.adoc[] +include::images/wavedrom/m-st-ext-for-int-mult.edn[] [[m-st-ext-for-int-mult]] //.Multiplication operation instructions (((MUL, MULH))) @@ -23,12 +23,12 @@ include::images/wavedrom/m-st-ext-for-int-mult.adoc[] (((MUL, MULHSU))) MUL performs an XLEN-bit×XLEN-bit multiplication of -_rs1_ by _rs2_ and places the lower XLEN bits in the destination +`rs1` by `rs2` and places the lower XLEN bits in the destination register. MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2×XLEN-bit product, for signed×signed, -unsigned×unsigned, and _rs1_×unsigned _rs2_ multiplication, respectively. -If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] _rdh, rs1, rs2_; MUL _rdl, rs1, rs2_ (source register specifiers must be in same order and _rdh_ cannot be the same as _rs1_ or _rs2_). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. +unsigned×unsigned, and `rs1`×unsigned `rs2` multiplication. +If both the high and low bits of the same product are required, then the recommended code sequence is: `MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2` (source register specifiers must be in same order and `rdh` cannot be the same as `rs1` or `rs2`). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. [NOTE] ==== @@ -52,14 +52,14 @@ to shift both arguments left by 32 bits, then use MULH[[S]U]. === Division Operations -include::images/wavedrom/division-op.adoc[] +include::images/wavedrom/division-op.edn[] [[division-op]] //.Division operation instructions (((MUL, DIV))) (((MUL, DIVU))) DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned -integer division of _rs1_ by _rs2_, rounding towards zero. REM and REMU +integer division of `rs1` by `rs2`, rounding towards zero. REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of a nonzero result equals the sign of the dividend. @@ -71,17 +71,17 @@ latexmath:[$\textrm{dividend} = \textrm{divisor} \times \textrm{quotient} + \tex ==== If both the quotient and remainder are required from the same division, -the recommended code sequence is: DIV[U] _rdq, rs1, rs2_; REM[U] _rdr, -rs1, rs2_ (_rdq_ cannot be the same as _rs1_ or _rs2_). +the recommended code sequence is: `DIV[U] rdq, rs1, rs2; REM[U] rdr,` +`rs1, rs2` (`rdq` cannot be the same as `rs1` or `rs2`). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides. DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of -_rs1_ by the lower 32 bits of _rs2_, treating them as signed and -unsigned integers respectively, placing the 32-bit quotient in _rd_, +`rs1` by the lower 32 bits of `rs2`, treating them as signed and +unsigned integers, placing the 32-bit quotient in `rd`, sign-extended to 64 bits. REMW and REMUW are RV64 instructions that -provide the corresponding signed and unsigned remainder operations -respectively. Both REMW and REMUW always sign-extend the 32-bit result +provide the corresponding signed and unsigned remainder operations. Both +REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero. (((MUL, div by zero))) @@ -103,7 +103,7 @@ overflow cannot occur. Overflow (signed only) |latexmath:[$x$] + latexmath:[$-2^{L-1}$] |0 + latexmath:[$-1$] |latexmath:[$2^{L}-1$] + - - |latexmath:[$x$] + + - |latexmath:[$x$] + - |latexmath:[$-1$] + latexmath:[$-2^{L-1}$] + |latexmath:[$x$] + @@ -113,13 +113,13 @@ latexmath:[$-1$] |latexmath:[$2^{L}-1$] + //|Overflow (signed only) |latexmath:[$-2^{L-1}$] |latexmath:[$-1$] |– |– |latexmath:[$-2^{L-1}$] |0 //|=== -[TIP] +[NOTE] ==== We considered raising exceptions on integer divide by zero, with these exceptions causing a trap in most execution environments. However, this would be the only arithmetic trap in the standard ISA (floating-point exceptions set flags and write default values, but do not cause traps) -and would require language implementors to interact with the execution +and would require language implementers to interact with the execution environment's trap handlers for this case. Further, where language standards mandate that a divide-by-zero exception must cause an immediate control flow change, only a single branch instruction needs to @@ -136,18 +136,18 @@ unsigned division circuit and specifying the same overflow result simplifies the hardware. ==== -=== Zmmul Extension, Version 1.0 +=== `Zmmul` Extension, Version 1.0 -The Zmmul extension implements the multiplication subset of the M +The `Zmmul` extension implements the multiplication subset of the M extension. It adds all of the instructions defined in <<Multiplication Operations>>, namely: MUL, MULH, MULHU, MULHSU, and (for RV64 only) MULW. The encodings are identical to those -of the corresponding M-extension instructions. M implies Zmmul. +of the corresponding M-extension instructions. `M` implies `Zmmul`. (((MUL, Zmmul))) [NOTE] ==== -The *Zmmul* extension enables low-cost implementations that require +The `Zmmul` extension enables low-cost implementations that require multiplication operations but not division. For many microcontroller applications, division operations are too infrequent to justify the cost of divider hardware. By contrast, multiplication operations are more @@ -156,4 +156,3 @@ Simple FPGA soft cores particularly benefit from eliminating division but retaining multiplication, since many FPGAs provide hardwired multipliers but require dividers be implemented in soft logic. ==== - diff --git a/src/machine.adoc b/src/machine.adoc index 8358279..79bdb82 100644 --- a/src/machine.adoc +++ b/src/machine.adoc @@ -16,7 +16,7 @@ In addition to the machine-level CSRs described in this section, M-mode code can access all CSRs at lower privilege levels. [[misa]] -==== Machine ISA (`misa`) Register +==== Machine ISA (`misa`) Register The `misa` CSR is a *WARL* read-write register reporting the ISA supported by the hart. This register must be readable in any implementation, but a value of zero can be returned to indicate the `misa` register has not been implemented, requiring that CPU capabilities be determined through a separate non-standard mechanism. @@ -39,7 +39,7 @@ less-privileged modes. 3 |32 + 64 + -128 +_Reserved_ |=== The `misa` CSR is MXLEN bits wide. @@ -53,16 +53,15 @@ knowing the register width (MXLEN) of the hart. The base width is given by __MXLEN=2^MXL+4^__. The base width can also be found if `misa` is zero, by placing the -immediate 4 in a register, then shifting the register left by 31 bits at -a time. If zero after one shift, then the hart is RV32. If zero after -two shifts, then the hart is RV64, else RV128. +immediate 2 in a register, then shifting the register left by 31 bits. +If zero, the hart is RV32, else it is RV64. ==== The Extensions field encodes the presence of the standard extensions, with a single bit per letter of the alphabet (bit 0 encodes presence of extension "A" , bit 1 encodes presence of extension "B", through to -bit 25 which encodes "Z"). The "I" bit will be set for RV32I, RV64I, -and RV128I base ISAs, and the "E" bit will be set for RV32E and RV64E. The +bit 25 which encodes "Z"). The "I" bit will be set for the RV32I and RV64I +base ISAs, and the "E" bit will be set for RV32E and RV64E. The Extensions field is a *WARL* field that can contain writable bits where the implementation allows the supported ISA to be modified. At reset, the Extensions field shall contain the maximal set of supported extensions, @@ -143,7 +142,7 @@ K + L + M + N + -O + +O + P + Q + R + @@ -163,7 +162,7 @@ RV32E/64E base ISA + Single-precision floating-point extension + _Reserved_ + Hypervisor extension + -RV32I/64I/128I base ISA + +RV32I/64I base ISA + _Reserved_ + _Reserved_ + _Reserved_ + @@ -183,18 +182,24 @@ _Reserved_ + _Reserved_ |=== -The design of the RV128I base ISA is not yet complete, and while much of -the remainder of this specification is expected to apply to RV128, this -version of the document focuses only on RV32 and RV64. +The "X" bit will be set if there are any non-standard extensions. + +When the "B" bit is 1, the implementation supports the instructions provided by the +Zba, Zbb, and Zbs extensions. When the "B" bit is 0, it indicates that the +implementation might not support one or more of the Zba, Zbb, or Zbs extensions. -The "U" and "S" bits will be set if there is support for user and -supervisor modes respectively. +When the "M" bit is 1, the implementation supports all multiply and +division instructions defined by the M extension. When the "M" bit +is 0, it indicates that the implementation might not support those +instructions. However if the Zmmul extension is supported then +the multiply instructions it specifies are supported irrespective +of the value of the "M" bit. -The "X" bit will be set if there are any non-standard extensions. +When the "S" bit is 1, the implementation supports supervisor mode. +When the "S" bit is 0, the implementation might not support supervisor mode. -When "B" bit is 1, the implementation supports the instructions provided by the -Zba, Zbb, and Zbs extensions. When "B" bit is 0, it indicates that the -implementation may not support one or more of the Zba, Zbb, or Zbs extensions. +When the "U" bit is 1, the implementation supports user mode. +When the "U" bit is 0, the implementation might not support user mode. [NOTE] ==== @@ -219,6 +224,8 @@ If an ISA feature _x_ depends on an ISA feature _y_, then attempting to enable feature _x_ but disable feature _y_ results in both features being disabled. For example, setting "F"=0 and "D"=1 results in both "F" and "D" being cleared. +Similarly, setting "U"=0 and "S"=1" results in both "U" and "S" being +cleared. An implementation may impose additional constraints on the collective setting of two or more `misa` fields, in which case they function @@ -239,7 +246,7 @@ corresponding feature is not implemented. This follows from the fact that, when a feature is not implemented, the corresponding opcodes and CSRs become reserved, not necessarily illegal. -==== Machine Vendor ID (`mvendorid`) Register +==== Machine Vendor ID (`mvendorid`) Register The `mvendorid` CSR is a 32-bit read-only register providing the JEDEC manufacturer ID of the provider of the core. This register must be @@ -251,7 +258,7 @@ implementation. //image::png/mvendorid.png[align="center"] .Vendor ID register (`mvendorid`) -include::images/bytefield/mvendorid.adoc[] +include::images/bytefield/mvendorid.edn[] JEDEC manufacturer IDs are ordinarily encoded as a sequence of one-byte continuation codes `0x7f`, terminated by a one-byte ID not equal to @@ -276,7 +283,7 @@ manufacturer ID standard. At time of writing, registering a manufacturer ID with JEDEC has a one-time cost of $500. ==== -==== Machine Architecture ID (`marchid`) Register +==== Machine Architecture ID (`marchid`) Register The `marchid` CSR is an MXLEN-bit read-only register encoding the base microarchitecture of the hart. This register must be readable in any @@ -285,8 +292,8 @@ is not implemented. The combination of `mvendorid` and `marchid` should uniquely identify the type of hart microarchitecture that is implemented. -.Machine Architecture ID (`marchid`) register -include::images/bytefield/marchid.adoc[] +.Machine Architecture ID (`marchid`) register +include::images/bytefield/marchid.edn[] Open-source project architecture IDs are allocated globally by RISC-V International, and have non-zero architecture IDs with a zero @@ -297,7 +304,7 @@ cannot contain zero in the remaining MXLEN-1 bits. [NOTE] ==== The intent is for the architecture ID to represent the microarchitecture -associated with the repo around which development occurs rather than a +associated with the project around which development occurs rather than a particular organization. Commercial fabrications of open-source designs should (and might be required by the license to) retain the original architecture ID. This will aid in reducing fragmentation and tool @@ -315,7 +322,7 @@ organization. The `misa` register also helps distinguish different variants of a design. ==== -==== Machine Implementation ID (`mimpid`) Register +==== Machine Implementation ID (`mimpid`) Register The `mimpid` CSR provides a unique encoding of the version of the processor implementation. This register must be readable in any @@ -323,8 +330,8 @@ implementation, but a value of 0 can be returned to indicate that the field is not implemented. The Implementation value should reflect the design of the RISC-V processor itself and not any surrounding system. -.Machine Implementation ID (`mimpid`) register -include::images/bytefield/mimpid.adoc[] +.Machine Implementation ID (`mimpid`) register +include::images/bytefield/mimpid.edn[] [NOTE] ==== @@ -336,7 +343,7 @@ most-significant nibble down) with subfields aligned on nibble boundaries to ease human readability. ==== -==== Hart ID (`mhartid`) Register +==== Hart ID (`mhartid`) Register The `mhartid` CSR is an MXLEN-bit read-only register containing the integer ID of the hardware thread running the code. This register must @@ -345,8 +352,8 @@ numbered contiguously in a multiprocessor system, but at least one hart must have a hart ID of zero. Hart IDs must be unique within the execution environment. -.Hart ID (`mhartid`) register -include::images/bytefield/mhartid.adoc[] +.Hart ID (`mhartid`) register +include::images/bytefield/mhartid.edn[] [NOTE] ==== @@ -357,7 +364,7 @@ For efficiency, system implementers should aim to reduce the magnitude of the largest hart ID used in a system. ==== -==== Machine Status (`mstatus` and `mstatush`) Registers +==== Machine Status (`mstatus` and `mstatush`) Registers The `mstatus` register is an MXLEN-bit read/write register formatted as shown in <<mstatusreg-rv32>> for RV32 and @@ -368,93 +375,18 @@ S-level ISA. [[mstatusreg-rv32]] .Machine-mode status (`mstatus`) register for RV32 -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'SIE'}, - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'MIE'}, - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'SPIE'}, - {bits: 1, name: 'UBE'}, - {bits: 1, name: 'MPIE'}, - {bits: 1, name: 'SPP'}, - {bits: 2, name: 'VS[1:0]'}, - {bits: 2, name: 'MPP[1:0]'}, - {bits: 2, name: 'FS[1:0]'}, - {bits: 2, name: 'XS[1:0]'}, - {bits: 1, name: 'MPRV'}, - {bits: 1, name: 'SUM'}, - {bits: 1, name: 'MXR'}, - {bits: 1, name: 'TVM'}, - {bits: 1, name: 'TW'}, - {bits: 1, name: 'TSR'}, - {bits: 1, name: 'SPELP'}, - {bits: 7, name: 'WPRI'}, - {bits: 1, name: 'SD'}, -], config:{lanes: 2, hspace:1024}} -.... +include::images/wavedrom/mstatusreg-rv321.edn[] [[mstatusreg]] .Machine-mode status (`mstatus`) register for RV64 -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'SIE'}, - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'MIE'}, - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'SPIE'}, - {bits: 1, name: 'UBE'}, - {bits: 1, name: 'MPIE'}, - {bits: 1, name: 'SPP'}, - {bits: 2, name: 'VS[1:0]'}, - {bits: 2, name: 'MPP[1:0]'}, - {bits: 2, name: 'FS[1:0]'}, - {bits: 2, name: 'XS[1:0]'}, - {bits: 1, name: 'MPRV'}, - {bits: 1, name: 'SUM'}, - {bits: 1, name: 'MXR'}, - {bits: 1, name: 'TVM'}, - {bits: 1, name: 'TW'}, - {bits: 1, name: 'TSR'}, - {bits: 1, name: 'SPELP'}, - {bits: 8, name: 'WPRI'}, - {bits: 2, name: 'UXL[1:0]'}, - {bits: 2, name: 'SXL[1:0]'}, - {bits: 1, name: 'SBE'}, - {bits: 1, name: 'MBE'}, - {bits: 1, name: 'GVA'}, - {bits: 1, name: 'MPV'}, - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'MPELP'}, - {bits: 1, name: 'MDT'}, - {bits: 20, name: 'WPRI'}, - {bits: 1, name: 'SD'}, -], config:{lanes: 4, hspace:1024}} -.... +include::images/wavedrom/mstatusreg.edn[] For RV32 only, `mstatush` is a 32-bit read/write register formatted as shown in <<mstatushreg>>. Bits 30:4 of `mstatush` generally contain the same fields found in bits 62:36 of `mstatus` for RV64. Fields SD, SXL, and UXL do not exist in `mstatush`. [[mstatushreg]] .Additional machine-mode status (`mstatush`) register for RV32. -[wavedrom, ,svg] -.... -{reg: [ - {bits: 4, name: 'WPRI'}, - {bits: 1, name: 'SBE'}, - {bits: 1, name: 'MBE'}, - {bits: 1, name: 'GVA'}, - {bits: 1, name: 'MPV'}, - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'MPELP'}, - {bits: 1, name: 'MDT'}, - {bits: 21, name: 'WPRI'}, -], config:{lanes: 2, hspace:1024}} -.... +include::images/wavedrom/mstatushreg.edn[] [[privstack]] ===== Privilege and Global Interrupt-Enable Stack in `mstatus` register @@ -481,6 +413,7 @@ setting of the global __y__IE bit for the higher-privilege mode. Higher-privilege-level code can use separate per-interrupt enable bits to disable selected higher-privilege-mode interrupts before ceding control to a lower-privilege mode. +If supervisor mode is not implemented, then SIE and SPIE are read-only 0. [NOTE] ==== @@ -573,9 +506,11 @@ by the same write (For RV32, the `MDT` bit is in `mstatush` and the `MIE` bit in When a trap is to be taken into M-mode, if the `MDT` bit is currently 0, it is then set to 1, and the trap is delivered as expected. However, if `MDT` is -already set to 1, then this is an _unexpected trap_. Additionally, when the -Smrnmi extension is implemented, a trap that occurs when executing in M-mode -with the `mnstatus.NMIE` set to 0 is an _unexpected trap_. +already set to 1, then this is an _unexpected trap_. When the Smrnmi extension +is implemented, a trap caused by an RNMI is not considered an _unexpected trap_ +irrespective of the state of the `MDT` bit. A trap caused by an RNMI does not +set the `MDT` bit. However, a trap that occurs when executing in M-mode with +`mnstatus.NMIE` set to 0 is an _unexpected trap_. In the event of a _unexpected trap_, the handling is as follows: @@ -612,6 +547,10 @@ The `MRET` and `SRET` instructions, when executed in M-mode, set the `MDT` bit to 0. If the new privilege mode is U, VS, or VU, then `sstatus.SDT` is also set to 0. Additionally, if it is VU, then `vsstatus.SDT` is also set to 0. +The `MNRET` instruction, provided by the Smrnmi extension, sets the `MDT` bit to +0 if the new privilege mode is not M. If it is U, VS, or VU, then `sstatus.SDT` is +also set to 0. Additionally, if it is VU, then `vsstatus.SDT` is also set to 0. + [[xlen-control]] ===== Base ISA Control in `mstatus` Register @@ -656,6 +595,21 @@ always be a software bug, but machine operation is well-defined even in this case. ==== +Some HINT instructions are encoded as integer computational instructions that +overwrite their destination register with its current value, e.g., +`c.addi x8, 0`. +When such a HINT is executed with XLEN < MXLEN and bits MXLEN..XLEN of the +destination register not all equal to bit XLEN-1, it is implementation-defined +whether bits MXLEN..XLEN of the destination register are unchanged or are +overwritten with copies of bit XLEN-1. + +NOTE: This definition allows implementations to elide register write-back for +some HINTs, while allowing them to execute other HINTs in the same manner as +other integer computational instructions. +The implementation choice is observable only by privilege modes with an XLEN +setting greater than the current XLEN; it is invisible to the current +privilege mode. + ===== Memory Privilege in `mstatus` Register The MPRV (Modify PRiVilege) bit modifies the _effective privilege mode_, @@ -1098,8 +1052,7 @@ Dirty [width=75,align=center,float=center,cols="<,<,<,<,<"] |=== -5+^|Execute instruction that possibly modifies state, including -configuration +5+^|Execute instruction that possibly modifies state, including configuration |Action? + Next state @@ -1182,7 +1135,7 @@ additional microarchitectural bits might be maintained in the extension to further reduce context save and restore overhead. The SD bit is read-only and is set when either the FS, VS, or XS bits -encode a Dirty state (i.e., SD=((FS==11) OR (XS==11) OR (VS==11))). This +encode a Dirty state (i.e., `SD=(FS==0b11 OR XS==0b11 OR VS==0b11)`). This allows privileged code to quickly determine when no additional context save is required beyond the integer register set and `pc`. @@ -1216,7 +1169,7 @@ The Zicfilp extension adds the `SPELP` and `MPELP` fields that hold the previous * 0 - `NO_LP_EXPECTED` - no landing pad instruction expected. * 1 - `LP_EXPECTED` - a landing pad instruction is expected. -==== Machine Trap-Vector Base-Address (`mtvec`) Register +==== Machine Trap-Vector Base-Address (`mtvec`) Register The `mtvec` register is an MXLEN-bit *WARL* read/write register that holds trap vector configuration, consisting of a vector base address (BASE) @@ -1224,13 +1177,16 @@ and a vector mode (MODE). .Encoding of mtvec MODE field. -include::images/bytefield/mtvec.adoc[] +include::images/bytefield/mtvec.edn[] The `mtvec` register must always be implemented, but can contain a read-only value. If `mtvec` is writable, the set of values the register may hold can vary by implementation. The value in the BASE field must always be aligned on a 4-byte boundary, and the MODE setting may impose additional alignment constraints on the value in the BASE field. +Note that the CSR contains only bits XLEN-1 through 2 of the address BASE. +When used as an address, the lower two bits are filled with zeroes to obtain +an XLEN-bit address that is always aligned on a 4-byte boundary. [NOTE] ==== @@ -1281,7 +1237,7 @@ implemented without a hardware adder circuit. Reset and NMI vector locations are given in a platform specification. ==== -==== Machine Trap Delegation (`medeleg` and `mideleg`) Registers +==== Machine Trap Delegation (`medeleg` and `mideleg`) Registers By default, all traps at any privilege level are handled in machine mode, though a machine-mode handler can redirect traps back to the @@ -1352,7 +1308,7 @@ is clear, STIs can be taken in any mode and regardless of current mode will transfer control to M-mode. .Machine Exception Delegation (`medeleg`) register. -include::images/bytefield/medeleg.adoc[] +include::images/bytefield/medeleg.edn[] `medeleg` has a bit position allocated for every synchronous exception shown in <<mcauses>>, with the index of the @@ -1365,7 +1321,7 @@ that aliases bits 63:32 of `medeleg`. The `medelegh` register does not exist when XLEN=64. .Machine Interrupt Delegation (`mideleg`) Register. -include::images/bytefield/mideleg.adoc[] +include::images/bytefield/mideleg.edn[] `mideleg` holds trap delegation bits for individual interrupts, with the layout of bits matching those in the `mip` register (i.e., STIP @@ -1377,7 +1333,7 @@ corresponding `medeleg` bits should be read-only zero. In particular, The `medeleg`[16] is read-only zero as double trap is not delegatable. -==== Machine Interrupt (`mip` and `mie`) Registers +==== Machine Interrupt (`mip` and `mie`) Registers The `mip` register is an MXLEN-bit read/write register containing information on pending interrupts, while `mie` is the corresponding @@ -1391,10 +1347,10 @@ NOTE: Interrupts designated for platform use may be designated for custom use at the platform's discretion. .Machine Interrupt-Pending (`mip`) register. -include::images/bytefield/mideleg.adoc[] +include::images/bytefield/mideleg.edn[] .Machine Interrupt-Enable (`mie`) register -include::images/bytefield/mideleg.adoc[] +include::images/bytefield/mideleg.edn[] An interrupt _i_ will trap to M-mode (causing the privilege mode to change to M-mode) if all of the following are true: (a) either the @@ -1428,11 +1384,11 @@ formatted as shown in <<mipreg-standard>> and <<miereg-standard>> respectively. [[mipreg-standard]] .Standard portion (bits 15:0) of `mip`. -include::images/bytefield/mipreg-standard.adoc[] +include::images/bytefield/mipreg-standard.edn[] [[miereg-standard]] .Standard portion (bits 15:0) of `mie`. -include::images/bytefield/miereg-standard.adoc[] +include::images/bytefield/miereg-standard.edn[] [NOTE] ==== @@ -1440,7 +1396,7 @@ The machine-level interrupt registers handle a few root interrupt sources which are assigned a fixed service priority for simplicity, while separate external interrupt controllers can implement a more complex prioritization scheme over a much larger set of interrupts that -are then muxed into the machine-level interrupt sources. +are then multiplexed into the machine-level interrupt sources. ''' @@ -1513,7 +1469,7 @@ software interrupts. SSIP is writable in `mip` and may also be set to 1 by a platform-specific interrupt controller. If the Sscofpmf extension is implemented, bits `mip`.LCOFIP and `mie`.LCOFIE -are the interrupt-pending and interrupt-enable bits for local counter-overflow +are the interrupt-pending and interrupt-enable bits for local-counter-overflow interrupts. LCOFIP is read-write in `mip` and reflects the occurrence of a local counter-overflow overflow interrupt request resulting from any of the @@ -1583,7 +1539,7 @@ implementation is to make both the counter and its corresponding event selector be read-only 0. .Hardware performance monitor counters. -include::images/bytefield/hpmevents.adoc[] +include::images/bytefield/hpmevents.edn[] The `mhpmcounters` are *WARL* registers that support up to 64 bits of precision on RV32 and RV64. @@ -1596,14 +1552,14 @@ only bits 63-32. The `mhpmevent__n__h` CSRs are provided only if the Sscofpmf extension is implemented. [[mcounteren]] -==== Machine Counter-Enable (`mcounteren`) Register +==== Machine Counter-Enable (`mcounteren`) Register The counter-enable `mcounteren` register is a 32-bit register that controls the availability of the hardware performance-monitoring counters to the next-lower privileged mode. .Counter-enable (`mcounteren`) register. -include::images/bytefield/counteren.adoc[] +include::images/bytefield/counteren.edn[] The settings in this register only control accessibility. The act of reading or writing this register does not affect the underlying @@ -1629,9 +1585,9 @@ counters, the counters can be directly exposed to lower privilege modes. The `cycle`, `instret`, and `hpmcountern` CSRs are read-only shadows of `mcycle`, `minstret`, and `mhpmcounter n`, respectively. The `time` CSR is a read-only shadow of the memory-mapped `mtime` register. -Analogously, on RV32I the `cycleh`, `instreth` and `hpmcounternh` CSRs +Analogously, when XLEN=32, the `cycleh`, `instreth` and `hpmcounternh` CSRs are read-only shadows of `mcycleh`, `minstreth` and `mhpmcounternh`, -respectively. On RV32I the `timeh` CSR is a read-only shadow of the +respectively. When XLEN=32, the `timeh` CSR is a read-only shadow of the upper 32 bits of the memory-mapped `mtime` register, while `time` shadows only the lower 32 bits of `mtime`. @@ -1648,10 +1604,10 @@ corresponding counter will cause an illegal-instruction exception when executing in a less-privileged mode. In harts without U-mode, the `mcounteren` register should not exist. -==== Machine Counter-Inhibit (`mcountinhibit`) Register +==== Machine Counter-Inhibit (`mcountinhibit`) Register .Counter-inhibit `mcountinhibit` register -include::images/bytefield/counterinh.adoc[] +include::images/bytefield/counterinh.edn[] The counter-inhibit register `mcountinhibit` is a 32-bit *WARL* register that controls which of the hardware performance-monitoring counters @@ -1682,7 +1638,7 @@ Because the `mtime` counter can be shared between multiple cores, it cannot be inhibited with the `mcountinhibit` mechanism. ==== -==== Machine Scratch (`mscratch`) Register +==== Machine Scratch (`mscratch`) Register The `mscratch` register is an MXLEN-bit read/write register dedicated for use by machine mode. Typically, it is used to hold a pointer to a @@ -1690,7 +1646,7 @@ machine-mode hart-local context space and swapped with a user register upon entry to an M-mode trap handler. .Machine-mode scratch register. -include::images/bytefield/mscratch.adoc[] +include::images/bytefield/mscratch.edn[] [NOTE] ==== @@ -1743,10 +1699,10 @@ though it may be explicitly written by software. [[mepcreg]] .Machine exception program counter register. -include::images/bytefield/mepcreg.adoc[] +include::images/bytefield/mepcreg.edn[] [[mcause]] -==== Machine Cause (`mcause`) Register +==== Machine Cause (`mcause`) Register The `mcause` register is an MXLEN-bit read-write register formatted as shown in <<mcausereg>>. When a trap is taken into @@ -1762,7 +1718,7 @@ the possible machine-level exception codes. The Exception Code is a [[mcausereg]] .Machine Cause (`mcause`) register. -include::images/bytefield/mcausereg.adoc[] +include::images/bytefield/mcausereg.edn[] Note that load and load-reserved instructions generate load exceptions, whereas store, store-conditional, and AMO instructions generate @@ -1796,7 +1752,7 @@ synchronous exceptions is implementation-defined. .Machine cause (`mcause`) register values after trap. [%autowidth,float="center",align="center",cols=">,>,<",options="header",] |=== -|Interrupt |Exception Code |Description +|Interrupt |Exception Code |Description |1 + 1 + 1 + @@ -1808,7 +1764,7 @@ synchronous exceptions is implementation-defined. |_Reserved_ + Supervisor software interrupt + _Reserved_ + -Machine software interrupt +Machine software interrupt |1 + 1 + @@ -1821,7 +1777,7 @@ Machine software interrupt |_Reserved_ + Supervisor timer interrupt + _Reserved_ + -Machine timer interrupt +Machine timer interrupt |1 + 1 + 1 + @@ -1833,7 +1789,7 @@ Machine timer interrupt |_Reserved_ + Supervisor external interrupt + _Reserved_ + -Machine external interrupt +Machine external interrupt |1 + 1 + 1 + @@ -1845,7 +1801,7 @@ Machine external interrupt |_Reserved_ + Counter-overflow interrupt + _Reserved_ + -_Designated for platform use_ +_Designated for platform use_ |0 + 0 + 0 + @@ -1867,6 +1823,9 @@ _Designated for platform use_ 0 + 0 + 0 + +0 + +0 + +0 + 0 |0 + 1 + @@ -1917,7 +1876,7 @@ _Reserved_ + _Designated for custom use_ + _Reserved_ + _Designated for custom use_ + -_Reserved_ +_Reserved_ |=== <<< @@ -1982,7 +1941,7 @@ instruction). *** -Instruction address misaligned exceptions are raised by control-flow +Instruction address-misaligned exceptions are raised by control-flow instructions with misaligned targets, rather than by the act of fetching an instruction. Therefore, these exceptions have lower priority than other instruction address exceptions. @@ -1990,7 +1949,7 @@ other instruction address exceptions. [NOTE] ==== -A Software Check exception is a synchronous exception that is triggered when +A software-check exception is a synchronous exception that is triggered when there are violations of checks and assertions defined by ISA extensions that aim to safeguard the integrity of software assets, including e.g. control-flow and memory-access constraints. When this exception is raised, the `__x__tval` @@ -1999,19 +1958,19 @@ that stipulated the exception be raised. The priority of this exception, relative to other synchronous exceptions, depends on the cause of this exception and is defined by the extension that stipulated the exception be raised. -A Hardware Error exception is a synchronous exception triggered when corrupted or +A hardware-error exception is a synchronous exception triggered when corrupted or uncorrectable data is accessed explicitly or implicitly by an instruction. In this context, "data" encompasses all types of information used within a RISC-V -hart. Upon a hardware error exception, the `__x__epc` register is set to the +hart. Upon a hardware-error exception, the `__x__epc` register is set to the address of the instruction that attempted to access corrupted data, while the `__x__tval` register is set either to 0 or to the virtual address of an instruction fetch, load, or store that attempted to access corrupted data. The -priority of Hardware Error exception is implementation-defined, but any given +priority of hardware-error exception is implementation-defined, but any given occurrence is generally expected to be recognized at the point in the overall priority order at which the hardware error is discovered. ==== -==== Machine Trap Value (`mtval`) Register +==== Machine Trap Value (`mtval`) Register The `mtval` register is an MXLEN-bit read-write register formatted as shown in <<mtvalreg>>. When a trap is taken into @@ -2037,7 +1996,7 @@ particularly those with hardware page-table walkers. [[mtvalreg]] .Machine Trap Value (`mtval`) register. -include::images/bytefield/mtvalreg.adoc[] +include::images/bytefield/mtvalreg.edn[] If `mtval` is written with a nonzero value when a misaligned load or @@ -2089,7 +2048,7 @@ two cases (or alternatively, the system configuration information can be interrogated to install the appropriate trap handling before runtime). ==== -On a trap caused by a software check exception, the `mtval` register holds +On a trap caused by a software-check exception, the `mtval` register holds the cause for the exception. The following encodings are defined: * 0 - No information provided. @@ -2108,7 +2067,7 @@ return the faulting instruction bits is implemented, `mtval` must also be able to hold all values less than 2^__N__^, where _N_ is the smaller of MXLEN and ILEN. -==== Machine Configuration Pointer (`mconfigptr`) Register +==== Machine Configuration Pointer (`mconfigptr`) Register The `mconfigptr` register is an MXLEN-bit read-only CSR, formatted as shown in <<mconfigptrreg>>, that holds the physical @@ -2118,7 +2077,7 @@ and their configuration. [[mconfigptrreg]] .Machine Configuration Pointer (`mconfigptr`) register. -include::images/bytefield/mconfigptrreg.adoc[] +include::images/bytefield/mconfigptrreg.edn[] The pointer alignment in bits must be no smaller than MXLEN: @@ -2145,35 +2104,17 @@ M-mode software towards the beginning of the boot process. ==== [[sec:menvcfg]] -==== Machine Environment Configuration (`menvcfg`) Register +==== Machine Environment Configuration (`menvcfg`) Register The `menvcfg` CSR is a 64-bit read/write register, formatted as shown in <<menvcfgreg>>, that controls certain characteristics of the execution environment for modes less privileged than M. -[#menvcfgreg] +[[menvcfgreg]] .Machine environment configuration (`menvcfg`) register. -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: 'FIOM'}, - {bits: 1, name: 'WPRI'}, - {bits: 1, name: 'LPE'}, - {bits: 1, name: 'SSE'}, - {bits: 2, name: 'CBIE'}, - {bits: 1, name: 'CBCFE'}, - {bits: 1, name: 'CBZE'}, - {bits: 24, name: 'WPRI'}, - {bits: 2, name: 'PMM'}, - {bits: 25, name: 'WPRI'}, - {bits: 1, name: 'DTE'}, - {bits: 1, name: 'CDE'}, - {bits: 1, name: 'ADUE'}, - {bits: 1, name: 'PBMTE'}, - {bits: 1, name: 'STCE'}, -], config:{lanes: 4, hspace:1024}} -.... +include::images/wavedrom/menvcfgreg.edn[] + If bit FIOM (Fence of I/O implies Memory) is set to one in `menvcfg`, FENCE instructions executed in modes less privileged than M are modified @@ -2254,9 +2195,7 @@ The definition of the CBZE field is furnished by the Zicboz extension. The definitions of the CBCFE and CBIE fields are furnished by the Zicbom extension. -The definition of the PMM field will be furnished by the forthcoming -Smnpm extension. Its allocation within `menvcfg` may change prior to the -ratification of that extension. +The definition of the PMM field is furnished by the Smnpm extension. The Zicfilp extension adds the `LPE` field in `menvcfg`. When the `LPE` field is set to 1 and S-mode is implemented, the Zicfilp extension is enabled in S-mode. @@ -2275,9 +2214,11 @@ the following rules apply to privilege modes that are less than M: * 32-bit Zicfiss instructions will revert to their behavior as defined by Zimop. * 16-bit Zicfiss instructions will revert to their behavior as defined by Zcmop. * The `pte.xwr=010b` encoding in VS/S-stage page tables becomes reserved. -* The `henvcfg.SSE` and `senvcfg.SSE` fields will read as zero and are read-only. * `SSAMOSWAP.W/D` raises an illegal-instruction exception. +When `menvcfg.SSE` is 0, the `henvcfg.SSE` and `senvcfg.SSE` fields are +read-only zero. + The Ssdbltrp extension adds the double-trap-enable (`DTE`) field in `menvcfg`. When `menvcfg.DTE` is zero, the implementation behaves as though Ssdbltrp is not implemented. When Ssdbltrp is not implemented `sstatus.SDT`, `vsstatus.SDT`, and @@ -2290,38 +2231,22 @@ The `menvcfgh` register does not exist when XLEN=64. If U-mode is not supported, then registers `menvcfg` and `menvcfgh` do not exist. -==== Machine Security Configuration (`mseccfg`) Register +==== Machine Security Configuration (`mseccfg`) Register `mseccfg` is an optional 64-bit read/write register, formatted as shown in <<mseccfg>>, that controls security features. [[mseccfg]] .Machine security configuration (`mseccfg`) register. -[wavedrom, ,svg] -.... -{reg: [ - {bits: 1, name: 'MML'}, - {bits: 1, name: 'MMWP'}, - {bits: 1, name: 'RLB'}, - {bits: 5, name: 'WPRI'}, - {bits: 1, name: 'USEED'}, - {bits: 1, name: 'SSEED'}, - {bits: 1, name: 'MLPE'}, - {bits: 53, name: 'WPRI'}, -], config:{lanes: 4, hspace:1024}} -.... +include::images/wavedrom/mseccfg.edn[] -The definitions of the SSEED and USEED fields will be furnished by the -forthcoming entropy-source extension, Zkr. Their allocations within -`mseccfg` may change prior to the ratification of that extension. +The definitions of the SSEED and USEED fields are furnished by the +entropy-source extension, Zkr. -The definitions of the RLB, MMWP, and MML fields will be furnished by -the forthcoming PMP-enhancement extension, Smepmp. Their allocations -within `mseccfg` may change prior to the ratification of that extension. +The definitions of the RLB, MMWP, and MML fields are furnished by the +PMP-enhancement extension, Smepmp. -The definition of the PMM field will be furnished by the forthcoming -Smmpm extension. Its allocation within `mseccfg` may change prior to the -ratification of that extension. +The definition of the PMM field is furnished by the Smmpm extension. The Zicfilp extension adds the `MLPE` field in `mseccfg`. When `MLPE` field is 1, Zicfilp extension is enabled in M-mode. When the `MLPE` field is 0, the @@ -2337,7 +2262,7 @@ Register `mseccfgh` does not exist when XLEN=64. === Machine-Level Memory-Mapped Registers -==== Machine Timer (`mtime` and `mtimecmp`) Registers +==== Machine Timer (`mtime` and `mtimecmp`) Registers Platforms provide a real-time counter, exposed as a memory-mapped machine-mode read-write register, `mtime`. `mtime` must increment at @@ -2355,10 +2280,10 @@ writing `mtimecmp`). The interrupt will only be taken if interrupts are enabled and the MTIE bit is set in the `mie` register. .Machine time register (memory-mapped control register). -include::images/bytefield/mtime.adoc[] +include::images/bytefield/mtime.edn[] .Machine time compare register (memory-mapped control register). -include::images/bytefield/mtimecmp.adoc[] +include::images/bytefield/mtimecmp.edn[] [NOTE] ==== @@ -2385,8 +2310,9 @@ Simple fixed-frequency systems can use a single clock for both cycle counting and wall-clock time. ==== -Writes to `mtime` and `mtimecmp` are guaranteed to be reflected in MTIP -eventually, but not necessarily immediately. +If the result of the comparison between `mtime` and `mtimecmp` changes, it is +guaranteed to be reflected in MTIP eventually, but not necessarily +immediately. [NOTE] ==== @@ -2417,13 +2343,19 @@ For RV64, naturally aligned 64-bit memory accesses to the `mtime` and .... +The `time` CSR is a read-only shadow of the memory-mapped `mtime` register. +When XLEN=32, the `timeh` CSR is a read-only shadow of the upper 32 bits of the +memory-mapped `mtime` register, while `time` shadows only the lower 32 bits of +`mtime`. +When `mtime` changes, it is guaranteed to be reflected in `time` and `timeh` +eventually, but not necessarily immediately. === Machine-Mode Privileged Instructions ==== Environment Call and Breakpoint -include::images/wavedrom/mm-env-call.adoc[] +include::images/wavedrom/mm-env-call.edn[] The ECALL instruction is used to make a request to the supporting execution environment. When executed in U-mode, S-mode, or M-mode, it @@ -2463,7 +2395,7 @@ not increment the `minstret` CSR. Instructions to return from trap are encoded under the PRIV minor opcode. -include::images/wavedrom/trap-return.adoc[] +include::images/wavedrom/trap-return.edn[] To return after handling a trap, there are separate trap return instructions per privilege level, MRET and SRET. MRET is always @@ -2473,7 +2405,9 @@ also raise an illegal-instruction exception when TSR=1 in `mstatus`, as described in <<virt-control>>. An __x__RET instruction can be executed in privilege mode _x_ or higher, where executing a lower-privilege __x__RET instruction will pop the relevant lower-privilege -interrupt enable and privilege mode stack. In addition to manipulating +interrupt enable and privilege mode stack. Attempting to execute an __x__RET +instruction in a mode less privileged than _x_ will raise an +illegal-instruction exception. In addition to manipulating the privilege stack as described in <<privstack>>, __x__RET sets the `pc` to the value stored in the `__x__epc` register. @@ -2500,7 +2434,7 @@ privileged modes, and optionally available to U-mode. This instruction may raise an illegal-instruction exception when TW=1 in `mstatus`, as described in <<virt-control>>. -include::images/wavedrom/wfi.adoc[] +include::images/wavedrom/wfi.edn[] If an enabled interrupt is present or later becomes present while the hart is stalled, the interrupt trap will be taken on the following @@ -2544,7 +2478,7 @@ WFI if there was no actionable event. [NOTE] ==== -By allowing wakeup when interrupts are disabled, an alternate entry +By allowing wake-up when interrupts are disabled, an alternate entry point to an interrupt handler can be called that does not require saving the current context, as the current context can be saved or discarded before the WFI is executed. @@ -2576,7 +2510,7 @@ minimum required privilege mode, as do other SYSTEM instructions. [[customsys]] .SYSTEM instruction encodings designated for custom use. -include::images/bytefield/cust-sys-instr.adoc[] +include::images/bytefield/cust-sys-instr.edn[] [[reset]] === Reset @@ -2594,7 +2528,9 @@ the platform mandates a different reset value for some PMP registers’ A and L fields. If the hypervisor extension is implemented, the `hgatp`.MODE and `vsatp`.MODE fields are reset to 0. If the Smrnmi extension is implemented, the `mnstatus`.NMIE field is reset to 0. No - *WARL* field contains an illegal value. All other hart state is UNSPECIFIED. + *WARL* field contains an illegal value. If the Zicfilp extension is +implemented, the `mseccfg`.MLPE field is reset to 0. All other hart +state is UNSPECIFIED. The `mcause` values after reset have implementation-specific interpretation, but the value 0 should be returned on implementations @@ -2606,7 +2542,7 @@ most complete reset. ==== Some designs may have multiple causes of reset (e.g., power-on reset, external hard reset, brownout detected, watchdog timer elapse, -sleep-mode wakeup), which machine-mode software and debuggers may wish +sleep-mode wake-up), which machine-mode software and debuggers may wish to distinguish. `mcause` reset values may alias `mcause` values following synchronous @@ -2857,7 +2793,7 @@ Specific supported values for this PMA are represented by MAG__NN__, e.g., MAG16 indicates the misaligned atomicity granule is at least 16 bytes. The misaligned atomicity granule PMA applies only to AMOs, loads and stores -defined in the base ISAs, and loads and stores of no more than MXLEN bits +defined in the base ISAs, and loads and stores of no more than XLEN bits defined in the F, D, and Q extensions. For an instruction in that set, if all accessed bytes lie within the same misaligned atomicity granule, the instruction will not raise an exception for @@ -2898,7 +2834,9 @@ and I/O regions may be accessed with either _relaxed_ or _strong_ ordering. Accesses to an I/O region with relaxed ordering are generally observed by other harts and bus mastering devices in a manner similar to the ordering of accesses to an RVWMO memory region, as discussed in -Section A.4.2 in Volume I of this specification. By contrast, accesses +the I/O Ordering section in the RVWMO Explanatory Material appendix +of Volume I of this specification. +By contrast, accesses to an I/O region with strong ordering are generally observed by other harts and bus mastering devices in program order. @@ -3126,11 +3064,11 @@ entries 8-11 appear in `pmpcfg2`[31:0] for both RV32 and RV64. [[pmpcfg-rv32]] .RV32 PMP configuration CSR layout. -include::images/bytefield/pmp-rv32.adoc[] +include::images/bytefield/pmp-rv32.edn[] [[pmpcfg-rv64]] .RV64 PMP configuration CSR layout. -include::images/bytefield/pmp-rv64.adoc[] +include::images/bytefield/pmp-rv64.edn[] The PMP address registers are CSRs named `pmpaddr0`-`pmpaddr63`. Each @@ -3154,11 +3092,11 @@ the same limit. [[pmpaddr-rv32]] .PMP address register format, RV32. -include::images/bytefield/pmpaddr-rv32.adoc[] +include::images/bytefield/pmpaddr-rv32.edn[] [[pmpaddr-rv64]] .PMP address register format, RV64. -include::images/bytefield/pmpaddr-rv64.adoc[] +include::images/bytefield/pmpaddr-rv64.edn[] <<pmpcfg>> shows the layout of a PMP configuration register. The R, W, and X bits, when set, indicate that the PMP entry @@ -3168,7 +3106,7 @@ W, and X fields form a collective *WARL* field for which the combinations with R [[pmpcfg]] .PMP configuration register format. -include::images/bytefield/pmpcfg.adoc[] +include::images/bytefield/pmpcfg.edn[] Attempting to fetch an instruction from a PMP region that does not have @@ -3283,10 +3221,6 @@ back to NAPOT. Software may determine the PMP granularity by writing zero to `pmp0cfg`, then writing all ones to `pmpaddr0`, then reading back `pmpaddr0`. If _G_ is the index of the least-significant bit set, the PMP granularity is 2^G+2^ bytes. ==== -If the current XLEN is greater than MXLEN, the PMP address registers are -zero-extended from MXLEN to XLEN bits for the purposes of address -matching. - ===== Locking and Privilege Mode The L bit indicates that the PMP entry is locked, i.e., writes to the diff --git a/src/mm-eplan.adoc b/src/mm-eplan.adoc index d7bb870..7071b19 100644 --- a/src/mm-eplan.adoc +++ b/src/mm-eplan.adoc @@ -75,7 +75,7 @@ particular valid or invalid execution on the right. [.left] [%autowidth,float="center",align="center",cols="^,<,^,<",options="header"] !=== -2+!Hart 0 2+!Hart 1 +2+!Hart 0 2+!Hart 1 ! !⋮ ! !⋮ ! !li t1,1 ! !li t4,4 !(a) !sw t1,0(s0) !(e) !sw t4,0(s0) @@ -239,7 +239,7 @@ visible memory). Any other hart will therefore observe the load as performing before the store. Consider the <<litms_sb_forward>>. When running this program on an implementation with -store buffers, it is possible to arrive at the final outcome a0=1, `a1=0, a2=1, a3=0` as follows: +store buffers, it is possible to arrive at the final outcome `a0=1, a1=0, a2=1, a3=0` as follows: [[litms_sb_forward]] .A store buffer forwarding litmus test (outcome permitted) @@ -295,7 +295,7 @@ Call this "Rule X". Then we get the following: preceded (d), then (d) would be required to return the value 1. (This is a perfectly legal execution; it's just not the one in question) * (e) precedes (f): by rule X -* (f) precedes (h): by rule <<overlapping-ordering, 4]>> +* (f) precedes (h): by rule <<overlapping-ordering, 4>> * (h) precedes (a): by the load value axiom, as above. The global memory order must be a total order and cannot be cyclic, @@ -400,9 +400,9 @@ original hart holds the reservation. |(b) sd t1, 0(s0) |(b) sw t1, 4(s0) |(b) sw t1, 4(s0) |(b) sw t1, 4(s0) -|(c) sc.d t3, t2, 0(s0) |(c) sc.d t3, t2, 0(s0) |(c) sc.w t3, t2, 0(s0) |(c) addi s0, s0, 8 +|(c) sc.d t3, t2, 0(s0) |(c) sc.d t3, t2, 0(s0) |(c) sc.w t3, t2, 0(s0) |(c) addi s0, s0, 8 -|(d) sc.w t3, t2, 8(s0)||| +||||(d) sc.w t3, t2, 0(s0) |==== [[litmus_lrsdsc]] <<litmus_lrsdsc, Figure 4>>: In all four (independent) instances, the final store-conditional instruction is permitted but not guaranteed to succeed. @@ -608,7 +608,7 @@ balance between enforcing CoRR in all cases while simultaneously being weak enough to permit "RSW" and "fri-rfi" patterns that commonly appear in real microarchitectures. -There is one more overlapping-address rule: <<overlapping-ordering, +There is one more overlapping-address rule: <<overlapping-ordering, rule 3>> simply states that a value cannot be returned from an AMO or SC to a subsequent load until the AMO or SC has (in the case of the SC, successfully) performed globally. This @@ -671,8 +671,8 @@ memory model. Finally, we note that since RISC-V uses a multi-copy atomic memory model, programmers can reason about fences bits in a thread-local -manner. There is no complex notion of "fence cumulativity" as found in -memory models that are not multi-copy atomic. +manner. Fences in RISC-V are not cumulative, as they are in some +non-multi-copy-atomic memory models. [[sec:memory:acqrel]] ==== Explicit Synchronization (<<overlapping-ordering, Rules 5-8>>) @@ -786,8 +786,8 @@ operation due to the inherent data dependency. However, PPO rule 8 also applies even when the value being stored does not syntactically depend on the value returned by the paired LR. -Lastly, we note that just as with fences, programmers need not worry -about "cumulativity" when analyzing ordering annotations. +Lastly, we note that, as with fences, ordering annotations are +not cumulative. [[sec:memory:dependencies]] ==== Syntactic Dependencies (<<overlapping-ordering, Rules 9-11>>) @@ -941,7 +941,7 @@ no effect on the global memory order. !=== 4+!Initial values: 0(s0)=1; 0(s2)=1 4+! -2+^!Hart 0 2+^!Hart 1 +2+^!Hart 0 2+^!Hart 1 !(a) !ld a0,0(s0) !(e) !ld a3,0(s2) !(b) !lr a1,0(s1) !(f) !sd a3,0(s0) !(c) !sc a2,a0,0(s1) ! ! @@ -1023,7 +1023,7 @@ data for that store are known. Consider <<litmus_datarfi>> (f) cannot be executed until the data for (e) has been resolved, because (f) must return the value written by (e) (or by something even later in the global memory order), and the old value must not be clobbered by the -writeback of (e) before (d) has had a chance to perform. Therefore, (f) +write-back of (e) before (d) has had a chance to perform. Therefore, (f) will never perform before (d) has performed. @@ -1058,7 +1058,7 @@ then (f) would no longer be dependent on the data of (e) being resolved, and hence the dependency of (f) on (d), which produces the data for (e), would be broken. -Rule<<overlapping-ordering, 13>> makes a similar observation to the +Rule <<overlapping-ordering, 13>> makes a similar observation to the previous rule: a store cannot be performed at memory until all previous loads that might access the same address have themselves been performed. Such a load must appear to execute before the store, but it cannot do so @@ -1176,7 +1176,7 @@ I/O write to a device register, a FENCE W,O or stronger is needed. [.text-center,source%linenums,asm] ---- sd t0, 0(a0) -fence w,o +fence w,o sd a0, 0(a1) ---- @@ -1775,7 +1775,7 @@ would be compatible with the RVWMO memory model: * "J" JIT extension * Native encodings for load and store opcodes with _aq_ and _rl_ set * Fences limited to certain addresses -* Cache writeback/flush/invalidate/etc.instructions +* Cache write-back/flush/invalidate/etc.instructions [[discrepancies]] === Known Issues @@ -1803,7 +1803,7 @@ would be compatible with the RVWMO memory model: .Mixed-size discrepancy (permitted by axiomatic models, forbidden by operational model) [%autowidth,float="center",align="center",cols="^,<,^,<",options="header"] |=== -2+|Hart 0 2+^|Hart 1 +2+|Hart 0 2+^|Hart 1 2+|li t1, 1 2+^|li t1, 1 |(a) |lw a0,0(s0) |(d) |ld a1,0(s1) |(b) |fence rw,rw |(e) |lw a2,4(s1) @@ -1848,4 +1848,3 @@ enforce this ordering naturally. As such, even though this rule is not official, we recommend that implementers enforce it nevertheless in order to ensure forwards compatibility with the possible future addition of this rule to RVWMO. - diff --git a/src/mm-formal.adoc b/src/mm-formal.adoc index 9f2c942..3052fd6 100644 --- a/src/mm-formal.adoc +++ b/src/mm-formal.adoc @@ -8,11 +8,10 @@ discrepancies are unintended; the expectation is that the models describe exactly the same sets of legal behaviors. This appendix should be treated as commentary; all normative material is -provided in <<memorymodel, Chapter 17>> and in the rest of -the main body of the ISA specification. All currently known -discrepancies are listed in -<<discrepancies, Section A.7>>. Any other -discrepancies are unintentional. +provided in <<memorymodel>> and in the rest of +the main body of the ISA specification. +All currently known discrepancies are listed in <<discrepancies>>. +Any other discrepancies are unintentional. [[alloy]] === Formal Axiomatic Specification in Alloy @@ -243,7 +242,7 @@ pred restrict_to_current_encodings { // =Alloy shortcuts= pred acyclic[rel: Event->Event] { no iden & ^rel } pred total[rel: Event->Event, bag: Event] { - all disj e, e': bag | e->e' in rel + ~rel + all disj e, f: bag | e->f in rel + ~rel acyclic[rel] } .... @@ -256,7 +255,7 @@ input and simulates the execution of the test on top of the memory model. Memory models are written in the domain specific language Cat. This section provides two Cat memory model of RVWMO. The first model, <<herd2>>, follows the _global memory order_, -Chapter <<memorymodel>>, definition of RVWMO, as much +<<memorymodel>>, definition of RVWMO, as much as is possible for a Cat model. The second model, <<herd3>>, is an equivalent, more efficient, partial order based RVWMO model. @@ -286,7 +285,7 @@ let fence.tso = let f = fencerel(Fence.tso) in ([W];f;[W]) | ([R];f;[M]) -let fence = +let fence = fence.r.r | fence.r.w | fence.r.rw | fence.w.r | fence.w.w | fence.w.rw | fence.rw.r | fence.rw.w | fence.rw.rw | @@ -465,7 +464,7 @@ Model states: A model state consists of a shared memory and a tuple of hart stat ["ditaa",shadows=false, separation=false, fontsize: 14,float="center"] .... -+----------+ +---------+ ++----------+ +---------+ | Hart 0 | ... | Trace | +----------+ +---------+ ↑ ↓ ↑ ↓ @@ -597,7 +596,7 @@ continue executing. Transitions specific to `sc` instructions: [disc] -* <<early_sc_fail, Early sc fail>>: This causes the `sc` to fail, either a spontaneous fail or becauset is not paired with a program-order-previous `lr`. +* <<early_sc_fail, Early sc fail>>: This causes the `sc` to fail, either a spontaneous fail or because it is not paired with a program-order-previous `lr`. * <<paired_sc, Paired sc>>: This transition indicates the `sc` is paired with an `lr` and might succeed. @@ -736,7 +735,7 @@ register write by the most recent (in program order) instruction instance that can write that bit (or from the hart’s initial register state if there is no such write). Hence, it is essential to know the register write footprint of each instruction instance, which we -calculate when the instruction instance is created (see the <<fetch, Festch instruction>> action of +calculate when the instruction instance is created (see the <<fetch, Fetch instruction>> action of below). We ensure in the pseudocode that each instruction does at most one register write to each register bit, and also that it does not try to read a register value it just wrote. @@ -1049,10 +1048,10 @@ load is acquire-RCsc. ===== Satisfy memory load operation from memory For an instruction instance latexmath:[$i$] of a non-AMO load -instruction or an AMO instruction in the context of the <<do_amo, Saitsfy, commit and propagate operations of an AMO>> transition, +instruction or an AMO instruction in the context of the <<do_amo, Satisfy, commit and propagate operations of an AMO>> transition, any memory load operation latexmath:[$mlo$] in latexmath:[$i.\textit{mem\_loads}$] that has unsatisfied slices, can be -satisfied from memory if all the conditions of <sat_by_forwarding, Saitsfy memory load operation by forwarding from unpropagated stores>> are satisfied. Action: +satisfied from memory if all the conditions of <sat_by_forwarding, Satisfy memory load operation by forwarding from unpropagated stores>> are satisfied. Action: let latexmath:[$msoss$] be the memory store operation slices from memory covering the unsatisfied slices of latexmath:[$mlo$], and apply the action of <<do_amo, Satisfy memory operation by forwarding from unpropagates stores>>. @@ -1134,7 +1133,7 @@ Pending_mem_stores(_store_continuation_). ===== Commit store instruction An uncommitted instruction instance latexmath:[$i$] of a non-`sc` store -instruction or an `sc` instruction in the context of the <<commit_sc, Commit and propagate store operation of an `sc`>> +instruction or an `sc` instruction in the context of the <<commit_sc, Commit and propagate store operation of an `sc`>> transition, in state Pending_mem_stores(_store_continuation_), can be committed (not to be confused with propagated) if: @@ -1259,7 +1258,7 @@ Plain(_store_continuation(false)_). For efficiency, the `rmem` tool allows this transition only when it is not possible to take the <<commit_sc, Commit and propagate store operation of an sc>> transition. This does not affect the set of allowed final states, but when explored interactively, if the `sc` -should fail one should use the <<early_sc_fail, Eaarly sc fail>> transition instead of waiting for this transition. +should fail one should use the <<early_sc_fail, Early sc fail>> transition instead of waiting for this transition. ==== [[complete_stores]] ===== Complete store operations @@ -1434,6 +1433,3 @@ accesses). memory is not involved in the transition. Instead, the model depends on an external oracle that provides an opcode when given a memory location. * The model does not cover exceptions, traps and interrupts. - - - diff --git a/src/mm-herd.adoc b/src/mm-herd.adoc index f1c0fd8..6eeffcd 100644 --- a/src/mm-herd.adoc +++ b/src/mm-herd.adoc @@ -37,7 +37,7 @@ let fence.tso = let f = fencerel(Fence.tso) in ([W];f;[W]) | ([R];f;[M]) -let fence = +let fence = fence.r.r | fence.r.w | fence.r.rw | fence.w.r | fence.w.w | fence.w.rw | fence.rw.r | fence.rw.w | fence.rw.rw | diff --git a/src/naming.adoc b/src/naming.adoc index 0aaa177..9b3f62d 100644 --- a/src/naming.adoc +++ b/src/naming.adoc @@ -17,7 +17,7 @@ The ISA naming strings are case insensitive. === Base Integer ISA -RISC-V ISA strings begin with either RV32I, RV32E, RV64I, RV64E, or RV128I +RISC-V ISA strings begin with either RV32I, RV32E, RV64I, or RV64E, indicating the supported address space size in bits for the base integer ISA. @@ -45,42 +45,18 @@ Some ISA extensions depend on the presence of other extensions, e.g., may be implicit in the ISA name: for example, RV32IF is equivalent to RV32IFZicsr, and RV32ID is equivalent to RV32IFD and RV32IFDZicsr. -=== Version Numbers - -Recognizing that instruction sets may expand or alter over time, we -encode extension version numbers following the extension name. Version -numbers are divided into major and minor version numbers, separated by a -"p". If the minor version is "0", then "p0" can be omitted from -the version string. Changes in major version numbers imply a loss of -backwards compatibility, whereas changes in only the minor version -number must be backwards-compatible. For example, the original 64-bit -standard ISA defined in release 1.0 of this manual can be written in -full as "RV64I1p0M1p0A1p0F1p0D1p0", more concisely as -"RV64I1M1A1F1D1". - -We introduced the version numbering scheme with the second release. -Hence, we define the default version of a standard extension to be the -version present at that time, e.g., "RV32I" is equivalent to -"RV32I2". - === Underscores Underscores "_" may be used to separate ISA extensions to improve readability and to provide disambiguation, e.g., "RV32I2_M2_A2". -Because the "P" extension for Packed SIMD can be confused for the -decimal point in a version number, it must be preceded by an underscore -if it follows a number. For example, "rv32i2p2" means version 2.2 of -RV32I, whereas "rv32i2_p2" means version 2.0 of RV32I with version 2.0 -of the P extension. - === Additional Standard Unprivileged Extension Names -Standard unprivileged extensions can also be named using a single "Z" followed by -an alphabetical name and an optional version number. For example, -"Zifencei" names the instruction-fetch fence extension described in -<<zifencei>>; "Zifencei2" and -"Zifencei2p0" name version 2.0 of same. +Standard unprivileged extensions can also be named by using a single "Z" followed by an +alphanumeric name. The name must end with an alphabetical character. +The second letter from the end cannot be numeric if the +last letter is "p". For example, "Zifencei" names the instruction-fetch fence extension +described in <<zifencei>>. The first letter following the "Z" conventionally indicates the most closely related alphabetical extension category, IMAFDQLCBKJTPVH. For the @@ -88,54 +64,60 @@ closely related alphabetical extension category, IMAFDQLCBKJTPVH. For the indicates the extension is related to the "F" standard extension. If multiple "Z" extensions are named, they should be ordered first by category, then alphabetically within a category—for example, -"Zicsr_Zifencei_Zam". +"Zicsr_Zifencei_Ztso". All multi-letter extensions, including those with the "Z" prefix, must be separated from other multi-letter extensions by an underscore, e.g., "RV32IMACZicsr_Zifencei". -=== Supervisor-level Instruction-Set Extensions +=== Supervisor-level Instruction-Set Extension Names Standard extensions that extend the supervisor-level virtual-memory -architecture are prefixed with the letters "Sv", followed by an alphabetical -name and an optional version number, or by a numeric name with no version number. -Other standard extensions that extend -the supervisor-level architecture are prefixed with the letters "Ss", -followed by an alphabetical name and an optional version number. Such -extensions are defined in Volume II. +architecture are prefixed with the letters "Sv", followed by an alphanumeric +name. Other standard extensions that extend the supervisor-level architecture are +prefixed with the letters "Ss", followed by an alphanumeric name. The name +must end with an alphabetical character. The second letter from the end cannot +be numeric if the last letter is "p". These extensions are further defined in +Volume II. + +The extensions "sv32", "sv39", "sv48", and "sv59" were defined before the rule +against extension names ending in numbers was established. Standard supervisor-level extensions should be listed after standard -unprivileged extensions. If multiple supervisor-level extensions are -listed, they should be ordered alphabetically. +unprivileged extensions, and like other multi-letter extensions, must be +separated from other multi-letter extensions by an underscore. If multiple +supervisor-level extensions are listed, they should be ordered alphabetically. -=== Hypervisor-level Instruction-Set Extensions +=== Hypervisor-level Instruction-Set Extension Names Standard extensions that extend the hypervisor-level architecture are prefixed with the letters "Sh". If multiple hypervisor-level extensions are listed, they should be ordered alphabetically. -NOTE: Many augmentations to the hypervisor-level archtecture are more +NOTE: Many augmentations to the hypervisor-level architecture are more naturally defined as supervisor-level extensions, following the scheme described in the previous section. The "Sh" prefix is used by the few hypervisor-level extensions that have no supervisor-visible effects. -=== Machine-level Instruction-Set Extensions +=== Machine-level Instruction-Set Extension Names Standard machine-level instruction-set extensions are prefixed with the letters "Sm". Standard machine-level extensions should be listed after standard -lesser-privileged extensions. If multiple machine-level extensions are -listed, they should be ordered alphabetically. +lesser-privileged extensions, and like other multi-letter extensions, must be +separated from other multi-letter extensions by an underscore. If multiple +machine-level extensions are listed, they should be ordered alphabetically. === Non-Standard Extension Names -Non-standard extensions are named using a single "X" followed by an -alphabetical name and an optional version number. For example, -"Xhwacha" names the Hwacha vector-fetch ISA extension; "Xhwacha2" -and "Xhwacha2p0" name version 2.0 of same. +Non-standard extensions are named by using a single "X" followed by the alphanumeric +name. The name must end with an alphabetic character. The +second letter from the end cannot be numeric if the last letter is +"p". For example, "Xhwacha" names the Hwacha vector-fetch ISA +extension. Non-standard extensions must be listed after all standard extensions, and, like other multi-letter extensions, must be separated from other multi-letter @@ -144,7 +126,35 @@ For example, an ISA with non-standard extensions Argle and Bargle may be named "RV64IZifencei_Xargle_Xbargle". If multiple non-standard extensions are listed, they should be ordered -alphabetically. +alphabetically. Like other multi-letter extensions, they should be +separated from other multi-letter extensions by an underscore. + +=== Version Numbers + +Recognizing that instruction sets may expand or alter over time, we +encode extension version numbers following the extension name. Version +numbers are divided into major and minor version numbers, separated by a +"p". If the minor version is "0", then "p0" can be omitted from +the version string. To avoid ambiguity, no extension name may end with a number +or a "p" preceded by a number. + +Because the "P" extension for Packed SIMD can be confused for the +decimal point in a version number, it must be preceded by an underscore +if it follows another extension with a version number. For example, "rv32i2p2" +means version 2.2 of RV32I, whereas "rv32i2_p2" means version 2.0 of RV32I with +version 2.0 of the P extension. + +Changes in major version numbers imply a loss of +backwards compatibility, whereas changes in only the minor version +number must be backwards-compatible. For example, the original 64-bit +standard ISA defined in release 1.0 of this manual can be written in +full as "RV64I1p0M1p0A1p0F1p0D1p0", more concisely as +"RV64I1M1A1F1D1". + +We introduced the version numbering scheme with the second release. +Hence, we define the default version of a standard extension to be the +version present at that time, e.g., "RV32I" is equivalent to +"RV32I2". === Subset Naming Convention @@ -198,6 +208,10 @@ e.g., RV32IMACV is legal, whereas RV32IMAVC is not. |Supervisor-level extension "def" |Ssdef | +3+|*Standard Hypervisor-Level Extensions* + +|Hypervisor-level extension "ghi" |Shghi | + 3+|*Standard Machine-Level Extensions* |Machine-level extension "jkl" |Smjkl | diff --git a/src/p-st-ext.adoc b/src/p-st-ext.adoc deleted file mode 100644 index fabd30b..0000000 --- a/src/p-st-ext.adoc +++ /dev/null @@ -1,11 +0,0 @@ -[[packedsimd]] -== "P" Extension for Packed-SIMD Instructions, Version 0.2 -[NOTE] -==== -Discussions at the 5th RISC-V workshop indicated a desire to drop this -packed-SIMD proposal for floating-point registers in favor of -standardizing on the V extension for large floating-point SIMD -operations. However, there was interest in packed-SIMD fixed-point -operations for use in the integer registers of small RISC-V -implementations. A task group is working to define the new P extension. -==== diff --git a/src/priv-cfi.adoc b/src/priv-cfi.adoc index 082ceb7..ae67046 100644 --- a/src/priv-cfi.adoc +++ b/src/priv-cfi.adoc @@ -13,7 +13,7 @@ details on these CFI capabilities and the associated Unprivileged ISA. This section specifies the Privileged ISA for the Zicfilp extension. -[[FCIFIACT]] +[[FCFIACT]] ==== Landing-Pad-Enabled (LPE) State The term `xLPE` is used to determine if forward-edge CFI using landing pads @@ -88,23 +88,19 @@ When a trap is taken into privilege mode `x`, the `__x__PELP` is set to `ELP` and `ELP` is set to `NO_LP_EXPECTED`. An `MRET` or `SRET` instruction is used to return from a trap in M-mode or -S-mode, respectively. When executing an `__x__RET` instruction, if `__x__PP` -holds the value `y`, then `ELP` is set to the value of `__x__PELP` if `__y__LPE` -is 1; otherwise, it is set to `NO_LP_EXPECTED`; `__x__PELP` is set to -`NO_LP_EXPECTED`. +S-mode, respectively. When executing an `__x__RET` instruction, if the new +privilege mode is `y`, then `ELP` is set to the value of `__x__PELP` if +`__y__LPE` (see <<FCFIACT>>) is 1; otherwise, it is set to `NO_LP_EXPECTED`; +`__x__PELP` is set to `NO_LP_EXPECTED`. Upon entry into Debug Mode, the `pelp` bit in `dcsr` is updated with the `ELP` at the privilege level the hart was previously in, and the `ELP` is set to -`NO_LP_EXPECTED`. When a hart resumes from Debug Mode, if `dcsr.prv` holds the -value `y`, then `ELP` is set to the value of `pelp` if `__y__LPE` is 1; -otherwise, it is set to `NO_LP_EXPECTED`. +`NO_LP_EXPECTED`. When a hart resumes from Debug Mode, if the new privilege mode +is `y`, then `ELP` is set to the value of `pelp` if `__y__LPE` (see <<FCFIACT>>) +is 1; otherwise, it is set to `NO_LP_EXPECTED`. -When the Smrnmi extension is implemented, a `MNPELP` field (bit 9) -is provided in the `mnstatus` CSR to hold the previous `ELP` state on a trap to -the RNMI handler. When a RNMI trap is delivered, the `MNPELP` is set to `ELP` -and `ELP` set to `NO_LP_EXPECTED`. Upon a `MNRET`, if the `mnstatus.MNPP` holds -the value `y`, then `ELP` is set to the value of `MNPELP` if `yLPE` is 1; -otherwise, it is set to `NO_LP_EXPECTED`. +See also <<rnmi>> for semantics added to the RNMI trap and the MNRET instruction +when this extension is implemented. [NOTE] ==== @@ -128,17 +124,17 @@ This section specifies the Privileged ISA for the Zicfiss extension. ==== Shadow Stack Pointer (`ssp`) CSR access control Attempts to access the `ssp` CSR may result in either an illegal-instruction -exception or a virtual instruction exception, contingent upon the state of the +exception or a virtual-instruction exception, contingent upon the state of the *__x__*`envcfg.SSE` fields. The conditions are specified as follows: * If the privilege mode is less than M and `menvcfg.SSE` is 0, an illegal-instruction exception is raised. * Otherwise, if in U-mode and `senvcfg.SSE` is 0, an illegal-instruction exception is raised. -* Otherwise, if in VS-mode and `henvcfg.SSE` is 0, a virtual instruction +* Otherwise, if in VS-mode and `henvcfg.SSE` is 0, a virtual-instruction exception is raised. * Otherwise, if in VU-mode and either `henvcfg.SSE` or `senvcfg.SSE` is 0, - a virtual instruction exception is raised. + a virtual-instruction exception is raised. * Otherwise, the access is allowed. ==== Shadow-Stack-Enabled (SSE) State @@ -218,8 +214,10 @@ instruction will result in a store/AMO access-fault exception. Memory mapped as an SS page cannot be written to by instructions other than `SSAMOSWAP.W/D`, `SSPUSH`, and `C.SSPUSH`. Attempts will raise a store/AMO -access-fault exception. Implicit accesses, including instruction fetches to an SS -page, are not permitted. Such accesses will raise an access-fault exception +access-fault exception. Access to a SS page using _cache-block operation_ +(`CBO.*`) instructions is not permitted. Such accesses will raise a store/AMO +access-fault exception. Implicit accesses, including instruction fetches to an +SS page, are not permitted. Such accesses will raise an access-fault exception appropriate to the access type. However, the shadow stack is readable by all instructions that only load from memory. @@ -229,7 +227,7 @@ Stores to shadow stack pages by instructions other than `SSAMOSWAP`, `SSPUSH`, and `C.SSPUSH` will trigger a store/AMO access-fault exception, not a store/AMO page-fault exception, signaling a fatal error. A store/AMO page-fault suggests that the operating system could address and rectify the fault, which is not -feasible in this scenario. Hence, the page fault handler must decode the opcode +feasible in this scenario. Hence, the page-fault handler must decode the opcode of the faulting instruction to discern whether the fault was caused by a non-shadow-stack instruction writing to an SS page (a fatal condition) or by a shadow stack instruction to a non-resident page (a recoverable condition). The @@ -271,17 +269,6 @@ of as "store/AMO/SS" exceptions, indicating that the trapping instruction is either a store, an AMO, or a shadow stack instruction. ==== -[NOTE] -==== -The H (hypervisor) extension specifies that when a guest-page fault is caused by -an implicit memory access of VS-stage address translation, the reported -exception is either a load or store/AMO guest-page fault based not on the -original instruction type but rather on whether the memory access attempted for -VS-stage translation was a read or a write of memory. VS-stage address -translation can thus cause a shadow stack instruction to raise a load -guest-page-fault exception. -==== - Shadow stack instructions are restricted to accessing shadow stack (`pte.xwr=010b`) pages. Should a shadow stack instruction access a page that is not designated as a shadow stack page and is not marked as read-only @@ -294,7 +281,7 @@ store/AMO page-fault exception will be triggered. ==== Shadow stack loads and stores will trigger a store/AMO page-fault if the accessed page is read-only, to support copy-on-write (COW) of a shadow stack -page. If the page has been marked read-only for COW tracking, the page fault +page. If the page has been marked read-only for COW tracking, the page-fault handler responds by creating a copy of the page and updates the `pte.xwr` to `010b`, thereby designating each copy as a shadow stack page. Conversely, if the access targets a genuinely read-only page, the fault being reported as a @@ -331,9 +318,9 @@ that the instruction must not be emulated by a trap handler. ==== Correct execution of shadow stack instructions that access memory requires the -the accessed memory to be idempotent. If the memory referenced by +the accessed memory to be idempotent. If the memory referenced by `SSPUSH`/`C.SSPUSH`/`SSPOPCHK`/`C.SSPOPCHK`/`SSAMOSWAP.W/D` instructions is not -idempotent, then the instructions cause a store/AMO access-fault exception. +idempotent, then the instructions cause a store/AMO access-fault exception. [NOTE] ==== @@ -358,7 +345,7 @@ The G-stage address translation and protections remain unaffected by the Zicfiss extension. The `xwr == 010b` encoding in the G-stage PTE remains reserved. When G-stage page tables are active, the shadow stack instructions that access memory require the G-stage page table to have read-write permission for the accessed -memory; else a store/AMO guest-page fault exception is raised. +memory; else a store/AMO guest-page-fault exception is raised. [NOTE] ==== diff --git a/src/priv-csrs.adoc b/src/priv-csrs.adoc index 5104164..3c2a7ad 100644 --- a/src/priv-csrs.adoc +++ b/src/priv-csrs.adoc @@ -86,8 +86,8 @@ Note that not all registers are required on all implementations. [.monofont] |=== 3+^|CSR Address 2.2+|Hex 3.2+|Use and Accessibility -|[11:10] |[9:8] |[7:4] -8+|Unprivileged and User-Level CSRs +|[11:10] |[9:8] |[7:4] +8+|Unprivileged and User-Level CSRs m|00 m|00 m|XXXX 2+m| 0x000-0x0FF 3+|Standard read/write |`01` |`00` |`XXXX` 2+| `0x400-0x4FF` 3+|Standard read/write |`10` |`00` |`XXXX` 2+| `0x800-0x8FF` 3+|Custom read/write @@ -145,7 +145,7 @@ m|00 m|00 m|XXXX 2+m| 0x000-0x0FF 3+|Standard read/write `0x003` |URW + URW + -URW +URW |`fflags` + `frm` + `fcsr` @@ -153,16 +153,59 @@ URW Floating-Point Dynamic Rounding Mode. + Floating-Point Control and Status Register (`frm` +`fflags`). +4+^|Unprivileged Vector CSRs + +|`0x008` + +`0x009` + +`0x00A` + +`0x00F` + +`0xC20` + +`0xC21` + +`0xC22` +|URW + +URW + +URW + +URW + +URO + +URO + +URO +|`vstart` + +`vxsat` + +`vxrm` + +`vcsr` + +`vl` + +`vtype` + +`vlenb` +|Vector start position. + +Fixed-point accrued saturation flag. + +Fixed-point rounding mode. + +Vector control and status register. + +Vector length. + +Vector data type register. + +Vector register length in bytes. + 4+^|Unprivileged Zicfiss extension CSR |`0x011` + |URW + |`ssp` + |Shadow Stack Pointer. + +4+^|Unprivileged Entropy Source Extension CSR +|`0x015` + +|URW + +|`seed` + +|Seed for cryptographic random bit generators. + + +4+^|Unprivileged Zcmt Extension CSR +|`0x017` + +|URW + +|`jvt` + +|Table jump base vector and control register. + + 4+^|Unprivileged Counter/Timers |`0xC00` + -`0xC01` + +`0xC01` + `0xC02` + `0xC03` + `0xC04` + @@ -174,7 +217,7 @@ Floating-Point Control and Status Register (`frm` +`fflags`). `0xC83` + `0xC84` +   + -`0xC9F` +`0xC9F` |URO + URO + URO + @@ -188,20 +231,20 @@ URO + URO + URO +   + -URO +URO |`cycle` + -`time` + -`instret` + -`hpmcounter3` + -`hpmcounter4` + +`time` + +`instret` + +`hpmcounter3` + +`hpmcounter4` + ⋮ + -`hpmcounter31` + +`hpmcounter31` + `cycleh` + -`timeh` + -`instreth` + -`hpmcounter3h` + +`timeh` + +`instreth` + +`hpmcounter3h` + `hpmcounter4h` + -⋮ + +⋮ + `hpmcounter31h` |Cycle counter for RDCYCLE instruction. + Timer for RDTIME instruction. + @@ -238,7 +281,7 @@ SRW + SRW |`sstatus` + `sie` + -`stvec` + +`stvec` + `scounteren` |Supervisor status register. + Supervisor interrupt-enable register. + @@ -273,23 +316,65 @@ SRO `stval` + `sip` + `scountovf` -|Scratch register for supervisor trap handlers. + +|Supervisor scratch register. + Supervisor exception program counter. + Supervisor trap cause. + -Supervisor bad address or instruction. + +Supervisor trap value. + Supervisor interrupt pending. + Supervisor count overflow. +4+^|Supervisor Indirect + +|`0x150` + +`0x151` + +`0x152` + +`0x153` + +`0x155` + +`0x156` + +`0x157` +|SRW + +SRW + +SRW + +SRW + +SRW + +SRW + +SRW +|`siselect` + +`sireg` + +`sireg2` + +`sireg3` + +`sireg4` + +`sireg5` + +`sireg6` +|Supervisor indirect register select. + +Supervisor indirect register alias. + +Supervisor indirect register alias 2. + +Supervisor indirect register alias 3. + +Supervisor indirect register alias 4. + +Supervisor indirect register alias 5. + +Supervisor indirect register alias 6. + 4+^|Supervisor Protection and Translation |`0x180` |SRW |`satp` |Supervisor address translation and protection. +4+^|Supervisor Timer Compare + +|`0x14D` + +`0x15D` +|SRW + +SRW +|`stimecmp` + +`stimecmph` +|Supervisor timer compare. + +Upper 32 bits of `stimecmp`, RV32 only. + 4+^|Debug/Trace Registers |`0x5A8` |SRW |`scontext` |Supervisor-mode context register. -//4+^|Supervisor Resource Management Configuration -//|`0x181` |SRW |`srmcfg` |Supervisor Resource Management Configuration. +4+^|Supervisor Resource Management Configuration +|`0x181` |SRW |`srmcfg` |Supervisor Resource Management Configuration. 4+^|Supervisor State Enable Registers |`0x10C` + @@ -309,6 +394,20 @@ Supervisor count overflow. Supervisor State Enable 2 Register. + Supervisor State Enable 3 Register. +4+^|Supervisor Control Transfer Records Configuration +|`0x14E` + + `0x14F` + + `0x15F` +|SRW + + SRW + + SRW +|`sctrctl` + + `sctrstatus` + + `sctrdepth` +|Supervisor Control Transfer Records Control Register. + + Supervisor Control Transfer Records Status Register. + + Supervisor Control Transfer Records Depth Register. + |=== <<< @@ -327,18 +426,18 @@ Supervisor count overflow. `0x606` + `0x607` + `0x612` -|HRW + +|HRW + HRW + HRW + HRW + HRW + HRW + -HRW +HRW |`hstatus` + `hedeleg` + -`hideleg` + -`hie` + -`hcounteren` + +`hideleg` + +`hie` + +`hcounteren` + `hgeie` + `hedelegh` |Hypervisor status register. + @@ -355,7 +454,7 @@ Upper 32 bits of `hedeleg`, RV32 only. `0x644` + `0x645` + `0x64A` + -`0xE12` +`0xE12` |HRW + HRW + HRW + @@ -363,10 +462,10 @@ HRW + HRO |`htval` + `hip` + -`hvip` + +`hvip` + `htinst` + `hgeip` -|Hypervisor bad guest physical address. + +|Hypervisor trap value. + Hypervisor interrupt pending. + Hypervisor virtual interrupt pending. + Hypervisor trap instruction (transformed). + @@ -377,9 +476,9 @@ Hypervisor guest external interrupt pending. |`0x60A` + `0x61A` |HRW + -HRM +HRW |`henvcfg` + -`henvcfgh` +`henvcfgh` |Hypervisor environment configuration register. + Upper 32 bits of `henvcfg`, RV32 only. @@ -446,7 +545,7 @@ Upper 32 bits of `htimedelta`, RV32 only. `0x242` + `0x243` + `0x244` + -`0x280` +`0x280` |HRW + HRW + HRW + @@ -455,15 +554,15 @@ HRW + HRW + HRW + HRW + -HRW +HRW |`vsstatus` + `vsie` + -`vstvec` + +`vstvec` + `vsscratch` + `vsepc` + -`vscause` + +`vscause` + `vstval` + -`vsip` + +`vsip` + `vsatp` |Virtual supervisor status register. + Virtual supervisor interrupt-enable register. + @@ -471,10 +570,58 @@ Virtual supervisor trap handler base address. + Virtual supervisor scratch register. + Virtual supervisor exception program counter. + Virtual supervisor trap cause. + -Virtual supervisor bad address or instruction. + +Virtual supervisor trap value. + Virtual supervisor interrupt pending. + Virtual supervisor address translation and protection. +4+^|Virtual Supervisor Indirect + +|`0x250` + +`0x251` + +`0x252` + +`0x253` + +`0x255` + +`0x256` + +`0x257` +|HRW + +HRW + +HRW + +HRW + +HRW + +HRW + +HRW +|`vsiselect` + +`vsireg` + +`vsireg2` + +`vsireg3` + +`vsireg4` + +`vsireg5` + +`vsireg6` +|Virtual supervisor indirect register select. + +Virtual supervisor indirect register alias. + +Virtual supervisor indirect register alias 2. + +Virtual supervisor indirect register alias 3. + +Virtual supervisor indirect register alias 4. + +Virtual supervisor indirect register alias 5. + +Virtual supervisor indirect register alias 6. + +4+^|Virtual Supervisor Timer Compare + +|`0x24D` + +`0x25D` +|HRW + +HRW +|`vstimecmp` + +`vstimecmph` +|Virtual supervisor timer compare. + +Upper 32 bits of `vstimecmp`, RV32 only. + +4+^|Virtual Supervisor Control Transfer Records Configuration +|`0x24E` +|HRW +|`vsctrctl` +|Virtual Supervisor Control Transfer Records Control Register. + |=== <<< @@ -527,9 +674,9 @@ MRW + MRW + MRW + MRW + -MRW +MRW |`mstatus` + -`misa` + +`misa` + `medeleg` + `mideleg` + `mie` + @@ -547,6 +694,25 @@ Machine counter enable. + Additional machine status register, RV32 only. + Upper 32 bits of `medeleg`, RV32 only. +4+^|Machine Counter Configuration + +|`0x321` + +`0x322` + +`0x721` + +`0x722` +|MRW + +MRW + +MRW + +MRW +|`mcyclecfg` + +`minstretcfg` + +`mcyclecfgh` + +`minstretcfgh` +|Machine cycle counter configuration register. + +Machine instret counter configuration register. + +Upper 32 bits of `mcyclecfg`, RV32 only. + +Upper 32 bits of `minstretcfg`, RV32 only. + 4+^|Machine Trap Handling |`0x340` + @@ -555,7 +721,7 @@ Upper 32 bits of `medeleg`, RV32 only. `0x343` + `0x344` + `0x34A` + -`0x34B` +`0x34B` |MRW + MRW + MRW + @@ -569,27 +735,58 @@ MRW `mtval` + `mip` + `mtinst` + -`mtval2` -|Scratch register for machine trap handlers. + +`mtval2` +|Machine scratch register. + Machine exception program counter. + Machine trap cause. + -Machine bad address or instruction. + +Machine trap value. + Machine interrupt pending. + Machine trap instruction (transformed). + -Machine bad guest physical address. +Machine second trap value. + +4+^|Machine Indirect + +|`0x350` + +`0x351` + +`0x352` + +`0x353` + +`0x355` + +`0x356` + +`0x357` +|MRW + +MRW + +MRW + +MRW + +MRW + +MRW + +MRW +|`miselect` + +`mireg` + +`mireg2` + +`mireg3` + +`mireg4` + +`mireg5` + +`mireg6` +|Machine indirect register select. + +Machine indirect register alias. + +Machine indirect register alias 2. + +Machine indirect register alias 3. + +Machine indirect register alias 4. + +Machine indirect register alias 5. + +Machine indirect register alias 6. 4+^|Machine Configuration |`0x30A` + `0x31A` + `0x747` + -`0x757` +`0x757` |MRW + MRW + MRW + -MRW +MRW |`menvcfg` + -`menvcfgh` + +`menvcfgh` + `mseccfg` + `mseccfgh` |Machine environment configuration register. + @@ -626,7 +823,7 @@ MRW `pmpcfg2` + `pmpcfg3` + ⋯ + -`pmpcfg14` + +`pmpcfg14` + `pmpcfg15` + `pmpaddr0` + `pmpaddr1` + @@ -691,7 +888,7 @@ Physical memory protection address register. |`0x740` + `0x741` + `0x742` + -`0x744` +`0x744` |MRW + MRW + MRW + @@ -767,7 +964,7 @@ Upper 32 bits of `mhpmcounter31`, RV32 only. `0x724` +   + `0x73F` -|MRW + +|MRW + MRW + MRW +   + @@ -795,6 +992,11 @@ Upper 32 bits of `mhpmevent4`, RV32 only. +   + Upper 32 bits of `mhpmevent31`, RV32 only. +4+^|Machine Control Transfer Records Configuration +|`0x34E` +|MRW +|`mctrctl` +|Machine Control Transfer Records Control Register. 4+^|Debug/Trace Registers (shared with Debug Mode) diff --git a/src/priv-preface.adoc b/src/priv-preface.adoc index 25712c5..69fc0a5 100644 --- a/src/priv-preface.adoc +++ b/src/priv-preface.adoc @@ -1,44 +1,233 @@ [colophon] -= Preface +== Preface +// Had to make the above a level 1 heading (two equals signs) to avoid error when building +// the ISA manual as a book with other "parts". This is opposite to what the adoc says to do +// but otherwise asciidoctor creates the error message: +// +// asciidoctor: ERROR: ext/riscv-isa-manual/src/priv-preface.adoc: line 2: invalid part, must have at least one section (e.g., chapter, appendix, etc.) +// +// See asciidoctor doc which seems wrong: https://docs.asciidoctor.org/asciidoc/latest/sections/colophon/ +[.big]*_Preface to Version 20250508_* -[.big]*_Preface to Version 20240528_* +This document describes the RISC-V privileged architecture. + +The ISA modules marked *Ratified* have been ratified at this time. The +modules marked _Frozen_ are not expected to change significantly before +being put up for ratification. The modules marked _Draft_ are expected +to change before ratification. + +The document contains the following versions of the RISC-V ISA modules: + +[%autowidth,float="center",align="center",cols="^,<,^",options="header",] +|=== +|Module |Version |Status +|*Machine ISA* + +*Smstateen Extension* + +*Smcsrind/Sscsrind Extension* + +*Smepmp Extension* + +*Smcntrpmf Extension* + +*Smrnmi Extension* + +*Smcdeleg Extension* + +*Smdbltrp Extension* + +*Smctr* + +*Supervisor ISA* + +*Svade Extension* + +*Svnapot Extension* + +*Svpbmt Extension* + +*Svinval Extension* + +*Svadu Extension* + +*Svvptc* + +*Ssqosid* + +*Sstc Extension* + +*Sscofpmf Extension* + +*Ssdbltrp Extension* + +*Ssqosid Extension* + +*Hypervisor ISA* + +*Shlcofideleg Extension* + +*Svvptc Extension* + +*Pointer Masking* + +|*1.13* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.13* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +|*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* +|=== + +The following changes have been made since version 20241101: + +* Addition of the Smctr Control Transfer Records extension. +* Addition of the Svvptc Extension for Obviating Memory-Management Instructions after Marking PTEs Valid. +* Addition of the Ssqosid Extension for Quality-of-Service Identifiers +* Addition of the Pointer Masking Extension + +[.big]*_Preface to Version 20241101_* + +This document describes the RISC-V privileged architecture. + +The ISA modules marked *Ratified* have been ratified at this time. The +modules marked _Frozen_ are not expected to change significantly before +being put up for ratification. The modules marked _Draft_ are expected +to change before ratification. + +The document contains the following versions of the RISC-V ISA modules: + +[%autowidth,float="center",align="center",cols="^,<,^",options="header",] +|=== +|Module |Version |Status +|*Machine ISA* + +*Smstateen Extension* + +*Smcsrind/Sscsrind Extension* + +*Smepmp Extension* + +*Smcntrpmf Extension* + +*Smrnmi Extension* + +*Smcdeleg Extension* + +*Smdbltrp Extension* + +*Supervisor ISA* + +*Svade Extension* + +*Svnapot Extension* + +*Svpbmt Extension* + +*Svinval Extension* + +*Svadu Extension* + +*Sstc Extension* + +*Sscofpmf Extension* + +*Ssdbltrp Extension* + +*Ssqosid Extension* + +*Hypervisor ISA* + +*Shlcofideleg Extension* + +*Svvptc Extension* + +|*1.13* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.13* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +*1.0* + +|*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* +|=== + +[.big]*_Preface to Version 20241017_* This document describes the RISC-V privileged architecture. This -release, version 20240528, contains the following versions of the RISC-V ISA +release, version 20241017, contains the following versions of the RISC-V ISA modules: [%autowidth,float="center",align="center",cols="^,<,^",options="header",] |=== |Module |Version |Status -|_Machine ISA_ + +|*Machine ISA* + *Smstateen Extension* + *Smcsrind/Sscsrind Extension* + *Smepmp* + -**Smcntrpmf* + +*Smcntrpmf* + *Smrnmi Extension* + *Smcdeleg* + -_Smdbltrp_ + -_Supervisor ISA_ + +*Smdbltrp* + +*Supervisor ISA* + *Svade Extension* + -*Svnapot Extension* + -*Svpbmt Extension* + -*Svinval Extension* + +*Svnapot Extension* + +*Svpbmt Extension* + +*Svinval Extension* + *Svadu Extension* + *Sstc* + *Sscofpmf* + -_Ssdbltrp_ + +*Ssdbltrp* + *Hypervisor ISA* + -_Shlcofideleg_ +*Shlcofideleg* + +*Svvptc* -|_1.13_ + +|*1.13* + +*1.0* + +*1.0* + *1.0* + *1.0* + *1.0* + *1.0* + *1.0* + +*1.13* + *1.0* + -_1.0_ + -_1.13_ + *1.0* + *1.0* + *1.0* + @@ -46,19 +235,17 @@ _1.13_ + *1.0* + *1.0* + *1.0* + -_1.0_ + *1.0* + -_0.1_ +*1.0* + +*1.0* -|_Draft_ + +|*Ratified* + *Ratified* + *Ratified* + *Ratified* + *Ratified* + *Ratified* + *Ratified* + -_Draft_ + -_Draft_ + *Ratified* + *Ratified* + *Ratified* + @@ -66,9 +253,12 @@ _Draft_ + *Ratified* + *Ratified* + *Ratified* + -_Draft_ + *Ratified* + -_Draft_ +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* + +*Ratified* |=== The following changes have been made since version 1.12 of the Machine and @@ -89,12 +279,13 @@ implemented. * Defined the misaligned atomicity granule PMA, superseding the proposed Zam extension. * Allocated interrupt 13 for Sscofpmf LCOFI interrupt. -* Defined hardware error and software check exception codes. +* Defined hardware-error and software-check exception codes. * Specified synchronization requirements when changing the PBMTE fields in `menvcfg` and `henvcfg`. -* Exposed count-overflow interrups to VS-mode via the Shlcofideleg extension. +* Exposed count-overflow interrupts to VS-mode via the Shlcofideleg extension. +* Relaxed behavior of some HINTs when MXLEN > XLEN. -Finally, the following clarifications and document improvments have been made +Finally, the following clarifications and document improvements have been made since the last document release: * Transliterated the document from LaTeX into AsciiDoc. @@ -112,6 +303,9 @@ be set to a nonzero value but sometimes not. * Clarified exception behavior of unimplemented or inaccessible CSRs. * Clarified that Svpbmt allows implementations to override additional PMAs. * Replaced the concept of vacant memory regions with inaccessible memory or I/O regions. +* Clarified that timer and count-overflow interrupts' arrival in + interrupt-pending registers is not immediate. +* Clarified that MXR affects only explicit memory accesses. [.big]*_Preface to Version 20211203_* @@ -123,7 +317,7 @@ ISA modules: |=== |Module |Version |Status |*Machine ISA* + -*Supervisor ISA* + +*Supervisor ISA* + *Svnapot Extension* + *Svpbmt Extension* + *Svinval Extension* + @@ -212,10 +406,10 @@ contains the following versions of the RISC-V ISA modules: |Module |Version |Status |*Machine ISA* + *Supervisor ISA* + -_Hypervisor ISA_ +_Hypervisor ISA_ |*1.11* + -*1.11* + -_0.3_ +*1.11* + +_0.3_ |*Ratified* + *Ratified* + _Draft_ diff --git a/src/q-st-ext.adoc b/src/q-st-ext.adoc index 3940ea7..b4c7502 100644 --- a/src/q-st-ext.adoc +++ b/src/q-st-ext.adoc @@ -17,7 +17,7 @@ value. New 128-bit variants of LOAD-FP and STORE-FP instructions are added, encoded with a new value for the funct3 width field. -include::images/wavedrom/quad-ls.adoc[] +include::images/wavedrom/quad-ls.edn[] [[quad-ls]] //.Quad-precision load and store @@ -47,7 +47,7 @@ The quad-precision floating-point computational instructions are defined analogously to their double-precision counterparts, but operate on quad-precision operands and produce quad-precision results. -include::images/wavedrom/quad-compute.adoc[] +include::images/wavedrom/quad-compute.edn[] [[quad-compute]] //.Quad-precision computational @@ -64,7 +64,7 @@ FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values. FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions. Note FCVT.Q.L[U] always produces an exact result and is unaffected by rounding mode. -include::images/wavedrom/quad-cnvrt-mv.adoc[] +include::images/wavedrom/quad-cnvrt-mv.edn[] [[quad-cnvrt-mv]] //.Quad-precision convert and move @@ -76,15 +76,15 @@ single-precision floating-point number, or vice-versa, respectively. FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively. -include::images/wavedrom/quad-cnvt-interchange.adoc[] -[[quad-convrt-interchange]] +include::images/wavedrom/quad-cnvt-interchange.edn[] +[[quad-convert-interchange]] //.Quad-precision convert and move interchangeably Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction. -include::images/wavedrom/quad-cnvrt-intch-xqqx.adoc[] +include::images/wavedrom/quad-cnvrt-intch-xqqx.edn[] [[quad-cnvrt-intch-xqqx]] //.Quad-precision convert and move interchangeably XQ-QX @@ -92,18 +92,13 @@ FMV.X.Q and FMV.Q.X instructions are not provided in RV32 or RV64, so quad-precision bit patterns must be moved to the integer registers via memory. -[NOTE] -==== -RV128 will support FMV.X.Q and FMV.Q.X in the Q extension. -==== - === Quad-Precision Floating-Point Compare Instructions The quad-precision floating-point compare instructions are defined analogously to their double-precision counterparts, but operate on quad-precision operands. -include::images/wavedrom/quad-float-compare.adoc[] +include::images/wavedrom/quad-float-compare.edn[] [[quad-float-compare]] //.Quad-precision floatinf-point compare @@ -113,8 +108,6 @@ The quad-precision floating-point classify instruction, FCLASS.Q, is defined analogously to its double-precision counterpart, but operates on quad-precision operands. -include::images/wavedrom/quad-float-clssfy.adoc[] +include::images/wavedrom/quad-float-clssfy.edn[] [[quad-float-clssfy]] //.Quad-precision floating point classify - - diff --git a/src/resources/riscv-spec.bib b/src/resources/riscv-spec.bib index 16e6b38..d5b77d3 100644 --- a/src/resources/riscv-spec.bib +++ b/src/resources/riscv-spec.bib @@ -34,7 +34,7 @@ pages = {138--152}, publisher = {Computer Science Press, Inc.}, address = {New York, NY, USA}, -} +} @inproceedings{Ungar:1984, author = {David Ungar and Ricki Blau and Peter Foley and Dain Samples @@ -44,7 +44,7 @@ address = {Ann Arbor, MI}, year = {1984}, pages = {188--197} -} +} @Article{spur-jsscc1989, author = {David D. Lee and Shing I. Kong and Mark D. Hill and @@ -140,52 +140,52 @@ month = {December}, year = 2012} -@ARTICLE{tremblay-vis-ieeemicro1996, -author={Tremblay, M. and O'Connor, J.M. and Narayanan, V. and Liang He}, -journal={IEEE Micro}, -title={{VIS} speeds new media processing}, -year={1996}, -month=AUG, -volume={16}, -number={4}, -pages={10 -20}, -keywords={3D graphics environments;RISC-style instructions;UltraSparc;VIS;Visual Instruction Set;media processing;media-processing algorithms;computer graphics;instruction sets;reduced instruction set computing;}, +@ARTICLE{tremblay-vis-ieeemicro1996, +author={Tremblay, M. and O'Connor, J.M. and Narayanan, V. and Liang He}, +journal={IEEE Micro}, +title={{VIS} speeds new media processing}, +year={1996}, +month=AUG, +volume={16}, +number={4}, +pages={10 -20}, +keywords={3D graphics environments;RISC-style instructions;UltraSparc;VIS;Visual Instruction Set;media processing;media-processing algorithms;computer graphics;instruction sets;reduced instruction set computing;}, ISSN={0272-1732},} -@ARTICLE{lee-max-ieeemicro1996, -author={Lee, R.B.}, -journal={IEEE Micro}, -title={Subword parallelism with {MAX-2}}, -year={1996}, -month=AUG, -volume={16}, -number={4}, -pages={51 -59}, -keywords={MAX-2;instruction extensions;media processing;parallel computation;subword parallelism;word-oriented general-purpose processor;instruction sets;multimedia computing;parallel processing;}, +@ARTICLE{lee-max-ieeemicro1996, +author={Lee, R.B.}, +journal={IEEE Micro}, +title={Subword parallelism with {MAX-2}}, +year={1996}, +month=AUG, +volume={16}, +number={4}, +pages={51 -59}, +keywords={MAX-2;instruction extensions;media processing;parallel computation;subword parallelism;word-oriented general-purpose processor;instruction sets;multimedia computing;parallel processing;}, ISSN={0272-1732},} -@ARTICLE{peleg-mmx-ieeemicro1996, -author={Peleg, A. and Weiser, U.}, -journal={IEEE Micro}, -title={{MMX} technology extension to the {Intel} architecture}, -year={1996}, -month=AUG, -volume={16}, -number={4}, -pages={42 -50}, +@ARTICLE{peleg-mmx-ieeemicro1996, +author={Peleg, A. and Weiser, U.}, +journal={IEEE Micro}, +title={{MMX} technology extension to the {Intel} architecture}, +year={1996}, +month=AUG, +volume={16}, +number={4}, +pages={42 -50}, keywords={Intel architecture;MMX;SIMD;communications;compatibility;multimedia;operating systems;microprocessor chips;parallel architectures;}, ISSN={0272-1732},} -@ARTICLE{raman-sse-ieeemicro2000, -author={Raman, S.K. and Pentkovski, V. and Keshava, J.}, -journal={IEEE Micro}, -title={Implementing streaming {SIMD} extensions on the {Pentium}-{III} processor }, -year={2000}, +@ARTICLE{raman-sse-ieeemicro2000, +author={Raman, S.K. and Pentkovski, V. and Keshava, J.}, +journal={IEEE Micro}, +title={Implementing streaming {SIMD} extensions on the {Pentium}-{III} processor }, +year={2000}, month=JUL/AUG, -volume={20}, -number={4}, -pages={47 -57}, -keywords={Internet;Pentium III developers;demanding multimedia;die size constraints;streaming SIMD extensions;instruction sets;microprocessor chips;}, +volume={20}, +number={4}, +pages={47 -57}, +keywords={Internet;Pentium III developers;demanding multimedia;die size constraints;streaming SIMD extensions;instruction sets;microprocessor chips;}, ISSN={0272-1732},} @misc{lomont-avx-irm2011, @@ -195,28 +195,28 @@ howpublished = {Intel White Paper}, year = {2011}, } -@ARTICLE{goodacre-armisa-computer2005, -author={Goodacre, J. and Sloss, A.N.}, -journal={Computer}, -title={Parallelism and the {ARM} instruction set architecture}, -year={2005}, +@ARTICLE{goodacre-armisa-computer2005, +author={Goodacre, J. and Sloss, A.N.}, +journal={Computer}, +title={Parallelism and the {ARM} instruction set architecture}, +year={2005}, month=JULY, -volume={38}, -number={7}, -pages={ 42 - 50}, -keywords={ ARM RISC processor; ARM chip design; ARM instruction set architecture; digital signal processor-like operations; exception handling; multiprocessing; reduced-instruction-set computing; subword parallelism; thread-level parallelism; variable execution time; instruction sets; microprocessor chips; parallel architectures; parallel programming; reduced instruction set computing;}, +volume={38}, +number={7}, +pages={ 42 - 50}, +keywords={ ARM RISC processor; ARM chip design; ARM instruction set architecture; digital signal processor-like operations; exception handling; multiprocessing; reduced-instruction-set computing; subword parallelism; thread-level parallelism; variable execution time; instruction sets; microprocessor chips; parallel architectures; parallel programming; reduced instruction set computing;}, ISSN={0018-9162},} -@ARTICLE{diefendorff-altivec-ieeemicro2000, -author={Diefendorff, K. and Dubey, P.K. and Hochsprung, R. and Scale, H.}, +@ARTICLE{diefendorff-altivec-ieeemicro2000, +author={Diefendorff, K. and Dubey, P.K. and Hochsprung, R. and Scale, H.}, journal={IEEE Micro}, -title={{AltiVec} extension to {PowerPC} accelerates media processing}, -year={2000}, +title={{AltiVec} extension to {PowerPC} accelerates media processing}, +year={2000}, month=MAR/APR, -volume={20}, -number={2}, -pages={85 -95}, -keywords={2D image processing;3D graphics;AltiVec extension;Apple G4;Hewlett-Packard added MAX;MDMX;MIPS architecture;MMX;Motorola's MPC 7400;PA-RISC architecture;PowerPC;PowerPC's AltiVec;SSE;Silicon Graphics;Sun enhanced Sparc;alias KNI;handwriting recognition;media mining;media processing;multimedia technologies;narrow/broadband signal processing;personal computing;digital signal processing chips;handwriting recognition;multimedia systems;parallel architectures;}, +volume={20}, +number={2}, +pages={85 -95}, +keywords={2D image processing;3D graphics;AltiVec extension;Apple G4;Hewlett-Packard added MAX;MDMX;MIPS architecture;MMX;Motorola's MPC 7400;PA-RISC architecture;PowerPC;PowerPC's AltiVec;SSE;Silicon Graphics;Sun enhanced Sparc;alias KNI;handwriting recognition;media mining;media processing;multimedia technologies;narrow/broadband signal processing;personal computing;digital signal processing chips;handwriting recognition;multimedia systems;parallel architectures;}, ISSN={0272-1732},} @misc{gwennap-mdmx-mpr1996, @@ -237,7 +237,7 @@ year = {1996}, pages = {12--25}, publisher = {IEEE Computer Society Press}, address = {Los Alamitos, CA, USA}, -} +} @InProceedings{tx2, author = {John M. Frankovich and H. Philip Peterson}, @@ -263,7 +263,7 @@ year = {1996}, series = {PACT '98}, year = {1998}, address = {Washington, DC, USA}, -} +} @inproceedings{Kim-micro2005, author = {Kim, Hyesoon and Mutlu, Onur and Stark, Jared and Patt, Yale N.}, @@ -273,16 +273,16 @@ year = {1996}, year = {2005}, location = {Barcelona, Spain}, pages = {43--54}, -} +} @INPROCEEDINGS{Gharachorloo90memoryconsistency, author = {Kourosh Gharachorloo and Daniel Lenoski and James Laudon and Phillip Gibbons and Anoop Gupta and John - Hennessy}, + Hennessy}, title = {Memory Consistency and Event Ordering in Scalable - Shared-Memory Multiprocessors}, + Shared-Memory Multiprocessors}, booktitle = {In Proceedings of the 17th Annual International - Symposium on Computer Architecture}, + Symposium on Computer Architecture}, year = {1990}, pages = {15--26} } @@ -297,7 +297,7 @@ year = {1996}, location = {Austin, Texas}, pages = {294--305}, publisher = {IEEE Computer Society}, -} +} @Misc{sparcieee1994, title = {{IEEE} Standard for a 32-bit microprocessor}, @@ -342,16 +342,16 @@ year = {1996}, numpages = {9}, publisher = {ACM}, address = {New York, NY, USA}, -} +} @ARTICLE{goldbergvm, -author={Goldberg, Robert P.}, -journal={Computer}, -title={Survey of virtual machine research}, -year={1974}, -month={June}, -volume={7}, -number={6}, +author={Goldberg, Robert P.}, +journal={Computer}, +title={Survey of virtual machine research}, +year={1974}, +month={June}, +volume={7}, +number={6}, pages={34-45} } @@ -381,7 +381,7 @@ pages={34-45} acmid = {844138}, publisher = {ACM}, address = {New York, NY, USA}, -} +} @Book{stretch, author = "Werner Buchholz", @@ -407,7 +407,7 @@ pages={34-45} year = {1965}, location = {San Francisco, California}, pages = {33--40} -} +} @InProceedings{jtseng:sbbci, author = {J. Tseng and K. Asanovi\'c}, @@ -480,22 +480,22 @@ pages={34-45} } -@inproceedings{lithe-pan-hotpar09, +@inproceedings{lithe-pan-hotpar09, author = {Heidi Pan and Benjamin Hindman and Krste Asanovi\'c}, title = {{Lithe}: Enabling Efficient Composition of Parallel Libraries}, booktitle = {Proceedings of the 1st USENIX Workshop on Hot Topics in Parallelism (HotPar~'09)}, month = {March}, year = {2009}, -address = {Berkeley, CA}} +address = {Berkeley, CA}} -@inproceedings{lithe-pan-pldi10, +@inproceedings{lithe-pan-pldi10, author = {Heidi Pan and Benjamin Hindman and Krste Asanovi\'c}, title = {Composing Parallel Software Efficiently with {Lithe}}, booktitle = {31st Conference on Programming Language Design and Implementation}, month = {June}, year = {2010}, -address = {Toronto, Canada}} +address = {Toronto, Canada}} @article{roux:hal-01091186, TITLE = {{Innocuous Double Rounding of Basic Arithmetic Operations}}, @@ -606,7 +606,7 @@ address = {Toronto, Canada}} } @misc{nist:fips:197, - author = {{NIST}}, + author = {{NIST}}, title = {{Advanced} {Encryption} {Standard} ({AES})}, howpublished = {Federal Information Processing Standards Publication FIPS 197}, @@ -650,7 +650,7 @@ address = {Toronto, Canada}} @Misc{gbt:sm4, title = {{GB}/{T} 32907-2016: {SM4} Block Cipher Algorithm}, - howpublished = {Also {GM}/{T} 0002-2012. Standardization Administration of China}, + howpublished = {Also {GM}/{T} 0002-2012. Standardization Administration of China}, url = {http://www.gmbz.org.cn/upload/2018-04-04/1522788048733065051.pdf}, month = {August}, year = {2016} @@ -659,7 +659,7 @@ address = {Toronto, Canada}} @Misc{iso:sm3, author = {ISO/IEC}, title = {IT Security techniques -- Hash-functions -- Part 3: - Dedicated hash-functions}, + Dedicated hash-functions}, howpublished = {{ISO}/{IEC} Standard 10118-3:2018}, year = {2018} } @@ -750,7 +750,7 @@ pages={109-136} % -% Block Cipher Specifiations +% Block Cipher Specifications % ----------------------------------------------------------------- @inproceedings{block:prince, @@ -884,683 +884,683 @@ pages={109-136} % ----------------------------------------------------------------- -@Misc{ AM17, - author = {{AMD}}, - title = {{AMD} Random Number Generator}, - howpublished = {AMD TechDocs}, - publisher = {Advanced Micro Devices}, - url = {https://www.amd.com/system/files/TechDocs/amd-random-number-generator.pdf}, - month = {June}, - year = {2017} -} - -@Misc{ AR17, - author = {{ARM}}, - title = {ARM TrustZone True Random Number Generator: Technical Reference Manual}, - howpublished = {ARM 100976\_0000\_00\_en (rev. r0p0)}, - publisher = {{ARM}}, - url = {http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100976_0000_00_en}, - month = {May}, - year = {2017} -} - -@Misc{ AR20, - author = {{ARM}}, - title = {Arm Architecture Registers: Armv8, for Armv8-A - architecture profile}, - howpublished = {ARM DDI 0595 (ID033020)}, - publisher = {{ARM}}, - url = {https://developer.arm.com/docs/ddi0595/g}, - month = {April}, - year = {2020} -} - -@Book{ An20, - author = {Ross J. Anderson}, - title = {Security engineering - a guide to building dependable - distributed systems {(3.} ed.)}, - publisher = {Wiley}, - isbn = {978-1-119-64278-7}, - url = {https://www.cl.cam.ac.uk/~rja14/book.html}, - month = {December}, - year = {2020} -} - -@Misc{ BS13, - author = {{BSI}}, - title = {Evaluation of random number generators}, - howpublished = {Version 0.10, BSI}, - url = {https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_20_AIS_31_Evaluation_of_random_number_generators_e.html}, - publisher = {BSI}, - month = {March}, - year = {2013} -} - -@Misc{ Ba20, - author = {Elaine Barker}, - title = {Recommendation for Key Management: Part 1 -- General}, - howpublished = {NIST Special Publication SP 0 Part 1, Revision 5}, - doi = {10.6028/NIST.SP.800-57pt1r5}, - publisher = {{NIST}}, - month = {May}, - year = {2020} -} - -@Article{ Ba86, - author = {Per Bak}, - title = {The Devil's Staircase}, - journal = {Phys. Today}, - volume = {39}, - number = {12}, - pages = {38--45}, - doi = {10.1063/1.881047}, - publisher = {{AIP}}, - month = {December}, - year = {1986} -} - -@Misc{ BaBa19, - author = {Elaine Barker and William Barker}, - title = {Recommendation for Key Management: Part 2 -- Best - Practices for Key Management Organizations}, - howpublished = {NIST Special Publication SP 800-57 Part 2, Revision 1}, - doi = {10.6028/NIST.SP.800-57pt2r1}, - publisher = {{NIST}}, - month = {May}, - year = {2019} -} - -@Misc{ BaDa15, - author = {Elaine Barker and Quynh Dang}, - title = {Recommendation for Key Management, Part 3: - Application-Specific Key Management Guidance}, - howpublished = {NIST Special Publication SP 800-57 Part 3, Revision 1}, - doi = {10.6028/NIST.SP.800-57pt3r1}, - publisher = {{NIST}}, - month = {January}, - year = {2015} -} - -@InProceedings{ BaFoKa:12, - author = {Romain Bardou and Riccardo Focardi and Yusuke Kawamoto and - Lorenzo Simionato and Graham Steel and Joe{-}Kai Tsay}, - title = {Efficient Padding Oracle Attacks on Cryptographic - Hardware}, - booktitle = {Advances in Cryptology - {CRYPTO} 2012 - 32nd Annual - Cryptology Conference, Santa Barbara, CA, USA, August - 19-23, 2012. Proceedings}, - pages = {608--625}, - crossref = {_SaCa12}, - doi = {10.1007/978-3-642-32009-5\_36}, - year = {2012} -} - -@Misc{ BaKe15, - author = {Elaine Barker and John Kelsey}, - title = {Recommendation for Random Number Generation Using - Deterministic Random Bit Generators}, - howpublished = {NIST Special Publication SP 800-90A Revision 1}, - doi = {10.6028/NIST.SP.800-90Ar1}, - month = {June}, - year = {2015} -} - -@Misc{ BaKeRo:21, - author = {Elaine Barker and John Kelsey and Allen Roginsky and - Meltem Sönmez Turan and Darryl Buller and Aaron Kaufer}, - title = {Recommendation for Random Bit Generator ({RBG}) - Constructions}, - howpublished = {Draft NIST Special Publication SP 800-90C}, - month = {March}, - year = {2021} -} - -@Article{ BaLuMi:11, - author = {Mathieu Baudet and David Lubicz and Julien Micolod and - Andr{\'{e}} Tassiaux}, - title = {On the Security of Oscillator-Based Random Number - Generators}, - journal = {J. Cryptology}, - volume = {24}, - number = {2}, - pages = {398--425}, - doi = {10.1007/s00145-010-9089-3}, - year = {2011} -} - -@Article{ BeRePa:14, - author = {Georg T. Becker and Francesco Regazzoni and Christof Paar - and Wayne P. Burleson}, - title = {Stealthy dopant-level hardware Trojans: extended version}, - journal = {J. Cryptographic Engineering}, - volume = {4}, - number = {1}, - pages = {19--31}, - publisher = {Springer}, - doi = {10.1007/s13389-013-0068-0}, - year = {2014} -} - -@Article{ Bl86, - author = {Manuel Blum}, - title = {Independent unbiased coin flips from a correlated biased - source -- A finite state Markov chain}, - journal = {Combinatorica}, - volume = {6}, - number = {2}, - pages = {97--108}, - doi = {10.1007/BF02579167}, - year = {1986} -} - -@Article{ BlBlSh86, - author = {Lenore Blum and Manuel Blum and Mike Shub}, - title = {A Simple Unpredictable Pseudo-Random Number Generator}, - journal = {{SIAM} J. Comput.}, - volume = {15}, - number = {2}, - pages = {364--383}, - doi = {10.1137/0215025}, - publisher = {{SIAM}}, - year = {1986} -} - -@InProceedings{ ChMaGa:16, - author = {Stephen Checkoway and Jacob Maskiewicz and Christina - Garman and Joshua Fried and Shaanan Cohney and Matthew - Green and Nadia Heninger and Ralf{-}Philipp Weinmann and - Eric Rescorla and Hovav Shacham}, - title = {A Systematic Analysis of the Juniper Dual {EC} Incident}, - booktitle = {Proceedings of the 2016 {ACM} {SIGSAC} Conference on - Computer and Communications Security, Vienna, Austria, - October 24-28, 2016}, - pages = {468--479}, - crossref = {_WeKaKr:16}, - doi = {10.1145/2976749.2978395}, - year = {2016} -} - -@Article{ ChMaGa:18, - author = {Stephen Checkoway and Jacob Maskiewicz and Christina - Garman and Joshua Fried and Shaanan Cohney and Matthew - Green and Nadia Heninger and Ralf{-}Philipp Weinmann and - Eric Rescorla and Hovav Shacham}, - title = {Where did {I} leave my keys?: lessons from the Juniper - Dual {EC} incident}, - journal = {Commun. {ACM}}, - volume = {61}, - number = {11}, - pages = {148--155}, - publisher = {{ACM}}, - doi = {10.1145/3266291}, - year = {2018} -} - -@Misc{ Cr17, - author = {Common Criteria}, - title = {Common Methodology for Information Technology Security - Evaluation: Evaluation methodology}, - howpublished = {Specification: Version 3.1 Revision 5}, - url = {https://commoncriteriaportal.org/cc/}, - month = {April}, - year = {2017} -} - -@Misc{ Da02, - author = {Robert B. Davies}, - title = {Exclusive OR (XOR) and hardware random number generators}, - howpublished = {Author-hosted manuscript}, - url = {http://www.robertnz.net/pdf/xor2.pdf}, - month = {February}, - year = {2002} -} - -@Book{ DaRo58, - author = {Wilbur B. Davenport Jr. and William L. Root}, - title = {An Introduction to the Theory of Random Signals and - Noise}, - url = {https://ieeexplore.ieee.org/servlet/opac?bknumber=5265617}, - pages = {401}, - publisher = {McGraw-Hill}, - year = {1958} -} - -@Article{ El72, - author = {Peter Elias}, - title = {The Efficient Construction of an Unbiased Random - Sequence}, - journal = {Ann. Math. Statist.}, - volume = {43}, - number = {3}, - pages = {865--870}, - doi = {10.1214/aoms/1177692552}, - publisher = {Institute of Mathematical Statistics}, - year = {1972} -} - -@InProceedings{ EvPo16, - author = {Dmitry Evtyushkin and Dmitry V. Ponomarev}, - title = {Covert Channels through Random Number Generator: - Mechanisms, Capacity Estimation and Mitigations}, - booktitle = {Proceedings of the 2016 {ACM} {SIGSAC} Conference on - Computer and Communications Security, Vienna, Austria, - October 24-28, 2016}, - pages = {843--857}, - crossref = {_WeKaKr:16}, - doi = {10.1145/2976749.2978374}, - year = {2016} -} - -@InProceedings{ Gr96, - author = {Lov K. Grover}, - title = {A Fast Quantum Mechanical Algorithm for Database Search}, - booktitle = {Proceedings of the Twenty-eighth Annual ACM Symposium on - Theory of Computing}, - series = {STOC '96}, - pages = {212--219}, - url = {http://arxiv.org/pdf/quant-ph/9605043}, - doi = {10.1145/237814.237866}, - publisher = {{ACM}}, - year = 1996 -} - -@InProceedings{ GrLaRo:16, - author = {Markus Grassl and Brandon Langenberg and Martin Roetteler - and Rainer Steinwandt}, - title = {Applying Grover's Algorithm to {AES:} Quantum Resource - Estimates}, - booktitle = {Post-Quantum Cryptography - 7th International Workshop, - PQCrypto 2016, Fukuoka, Japan, February 24-26, 2016, - Proceedings}, - pages = {29--43}, - crossref = {_Ta16}, - url = {https://arxiv.org/pdf/1512.04965.pdf}, - doi = {10.1007/978-3-319-29360-8\_3}, - year = {2016} -} - -@Misc{ HaKoMa12, - author = {Mike Hamburg and Paul Kocher and Mark E. Marson}, - title = {Analysis of Intel's Ivy Bridge Digital Random Number - Generator}, - howpublished = {Technical Report, Cryptography Research (Prepared for - Intel)}, - month = {March}, - year = {2012} -} - -@Article{ HaLe98, - author = {Ali Hajimiri and Thomas H. Lee}, - title = {A general theory of phase noise in electrical - oscillators}, - journal = {IEEE Journal of Solid-State Circuits}, - volume = {33}, - number = {2}, - pages = {179--194}, - publisher = {{IEEE}}, - doi = {10.1109/4.658619}, - year = {1998} -} - -@Article{ HaLiLe99, - author = {Ali Hajimiri and Sotirios Limotyrakis and Thomas H. Lee}, - title = {Jitter and phase noise in ring oscillators}, - journal = { {IEEE} Journal of Solid-State Circuits}, - volume = {34}, - number = {6}, - doi = {10.1109/4.766813}, - url = {https://authors.library.caltech.edu/4916/1/HAJieeejssc99a.pdf}, - pages = {790--804}, - month = {June}, - year = {1999} -} - -@Article{ HuHe20, - author = {Darren Hurley-Smith and Julio C\'esar Hern\'andez-Castro}, - title = {Quantum Leap and Crash: Searching and Finding Bias in - Quantum Random Number Generators}, - journal = {ACM Transactions on Privacy and Security}, - volume = {23}, - number = {3}, - pages = {1--25}, - doi = {10.1145/3403643}, - publisher = {{ACM}}, - month = {June}, - year = {2020} -} - -@TechReport{ IS16, - author = {{ISO}}, - type = {Standard}, - title = {Information technology -- Security techniques -- Testing - methods for the mitigation of non-invasive attack classes - against cryptographic modules}, - shorttitle = {{ISO}/{IEC} 17825:2016}, - language = {en}, - number = {ISO/IEC 17825:2016}, - institution = {International Organization for Standardization}, - year = {2016} -} - -@Misc{ IT19, - author = {ITU}, - title = {Quantum noise random number generator architecture}, - howpublished = {Recommendation ITU-T X.1702}, - url = {https://www.itu.int/rec/T-REC-X.1702-201911-I/en}, - publisher = {International Telecommunications Union}, - month = {November}, - year = {2019} -} - -@Misc{ In20, - author = {Intel}, - title = {Deep Dive: Special Register Buffer Data Sampling}, - url = {https://software.intel.com/security-software-guidance/insights/deep-dive-special-register-buffer-data-sampling}, - howpublished = {Intel Developer Zone}, - publisher = {Intel}, - month = {June}, - year = {2020} -} - -@Misc{ In20A, - author = {Intel}, - title = {{SRBDS} Mitigation Impact on Intel Secure Key}, - url = {https://software.intel.com/security-software-guidance/insights/srbds-mitigation-impact-intel-secure-key}, - howpublished = {Intel Developer Zone}, - publisher = {Intel}, - month = {June}, - year = {2020} -} - -@InProceedings{ JaNaRo:20, - author = {Samuel Jaques and Michael Naehrig and Martin Roetteler and - Fernando Virdia}, - title = {Implementing Grover Oracles for Quantum Key Search on - {AES} and LowMC}, - booktitle = {Advances in Cryptology - {EUROCRYPT} 2020 - 39th Annual - International Conference on the Theory and Applications of - Cryptographic Techniques, Zagreb, Croatia, May 10-14, 2020, - Proceedings, Part {II}}, - pages = {280--310}, - crossref = {_CaIs20}, - url = {https://arxiv.org/pdf/1910.01700.pdf}, - doi = {10.1007/978-3-030-45724-2\_10}, - year = {2020} -} - -@Article{ KaScVe13, - author = {Dusko Karaklajic and J{\"{o}}rn{-}Marc Schmidt and Ingrid - Verbauwhede}, - title = {Hardware Designer's Guide to Fault Attacks}, - journal = {{IEEE} Trans. Very Large Scale Integr. Syst.}, - volume = {21}, - number = {12}, - pages = {2295--2306}, - doi = {10.1109/TVLSI.2012.2231707}, - publisher = {IEEE}, - year = {2013} -} - -@Misc{ KiSc01, - author = {Wolfgang Killmann and Werner Schindler}, - title = {A Proposal for: Functionality classes and evaluation - methodology for true (physical) random number generators}, - howpublished = {AIS 31, Version 3.1, English Translation, BSI}, - url = {https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_31_Functionality_classes_evaluation_methodology_for_true_RNG_e.html}, - publisher = {BSI}, - month = {September}, - year = {2001} -} - -@Misc{ KiSc11, - author = {Wolfgang Killmann and Werner Schindler}, - title = {A Proposal for: Functionality classes for random number - generators}, - howpublished = {AIS 20 / AIS 31, Version 2.0, English Translation, BSI}, - url = {https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_31_Functionality_classes_for_random_number_generators_e.html}, - publisher = {BSI}, - month = {September}, - year = {2011} -} - -@Misc{ KoXiHu:21, - author = {Nick Kossifidis and Joe Xie and Bill Huffman and Allen - Baum and Greg Favor and Tariq Kurd and Fumio Arakawa}, - title = {{PMP} Enhancements for memory access and execution - prevention on Machine mode}, - howpublished = {Version 0.9.1 -- {RISC}-{V} {TEE} Task Group}, - month = {May}, - year = {2021} +@Misc{ AM17, + author = {{AMD}}, + title = {{AMD} Random Number Generator}, + howpublished = {AMD TechDocs}, + publisher = {Advanced Micro Devices}, + url = {https://www.amd.com/system/files/TechDocs/amd-random-number-generator.pdf}, + month = {June}, + year = {2017} +} + +@Misc{ AR17, + author = {{ARM}}, + title = {ARM TrustZone True Random Number Generator: Technical Reference Manual}, + howpublished = {ARM 100976\_0000\_00\_en (rev. r0p0)}, + publisher = {{ARM}}, + url = {http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100976_0000_00_en}, + month = {May}, + year = {2017} +} + +@Misc{ AR20, + author = {{ARM}}, + title = {Arm Architecture Registers: Armv8, for Armv8-A + architecture profile}, + howpublished = {ARM DDI 0595 (ID033020)}, + publisher = {{ARM}}, + url = {https://developer.arm.com/docs/ddi0595/g}, + month = {April}, + year = {2020} +} + +@Book{ An20, + author = {Ross J. Anderson}, + title = {Security engineering - a guide to building dependable + distributed systems {(3.} ed.)}, + publisher = {Wiley}, + isbn = {978-1-119-64278-7}, + url = {https://www.cl.cam.ac.uk/~rja14/book.html}, + month = {December}, + year = {2020} +} + +@Misc{ BS13, + author = {{BSI}}, + title = {Evaluation of random number generators}, + howpublished = {Version 0.10, BSI}, + url = {https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_20_AIS_31_Evaluation_of_random_number_generators_e.html}, + publisher = {BSI}, + month = {March}, + year = {2013} +} + +@Misc{ Ba20, + author = {Elaine Barker}, + title = {Recommendation for Key Management: Part 1 -- General}, + howpublished = {NIST Special Publication SP 0 Part 1, Revision 5}, + doi = {10.6028/NIST.SP.800-57pt1r5}, + publisher = {{NIST}}, + month = {May}, + year = {2020} +} + +@Article{ Ba86, + author = {Per Bak}, + title = {The Devil's Staircase}, + journal = {Phys. Today}, + volume = {39}, + number = {12}, + pages = {38--45}, + doi = {10.1063/1.881047}, + publisher = {{AIP}}, + month = {December}, + year = {1986} +} + +@Misc{ BaBa19, + author = {Elaine Barker and William Barker}, + title = {Recommendation for Key Management: Part 2 -- Best + Practices for Key Management Organizations}, + howpublished = {NIST Special Publication SP 800-57 Part 2, Revision 1}, + doi = {10.6028/NIST.SP.800-57pt2r1}, + publisher = {{NIST}}, + month = {May}, + year = {2019} +} + +@Misc{ BaDa15, + author = {Elaine Barker and Quynh Dang}, + title = {Recommendation for Key Management, Part 3: + Application-Specific Key Management Guidance}, + howpublished = {NIST Special Publication SP 800-57 Part 3, Revision 1}, + doi = {10.6028/NIST.SP.800-57pt3r1}, + publisher = {{NIST}}, + month = {January}, + year = {2015} +} + +@InProceedings{ BaFoKa:12, + author = {Romain Bardou and Riccardo Focardi and Yusuke Kawamoto and + Lorenzo Simionato and Graham Steel and Joe{-}Kai Tsay}, + title = {Efficient Padding Oracle Attacks on Cryptographic + Hardware}, + booktitle = {Advances in Cryptology - {CRYPTO} 2012 - 32nd Annual + Cryptology Conference, Santa Barbara, CA, USA, August + 19-23, 2012. Proceedings}, + pages = {608--625}, + crossref = {_SaCa12}, + doi = {10.1007/978-3-642-32009-5\_36}, + year = {2012} +} + +@Misc{ BaKe15, + author = {Elaine Barker and John Kelsey}, + title = {Recommendation for Random Number Generation Using + Deterministic Random Bit Generators}, + howpublished = {NIST Special Publication SP 800-90A Revision 1}, + doi = {10.6028/NIST.SP.800-90Ar1}, + month = {June}, + year = {2015} +} + +@Misc{ BaKeRo:21, + author = {Elaine Barker and John Kelsey and Allen Roginsky and + Meltem Sönmez Turan and Darryl Buller and Aaron Kaufer}, + title = {Recommendation for Random Bit Generator ({RBG}) + Constructions}, + howpublished = {Draft NIST Special Publication SP 800-90C}, + month = {March}, + year = {2021} +} + +@Article{ BaLuMi:11, + author = {Mathieu Baudet and David Lubicz and Julien Micolod and + Andr{\'{e}} Tassiaux}, + title = {On the Security of Oscillator-Based Random Number + Generators}, + journal = {J. Cryptology}, + volume = {24}, + number = {2}, + pages = {398--425}, + doi = {10.1007/s00145-010-9089-3}, + year = {2011} +} + +@Article{ BeRePa:14, + author = {Georg T. Becker and Francesco Regazzoni and Christof Paar + and Wayne P. Burleson}, + title = {Stealthy dopant-level hardware Trojans: extended version}, + journal = {J. Cryptographic Engineering}, + volume = {4}, + number = {1}, + pages = {19--31}, + publisher = {Springer}, + doi = {10.1007/s13389-013-0068-0}, + year = {2014} +} + +@Article{ Bl86, + author = {Manuel Blum}, + title = {Independent unbiased coin flips from a correlated biased + source -- A finite state Markov chain}, + journal = {Combinatorica}, + volume = {6}, + number = {2}, + pages = {97--108}, + doi = {10.1007/BF02579167}, + year = {1986} +} + +@Article{ BlBlSh86, + author = {Lenore Blum and Manuel Blum and Mike Shub}, + title = {A Simple Unpredictable Pseudo-Random Number Generator}, + journal = {{SIAM} J. Comput.}, + volume = {15}, + number = {2}, + pages = {364--383}, + doi = {10.1137/0215025}, + publisher = {{SIAM}}, + year = {1986} +} + +@InProceedings{ ChMaGa:16, + author = {Stephen Checkoway and Jacob Maskiewicz and Christina + Garman and Joshua Fried and Shaanan Cohney and Matthew + Green and Nadia Heninger and Ralf{-}Philipp Weinmann and + Eric Rescorla and Hovav Shacham}, + title = {A Systematic Analysis of the Juniper Dual {EC} Incident}, + booktitle = {Proceedings of the 2016 {ACM} {SIGSAC} Conference on + Computer and Communications Security, Vienna, Austria, + October 24-28, 2016}, + pages = {468--479}, + crossref = {_WeKaKr:16}, + doi = {10.1145/2976749.2978395}, + year = {2016} +} + +@Article{ ChMaGa:18, + author = {Stephen Checkoway and Jacob Maskiewicz and Christina + Garman and Joshua Fried and Shaanan Cohney and Matthew + Green and Nadia Heninger and Ralf{-}Philipp Weinmann and + Eric Rescorla and Hovav Shacham}, + title = {Where did {I} leave my keys?: lessons from the Juniper + Dual {EC} incident}, + journal = {Commun. {ACM}}, + volume = {61}, + number = {11}, + pages = {148--155}, + publisher = {{ACM}}, + doi = {10.1145/3266291}, + year = {2018} +} + +@Misc{ Cr17, + author = {Common Criteria}, + title = {Common Methodology for Information Technology Security + Evaluation: Evaluation methodology}, + howpublished = {Specification: Version 3.1 Revision 5}, + url = {https://commoncriteriaportal.org/cc/}, + month = {April}, + year = {2017} +} + +@Misc{ Da02, + author = {Robert B. Davies}, + title = {Exclusive OR (XOR) and hardware random number generators}, + howpublished = {Author-hosted manuscript}, + url = {http://www.robertnz.net/pdf/xor2.pdf}, + month = {February}, + year = {2002} +} + +@Book{ DaRo58, + author = {Wilbur B. Davenport Jr. and William L. Root}, + title = {An Introduction to the Theory of Random Signals and + Noise}, + url = {https://ieeexplore.ieee.org/servlet/opac?bknumber=5265617}, + pages = {401}, + publisher = {McGraw-Hill}, + year = {1958} +} + +@Article{ El72, + author = {Peter Elias}, + title = {The Efficient Construction of an Unbiased Random + Sequence}, + journal = {Ann. Math. Statist.}, + volume = {43}, + number = {3}, + pages = {865--870}, + doi = {10.1214/aoms/1177692552}, + publisher = {Institute of Mathematical Statistics}, + year = {1972} +} + +@InProceedings{ EvPo16, + author = {Dmitry Evtyushkin and Dmitry V. Ponomarev}, + title = {Covert Channels through Random Number Generator: + Mechanisms, Capacity Estimation and Mitigations}, + booktitle = {Proceedings of the 2016 {ACM} {SIGSAC} Conference on + Computer and Communications Security, Vienna, Austria, + October 24-28, 2016}, + pages = {843--857}, + crossref = {_WeKaKr:16}, + doi = {10.1145/2976749.2978374}, + year = {2016} +} + +@InProceedings{ Gr96, + author = {Lov K. Grover}, + title = {A Fast Quantum Mechanical Algorithm for Database Search}, + booktitle = {Proceedings of the Twenty-eighth Annual ACM Symposium on + Theory of Computing}, + series = {STOC '96}, + pages = {212--219}, + url = {http://arxiv.org/pdf/quant-ph/9605043}, + doi = {10.1145/237814.237866}, + publisher = {{ACM}}, + year = 1996 +} + +@InProceedings{ GrLaRo:16, + author = {Markus Grassl and Brandon Langenberg and Martin Roetteler + and Rainer Steinwandt}, + title = {Applying Grover's Algorithm to {AES:} Quantum Resource + Estimates}, + booktitle = {Post-Quantum Cryptography - 7th International Workshop, + PQCrypto 2016, Fukuoka, Japan, February 24-26, 2016, + Proceedings}, + pages = {29--43}, + crossref = {_Ta16}, + url = {https://arxiv.org/pdf/1512.04965.pdf}, + doi = {10.1007/978-3-319-29360-8\_3}, + year = {2016} +} + +@Misc{ HaKoMa12, + author = {Mike Hamburg and Paul Kocher and Mark E. Marson}, + title = {Analysis of Intel's Ivy Bridge Digital Random Number + Generator}, + howpublished = {Technical Report, Cryptography Research (Prepared for + Intel)}, + month = {March}, + year = {2012} +} + +@Article{ HaLe98, + author = {Ali Hajimiri and Thomas H. Lee}, + title = {A general theory of phase noise in electrical + oscillators}, + journal = {IEEE Journal of Solid-State Circuits}, + volume = {33}, + number = {2}, + pages = {179--194}, + publisher = {{IEEE}}, + doi = {10.1109/4.658619}, + year = {1998} +} + +@Article{ HaLiLe99, + author = {Ali Hajimiri and Sotirios Limotyrakis and Thomas H. Lee}, + title = {Jitter and phase noise in ring oscillators}, + journal = { {IEEE} Journal of Solid-State Circuits}, + volume = {34}, + number = {6}, + doi = {10.1109/4.766813}, + url = {https://authors.library.caltech.edu/4916/1/HAJieeejssc99a.pdf}, + pages = {790--804}, + month = {June}, + year = {1999} +} + +@Article{ HuHe20, + author = {Darren Hurley-Smith and Julio C\'esar Hern\'andez-Castro}, + title = {Quantum Leap and Crash: Searching and Finding Bias in + Quantum Random Number Generators}, + journal = {ACM Transactions on Privacy and Security}, + volume = {23}, + number = {3}, + pages = {1--25}, + doi = {10.1145/3403643}, + publisher = {{ACM}}, + month = {June}, + year = {2020} +} + +@TechReport{ IS16, + author = {{ISO}}, + type = {Standard}, + title = {Information technology -- Security techniques -- Testing + methods for the mitigation of non-invasive attack classes + against cryptographic modules}, + shorttitle = {{ISO}/{IEC} 17825:2016}, + language = {en}, + number = {ISO/IEC 17825:2016}, + institution = {International Organization for Standardization}, + year = {2016} +} + +@Misc{ IT19, + author = {ITU}, + title = {Quantum noise random number generator architecture}, + howpublished = {Recommendation ITU-T X.1702}, + url = {https://www.itu.int/rec/T-REC-X.1702-201911-I/en}, + publisher = {International Telecommunications Union}, + month = {November}, + year = {2019} +} + +@Misc{ In20, + author = {Intel}, + title = {Deep Dive: Special Register Buffer Data Sampling}, + url = {https://software.intel.com/security-software-guidance/insights/deep-dive-special-register-buffer-data-sampling}, + howpublished = {Intel Developer Zone}, + publisher = {Intel}, + month = {June}, + year = {2020} +} + +@Misc{ In20A, + author = {Intel}, + title = {{SRBDS} Mitigation Impact on Intel Secure Key}, + url = {https://software.intel.com/security-software-guidance/insights/srbds-mitigation-impact-intel-secure-key}, + howpublished = {Intel Developer Zone}, + publisher = {Intel}, + month = {June}, + year = {2020} +} + +@InProceedings{ JaNaRo:20, + author = {Samuel Jaques and Michael Naehrig and Martin Roetteler and + Fernando Virdia}, + title = {Implementing Grover Oracles for Quantum Key Search on + {AES} and LowMC}, + booktitle = {Advances in Cryptology - {EUROCRYPT} 2020 - 39th Annual + International Conference on the Theory and Applications of + Cryptographic Techniques, Zagreb, Croatia, May 10-14, 2020, + Proceedings, Part {II}}, + pages = {280--310}, + crossref = {_CaIs20}, + url = {https://arxiv.org/pdf/1910.01700.pdf}, + doi = {10.1007/978-3-030-45724-2\_10}, + year = {2020} +} + +@Article{ KaScVe13, + author = {Dusko Karaklajic and J{\"{o}}rn{-}Marc Schmidt and Ingrid + Verbauwhede}, + title = {Hardware Designer's Guide to Fault Attacks}, + journal = {{IEEE} Trans. Very Large Scale Integr. Syst.}, + volume = {21}, + number = {12}, + pages = {2295--2306}, + doi = {10.1109/TVLSI.2012.2231707}, + publisher = {IEEE}, + year = {2013} +} + +@Misc{ KiSc01, + author = {Wolfgang Killmann and Werner Schindler}, + title = {A Proposal for: Functionality classes and evaluation + methodology for true (physical) random number generators}, + howpublished = {AIS 31, Version 3.1, English Translation, BSI}, + url = {https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_31_Functionality_classes_evaluation_methodology_for_true_RNG_e.html}, + publisher = {BSI}, + month = {September}, + year = {2001} +} + +@Misc{ KiSc11, + author = {Wolfgang Killmann and Werner Schindler}, + title = {A Proposal for: Functionality classes for random number + generators}, + howpublished = {AIS 20 / AIS 31, Version 2.0, English Translation, BSI}, + url = {https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_31_Functionality_classes_for_random_number_generators_e.html}, + publisher = {BSI}, + month = {September}, + year = {2011} +} + +@Misc{ KoXiHu:21, + author = {Nick Kossifidis and Joe Xie and Bill Huffman and Allen + Baum and Greg Favor and Tariq Kurd and Fumio Arakawa}, + title = {{PMP} Enhancements for memory access and execution + prevention on Machine mode}, + howpublished = {Version 0.9.1 -- {RISC}-{V} {TEE} Task Group}, + month = {May}, + year = {2021} } 10.1007/978-3-540-71039-4\_21 -@InProceedings{ La08, - author = {Patrick Lacharme}, - title = {Post-Processing Functions for a Biased Physical Random - Number Generator}, - booktitle = {Fast Software Encryption, 15th International Workshop, - {FSE} 2008, Lausanne, Switzerland, February 10-13, 2008, - Revised Selected Papers}, - pages = {334--342}, - crossref = {_Ny08}, - doi = {10.1007/978-3-540-71039-4\_21}, - year = {2008} -} - -@Article{ LiBaBo:13, - author = {John S. Liberty and Adrian Barrera and David W. Boerstler - and Thomas B. Chadwick and Scott R. Cottier and H. Peter - Hofstee and Julie A. Rosser and Marty L. Tsai}, - title = {True hardware random number generation implemented in the - 32-nm {SOI} {POWER7+} processor}, - journal = {{IBM} J. Res. Dev.}, - volume = {57}, - number = {6}, - doi = {10.1147/JRD.2013.2279599}, - year = {2013} -} - -@InProceedings{ MaMo09, - author = {A. Theodore Markettos and Simon W. Moore}, - title = {The Frequency Injection Attack on Ring-Oscillator-Based - True Random Number Generators}, - booktitle = {Cryptographic Hardware and Embedded Systems - {CHES} 2009, - 11th International Workshop, Lausanne, Switzerland, - September 6-9, 2009, Proceedings}, - pages = {317--331}, - crossref = {_ClGa09}, - doi = {10.1007/978-3-642-04138-9\_23}, - year = {2009} -} - -@Misc{ Me18, - author = {John P. Mechalas}, - title = {Intel Digital Random Number Generator (DRNG) Software - Implementation Guide}, - howpublished = {Intel Technical Report, Version 2.1}, - url = {https://software.intel.com/content/www/us/en/develop/articles/intel-digital-random-number-generator-drng-software-implementation-guide.html}, - month = {October}, - year = {2018} -} - -@InProceedings{ MoSuEi:20, - author = {Daniel Moghimi and Berk Sunar and Thomas Eisenbarth and - Nadia Heninger}, - title = {{TPM}-{FAIL}: {TPM} meets Timing and Lattice Attacks}, - booktitle = {29th {USENIX} Security Symposium ({USENIX} Security 20)}, - url = {https://www.usenix.org/conference/usenixsecurity20/presentation/moghimi-tpm}, - pages = {To appear}, - publisher = {{USENIX} Association}, - month = {August}, - year = {2020} -} - -@Misc{ Mu20, - author = {Stephan M\"uller}, - title = {Documentation and Analysis of the Linux Random Number - Generator, Version 3.6}, - howpublished = {Prepared for BSI by atsec information security GmbH}, - url = {https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/LinuxRNG/LinuxRNG_EN.pdf}, - month = {April}, - year = {2020} -} - -@Misc{ NC20, - author = {NCSC}, - title = {Quantum security technologies}, - howpublished = {White paper, Version 1.0. National Cyber Security Centre - (UK).}, - url = {https://www.ncsc.gov.uk/whitepaper/quantum-security-technologies}, - month = {March}, - year = {2020} -} - -@Misc{ NI16, - author = {{NIST}}, - title = {Submission Requirements and Evaluation Criteria for the - Post-Quantum Cryptography Standardization Process}, - howpublished = {Official Call for Proposals, National Institute for - Standards and Technology}, - url = {http://csrc.nist.gov/groups/ST/post-quantum-crypto/documents/call-for-proposals-final-dec-2016.pdf}, - month = {December}, - year = 2016 -} - -@Misc{ NI19, - author = {{NIST}}, - title = {Security Requirements for Cryptographic Modules}, - howpublished = {Federal Information Processing Standards Publication FIPS - 140-3}, +@InProceedings{ La08, + author = {Patrick Lacharme}, + title = {Post-Processing Functions for a Biased Physical Random + Number Generator}, + booktitle = {Fast Software Encryption, 15th International Workshop, + {FSE} 2008, Lausanne, Switzerland, February 10-13, 2008, + Revised Selected Papers}, + pages = {334--342}, + crossref = {_Ny08}, + doi = {10.1007/978-3-540-71039-4\_21}, + year = {2008} +} + +@Article{ LiBaBo:13, + author = {John S. Liberty and Adrian Barrera and David W. Boerstler + and Thomas B. Chadwick and Scott R. Cottier and H. Peter + Hofstee and Julie A. Rosser and Marty L. Tsai}, + title = {True hardware random number generation implemented in the + 32-nm {SOI} {POWER7+} processor}, + journal = {{IBM} J. Res. Dev.}, + volume = {57}, + number = {6}, + doi = {10.1147/JRD.2013.2279599}, + year = {2013} +} + +@InProceedings{ MaMo09, + author = {A. Theodore Markettos and Simon W. Moore}, + title = {The Frequency Injection Attack on Ring-Oscillator-Based + True Random Number Generators}, + booktitle = {Cryptographic Hardware and Embedded Systems - {CHES} 2009, + 11th International Workshop, Lausanne, Switzerland, + September 6-9, 2009, Proceedings}, + pages = {317--331}, + crossref = {_ClGa09}, + doi = {10.1007/978-3-642-04138-9\_23}, + year = {2009} +} + +@Misc{ Me18, + author = {John P. Mechalas}, + title = {Intel Digital Random Number Generator (DRNG) Software + Implementation Guide}, + howpublished = {Intel Technical Report, Version 2.1}, + url = {https://software.intel.com/content/www/us/en/develop/articles/intel-digital-random-number-generator-drng-software-implementation-guide.html}, + month = {October}, + year = {2018} +} + +@InProceedings{ MoSuEi:20, + author = {Daniel Moghimi and Berk Sunar and Thomas Eisenbarth and + Nadia Heninger}, + title = {{TPM}-{FAIL}: {TPM} meets Timing and Lattice Attacks}, + booktitle = {29th {USENIX} Security Symposium ({USENIX} Security 20)}, + url = {https://www.usenix.org/conference/usenixsecurity20/presentation/moghimi-tpm}, + pages = {To appear}, + publisher = {{USENIX} Association}, + month = {August}, + year = {2020} +} + +@Misc{ Mu20, + author = {Stephan M\"uller}, + title = {Documentation and Analysis of the Linux Random Number + Generator, Version 3.6}, + howpublished = {Prepared for BSI by atsec information security GmbH}, + url = {https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/LinuxRNG/LinuxRNG_EN.pdf}, + month = {April}, + year = {2020} +} + +@Misc{ NC20, + author = {NCSC}, + title = {Quantum security technologies}, + howpublished = {White paper, Version 1.0. National Cyber Security Centre + (UK).}, + url = {https://www.ncsc.gov.uk/whitepaper/quantum-security-technologies}, + month = {March}, + year = {2020} +} + +@Misc{ NI16, + author = {{NIST}}, + title = {Submission Requirements and Evaluation Criteria for the + Post-Quantum Cryptography Standardization Process}, + howpublished = {Official Call for Proposals, National Institute for + Standards and Technology}, + url = {http://csrc.nist.gov/groups/ST/post-quantum-crypto/documents/call-for-proposals-final-dec-2016.pdf}, + month = {December}, + year = 2016 +} + +@Misc{ NI19, + author = {{NIST}}, + title = {Security Requirements for Cryptographic Modules}, + howpublished = {Federal Information Processing Standards Publication FIPS + 140-3}, url = {https://doi.org/10.6028/NIST.FIPS.140-3}, - month = {March}, - year = {2019} -} - -@Misc{ NICC21, - author = {{NIST} and {CCCS}}, - title = {Implementation Guidance for {FIPS} 140-3 and the - Cryptographic Module Validation Program}, - howpublished = {CMVP}, - url = {https://csrc.nist.gov/CSRC/media/Projects/cryptographic-module-validation-program/documents/fips%20140-3/FIPS%20140-3%20IG.pdf}, - month = {May}, - year = {2021} -} - -@Misc{ NS15, - author = {{NSA}/{CSS}}, - title = {Commercial National Security Algorithm Suite}, - url = {https://apps.nsa.gov/iaarchive/programs/iad-initiatives/cnsa-suite.cfm}, - month = {August}, - year = 2015 -} - -@InCollection{ Ne51, - title = {Various Techniques Used in Connection with Random Digits}, - author = {von Neumann, John}, - booktitle = {Monte Carlo Method}, - editor = {Householder, A.~S. and Forsythe, G.~E. and Germond, - H.~H.}, - series = {National Bureau of Standards Applied Mathematics Series}, - volume = {12}, - chapter = {13}, - pages = {36--38}, - publisher = {US Government Printing Office}, - address = {Washington, DC}, - url = {https://mcnp.lanl.gov/pdf_files/nbs_vonneumann.pdf}, - year = {1951} -} - -@Misc{ Ra20, - author = {Rambus}, - title = {TRNG-IP-76 / EIP-76 Family of FIPS Approved True Random - Generators}, - howpublished = {Commercial Crypto IP. Formerly (2017) available from - Inside Secure.}, - url = {https://www.rambus.com/security/crypto-accelerator-hardware-cores/basic-crypto-blocks/trng-ip-76/}, - year = {2020} -} - -@InProceedings{ RaMiRa:21, - author = {Hany Ragab and Alyssa Milburn and Kaveh Razavi and Herbert - Bos and Cristiano Giuffrida}, - title = {CrossTalk : Speculative Data Leaks Across Cores Are Real}, - booktitle = {IEEE Symposium on Security \& Privacy 2021}, - url = {https://download.vusec.net/papers/crosstalk_sp21.pdf}, - pages = {To appear}, - publisher = {IEEE}, - month = {May}, - year = {2021} -} - -@Article{ Ri44, - author = {Stephen O. Rice}, - title = {Mathematical analysis of random noise (Parts I-II)}, - journal = {The Bell System Technical Journal}, - volume = {23}, - number = {3}, - pages = {282--332}, - doi = {10.1002/j.1538-7305.1944.tb00874.x}, - month = {July}, - year = {1944} -} - -@Article{ Ri45, - author = {Stephen O. Rice}, - title = {Mathematical analysis of random noise (Parts III-IV))}, - journal = {The Bell System Technical Journal}, - volume = {24}, - number = {1}, - pages = {46--156}, - doi = {10.1002/j.1538-7305.1945.tb00453.x}, - month = {January}, - year = {1945} -} - -@Misc{ RuSoNe:10, - author = {Andrew Rukhin and Juan Soto and James Nechvatal and Miles - Smid and Elaine Barker and Stefan Leigh and Mark Levenson - and Mark Vangel and David Banks and Alan Heckert and - JamesDray and San Vo}, - title = {A Statistical Test Suite for Random and Pseudorandom - Number Generators for Cryptographic Applications}, - doi = {10.6028/NIST.SP.800-22r1a}, - month = {April}, - year = {2010} -} - -@Misc{ Sa19, - author = {Jim Salter}, - title = {How a months-old {AMD} microcode bug destroyed my - weekend}, - howpublished = {Ars Technica}, - url = {https://arstechnica.com/gadgets/2019/10/how-a-months-old-amd-microcode-bug-destroyed-my-weekend/}, - month = {October}, - year = {2019} -} - -@InProceedings{ Sa20, - author = {Markku-Juhani O. Saarinen}, - title = {A Lightweight ISA Extension for {AES} and {SM4}}, - booktitle = {First International Workshop on Secure RISC-V Architecture - Design Exploration (SECRISC-V'20)}, - url = {https://arxiv.org/abs/2002.07041}, - publisher = {IEEE}, - month = {August}, - year = {2020} -} - - -@InProceedings{ SaNeMa20, - author = {Markku-Juhani O. Saarinen and G. Richard Newell and Ben - Marshall}, - title = {Building a Modern {TRNG}: An Entropy Source Interface for - {RISC}-{V}}, - booktitle = {4th Workshop on Attacks and Solutions in Hardware Security - (ASHES’20), November 13, 2020, Virtual Event, USA.}, - doi = {10.1145/3411504.3421212}, - publisher = {ACM}, - pages = {93--102}, - month = {November}, - year = 2020 + month = {March}, + year = {2019} +} + +@Misc{ NICC21, + author = {{NIST} and {CCCS}}, + title = {Implementation Guidance for {FIPS} 140-3 and the + Cryptographic Module Validation Program}, + howpublished = {CMVP}, + url = {https://csrc.nist.gov/CSRC/media/Projects/cryptographic-module-validation-program/documents/fips%20140-3/FIPS%20140-3%20IG.pdf}, + month = {May}, + year = {2021} +} + +@Misc{ NS15, + author = {{NSA}/{CSS}}, + title = {Commercial National Security Algorithm Suite}, + url = {https://apps.nsa.gov/iaarchive/programs/iad-initiatives/cnsa-suite.cfm}, + month = {August}, + year = 2015 +} + +@InCollection{ Ne51, + title = {Various Techniques Used in Connection with Random Digits}, + author = {von Neumann, John}, + booktitle = {Monte Carlo Method}, + editor = {Householder, A.~S. and Forsythe, G.~E. and Germond, + H.~H.}, + series = {National Bureau of Standards Applied Mathematics Series}, + volume = {12}, + chapter = {13}, + pages = {36--38}, + publisher = {US Government Printing Office}, + address = {Washington, DC}, + url = {https://mcnp.lanl.gov/pdf_files/nbs_vonneumann.pdf}, + year = {1951} +} + +@Misc{ Ra20, + author = {Rambus}, + title = {TRNG-IP-76 / EIP-76 Family of FIPS Approved True Random + Generators}, + howpublished = {Commercial Crypto IP. Formerly (2017) available from + Inside Secure.}, + url = {https://www.rambus.com/security/crypto-accelerator-hardware-cores/basic-crypto-blocks/trng-ip-76/}, + year = {2020} +} + +@InProceedings{ RaMiRa:21, + author = {Hany Ragab and Alyssa Milburn and Kaveh Razavi and Herbert + Bos and Cristiano Giuffrida}, + title = {CrossTalk : Speculative Data Leaks Across Cores Are Real}, + booktitle = {IEEE Symposium on Security \& Privacy 2021}, + url = {https://download.vusec.net/papers/crosstalk_sp21.pdf}, + pages = {To appear}, + publisher = {IEEE}, + month = {May}, + year = {2021} +} + +@Article{ Ri44, + author = {Stephen O. Rice}, + title = {Mathematical analysis of random noise (Parts I-II)}, + journal = {The Bell System Technical Journal}, + volume = {23}, + number = {3}, + pages = {282--332}, + doi = {10.1002/j.1538-7305.1944.tb00874.x}, + month = {July}, + year = {1944} +} + +@Article{ Ri45, + author = {Stephen O. Rice}, + title = {Mathematical analysis of random noise (Parts III-IV))}, + journal = {The Bell System Technical Journal}, + volume = {24}, + number = {1}, + pages = {46--156}, + doi = {10.1002/j.1538-7305.1945.tb00453.x}, + month = {January}, + year = {1945} +} + +@Misc{ RuSoNe:10, + author = {Andrew Rukhin and Juan Soto and James Nechvatal and Miles + Smid and Elaine Barker and Stefan Leigh and Mark Levenson + and Mark Vangel and David Banks and Alan Heckert and + JamesDray and San Vo}, + title = {A Statistical Test Suite for Random and Pseudorandom + Number Generators for Cryptographic Applications}, + doi = {10.6028/NIST.SP.800-22r1a}, + month = {April}, + year = {2010} +} + +@Misc{ Sa19, + author = {Jim Salter}, + title = {How a months-old {AMD} microcode bug destroyed my + weekend}, + howpublished = {Ars Technica}, + url = {https://arstechnica.com/gadgets/2019/10/how-a-months-old-amd-microcode-bug-destroyed-my-weekend/}, + month = {October}, + year = {2019} +} + +@InProceedings{ Sa20, + author = {Markku-Juhani O. Saarinen}, + title = {A Lightweight ISA Extension for {AES} and {SM4}}, + booktitle = {First International Workshop on Secure RISC-V Architecture + Design Exploration (SECRISC-V'20)}, + url = {https://arxiv.org/abs/2002.07041}, + publisher = {IEEE}, + month = {August}, + year = {2020} +} + + +@InProceedings{ SaNeMa20, + author = {Markku-Juhani O. Saarinen and G. Richard Newell and Ben + Marshall}, + title = {Building a Modern {TRNG}: An Entropy Source Interface for + {RISC}-{V}}, + booktitle = {4th Workshop on Attacks and Solutions in Hardware Security + (ASHES’20), November 13, 2020, Virtual Event, USA.}, + doi = {10.1145/3411504.3421212}, + publisher = {ACM}, + pages = {93--102}, + month = {November}, + year = 2020 } @Misc{ Sa21, @@ -1572,212 +1572,212 @@ pages={109-136} year = 2021 } -@Misc{ SaNeMa21, - author = {Markku-Juhani O. Saarinen and G. Richard Newell and Ben - Marshall}, - title = {Development of The {RISC}-{V} Entropy Source Interface}, - howpublished = {{IACR} ePrint 2020/866}, - url = {https://eprint.iacr.org/2029/866}, - publisher = {Submitted For Publication}, - month = {June}, - year = 2021 -} - -@Misc{ Sc99, - author = {Werner Schindler}, - title = {Functionality classes and evaluation methodology for - deterministic random number generators}, - howpublished = {AIS 20, Version 2.0, English Translation, BSI}, - publisher = {BSI}, - url = {https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_20_Functionality_Classes_Evaluation_Methodology_DRNG_e.html}, - month = {December}, - year = {1999} -} - -@InProceedings{ Sh94, - author = {Peter W. Shor}, - title = {Algorithms for quantum computation: Discrete logarithms - and factoring}, - booktitle = {35th Annual Symposium on Foundations of Computer Science, - Santa Fe, New Mexico, USA, 20-22 November 1994}, - pages = {124--134}, - publisher = {IEEE}, - doi = {10.1109/SFCS.1994.365700}, - url = {https://arxiv.org/abs/quant-ph/9508027}, - year = 1994 -} - -@Misc{ TG20, - author = {{RISC-V} {Crypto} {TG}}, - title = {RISC-V Cryptography Extensions}, - url = {https://github.com/riscv/riscv-crypto}, - howpublished = {Editor's location -- to be integrated with main - specifications}, - year = {2020} +@Misc{ SaNeMa21, + author = {Markku-Juhani O. Saarinen and G. Richard Newell and Ben + Marshall}, + title = {Development of The {RISC}-{V} Entropy Source Interface}, + howpublished = {{IACR} ePrint 2020/866}, + url = {https://eprint.iacr.org/2029/866}, + publisher = {Submitted For Publication}, + month = {June}, + year = 2021 +} + +@Misc{ Sc99, + author = {Werner Schindler}, + title = {Functionality classes and evaluation methodology for + deterministic random number generators}, + howpublished = {AIS 20, Version 2.0, English Translation, BSI}, + publisher = {BSI}, + url = {https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_20_Functionality_Classes_Evaluation_Methodology_DRNG_e.html}, + month = {December}, + year = {1999} +} + +@InProceedings{ Sh94, + author = {Peter W. Shor}, + title = {Algorithms for quantum computation: Discrete logarithms + and factoring}, + booktitle = {35th Annual Symposium on Foundations of Computer Science, + Santa Fe, New Mexico, USA, 20-22 November 1994}, + pages = {124--134}, + publisher = {IEEE}, + doi = {10.1109/SFCS.1994.365700}, + url = {https://arxiv.org/abs/quant-ph/9508027}, + year = 1994 +} + +@Misc{ TG20, + author = {{RISC-V} {Crypto} {TG}}, + title = {RISC-V Cryptography Extensions}, + url = {https://github.com/riscv/riscv-crypto}, + howpublished = {Editor's location -- to be integrated with main + specifications}, + year = {2020} } @Misc{TuBaKe:18, - author = {Meltem S\"onmez Turan and Elaine Barker and John Kelsey and - Kerry A. McKay and Mary L. Baish and Mike Boyle}, - title = {Recommendation for the Entropy Sources Used for Random Bit - Generation}, - howpublished = {NIST Special Publication SP 800-90B}, - doi = {10.6028/NIST.SP.800-90B}, - month = {January}, - year = {2018} -} - -@InProceedings{ VaDr10, - author = {Michal Varchola and Milos Drutarovsk{\'{y}}}, - title = {New High Entropy Element for {FPGA} Based True Random - Number Generators}, - booktitle = {Cryptographic Hardware and Embedded Systems, {CHES} 2010, - 12th International Workshop, Santa Barbara, CA, USA, August - 17-20, 2010. Proceedings}, - pages = {351--365}, - crossref = {_MaSt10}, - doi = {10.1007/978-3-642-15031-9\_24}, - year = {2010} -} - -@InProceedings{ VaFiAu:10, - author = {Boyan Valtchanov and Viktor Fischer and Alain Aubert and - Florent Bernard}, - title = {Characterization of randomness sources in ring - oscillator-based true random number generators in FPGAs}, - booktitle = {13th {IEEE} International Symposium on Design and - Diagnostics of Electronic Circuits and Systems, {DDECS} - 2010, Vienna, Austria, April 14-16, 2010}, - pages = {48--53}, - crossref = {_GrKoSt:10}, - doi = {10.1109/DDECS.2010.5491819}, - year = {2010} -} - -@Proceedings{ _CaIs20, - editor = {Anne Canteaut and Yuval Ishai}, - title = {Advances in Cryptology - {EUROCRYPT} 2020 - 39th Annual - International Conference on the Theory and Applications of - Cryptographic Techniques, Zagreb, Croatia, May 10-14, 2020, - Proceedings, Part {II}}, - series = {Lecture Notes in Computer Science}, - volume = {12106}, - publisher = {Springer}, - doi = {10.1007/978-3-030-45724-2}, - isbn = {978-3-030-45723-5}, - year = {2020} -} - -@Proceedings{ _ClGa09, - editor = {Christophe Clavier and Kris Gaj}, - title = {Cryptographic Hardware and Embedded Systems - {CHES} 2009, - 11th International Workshop, Lausanne, Switzerland, - September 6-9, 2009, Proceedings}, - series = {Lecture Notes in Computer Science}, - volume = {5747}, - publisher = {Springer}, - doi = {10.1007/978-3-642-04138-9}, - isbn = {978-3-642-04137-2}, - year = {2009} -} - -@Proceedings{ _GrKoSt:10, - editor = {Elena Gramatov{\'{a}} and Zdenek Kot{\'{a}}sek and Andreas - Steininger and Heinrich Theodor Vierhaus and Horst - Zimmermann}, - title = {13th {IEEE} International Symposium on Design and - Diagnostics of Electronic Circuits and Systems, {DDECS} - 2010, Vienna, Austria, April 14-16, 2010}, - publisher = {{IEEE} Computer Society}, - url = {https://ieeexplore.ieee.org/xpl/conhome/5484099/proceeding}, - isbn = {978-1-4244-6612-2}, - year = {2010} -} - -@Proceedings{ _MaSt10, - editor = {Stefan Mangard and Fran{\c{c}}ois{-}Xavier Standaert}, - title = {Cryptographic Hardware and Embedded Systems, {CHES} 2010, - 12th International Workshop, Santa Barbara, CA, USA, August - 17-20, 2010. Proceedings}, - series = {Lecture Notes in Computer Science}, - volume = {6225}, - publisher = {Springer}, - doi = {10.1007/978-3-642-15031-9}, - isbn = {978-3-642-15030-2}, - year = {2010} -} - -@Proceedings{ _Ny08, - editor = {Kaisa Nyberg}, - title = {Fast Software Encryption, 15th International Workshop, - {FSE} 2008, Lausanne, Switzerland, February 10-13, 2008, - Revised Selected Papers}, - series = {Lecture Notes in Computer Science}, - volume = {5086}, - publisher = {Springer}, - doi = {10.1007/978-3-540-71039-4}, - isbn = {978-3-540-71038-7}, - year = {2008} -} - -@Proceedings{ _SaCa12, - editor = {Reihaneh Safavi{-}Naini and Ran Canetti}, - title = {Advances in Cryptology - {CRYPTO} 2012 - 32nd Annual - Cryptology Conference, Santa Barbara, CA, USA, August - 19-23, 2012. Proceedings}, - series = {Lecture Notes in Computer Science}, - volume = {7417}, - publisher = {Springer}, - doi = {10.1007/978-3-642-32009-5}, - isbn = {978-3-642-32008-8}, - year = {2012} -} - -@Proceedings{ _Ta16, - editor = {Tsuyoshi Takagi}, - title = {Post-Quantum Cryptography - 7th International Workshop, - PQCrypto 2016, Fukuoka, Japan, February 24-26, 2016, - Proceedings}, - series = {Lecture Notes in Computer Science}, - volume = {9606}, - publisher = {Springer}, - doi = {10.1007/978-3-319-29360-8}, - isbn = {978-3-319-29359-2}, - year = {2016} -} - -@Book{ _WaAs19, - editor = {Andrew Waterman and Krste Asanovi\'c}, - title = {The {RISC}-{V} Instruction Set Manual, Volume I: - User-Level {ISA}}, - note = {Document Version 20191213}, - publisher = {RISC-V Foundation}, - url = {https://riscv.org/specifications/}, - month = {December}, - year = 2019 -} - -@Book{ _WaAs19A, - editor = {Andrew Waterman and Krste Asanovi\'c}, - title = {The {RISC}-{V} Instruction Set Manual, Volume II: - Privileged Architecture}, - note = {Document Version 20190608-Priv-MSU-Ratified}, - publisher = {RISC-V Foundation}, - url = {https://riscv.org/specifications/}, - month = {June}, - year = 2019 -} - -@Proceedings{ _WeKaKr:16, - editor = {Edgar R. Weippl and Stefan Katzenbeisser and Christopher - Kruegel and Andrew C. Myers and Shai Halevi}, - title = {Proceedings of the 2016 {ACM} {SIGSAC} Conference on - Computer and Communications Security, Vienna, Austria, - October 24-28, 2016}, - publisher = {{ACM}}, - url = {http://dl.acm.org/citation.cfm?id=2976749}, - isbn = {978-1-4503-4139-4}, - year = {2016} + author = {Meltem S\"onmez Turan and Elaine Barker and John Kelsey and + Kerry A. McKay and Mary L. Baish and Mike Boyle}, + title = {Recommendation for the Entropy Sources Used for Random Bit + Generation}, + howpublished = {NIST Special Publication SP 800-90B}, + doi = {10.6028/NIST.SP.800-90B}, + month = {January}, + year = {2018} +} + +@InProceedings{ VaDr10, + author = {Michal Varchola and Milos Drutarovsk{\'{y}}}, + title = {New High Entropy Element for {FPGA} Based True Random + Number Generators}, + booktitle = {Cryptographic Hardware and Embedded Systems, {CHES} 2010, + 12th International Workshop, Santa Barbara, CA, USA, August + 17-20, 2010. Proceedings}, + pages = {351--365}, + crossref = {_MaSt10}, + doi = {10.1007/978-3-642-15031-9\_24}, + year = {2010} +} + +@InProceedings{ VaFiAu:10, + author = {Boyan Valtchanov and Viktor Fischer and Alain Aubert and + Florent Bernard}, + title = {Characterization of randomness sources in ring + oscillator-based true random number generators in FPGAs}, + booktitle = {13th {IEEE} International Symposium on Design and + Diagnostics of Electronic Circuits and Systems, {DDECS} + 2010, Vienna, Austria, April 14-16, 2010}, + pages = {48--53}, + crossref = {_GrKoSt:10}, + doi = {10.1109/DDECS.2010.5491819}, + year = {2010} +} + +@Proceedings{ _CaIs20, + editor = {Anne Canteaut and Yuval Ishai}, + title = {Advances in Cryptology - {EUROCRYPT} 2020 - 39th Annual + International Conference on the Theory and Applications of + Cryptographic Techniques, Zagreb, Croatia, May 10-14, 2020, + Proceedings, Part {II}}, + series = {Lecture Notes in Computer Science}, + volume = {12106}, + publisher = {Springer}, + doi = {10.1007/978-3-030-45724-2}, + isbn = {978-3-030-45723-5}, + year = {2020} +} + +@Proceedings{ _ClGa09, + editor = {Christophe Clavier and Kris Gaj}, + title = {Cryptographic Hardware and Embedded Systems - {CHES} 2009, + 11th International Workshop, Lausanne, Switzerland, + September 6-9, 2009, Proceedings}, + series = {Lecture Notes in Computer Science}, + volume = {5747}, + publisher = {Springer}, + doi = {10.1007/978-3-642-04138-9}, + isbn = {978-3-642-04137-2}, + year = {2009} +} + +@Proceedings{ _GrKoSt:10, + editor = {Elena Gramatov{\'{a}} and Zdenek Kot{\'{a}}sek and Andreas + Steininger and Heinrich Theodor Vierhaus and Horst + Zimmermann}, + title = {13th {IEEE} International Symposium on Design and + Diagnostics of Electronic Circuits and Systems, {DDECS} + 2010, Vienna, Austria, April 14-16, 2010}, + publisher = {{IEEE} Computer Society}, + url = {https://ieeexplore.ieee.org/xpl/conhome/5484099/proceeding}, + isbn = {978-1-4244-6612-2}, + year = {2010} +} + +@Proceedings{ _MaSt10, + editor = {Stefan Mangard and Fran{\c{c}}ois{-}Xavier Standaert}, + title = {Cryptographic Hardware and Embedded Systems, {CHES} 2010, + 12th International Workshop, Santa Barbara, CA, USA, August + 17-20, 2010. Proceedings}, + series = {Lecture Notes in Computer Science}, + volume = {6225}, + publisher = {Springer}, + doi = {10.1007/978-3-642-15031-9}, + isbn = {978-3-642-15030-2}, + year = {2010} +} + +@Proceedings{ _Ny08, + editor = {Kaisa Nyberg}, + title = {Fast Software Encryption, 15th International Workshop, + {FSE} 2008, Lausanne, Switzerland, February 10-13, 2008, + Revised Selected Papers}, + series = {Lecture Notes in Computer Science}, + volume = {5086}, + publisher = {Springer}, + doi = {10.1007/978-3-540-71039-4}, + isbn = {978-3-540-71038-7}, + year = {2008} +} + +@Proceedings{ _SaCa12, + editor = {Reihaneh Safavi{-}Naini and Ran Canetti}, + title = {Advances in Cryptology - {CRYPTO} 2012 - 32nd Annual + Cryptology Conference, Santa Barbara, CA, USA, August + 19-23, 2012. Proceedings}, + series = {Lecture Notes in Computer Science}, + volume = {7417}, + publisher = {Springer}, + doi = {10.1007/978-3-642-32009-5}, + isbn = {978-3-642-32008-8}, + year = {2012} +} + +@Proceedings{ _Ta16, + editor = {Tsuyoshi Takagi}, + title = {Post-Quantum Cryptography - 7th International Workshop, + PQCrypto 2016, Fukuoka, Japan, February 24-26, 2016, + Proceedings}, + series = {Lecture Notes in Computer Science}, + volume = {9606}, + publisher = {Springer}, + doi = {10.1007/978-3-319-29360-8}, + isbn = {978-3-319-29359-2}, + year = {2016} +} + +@Book{ _WaAs19, + editor = {Andrew Waterman and Krste Asanovi\'c}, + title = {The {RISC}-{V} Instruction Set Manual, Volume I: + User-Level {ISA}}, + note = {Document Version 20191213}, + publisher = {RISC-V Foundation}, + url = {https://riscv.org/specifications/}, + month = {December}, + year = 2019 +} + +@Book{ _WaAs19A, + editor = {Andrew Waterman and Krste Asanovi\'c}, + title = {The {RISC}-{V} Instruction Set Manual, Volume II: + Privileged Architecture}, + note = {Document Version 20190608-Priv-MSU-Ratified}, + publisher = {RISC-V Foundation}, + url = {https://riscv.org/specifications/}, + month = {June}, + year = 2019 +} + +@Proceedings{ _WeKaKr:16, + editor = {Edgar R. Weippl and Stefan Katzenbeisser and Christopher + Kruegel and Andrew C. Myers and Shai Halevi}, + title = {Proceedings of the 2016 {ACM} {SIGSAC} Conference on + Computer and Communications Security, Vienna, Austria, + October 24-28, 2016}, + publisher = {{ACM}}, + url = {http://dl.acm.org/citation.cfm?id=2976749}, + isbn = {978-1-4503-4139-4}, + year = {2016} } @electronic{DEBUG_SPEC, title = {The RISC-V Debug Specification}, @@ -1785,3 +1785,20 @@ pages={109-136} year = {} } ~ +@article{HWASAN, + author = {Kostya Serebryany and + Evgenii Stepanov and + Aleksey Shlyapnikov and + Vlad Tsyrklevich and + Dmitry Vyukov}, + title = {Memory Tagging and how it improves {C/C++} memory safety}, + journal = {CoRR}, + volume = {abs/1802.09517}, + year = {2018}, + url = {http://arxiv.org/abs/1802.09517}, + eprinttype = {arXiv}, + eprint = {1802.09517}, + timestamp = {Mon, 13 Aug 2018 16:46:42 +0200}, + biburl = {https://dblp.org/rec/journals/corr/abs-1802-09517.bib}, + bibsource = {dblp computer science bibliography, https://dblp.org} +} diff --git a/src/riscv-privileged.adoc b/src/riscv-privileged.adoc index afe0883..78bdedf 100644 --- a/src/riscv-privileged.adoc +++ b/src/riscv-privileged.adoc @@ -1,9 +1,7 @@ -[[risc-v-isa]] +[[manual:priv,RISC-V ISA Manual Volume II: Privileged Architecture]] = The RISC-V Instruction Set Manual: Volume II: Privileged Architecture include::../docs-resources/global-config.adoc[] :description: Volume II - Privileged Architecture -:revnumber: 20240528 -//:revremark: Pre-release version //development: assume everything can change //stable: assume everything could change //frozen: of you implement this version you assume the risk that something might change because of the public review cycle but we expect little to no change. @@ -13,13 +11,15 @@ include::../docs-resources/global-config.adoc[] :appendix-caption: Appendix :imagesdir: ../docs-resources/images :title-logo-image: image:risc-v_logo.png["RISC-V International Logo",pdfwidth=3.25in,align=center] +ifdef::draft-watermark[] :page-background-image: image:draft.png[opacity=20%] +endif::[] //:title-page-background-image: none //:back-cover-image: image:backpage.png[opacity=25%] // Settings: :experimental: :reproducible: -:imagesoutdir: images +:imagesoutdir: {docdir}/../build/images-out :bibtex-file: src/resources/riscv-spec.bib :bibtex-order: alphabetical :bibtex-style: apa @@ -33,10 +33,7 @@ include::../docs-resources/global-config.adoc[] :sectnumlevels: 5 :toc: left :toclevels: 4 -:source-highlighter: pygments -ifdef::backend-pdf[] :source-highlighter: rouge -endif::[] :table-caption: Table :figure-caption: Figure :xrefstyle: short @@ -62,11 +59,10 @@ Avižienis, Jacob Bachmeyer, Allen J. Baum, Jonathan Behrens, Paolo Bonzini, Rus Christopher Celio, Chuanhua Chang, David Chisnall, Anthony Coulter, Palmer Dabbelt, Monte Dalrymple, Paul Donahue, Greg Favor, Dennis Ferguson, Marc Gauthier, Andy Glew, Gary Guo, Mike Frysinger, John Hauser, David Horner, Olof -Johansson, David Kruckemyer, Yunsup Lee, Daniel Lustig, Andrew Lutomirski, Prashanth Mundkur, -Jonathan Neuschäfer, Rishiyur +Johansson, David Kruckemyer, Yunsup Lee, Daniel Lustig, Andrew Lutomirski, Martin Maas, Prashanth Mundkur, Jonathan Neuschäfer, Rishiyur Nikhil, Stefan O'Rear, Albert Ou, John Ousterhout, David Patterson, Dmitri Pavlov, Kade Phillips, Josh Scheid, Colin Schmidt, Michael Taylor, Wesley Terpstra, Matt Thomas, Tommy Thorn, Ray -VanDeWalker, Megan Wachs, Steve Wallach, Andrew Waterman, Claire Wolf, +VanDeWalker, Megan Wachs, Steve Wallach, Andrew Waterman, Claire Wolf, Adam Zabrocki, and Reinoud Zandijk.._ _This document is released under a Creative Commons Attribution 4.0 International License._ @@ -87,12 +83,14 @@ include::smcntrpmf.adoc[] include::rnmi.adoc[] include::smcdeleg.adoc[] include::smdbltrp.adoc[] +include::smctr.adoc[] include::supervisor.adoc[] include::sstc.adoc[] include::sscofpmf.adoc[] include::hypervisor.adoc[] include::priv-cfi.adoc[] include::ssdbltrp.adoc[] +include::zpm.adoc[] include::priv-insns.adoc[] include::priv-history.adoc[] include::bibliography.adoc[] diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index a755403..8047aed 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -1,22 +1,22 @@ -[[risc-v-isa]] +[[manual:unpriv,RISC-V ISA Manual Volume I: Unprivileged Architecture]] = The RISC-V Instruction Set Manual Volume I: Unprivileged Architecture include::../docs-resources/global-config.adoc[] :description: Unprivileged Architecture -:revnumber: 20240411 -//:revremark: Pre-release version :colophon: :preface-title: Preamble :appendix-caption: Appendix :imagesdir: ../docs-resources/images :title-logo-image: image:risc-v_logo.png["RISC-V International Logo",pdfwidth=3.25in,align=center] +ifdef::draft-watermark[] :page-background-image: image:draft.png[opacity=20%] +endif::[] //:title-page-background-image: none //:back-cover-image: image:backpage.png[opacity=25%] :back-cover-image: image:riscv-horizontal-color.svg[opacity=25%] // Settings: :experimental: :reproducible: -:imagesoutdir: images +:imagesoutdir: {docdir}/../build/images-out :bibtex-file: src/resources/riscv-spec.bib :bibtex-order: alphabetical :bibtex-style: apa @@ -27,12 +27,10 @@ include::../docs-resources/global-config.adoc[] :example-caption: Example :listing-caption: Listing :sectnums: +:sectnumlevels: 5 :toc: left :toclevels: 5 -:source-highlighter: pygments -ifdef::backend-pdf[] :source-highlighter: rouge -endif::[] :table-caption: Table :figure-caption: Figure :xrefstyle: short @@ -86,6 +84,7 @@ Jan Gray, Gianluca Guida, Michael Hamburg, John Hauser, +Christian Herber, John Ingalls, David Horner, Bruce Hoult, @@ -107,6 +106,7 @@ Nathan Menhorn, Christoph Müllner, Joseph Myers, Vijayanand Nagarajan, +Torbjørn Viem Ness, Rishiyur Nikhil, Jonas Oberhauser, Stefan O'Rear, @@ -154,11 +154,11 @@ December 2019._ //the colophon allows for a section after the preamble that is part of the frontmatter and therefore not assigned a page number. include::colophon.adoc[] + include::intro.adoc[] include::rv32.adoc[] include::rv32e.adoc[] include::rv64.adoc[] -include::rv128.adoc[] include::zifencei.adoc[] include::zicsr.adoc[] include::counters.adoc[] @@ -184,23 +184,22 @@ include::zfinx.adoc[] include::c-st-ext.adoc[] include::zc.adoc[] include::b-st-ext.adoc[] -include::j-st-ext.adoc[] -include::p-st-ext.adoc[] include::v-st-ext.adoc[] include::scalar-crypto.adoc[] include::vector-crypto.adoc[] include::unpriv-cfi.adoc[] +include::zilsd.adoc[] include::rv-32-64g.adoc[] -include::extending.adoc[] include::naming.adoc[] -include::history.adoc[] include::mm-eplan.adoc[] include::mm-formal.adoc[] + //Appendices for Vector include::vector-examples.adoc[] include::calling-convention.adoc[] //include::fraclmul.adoc[] //End of Vector appendices + include::index.adoc[] // this is generated generated from index markers. include::bibliography.adoc[] diff --git a/src/rnmi.adoc b/src/rnmi.adoc index aef8e9d..b7468e8 100644 --- a/src/rnmi.adoc +++ b/src/rnmi.adoc @@ -1,11 +1,5 @@ [[rnmi]] -== "Smrnmi" Extension for Resumable Non-Maskable Interrupts, Version 0.5 - -[WARNING] -==== -*Warning! This frozen specification may change before being accepted as -standard by RISC-V International.* -==== +== "Smrnmi" Extension for Resumable Non-Maskable Interrupts, Version 1.0 The base machine-level architecture supports only unresumable non-maskable interrupts (UNMIs), where the NMI jumps to a handler in @@ -38,21 +32,21 @@ in `mtvec` as the RNMI exception trap handler. === RNMI CSRs -This proposal adds additional M-mode CSRs to enable a resumable +This extension adds additional M-mode CSRs to enable a resumable non-maskable interrupt (RNMI). .Resumable NMI scratch register `mnscratch` -include::images/bytefield/mnscratch.adoc[] +include::images/bytefield/mnscratch.edn[] The `mnscratch` CSR holds an MXLEN-bit read-write register which enables -the NMI trap handler to save and restore the context that was +the RNMI trap handler to save and restore the context that was interrupted. .Resumable NMI program counter `mnepc`. include::images/bytefield/mnepc.edn[] The `mnepc` CSR is an MXLEN-bit read-write register which on entry to -the NMI trap handler holds the PC of the instruction that took the +the RNMI trap handler holds the PC of the instruction that took the interrupt. The low bit of `mnepc` (`mnepc[0]`) is always zero. On implementations @@ -62,7 +56,7 @@ zero. If an implementation allows IALIGN to be either 16 or 32 (by changing CSR `misa`, for example), then, whenever IALIGN=32, bit `mnepc[1]` is masked on reads so that it appears to be 0. This masking occurs also for -the implicit read by the MRET instruction. Though masked, `mnepc[1]` +the implicit read by the MNRET instruction. Though masked, `mnepc[1]` remains writable when IALIGN=32. `mnepc` is a *WARL* register that must be able to hold all valid virtual @@ -74,10 +68,10 @@ of holding. .Resumable NMI cause `mncause`. include::images/bytefield/mncause.edn[] -The `mncause` CSR holds the reason for the NMI. -If the reason is an interrupt, bit MXLEN-1 is set to 1, and the NMI +The `mncause` CSR holds the reason for the RNMI. +If the reason is an interrupt, bit MXLEN-1 is set to 1, and the RNMI cause is encoded in the least-significant bits. -If the reason is an interrupt and NMI causes are not supported, bit MXLEN-1 is +If the reason is an interrupt and RNMI causes are not supported, bit MXLEN-1 is set to 1, and zero is written to the least-significant bits. If the reason is an exception within M-mode that results in a double trap as specified in the Smdbltrp extension, bit MXLEN-1 is set to 0 and the @@ -98,7 +92,7 @@ If the Zicfilp extension is implemented, `mnstatus` also holds the MNPELP field, which on entry to the RNMI trap handler holds the previous `ELP` state. When an RNMI trap is taken, MNPELP is set to `ELP` and `ELP` is set to 0. -`mnstatus` also holds the NMIE bit. When NMIE=1, nonmaskable interrupts +`mnstatus` also holds the NMIE bit. When NMIE=1, non-maskable interrupts are enabled. When NMIE=0, _all_ interrupts are disabled. When NMIE=0, the hart behaves as though `mstatus`.MPRV were clear, @@ -144,8 +138,8 @@ MNRET is an M-mode-only instruction that uses the values in `mnepc` and `mnstatus` to return to the program counter, privilege mode, and virtualization mode of the interrupted context. This instruction also sets `mnstatus`.NMIE. If MNRET changes the privilege mode to a mode less privileged than M, it also sets `mstatus`.MPRV to 0. -If the Zicfilp extension is implemented, then if `mnstatus`.MNPP holds the -value __y__, MNRET sets `ELP` to the logical AND of __y__LPE and `mnstatus`.MNPELP. +If the Zicfilp extension is implemented, then if the new privileged mode +is __y__, MNRET sets `ELP` to the logical AND of __y__LPE (see <<FCFIACT>>) and `mnstatus`.MNPELP. === RNMI Operation diff --git a/src/rv-32-64g.adoc b/src/rv-32-64g.adoc index 0464228..afc5d50 100644 --- a/src/rv-32-64g.adoc +++ b/src/rv-32-64g.adoc @@ -16,23 +16,19 @@ and RV64G. |=== |inst[4:2] .2+|000 .2+|001 .2+|010 .2+|011 .2+|100 .2+|101 .2+|110 .2+|111 (>32b) |inst[6:5] -|00 |LOAD |LOAD-FP |_custom-0_ |MISC-MEM |OP-IMM |AUIPC |OP-IMM-32 |48b -|01 |STORE |STORE-FP |_custom-1_ |AMO |OP |LUI |OP-32 |64b -|10 |MADD |MSUB |NMSUB |NMADD |OP-FP |OP-V |_custom-2/rv128_|48b -|11 |BRANCH |JALR |_reserved_ |JAL |SYSTEM |OP-VE |_custom-3/rv128_|≥80b +|00 |LOAD |LOAD-FP |_custom-0_ |MISC-MEM |OP-IMM |AUIPC |OP-IMM-32 |_reserved_ +|01 |STORE |STORE-FP |_custom-1_ |AMO |OP |LUI |OP-32 |_reserved_ +|10 |MADD |MSUB |NMSUB |NMADD |OP-FP |OP-V |_custom-2_ |_reserved_ +|11 |BRANCH |JALR |_reserved_ |JAL |SYSTEM |OP-VE |_custom-3_ |_reserved_ |=== <<opcodemap>> shows a map of the major opcodes for -RVG. Major opcodes with 3 or more lower bits set are reserved for -instruction lengths greater than 32 bits. Opcodes marked as _reserved_ +RVG. Opcodes marked as _reserved_ should be avoided for custom instruction-set extensions as they might be used by future standard extensions. Major opcodes marked as _custom-0_ -and _custom-1_ will be avoided by future standard extensions and are +through _custom-3_ will be avoided by future standard extensions and are recommended for use by custom instruction-set extensions within the base -32-bit instruction format. The opcodes marked _custom-2/rv128_ and -_custom-3/rv128_ are reserved for future use by RV128, but will -otherwise be avoided for standard extensions and so can also be used for -custom instruction-set extensions in RV32 and RV64. +32-bit instruction format. We believe RV32G and RV64G provide simple but complete instruction sets for a broad range of general-purpose computing. The optional compressed @@ -274,7 +270,7 @@ ISA. <<< [%autowidth.stretch,float="center",align="center",cols="^2m,^2m,^2m,^2m,<2m,>3m, <4m, >4m, <4m, >4m, <4m, >4m, <4m, >4m, <6m"] -|=== +|=== |31 |27 |26 |25 |24 | 20|19 | 15| 14 | 12|11 | 7|6 | 0| 4+^|funct7 2+^|rs2 2+^|rs1 2+^|funct3 2+^|rd 2+^|opcode <|R-type 2+^|rs3 2+^|funct2 2+^|rs2 2+^|rs1 2+^|funct3 2+^|rd 2+^|opcode <|R4-type @@ -322,12 +318,6 @@ ISA. 4+^|1101001 2+^|00010 2+^|rs1 2+^|rm 2+^|rd 2+^|1010011 <|FCVT.D.L 4+^|1101001 2+^|00011 2+^|rs1 2+^|rm 2+^|rd 2+^|1010011 <|FCVT.D.LU 4+^|1111001 2+^|00000 2+^|rs1 2+^|000 2+^|rd 2+^|1010011 <|FMV.D.X -15+^| - |31 |27 |26 |25 |24 | 20|19 | 15| 14 | 12|11 | 7|6 | 0| - 4+^|funct7 2+^|rs2 2+^|rs1 2+^|funct3 2+^|rd 2+^|opcode <|R-type - 2+^|rs3 2+^|funct2 2+^|rs2 2+^|rs1 2+^|funct3 2+^|rd 2+^|opcode <|R4-type - 6+^|imm[11:0] 2+^|rs1 2+^|funct3 2+^|rd 2+^|opcode <|I-type - 4+^|imm[11:5] 2+^|rs2 2+^|rs1 2+^|funct3 2+^|imm[4:0] 2+^|opcode <|S-type |=== <<< diff --git a/src/rv128.adoc b/src/rv128.adoc deleted file mode 100644 index 62af109..0000000 --- a/src/rv128.adoc +++ /dev/null @@ -1,76 +0,0 @@ -[[rv128]] -== RV128I Base Integer Instruction Set, Version 1.7 - -"There is only one mistake that can be made in computer design that is -difficult to recover from—not having enough address bits for memory -addressing and memory management." --- Bell and Strecker, ISCA-3, 1976. - -This chapter describes RV128I, a variant of the RISC-V ISA supporting a -flat 128-bit address space. The variant is a straightforward -extrapolation of the existing RV32I and RV64I designs. -(((RV128, design))) - -[TIP] -==== -The primary reason to extend integer register width is to support larger -address spaces. It is not clear when a flat address space larger than 64 -bits will be required. At the time of writing, the fastest supercomputer -in the world as measured by the Top500 benchmark had over 1PB of DRAM, and -would require over 50 bits of address space if all the DRAM resided in a -single address space. Some warehouse-scale computers already contain -even larger quantities of DRAM, and new dense solid-state non-volatile -memories and fast interconnect technologies might drive a demand for -even larger memory spaces. Exascale systems research is targeting 100PB memory -systems, which occupy 57 bits of address space. At historic rates of -growth, it is possible that greater than 64 bits of address space might -be required before 2030. -History suggests that whenever it becomes clear that more than 64 bits -of address space is needed, architects will repeat intensive debates -about alternatives to extending the address space, including -segmentation, 96-bit address spaces, and software workarounds, until, -finally, flat 128-bit address spaces will be adopted as the simplest and -best solution. -We have not frozen the RV128 spec at this time, as there might be need -to evolve the design based on actual usage of 128-bit address spaces. -==== -(((RV128, evolution))) -(((RV128I, as relates to RV64I))) - -RV128I builds upon RV64I in the same way RV64I builds upon RV32I, with -integer registers extended to 128 bits (i.e., XLEN=128). Most integer -computational instructions are unchanged as they are defined to operate -on XLEN bits. The RV64I "*W" integer instructions that operate on -32-bit values in the low bits of a register are retained but now sign -extend their results from bit 31 to bit 127. A new set of "*D" integer -instructions are added that operate on 64-bit values held in the low -bits of the 128-bit integer registers and sign extend their results from -bit 63 to bit 127. The "*D" instructions consume two major opcodes -(OP-IMM-64 and OP-64) in the standard 32-bit encoding. -(((RV128I, compatibility with RV64))) - -[NOTE] -==== -To improve compatibility with RV64, in a reverse of how RV32 to RV64 was -handled, we might change the decoding around to rename RV64I ADD as a -64-bit ADDD, and add a 128-bit ADDQ in what was previously the OP-64 -major opcode (now renamed the OP-128 major opcode). -==== - - -Shifts by an immediate (SLLI/SRLI/SRAI) are now encoded using the low 7 -bits of the I-immediate, and variable shifts (SLL/SRL/SRA) use the low 7 -bits of the shift amount source register. -(((RV128I, LOU))) - -A LDU (load double unsigned) instruction is added using the existing -LOAD major opcode, along with new LQ and SQ instructions to load and -store quadword values. SQ is added to the STORE major opcode, while LQ -is added to the MISC-MEM major opcode. - - -The floating-point instruction set is unchanged, although the 128-bit Q -floating-point extension can now support FMV.X.Q and FMV.Q.X -instructions, together with additional FCVT instructions to and from the -T (128-bit) integer format. - diff --git a/src/rv32.adoc b/src/rv32.adoc index 9714df4..15cc9fa 100644 --- a/src/rv32.adoc +++ b/src/rv32.adoc @@ -3,7 +3,7 @@ This chapter describes the RV32I base integer instruction set. -[TIP] +[NOTE] ==== RV32I was designed to be sufficient to form a compiler target and to support modern operating system environments. The ISA was also designed @@ -85,10 +85,7 @@ holds the address of the current instruction. 3+^| [.small]#x29# 3+^| [.small]#x30# 3+^| [.small]#x31# -3+^| [.small]#XLEN# -| [.small]#XLEN-1#| >| [.small]#0# -3+^| [.small]#pc# -3+^| [.small]#XLEN# +3+^| [.small]#pc# |=== [NOTE] ==== @@ -123,7 +120,7 @@ of loop unrolling, software pipelining, and cache tiling. For these reasons, we chose a conventional size of 32 integer registers for RV32I. Dynamic register usage tends to be dominated by a few -frequently accessed registers, and regfile implementations can be +frequently accessed registers, and register file implementations can be optimized to reduce access energy for the frequently accessed registers cite:[jtseng:sbbci]. The optional compressed 16-bit instruction format mostly only accesses 8 registers and hence can provide a dense instruction @@ -174,7 +171,7 @@ bits in the instruction and have been allocated to reduce hardware complexity. In particular, the sign bit for all immediates is always in bit 31 of the instruction to speed sign-extension circuitry. -include::images/wavedrom/instruction_formats.adoc[] +include::images/wavedrom/instruction-formats.edn[] [[base_instr,Base instruction formats]] RISC-V base instruction formats. Each immediate subfield is labeled with the bit position (imm[x]) in the immediate value being produced, rather than the bit position within the instruction's immediate field as is usually done. @@ -201,7 +198,7 @@ to keep the ISA as simple as possible. There are a further two variants of the instruction formats (B/J) based on the handling of immediates, as shown in <<baseinstformatsimm>>. -include::images/wavedrom/immediate_variants.adoc[] +include::images/wavedrom/immediate-variants.edn[] [[baseinstformatsimm,Base instruction formats immediate variants.]] //.RISC-V base instruction formats showing immediate variants. @@ -222,9 +219,19 @@ formats and with each other. <<immtypes>> shows the immediates produced by each of the base instruction formats, and is labeled to show which instruction bit (inst[_y_]) produces each bit of the immediate value. + [[immtypes, Immediate types]] -.Types of immediate produced by RISC-V instructions. -include::images/wavedrom/immediate.adoc[] +include::images/wavedrom/i-immediate.edn[] + +include::images/wavedrom/s-immediate.edn[] + +include::images/wavedrom/b-immediate.edn[] + +include::images/wavedrom/u-immediate.edn[] + +.Types of immediate produced by RISC-V instructions. +include::images/wavedrom/j-immediate.edn[] + The fields are labeled with the instruction bits used to construct their value. Sign extensions always uses inst[31]. @@ -240,8 +247,8 @@ branch and jump calculations and so would not benefit from keeping the location of immediate bits constant across types of instruction, we wanted to reduce the hardware cost of the simplest implementations. By rotating bits in the instruction encoding of B and J immediates instead -of using dynamic hardware muxes to multiply the immediate by 2, we -reduce instruction signal fanout and immediate mux costs by around a +of using dynamic hardware multiplexers to multiply the immediate by 2, we +reduce instruction signal fanout and immediate multiplexer costs by around a factor of 2. The scrambled immediate encoding will add negligible time to static or ahead-of-time compilation. For dynamic generation of instructions, there is some small additional overhead, but the most @@ -258,7 +265,7 @@ destination is register _rd_ for both register-immediate and register-register instructions. No integer computational instructions cause arithmetic exceptions. -[TIP] +[NOTE] ==== We did not include special instruction-set support for overflow checks on integer arithmetic operations in the base instruction set, as many @@ -291,7 +298,7 @@ comparing the results of ADD and ADDW on the operands. ==== Integer Register-Immediate Instructions -include::images/wavedrom/integer_computational.adoc[] +include::images/wavedrom/integer-computational.edn[] //.Integer Computational Instructions ADDI adds the sign-extended 12-bit immediate to register _rs1_. @@ -312,7 +319,7 @@ XOR on register _rs1_ and the sign-extended 12-bit immediate and place the result in _rd_. Note, XORI _rd, rs1, -1_ performs a bitwise logical inversion of register _rs1_ (assembler pseudoinstruction NOT _rd, rs_). -include::images/wavedrom/int-comp-slli-srli-srai.adoc[] +include::images/wavedrom/int-comp-slli-srli-srai.edn[] [[int-comp-slli-srli-srai]] //.Integer register-immediate, SLLI, SRLI, SRAI @@ -324,7 +331,7 @@ shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits). -include::images/wavedrom/int-comp-lui-aiupc.adoc[] +include::images/wavedrom/int-comp-lui-aiupc.edn[] [[int-comp-lui-aiupc]] //.Integer register-immediate, U-immediate @@ -364,7 +371,7 @@ the _rs1_ and _rs2_ registers as source operands and write the result into register _rd_. The _funct7_ and _funct3_ fields select the type of operation. -include::images/wavedrom/int_reg-reg.adoc[] +include::images/wavedrom/int-reg-reg.edn[] [[int-reg-reg]] //.Integer register-register @@ -383,7 +390,7 @@ the lower 5 bits of register _rs2_. ==== NOP Instruction -include::images/wavedrom/nop.adoc[] +include::images/wavedrom/nop.edn[] [[nop]] //.NOP instructions @@ -444,7 +451,7 @@ than the regular link register. Plain unconditional jumps (assembler pseudoinstruction J) are encoded as a JAL with _rd_=`x0`. -include::images/wavedrom/ct-unconditional.adoc[] +include::images/wavedrom/ct-unconditional.edn[] [[ct-unconditional]] //.The unconditional-jump instruction, JAL @@ -456,7 +463,13 @@ instruction following the jump (`pc`+4) is written to register _rd_. Register `x0` can be used as the destination if the result is not required. -include::images/wavedrom/ct-unconditional-2.adoc[] +Plain unconditional indirect jumps (assembler pseudoinstruction JR) are +encoded as a JALR with _rd_=`x0`. +Procedure returns in the standard calling convention (assembler +pseudoinstruction RET) are encoded as a JALR with _rd_=`x0`, _rs1_=`x1`, and +_imm_=0. + +include::images/wavedrom/ct-unconditional-2.edn[] [[ct-unconditional-2]] //.The indirect unconditional-jump instruction, JALR @@ -550,7 +563,7 @@ is sign-extended and added to the address of the branch instruction to give the target address. The conditional branch range is ±4 KiB. -include::images/wavedrom/ct-conditional.adoc[] +include::images/wavedrom/ct-conditional.edn[] [[ct-conditional]] //.Conditional branches @@ -581,7 +594,7 @@ a conditional branch instruction with an always-true condition. RISC-V jumps are also PC-relative and support a much wider offset range than branches, and will not pollute conditional-branch prediction tables. -[TIP] +[NOTE] ==== The conditional branches were designed to include arithmetic comparison operations between two registers (as also done in PA-RISC, Xtensa, and @@ -666,7 +679,7 @@ even though the load value is discarded. The EEI will define whether the memory system is little-endian or big-endian. In RISC-V, endianness is byte-address invariant. -[TIP] +[NOTE] ==== In a system for which endianness is byte-address invariant, the following property holds: if a byte is stored to memory at some address @@ -686,7 +699,7 @@ significance. Loads similarly transfer the contents of the greater memory byte addresses to the less-significant register bytes. ==== -include::images/wavedrom/load_store.adoc[] +include::images/wavedrom/load-store.edn[] [[load-store,load and store]] //.Load and store instructions @@ -731,7 +744,7 @@ by address misalignment result in a contained trap (allowing software running inside the execution environment to handle the trap) or a fatal trap (terminating execution). -[TIP] +[NOTE] ==== Misaligned accesses are occasionally required when porting legacy code, and help performance on applications when using any form of packed-SIMD @@ -775,11 +788,11 @@ are aligned. [[fence]] === Memory Ordering Instructions -include::images/wavedrom/mem_order.adoc[] +include::images/wavedrom/mem-order.edn[] [[mem-order]] //.Memory ordering instructions -The FENCE instruction is used to order device I/O and memory accesses as +FENCE instructions are used to order device I/O and memory accesses as viewed by other RISC-V harts and external devices or coprocessors. Any combination of device input (I), device output (O), memory reads \(R), and memory writes (W) may be ordered with respect to any combination of @@ -789,9 +802,9 @@ any operation in the _predecessor_ set preceding the FENCE. <<memorymodel>> provides a precise description of the RISC-V memory consistency model. -The FENCE instruction also orders memory reads and writes made by the +FENCE instructions also order memory reads and writes made by the hart as observed by memory reads and writes made by an external device. -However, FENCE does not order observations of events made by an external +However, FENCE instructions do not order observations of events made by an external device using any other signaling mechanism. [NOTE] @@ -800,7 +813,7 @@ A device might observe an access to a memory location via some external communication mechanism, e.g., a memory-mapped control register that drives an interrupt signal to an interrupt controller. This communication is outside the scope of the FENCE ordering mechanism and -hence the FENCE instruction can provide no guarantee on when a change in +hence FENCE instructions can provide no guarantee on when a change in the interrupt signal is visible to the interrupt controller. Specific devices might provide additional ordering guarantees to reduce software overhead but those are outside the scope of the RISC-V memory model. @@ -814,23 +827,23 @@ memory-mapped I/O devices will typically be accessed with uncached loads and stores that are ordered using the I and O bits rather than the R and W bits. Instruction-set extensions might also describe new I/O instructions that will also be ordered using the I and O bits in a -FENCE. +FENCE instruction. [[fm]] [float="center",align="center",cols="^1,^1,<3",options="header"] .Fence mode encoding |=== -|_fm_ field |Mnemonic |Meaning +|_fm_ field |Mnemonic suffix|Meaning |0000 |_none_ |Normal Fence -|1000 |TSO |With `FENCE RW,RW`: exclude write-to-read ordering; otherwise: _Reserved for future use._ -2+|_other_ |_Reserved for future use._ +|1000 |.TSO |With `FENCE RW,RW`: exclude write-to-read ordering; otherwise: _Reserved for future use._ +|_other_|_other_ |_Reserved for future use._ |=== -The fence mode field _fm_ defines the semantics of the `FENCE`. A `FENCE` -with _fm_=`0000` orders all memory operations in its predecessor set +The FENCE mode field _fm_ defines the semantics of the FENCE instruction. A `FENCE` +(with _fm_=`0000`) orders all memory operations in its predecessor set before all memory operations in its successor set. -The `FENCE.TSO` instruction is encoded as a `FENCE` instruction +A `FENCE.TSO` instruction is encoded as a FENCE instruction with _fm_=`1000`, _predecessor_=`RW`, and _successor_=`RW`. `FENCE.TSO` orders all load operations in its predecessor set before all memory operations in its successor set, and all store operations in its predecessor set @@ -840,20 +853,20 @@ store operations in the `FENCE.TSO's` predecessor set unordered with [NOTE] ==== -Because FENCE RW,RW imposes a superset of the orderings that FENCE.TSO -imposes, it is correct to ignore the _fm_ field and implement FENCE.TSO as FENCE RW,RW. +Because `FENCE RW,RW` imposes a superset of the orderings that `FENCE.TSO` +imposes, it is correct to ignore the _fm_ field and implement `FENCE.TSO` as `FENCE RW,RW`. ==== -The unused fields in the `FENCE` instructions--_rs1_ and _rd_--are reserved +The unused fields in the FENCE instructions--_rs1_ and _rd_--are reserved for finer-grain fences in future extensions. For forward compatibility, base implementations shall ignore these fields, and standard software shall zero these fields. Likewise, many _fm_ and predecessor/successor -set settings in <<fm>> are also reserved for future use. +set settings are also reserved for future use. Base implementations shall treat all such reserved configurations as -normal fences with _fm_=0000, and standard software shall use only +`FENCE` instructions (with _fm_=`0000`), and standard software shall use only non-reserved configurations. -[TIP] +[NOTE] ==== We chose a relaxed memory model to allow high performance from simple machine implementations and from likely future coprocessor or @@ -862,9 +875,10 @@ ordering to avoid unnecessary serialization within a device-driver hart and also to support alternative non-memory paths to control added coprocessors or I/O devices. Simple implementations may additionally ignore the _predecessor_ and _successor_ fields and always execute a -conservative fence on all operations. +conservative FENCE on all operations. ==== +[[ecall-ebreak]] === Environment Call and Breakpoints `SYSTEM` instructions are used to access system functionality that might require privileged access and are encoded using the I-type instruction @@ -875,7 +889,7 @@ described in <<csrinsts>>, and the base unprivileged instructions are described in the following section. -[TIP] +[NOTE] ==== The SYSTEM instructions are defined to allow simpler implementations to always trap to a single software trap handler. More sophisticated @@ -883,7 +897,7 @@ implementations might execute more of each system instruction in hardware. ==== -include::images/wavedrom/env_call-breakpoint.adoc[] +include::images/wavedrom/env-call-breakpoint.edn[] [[env-call]] //.Environment call and breakpoint instructions @@ -906,11 +920,11 @@ to reflect that they can be used more generally than to call a supervisor-level operating system or debugger. ==== -[TIP] +[NOTE] ==== EBREAK was primarily designed to be used by a debugger to cause execution to stop and fall back into the debugger. EBREAK is also used -by the standard gcc compiler to mark code paths that should not be +by the standard GCC compiler to mark code paths that should not be executed. Another use of EBREAK is to support "semihosting", where the execution @@ -924,7 +938,7 @@ to distinguish a semihosting EBREAK from a debugger inserted EBREAK. .... slli x0, x0, 0x1f # Entry NOP ebreak # Break to debugger - srai x0, x0, 7 # NOP encoding the semihosting call number 7 + srai x0, x0, 7 # Exit NOP .... Note that these three instructions must be 32-bit-wide instructions, @@ -958,7 +972,7 @@ performance counters. Implementations are always allowed to ignore the encoded hints. Most RV32I HINTs are encoded as integer computational instructions with -_rd_=x0. The other RV32I HINTs are encoded as FENCE instructions with +_rd_=`x0`. The other RV32I HINTs are encoded as FENCE instructions with a null predecessor or successor set and with _fm_=0. [NOTE] @@ -986,7 +1000,7 @@ HINT space is reserved for standard HINTs. The remainder of the HINT space is designated for custom HINTs: no standard HINTs will ever be defined in this subspace. -[TIP] +[NOTE] ==== We anticipate standard hints to eventually include memory-system spatial and temporal locality hints, branch prediction hints, thread-scheduling @@ -1017,11 +1031,15 @@ hints, security tags, and instrumentation flags for simulation/emulation. |ADD |_rd_=`x0`, _rs1_=`x0`, _rs2_≠``x2-x5`` | 28 -|ADD |_rd_=`x0`, _rs1_=`x0`, _rs2_=`x2-x5` |4|(_rs2_=`x2`) NTL.P1 + +|ADD |_rd_=`x0`, _rs1_=`x0`, _rs2_=`x2-x5` |4|(_rs2_=`x2`) NTL.P1 + (_rs2_=`x3`) NTL.PALL + (_rs2_=`x4`) NTL.S1 + (_rs2_=`x5`) NTL.ALL +|SLLI |_rd_=`x0`, _rs1_=`x0`, _shamt_=31 |1|Semihosting entry marker + +|SRAI |_rd_=`x0`, _rs1_=`x0`, _shamt_=7 |1|Semihosting exit marker + |SUB |_rd_=`x0` |latexmath:[$2^{10}$] .11+<.^m|_Designated for future standard use_ |AND |_rd_=`x0` |latexmath:[$2^{10}$] @@ -1046,20 +1064,25 @@ hints, security tags, and instrumentation flags for simulation/emulation. |FENCE |_rd_=_rs1_=`x0`, _fm_=0, _pred_=W, _succ_=0 |1 |PAUSE -4+| +4+| |SLTI |_rd_=`x0` |latexmath:[$2^{17}$] .7+<.^m|_Designated for custom use_ |SLTIU|_rd_=`x0` |latexmath:[$2^{17}$] -|SLLI |_rd_=`x0` |latexmath:[$2^{10}$] +|SLLI |_rd_=`x0`, and either _rs1_≠``x0`` or _shamt_≠31 |latexmath:[$2^{10}-1$] |SRLI |_rd_=`x0` |latexmath:[$2^{10}$] -|SRAI |_rd_=`x0` |latexmath:[$2^{10}$] +|SRAI |_rd_=`x0`, and either _rs1_≠``x0`` or _shamt_≠7 |latexmath:[$2^{10}-1$] |SLT |_rd_=`x0` |latexmath:[$2^{10}$] |SLTU |_rd_=`x0` |latexmath:[$2^{10}$] |=== +NOTE: `slli x0, x0, 0x1f` and `srai x0, x0, 7` were previously designated as +custom HINTs, but they have been appropriated for use in semihosting calls, as +described in <<ecall-ebreak>>. +To reflect their usage in practice, the base ISA spec has been changed to +designate them as standard HINTs. diff --git a/src/rv32e.adoc b/src/rv32e.adoc index c30b598..35c996f 100644 --- a/src/rv32e.adoc +++ b/src/rv32e.adoc @@ -22,7 +22,7 @@ RV64I are also compatible with RV32E and RV64E, respectively. RV32E and RV64E reduce the integer register count to 16 general-purpose registers, (`x0-x15`), where `x0` is a dedicated zero register. -[TIP] +[NOTE] ==== We have found that in the small RV32I core implementations, the upper 16 registers consume around one quarter of the total area of the core diff --git a/src/rv64.adoc b/src/rv64.adoc index 531158a..35dca3b 100644 --- a/src/rv64.adoc +++ b/src/rv64.adoc @@ -39,7 +39,7 @@ ensure reasonable performance for 32-bit values. ==== Integer Register-Immediate Instructions -include::images/wavedrom/rv64i-base-int.adoc[] +include::images/wavedrom/rv64i-base-int.edn[] [[rv64i-base-int]] //.RV64I register-immediate instructions @@ -50,7 +50,7 @@ immediate to register _rs1_ and produces the proper sign extension of a writes the sign extension of the lower 32 bits of register _rs1_ into register _rd_ (assembler pseudoinstruction SEXT.W). -include::images/wavedrom/rv64i-slli.adoc[] +include::images/wavedrom/rv64i-slli.edn[] [[rv64i-slli]] //.RV64I register-immediate (descr ADDIW) instructions @@ -67,7 +67,7 @@ copied into the vacated upper bits). (((RV64I, SRLIW))) (((RV64I, RV64I-only))) -include::images/wavedrom/rv64i-slliw.adoc[] +include::images/wavedrom/rv64i-slliw.edn[] [[rv64i-slliw]] SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously @@ -82,7 +82,7 @@ were defined to cause illegal-instruction exceptions, whereas now they are marked as reserved. This is a backwards-compatible change. ==== -include::images/wavedrom/rv64_lui-auipc.adoc[] +include::images/wavedrom/rv64-lui-auipc.edn[] [[rv64_lui-auipc]] //.RV64I register-immediate (descr) instructions @@ -108,7 +108,7 @@ with LD, AUIPC with JALR, etc. in RV64I is ==== Integer Register-Register Operations //this diagramdoesn't match the tex specification -include::images/wavedrom/rv64i_int-reg-reg.adoc[] +include::images/wavedrom/rv64i-int-reg-reg.edn[] [[int_reg-reg]] //.RV64I integer register-register instructions @@ -136,7 +136,7 @@ results to 64 bits. The shift amount is given by _rs2[4:0]_. RV64I extends the address space to 64 bits. The execution environment will define what portions of the address space are legal to access. -include::images/wavedrom/load_store.adoc[] +include::images/wavedrom/load-store.edn[] [[load_store]] //.Load and store instructions @@ -194,6 +194,10 @@ no standard HINTs will ever be defined in this subspace. (_rs2_=_x4_) NTL.S1 + (_rs2_=_x5_) NTL.ALL +|SLLI |_rd_=`x0`, _rs1_=`x0`, _shamt_=31 |1|Semihosting entry marker + +|SRAI |_rd_=`x0`, _rs1_=`x0`, _shamt_=7 |1|Semihosting exit marker + |SUB |_rd_=_x0_ |latexmath:[$2^{10}$] .16+.^| _Designated for future standard use_ |AND |_rd_=_x0_ |latexmath:[$2^{10}$] @@ -232,11 +236,11 @@ no standard HINTs will ever be defined in this subspace. |SLTIU |_rd_=_x0_ |latexmath:[$2^{17}$] -|SLLI |_rd_=_x0_ |latexmath:[$2^{11}$] +|SLLI |_rd_=`x0`, and either _rs1_≠``x0`` or _shamt_≠31 |latexmath:[$2^{11}-1$] -|SRLI |_rd_=_x0_ |latexmath:[$2^{11}$] +|SRLI |_rd_=`x0` |latexmath:[$2^{11}$] -|SRAI |_rd_=_x0_ |latexmath:[$2^{11}$] +|SRAI |_rd_=`x0`, and either _rs1_≠``x0`` or _shamt_≠7 |latexmath:[$2^{11}-1$] |SLLIW |_rd_=_x0_ |latexmath:[$2^{10}$] @@ -249,3 +253,8 @@ no standard HINTs will ever be defined in this subspace. |SLTU |_rd_=_x0_ |latexmath:[$2^{10}$] |=== +NOTE: `slli x0, x0, 0x1f` and `srai x0, x0, 7` were previously designated as +custom HINTs, but they have been appropriated for use in semihosting calls, as +described in <<ecall-ebreak>>. +To reflect their usage in practice, the base ISA spec has been changed to +designate them as standard HINTs. diff --git a/src/rvwmo.adoc b/src/rvwmo.adoc index d719a4e..70853e3 100644 --- a/src/rvwmo.adoc +++ b/src/rvwmo.adoc @@ -33,9 +33,9 @@ additional explanatory material. ==== This chapter defines the memory model for regular main memory operations. The interaction of the memory model with I/O memory, -instruction fetches, FENCE.I, page table walks, and SFENCE.VMA is not +instruction fetches, FENCE.I, page-table walks, and SFENCE.VMA is not (yet) formalized. Some or all of the above may be formalized in a future -revision of this specification. The RV128 base ISA and future ISA +revision of this specification. Future ISA extensions such as the V vector and J JIT extensions will need to be incorporated into a future revision as well. @@ -87,7 +87,7 @@ and a store operation simultaneously. [NOTE] ==== -Instructions in the RV128 base instruction set and in future ISA +Future ISA extensions such as *V* (vector) and *P* (SIMD) may give rise to multiple memory operations. However, the memory model for these extensions has not yet been formalized. @@ -503,25 +503,29 @@ register(s) to destination register(s) as specified |CSRRW‡ |_rs1_, _csr_^*^ | _rd_, _csr_ | |^*^unless _rd_=`x0` -|CSRRS‡ |_rs1_, _csr_ |_rd_ ^*^, _csr_ | |^*^unless _rs1_=`x0` +5+| ‡ carries a dependency from _rs1_ to _csr_ and from _csr_ to _rd_ -|CSRRC‡ |_rs1_, _csr_ |_rd_ ^*^, _csr_ | |^*^unless _rs1_=`x0` +|CSRRS‡ |_rs1_, _csr_ |_rd_, _csr_^*^ | |^*^unless _rs1_=`x0` -5+| ‡ carries a dependency from _rs1_ to _csr_ and from _csr_ to _rd_ +|CSRRC‡ |_rs1_, _csr_ |_rd_, _csr_^*^ | |^*^unless _rs1_=`x0` + +5+| ‡ carries a dependency from _csr_ and _rs1_ to _csr_ and from _csr_ to _rd_ |CSRRWI ‡ |_csr_ ^*^ |_rd_, _csr_ | |^*^unless _rd_=_x0_ +5+| ‡ carries a dependency from _csr_ to _rd_ + |CSRRSI ‡ |_csr_ |_rd_, _csr_^*^ | |^*^unless uimm[4:0]=0 |CSRRCI ‡ |_csr_ |_rd_, _csr_^*^ | |^*^unless uimm[4:0]=0 -5+| ‡ carries a dependency from _csr_ to _rd_ +5+| ‡ carries a dependency from _csr_ to _rd_ and _csr_ |=== .RV64I Base Integer Instruction Set [%autowidth.stretch,float="center",align="center",cols="<,<,<,<,<",options="header"] |=== -| |Source Registers |Destination Registers |Accumulating CSRs| +| |Source Registers |Destination Registers |Accumulating CSRs| |_LWU_ † |_rs1_ ^A^ |_rd_ | | @@ -778,7 +782,7 @@ register(s) to destination register(s) as specified |FCLASS.D |_rs1_ |_rd_ | | -|FCVT.W.D |_rs1_,^*^ |_rd_ |NV, NX |^*^if rm=111 +|FCVT.W.D |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 |FCVT.WU.D |_rs1_, frm^*^ |_rd_ |NV, NX |^*^if rm=111 @@ -807,4 +811,3 @@ register(s) to destination register(s) as specified |FMV.D.X |_rs1_ |_rd_ | | |=== - diff --git a/src/scalar-crypto.adoc b/src/scalar-crypto.adoc index b3de74a..d3634c4 100644 --- a/src/scalar-crypto.adoc +++ b/src/scalar-crypto.adoc @@ -1,3 +1,4 @@ +[[crypto_scalar_instructions]] == Cryptography Extensions: Scalar & Entropy Source Instructions, Version 1.0.1 === Changelog @@ -6,11 +7,11 @@ |=== | Version | Changes -| `v1.0.1` -| Fix typos to show that +| `v1.0.1` +| Fix typos to show that `c.srli`, `c.srai`, and `c.slli` are Zkt instructions in RV64. -| `v1.0.0` +| `v1.0.0` | Initial Release |=== @@ -33,12 +34,6 @@ This is found in <<crypto_scalar_es>>. It also contains a mechanism allowing core implementers to provide _"Constant Time Execution"_ guarantees in <<crypto_scalar_zkt>>. -A companion document _Volume II: Vector Instructions_, describes -instruction proposals which build on the RISC-V Vector Extension. -The Vector Cryptography extension is currently a work in progress -waiting for the base Vector extension to stabilise. -We expect to pick up this work in earnest in Q4-2021 or Q1-2022. - [[crypto_scalar_audience]] ==== Intended Audience @@ -104,7 +99,7 @@ but they are the ones we considered most while writing it. [[crypto_scalar_sail_specifications]] ==== Sail Specifications -RISC-V maintains a +RISC-V maintains a link:https://github.com/riscv/sail-riscv[formal model] of the ISA specification, implemented in the Sail ISA specification language @@ -124,12 +119,12 @@ calls to supporting functions which are too verbose to include directly in the specification. This supporting code is listed in <<crypto_scalar_appx_sail>>. -The -link:https://github.com/rems-project/sail/blob/sail2/manual.pdf[Sail Manual] +The +link:https://alasdair.github.io/manual.html[Sail Manual] is recommended reading in order to best understand the code snippets. Note that this document contains only a subset of the formal model: refer to -the formal model Github +the formal model GitHub link:https://github.com/riscv/sail-riscv[repository] for the complete model. @@ -150,7 +145,7 @@ policies: where recommended (but not required) instruction sequences for performing particular tasks are given as an example, such that both hardware and software implementers can optimise for only a single use-case. - + * The extension will be designed to support _existing_ standardised cryptographic constructs well. It will not try to support proposed standards, or cryptographic @@ -161,7 +156,7 @@ policies: standard extension. It is anticipated that the NIST Lightweight Cryptography contest and the NIST Post-Quantum Cryptography contest may be dealt with this way, depending on timescales. - + * Historically, there has been some discussion cite:[LSYRR:04] on how newly supported operations in general-purpose computing might @@ -169,7 +164,7 @@ policies: The standard will not try to anticipate new useful low-level operations which _may_ be useful as building blocks for future cryptographic constructs. - + * Regarding side-channel countermeasures: Where relevant, proposed instructions must aim to remove the possibility of any timing side-channels. @@ -210,119 +205,30 @@ protocols, while ShangMi ciphers are required for use in China. [[zbkb-sc,Zbkb-sc]] ==== `Zbkb` - Bitmanip instructions for Cryptography -These are a subset of the Bitmanipulation Extension `Zbb` which are -particularly useful for Cryptography. - -NOTE: Some of these instructions are defined in the first Bitmanip -ratification package, and some are not ( -<<insns-pack-sc,pack>>, -<<insns-packh-sc,packh>>, -<<insns-packw-sc,packw>>, -<<insns-brev8,brev8>>, -<<insns-zip-sc,zip>>, -<<insns-unzip-sc,unzip>>). -All of the instructions in <<zbkb-sc>> have their complete specification included -in this document, including those _not_ present in the initial -Bitmanip ratification package. -This is to make the present specification complete as a standalone document. -Inevitably there might be small divergences between the Bitmanip and -Scalar Cryptography specification documents as they move at different -paces. -When this happens, assume that the Bitmanip specification has the -most up-to-date version of Bitmanip instructions. -This is an unfortunate but necessary stop-gap while Scalar Cryptography -and Bitmanip are being rapidly iterated on prior to public review. - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -| ✓ | ✓ | ror | <<insns-ror-sc>> -| ✓ | ✓ | rol | <<insns-rol-sc>> -| ✓ | ✓ | rori | <<insns-rori-sc>> -| | ✓ | rorw | <<insns-rorw-sc>> -| | ✓ | rolw | <<insns-rolw-sc>> -| | ✓ | roriw | <<insns-roriw-sc>> -| ✓ | ✓ | andn | <<insns-andn-sc>> -| ✓ | ✓ | orn | <<insns-orn-sc>> -| ✓ | ✓ | xnor | <<insns-xnor-sc>> -| ✓ | ✓ | pack | <<insns-pack-sc>> -| ✓ | ✓ | packh | <<insns-packh-sc>> -| | ✓ | packw | <<insns-packw-sc>> -| ✓ | ✓ | brev8 | <<insns-brev8>> -| ✓ | ✓ | rev8 | <<insns-rev8-sc>> -| ✓ | | zip | <<insns-zip-sc>> -| ✓ | | unzip | <<insns-unzip-sc>> -|=== +This extension contains bit-manipulation instructions that are particularly +useful for cryptography, most of which are also in the `Zbb` extension. +Please refer to <<b-st-ext.adoc#zbkb>>. [[zbkc-sc,Zbkc-sc]] ==== `Zbkc` - Carry-less multiply instructions Constant time carry-less multiply for Galois/Counter Mode. -These are separated from the <<zbkb-sc>> because they +These are separated from the <<b-st-ext.adoc#zbkb>> because they have a considerable implementation overhead which cannot be amortised across other instructions. -NOTE: These instructions are defined in the first Bitmanip -ratification package for the `Zbc` extension. -All of the instructions in <<zbkc-sc>> have their complete specification included -in this document, including those _not_ present in the initial -Bitmanip ratification package. -This is to make the present specification complete as a standalone document. -Inevitably there might be small divergences between the Bitmanip and -Scalar Cryptography specification documents as they move at different -paces. -When this happens, assume that the Bitmanip specification has the -most up-to-date version of Bitmanip instructions. -This is an unfortunate but necessary stop-gap while Scalar Cryptography -and Bitmanip are being rapidly iterated on prior to public review. - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -| ✓ | ✓ | clmul | <<insns-clmul>> -| ✓ | ✓ | clmulh | <<insns-clmulh-sc>> -|=== +Please refer to <<b-st-ext.adoc#zbkc>>. [[zbkx-sc,Zbkx-sc]] ==== `Zbkx` - Crossbar permutation instructions These instructions are useful for implementing SBoxes in constant time, and potentially with DPA protections. -These are separated from the <<zbkb-sc>> because they +These are separated from the <<b-st-ext.adoc#zbkb>> because they have an implementation overhead which cannot be amortised across other instructions. -NOTE: All of these instructions are missing from the first Bitmanip -ratification package. -Hence, all of the instructions in <<zbkx-sc>> have their complete specification -included in this document. -This is to make the present specification complete as a standalone document. -Inevitably there might be small divergences between the Bitmanip and -Scalar Cryptography specification documents as they move at different -paces. -When this happens, assume that the Bitmanip specification has the -most up-to-date version of Bitmanip instructions. -This is an unfortunate but necessary stop-gap while Scalar Cryptography -and Bitmanip are being rapidly iterated on prior to public review. - -[%header,cols="^1,^1,4,8"] -|=== -|RV32 -|RV64 -|Mnemonic -|Instruction - -| ✓ | ✓ | xperm8 | <<insns-xperm8>> -| ✓ | ✓ | xperm4 | <<insns-xperm4>> -|=== +Please refer to <<b-st-ext.adoc#zbkx>>. [[zknd,Zknd]] ==== `Zknd` - NIST Suite: AES Decryption @@ -443,9 +349,9 @@ Instructions for accelerating the SM3 hash function. [[zkr,Zkr]] ==== `Zkr` - Entropy Source Extension -The entropy source extension defines the `seed` CSR at address `0x015`. +The entropy source extension defines the `seed` CSR at address `0x015`. This CSR provides up to 16 physical `entropy` bits that can be used to -seed cryptographic random bit generators. +seed cryptographic random bit generators. See <<crypto_scalar_es>> for the normative specification and access control notes. <<crypto_scalar_appx_es>> contains design rationale and further @@ -544,7 +450,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction sources a single byte from `rs2` according to `bs`. To this it applies the inverse AES SBox operation, and XOR's the result with `rs1`. @@ -606,7 +512,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction sources a single byte from `rs2` according to `bs`. To this it applies the inverse AES SBox operation, and a partial inverse MixColumn, before XOR'ing the result with `rs1`. @@ -669,7 +575,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction sources a single byte from `rs2` according to `bs`. To this it applies the forward AES SBox operation, before XOR'ing the result with `rs1`. @@ -731,7 +637,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction sources a single byte from `rs2` according to `bs`. To this it applies the forward AES SBox operation, and a partial forward MixColumn, before XOR'ing the result with `rs1`. @@ -794,7 +700,7 @@ Encoding:: ]} .... -Description:: +Description:: Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next round output, applying the Inverse ShiftRows and SubBytes steps. @@ -868,7 +774,7 @@ Encoding:: ]} .... -Description:: +Description:: Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next round output, applying the Inverse ShiftRows, SubBytes and MixColumns steps. @@ -943,7 +849,7 @@ Encoding:: ]} .... -Description:: +Description:: Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next round output, applying the ShiftRows and SubBytes steps. @@ -1017,7 +923,7 @@ Encoding:: ]} .... -Description:: +Description:: Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next round output, applying the ShiftRows, SubBytes and MixColumns steps. @@ -1093,7 +999,7 @@ Encoding:: ]} .... -Description:: +Description:: The instruction applies the inverse MixColumns transformation to two columns of the state array, packed into a single 64-bit register. @@ -1159,7 +1065,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction implements the rotation, SubBytes and Round Constant addition steps of the AES block cipher Key Schedule. This instruction must _always_ be implemented such that its execution @@ -1232,7 +1138,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction implements the additional XOR'ing of key words as part of the AES block cipher Key Schedule. This instruction must _always_ be implemented such that its execution @@ -1294,7 +1200,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs the bitwise logical AND operation between _rs1_ and the bitwise inversion of _rs2_. Operation:: @@ -1321,14 +1227,14 @@ Included in:: <<< -[#insns-brev8,reftext="Reverse bits in bytes"] +[#insns-brev8-sc,reftext="Reverse bits in bytes"] ==== brev8 Synopsis:: Reverse the bits in each byte of a source register. Mnemonic:: -brev8, _rd_, _rs_ +brev8 _rd_, _rs_ Encoding:: [wavedrom, , svg] @@ -1336,29 +1242,21 @@ Encoding:: {reg:[ { bits: 7, name: 0x13, attr: ['OP-IMM'] }, { bits: 5, name: 'rd' }, - { bits: 3, name: 0x65 }, + { bits: 3, name: 0x5 }, { bits: 5, name: 'rs' }, - { bits: 12, name: 0x687 }, + { bits: 12, name: 0x687 } ]} .... -Description:: +Description:: This instruction reverses the order of the bits in every byte of a register. -[NOTE] -==== -This instruction is a specific encoding of a more generic instruction which was originally -proposed as part of the RISC-V Bitmanip extension (grevi). Eventually, the more generic -instruction may be standardised. Until then, only the most common instances of it, such as -this, are being included in specifications. -==== - Operation:: [source,sail] -- result : xlenbits = EXTZ(0b0); foreach (i from 0 to sizeof(xlen) by 8) { -result[i+7..i] = reverse_bits_in_byte(X(rs1)[i+7..i]); + result[i+7..i] = reverse_bits_in_byte(X(rs1)[i+7..i]); }; X(rd) = result; -- @@ -1517,7 +1415,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs the bitwise logical OR operation between _rs1_ and the bitwise inversion of _rs2_. Operation:: @@ -1566,7 +1464,7 @@ Encoding:: ]} .... -Description:: +Description:: The pack instruction packs the XLEN/2-bit lower halves of _rs1_ and _rs2_ into _rd_, with _rs1_ in the lower half and _rs2_ in the upper half. @@ -1614,7 +1512,7 @@ Encoding:: ]} .... -Description:: +Description:: And the packh instruction packs the least-significant bytes of _rs1_ and _rs2_ into the 16 least-significant bits of _rd_, zero extending the rest of _rd_. @@ -1664,7 +1562,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction packs the low 16 bits of _rs1_ and _rs2_ into the 32 least-significant bits of _rd_, sign extending the 32-bit result to the rest of _rd_. @@ -1725,7 +1623,7 @@ Encoding (RV64):: ]} .... -Description:: +Description:: This instruction reverses the order of the bytes in _rs_. Operation:: @@ -1854,7 +1752,7 @@ Encoding:: Description:: This instruction performs a rotate left on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. -The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits. +The resulting word value is sign-extended by copying bit 31 to all of the more-significant bits. Operation:: [source,sail] @@ -1972,7 +1870,7 @@ Encoding (RV64):: ]} .... -Description:: +Description:: This instruction performs a rotate right of _rs1_ by the amount in the least-significant log2(XLEN) bits of _shamt_. For RV32, the encodings corresponding to shamt[5]=1 are reserved. @@ -2027,7 +1925,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs a rotate right on the least-significant word of _rs1_ by the amount in the least-significant log2(XLEN) bits of _shamt_. @@ -2083,7 +1981,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs a rotate right on the least-significant word of _rs1_ by the amount in least-significant 5 bits of _rs2_. The resultant word is sign-extended by copying bit 31 to all of the more-significant bits. @@ -2119,7 +2017,7 @@ Included in:: Synopsis:: Implements the Sigma0 transformation function as used in -the SHA2-256 hash function cite:[nist:fips:180:4] (Section 4.1.2). +the SHA2-256 hash function cite:[nist:fips:180:4]. Mnemonic:: sha256sig0 rd, rs1 @@ -2138,13 +2036,13 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for both RV32 and RV64 base architectures. For RV32, the entire `XLEN` source register is operated on. For RV64, the low `32` bits of the source register are operated on, and the result sign extended to `XLEN` bits. Though named for SHA2-256, the instruction works for both the -SHA2-224 and SHA2-256 parameterisations as described in +SHA2-224 and SHA2-256 parameterizations as described in cite:[nist:fips:180:4]. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. @@ -2185,7 +2083,7 @@ Included in:: Synopsis:: Implements the Sigma1 transformation function as used in -the SHA2-256 hash function cite:[nist:fips:180:4] (Section 4.1.2). +the SHA2-256 hash function cite:[nist:fips:180:4]. Mnemonic:: sha256sig1 rd, rs1 @@ -2204,13 +2102,13 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for both RV32 and RV64 base architectures. For RV32, the entire `XLEN` source register is operated on. For RV64, the low `32` bits of the source register are operated on, and the result sign extended to `XLEN` bits. Though named for SHA2-256, the instruction works for both the -SHA2-224 and SHA2-256 parameterisations as described in +SHA2-224 and SHA2-256 parameterizations as described in cite:[nist:fips:180:4]. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. @@ -2251,7 +2149,7 @@ Included in:: Synopsis:: Implements the Sum0 transformation function as used in -the SHA2-256 hash function cite:[nist:fips:180:4] (Section 4.1.2). +the SHA2-256 hash function cite:[nist:fips:180:4]. Mnemonic:: sha256sum0 rd, rs1 @@ -2270,13 +2168,13 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for both RV32 and RV64 base architectures. For RV32, the entire `XLEN` source register is operated on. For RV64, the low `32` bits of the source register are operated on, and the result sign extended to `XLEN` bits. Though named for SHA2-256, the instruction works for both the -SHA2-224 and SHA2-256 parameterisations as described in +SHA2-224 and SHA2-256 parameterizations as described in cite:[nist:fips:180:4]. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. @@ -2317,7 +2215,7 @@ Included in:: Synopsis:: Implements the Sum1 transformation function as used in -the SHA2-256 hash function cite:[nist:fips:180:4] (Section 4.1.2). +the SHA2-256 hash function cite:[nist:fips:180:4]. Mnemonic:: sha256sum1 rd, rs1 @@ -2336,13 +2234,13 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for both RV32 and RV64 base architectures. For RV32, the entire `XLEN` source register is operated on. For RV64, the low `32` bits of the source register are operated on, and the result sign extended to `XLEN` bits. Though named for SHA2-256, the instruction works for both the -SHA2-224 and SHA2-256 parameterisations as described in +SHA2-224 and SHA2-256 parameterizations as described in cite:[nist:fips:180:4]. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. @@ -2383,7 +2281,7 @@ Included in:: Synopsis:: Implements the _high half_ of the Sigma0 transformation, as -used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +used in the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sig0h rd, rs1, rs2 @@ -2402,7 +2300,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is implemented on RV32 only. Used to compute the Sigma0 transform of the SHA2-512 hash function in conjunction with the <<insns-sha512sig0l,`sha512sig0l`>> instruction. @@ -2411,14 +2309,14 @@ are each represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sigma0 transform for SHA2-512 may be computed on RV32 using the following instruction sequence: - sha512sig0l t0, a0, a1 - sha512sig0h t1, a1, a0 + sha512sig0l t0, a0, a1 + sha512sig0h t1, a1, a0 ==== @@ -2457,7 +2355,7 @@ Included in:: Synopsis:: Implements the _low half_ of the Sigma0 transformation, as -used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +used in the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sig0l rd, rs1, rs2 @@ -2476,7 +2374,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is implemented on RV32 only. Used to compute the Sigma0 transform of the SHA2-512 hash function in conjunction with the <<insns-sha512sig0h,`sha512sig0h`>> instruction. @@ -2485,14 +2383,14 @@ are each represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sigma0 transform for SHA2-512 may be computed on RV32 using the following instruction sequence: - sha512sig0l t0, a0, a1 - sha512sig0h t1, a1, a0 + sha512sig0l t0, a0, a1 + sha512sig0h t1, a1, a0 ==== @@ -2531,7 +2429,7 @@ Included in:: Synopsis:: Implements the _high half_ of the Sigma1 transformation, as -used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +used in the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sig1h rd, rs1, rs2 @@ -2550,7 +2448,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is implemented on RV32 only. Used to compute the Sigma1 transform of the SHA2-512 hash function in conjunction with the <<insns-sha512sig1l,`sha512sig1l`>> instruction. @@ -2559,13 +2457,13 @@ are each represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sigma1 transform for SHA2-512 may be computed on RV32 using the following instruction sequence: - sha512sig1l t0, a0, a1 + sha512sig1l t0, a0, a1 sha512sig1h t1, a1, a0 ==== @@ -2605,7 +2503,7 @@ Included in:: Synopsis:: Implements the _low half_ of the Sigma1 transformation, as -used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +used in the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sig1l rd, rs1, rs2 @@ -2624,7 +2522,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is implemented on RV32 only. Used to compute the Sigma1 transform of the SHA2-512 hash function in conjunction with the <<insns-sha512sig1h,`sha512sig1h`>> instruction. @@ -2633,13 +2531,13 @@ are each represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sigma1 transform for SHA2-512 may be computed on RV32 using the following instruction sequence: - sha512sig1l t0, a0, a1 + sha512sig1l t0, a0, a1 sha512sig1h t1, a1, a0 ==== @@ -2679,7 +2577,7 @@ Included in:: Synopsis:: Implements the Sum0 transformation, as -used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +used in the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sum0r rd, rs1, rs2 @@ -2698,7 +2596,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is implemented on RV32 only. Used to compute the Sum0 transform of the SHA2-512 hash function. The transform is a 64-bit to 64-bit function, so the input and output @@ -2706,14 +2604,14 @@ is represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sum0 transform for SHA2-512 may be computed on RV32 using the following instruction sequence: - sha512sum0r t0, a0, a1 - sha512sum0r t1, a1, a0 + sha512sum0r t0, a0, a1 + sha512sum0r t1, a1, a0 Note the reversed source register ordering. ==== @@ -2753,7 +2651,7 @@ Included in:: Synopsis:: Implements the Sum1 transformation, as -used in the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +used in the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sum1r rd, rs1, rs2 @@ -2772,7 +2670,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is implemented on RV32 only. Used to compute the Sum1 transform of the SHA2-512 hash function. The transform is a 64-bit to 64-bit function, so the input and output @@ -2780,14 +2678,14 @@ is represented by two 32-bit registers. This instruction must _always_ be implemented such that its execution latency does not depend on the data being operated on. -[TIP] +[NOTE] .Note to software developers ==== The entire Sum1 transform for SHA2-512 may be computed on RV32 using the following instruction sequence: - sha512sum1r t0, a0, a1 - sha512sum1r t1, a1, a0 + sha512sum1r t0, a0, a1 + sha512sum1r t1, a1, a0 Note the reversed source register ordering. ==== @@ -2827,7 +2725,7 @@ Included in:: Synopsis:: Implements the Sigma0 transformation function as used in -the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sig0 rd, rs1 @@ -2846,7 +2744,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for the RV64 base architecture. It implements the Sigma0 transform of the SHA2-512 hash function. cite:[nist:fips:180:4]. @@ -2887,7 +2785,7 @@ Included in:: Synopsis:: Implements the Sigma1 transformation function as used in -the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sig1 rd, rs1 @@ -2906,7 +2804,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for the RV64 base architecture. It implements the Sigma1 transform of the SHA2-512 hash function. cite:[nist:fips:180:4]. @@ -2947,7 +2845,7 @@ Included in:: Synopsis:: Implements the Sum0 transformation function as used in -the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sum0 rd, rs1 @@ -2966,7 +2864,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for the RV64 base architecture. It implements the Sum0 transform of the SHA2-512 hash function. cite:[nist:fips:180:4]. @@ -3007,7 +2905,7 @@ Included in:: Synopsis:: Implements the Sum1 transformation function as used in -the SHA2-512 hash function cite:[nist:fips:180:4] (Section 4.1.3). +the SHA2-512 hash function cite:[nist:fips:180:4]. Mnemonic:: sha512sum1 rd, rs1 @@ -3026,7 +2924,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for the RV64 base architecture. It implements the Sum1 transform of the SHA2-512 hash function. cite:[nist:fips:180:4]. @@ -3086,7 +2984,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for the RV32 and RV64 base architectures. It implements the _P0_ transform of the SM3 hash function cite:[gbt:sm3,iso:sm3]. This instruction must _always_ be implemented such that its execution @@ -3150,7 +3048,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction is supported for the RV32 and RV64 base architectures. It implements the _P1_ transform of the SM3 hash function cite:[gbt:sm3,iso:sm3]. This instruction must _always_ be implemented such that its execution @@ -3214,7 +3112,7 @@ Encoding:: ]} .... -Description:: +Description:: Implements a T-tables in hardware style approach to accelerating the SM4 round function. A byte is extracted from `rs2` based on `bs`, to which the SBox and @@ -3283,7 +3181,7 @@ Encoding:: ]} .... -Description:: +Description:: Implements a T-tables in hardware style approach to accelerating the SM4 Key Schedule. A byte is extracted from `rs2` based on `bs`, to which the SBox and @@ -3331,7 +3229,8 @@ Included in:: ==== unzip Synopsis:: -Implements the inverse of the zip instruction. +Place odd and even bits of the source register into upper and lower halves of +the destination register, respectively. Mnemonic:: unzip _rd_, _rs_ @@ -3345,14 +3244,14 @@ Encoding:: {bits: 5, name: 'rd'}, {bits: 3, name: 0x5}, {bits: 5, name: 'rs1'}, -{bits: 5, name: 0x1f}, +{bits: 5, name: 0xf}, {bits: 7, name: 0x4}, ]} .... -Description:: -This instruction gathers bits from the high and low halves of the source -word into odd/even bit positions in the destination word. +Description:: +This instruction scatters all of the odd and even bits of a source word into +the high and low halves of a destination word. It is the inverse of the <<insns-zip-sc,zip>> instruction. This instruction is available only on RV32. @@ -3410,7 +3309,7 @@ Encoding:: ]} .... -Description:: +Description:: This instruction performs the bit-wise exclusive-NOR operation on _rs1_ and _rs2_. Operation:: @@ -3436,48 +3335,51 @@ Included in:: |=== <<< -[#insns-xperm8,reftext="Crossbar permutation (bytes)"] +[#insns-xperm8-sc,reftext="Crossbar permutation (bytes)"] ==== xperm8 Synopsis:: -Byte-wise lookup of indicies into a vector. +Byte-wise lookup of indices into a vector in registers. Mnemonic:: -xprem8 _rd_, _rs1_, _rs2_ +xperm8 _rd_, _rs1_, _rs2_ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 2, name: 0x3 }, - { bits: 5, name: 0xC }, - { bits: 5, name: 'rd'}, - { bits: 3, name: 0x4 }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x14 }, +{bits: 2, name: 0x3}, +{bits: 5, name: 0xc}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x4}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'rs2'}, +{bits: 7, name: 0x14}, ]} .... -Description:: -The xperm8 instruction operates on bytes. The rs1 register contains a vector of XLEN/8 8-bit elements. The -rs2 register contains a vector of XLEN/8 8-bit indexes. The result is each element in rs2 replaced by the -indexed element in rs1, or zero if the index into rs2 is out of bounds. +Description:: +The xperm8 instruction operates on bytes. +The _rs1_ register contains a vector of XLEN/8 8-bit elements. +The _rs2_ register contains a vector of XLEN/8 8-bit indexes. +The result is each element in _rs2_ replaced by the indexed element in _rs1_, +or zero if the index into _rs2_ is out of bounds. Operation:: [source,sail] -- val xperm8_lookup : (bits(8), xlenbits) -> bits(8) function xperm8_lookup (idx, lut) = { -(lut >> (idx @ 0b000))[7..0] + (lut >> (idx @ 0b000))[7..0] } -function clause execute ( XPERM_8 (rs2,rs1,rd)) = { -result : xlenbits = EXTZ(0b0); -foreach(i from 0 to xlen by 8) { -result[i+7..i] = xperm8_lookup(X(rs2)[i+7..i], X(rs1)); -}; -X(rd) = result; -RETIRE_SUCCESS + +function clause execute ( XPERM8 (rs2,rs1,rd)) = { + result : xlenbits = EXTZ(0b0); + foreach(i from 0 to xlen by 8) { + result[i+7..i] = xperm8_lookup(X(rs2)[i+7..i], X(rs1)); + }; + X(rd) = result; + RETIRE_SUCCESS } -- @@ -3488,18 +3390,18 @@ Included in:: |Minimum version |Lifecycle state -|Zbkx (<<#zbkx-sc>>) -|v1.0.0-rc4 +|Zbkx (<<#zbkx>>) +|v1.0 |Ratified |=== <<< -[#insns-xperm4,reftext="Crossbar permutation (nibbles)"] +[#insns-xperm4-sc,reftext="Crossbar permutation (nibbles)"] ==== xperm4 Synopsis:: -Nibble-wise lookup of indicies into a vector. +Nibble-wise lookup of indices into a vector. Mnemonic:: xperm4 _rd_, _rs1_, _rs2_ @@ -3508,35 +3410,38 @@ Encoding:: [wavedrom, , svg] .... {reg:[ - { bits: 2, name: 0x3 }, - { bits: 5, name: 0xC }, - { bits: 5, name: 'rd'}, - { bits: 3, name: 0x2 }, - { bits: 5, name: 'rs1' }, - { bits: 5, name: 'rs2' }, - { bits: 7, name: 0x14 }, +{bits: 2, name: 0x3}, +{bits: 5, name: 0xc}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 0x2}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'rs2'}, +{bits: 7, name: 0x14}, ]} .... -Description:: -The xperm4 instruction operates on nibbles. The rs1 register contains a vector of XLEN/4 4-bit elements. -The rs2 register contains a vector of XLEN/4 4-bit indexes. The result is each element in rs2 replaced by the -indexed element in rs1, or zero if the index into rs2 is out of bounds. +Description:: +The xperm4 instruction operates on nibbles. +The _rs1_ register contains a vector of XLEN/4 4-bit elements. +The _rs2_ register contains a vector of XLEN/4 4-bit indexes. +The result is each element in _rs2_ replaced by the indexed element in _rs1_, +or zero if the index into _rs2_ is out of bounds. Operation:: [source,sail] -- val xperm4_lookup : (bits(4), xlenbits) -> bits(4) function xperm4_lookup (idx, lut) = { -(lut >> (idx @ 0b00))[3..0] + (lut >> (idx @ 0b00))[3..0] } -function clause execute ( XPERM_4 (rs2,rs1,rd)) = { -result : xlenbits = EXTZ(0b0); -foreach(i from 0 to xlen by 4) { -result[i+3..i] = xperm4_lookup(X(rs2)[i+3..i], X(rs1)); -}; -X(rd) = result; -RETIRE_SUCCESS + +function clause execute ( XPERM4 (rs2,rs1,rd)) = { + result : xlenbits = EXTZ(0b0); + foreach(i from 0 to xlen by 4) { + result[i+3..i] = xperm4_lookup(X(rs2)[i+3..i], X(rs1)); + }; + X(rd) = result; + RETIRE_SUCCESS } -- @@ -3547,8 +3452,8 @@ Included in:: |Minimum version |Lifecycle state -|Zbkx (<<#zbkx-sc>>) -|v1.0.0-rc4 +|Zbkx (<<#zbkx>>) +|v1.0 |Ratified |=== @@ -3558,8 +3463,8 @@ Included in:: ==== zip Synopsis:: -Gather odd and even bits of the source word into upper/lower halves of the -destination. +Interleave upper and lower halves of the source register into odd and even +bits of the destination register, respectively. Mnemonic:: zip _rd_, _rs_ @@ -3573,14 +3478,14 @@ Encoding:: {bits: 5, name: 'rd'}, {bits: 3, name: 0x1}, {bits: 5, name: 'rs1'}, -{bits: 5, name: 0x1e}, +{bits: 5, name: 0xf}, {bits: 7, name: 0x4}, ]} .... -Description:: -This instruction scatters all of the odd and even bits of a source word into -the high and low halves of a destination word. +Description:: +This instruction gathers bits from the high and low halves of the source +word into odd/even bit positions in the destination word. It is the inverse of the <<insns-unzip-sc,unzip>> instruction. This instruction is available only on RV32. @@ -3619,7 +3524,7 @@ Included in:: [[crypto_scalar_es]] === Entropy Source -The `seed` CSR provides an interface to a NIST SP 800-90B cite:[TuBaKe:18] +The `seed` CSR provides an interface to a NIST SP 800-90B cite:[TuBaKe:18] or BSI AIS-31 cite:[KiSc11] compliant physical Entropy Source (ES). An entropy source, by itself, is not a cryptographically secure Random @@ -3637,7 +3542,7 @@ detailed suggestions on how the entropy source output can be used. [[crypto_scalar_seed_csr]] ==== The `seed` CSR -`seed` is an unprivileged CSR located at address `0x015`. +`seed` is an unprivileged CSR located at address `0x015`. The 32-bit contents of `seed` are as follows: [%autowidth.stretch,cols="^,^,<",options="header",] @@ -3654,15 +3559,15 @@ The 32-bit contents of `seed` are as follows: |`15: 0` |`entropy` |16 bits of randomness, only when `OPST=ES16`. |======================================================================= -The `seed` CSR must be accessed with a read-write instruction. A read-only -instruction such as `CSRRS/CSRRC` with `rs1=x0` or `CSRRSI/CSRRCI` with -`uimm=0` will raise an illegal instruction exception. +Attempts to access the `seed` CSR using a read-only CSR-access instruction +(`CSRRS`/`CSRRC` with _rs1_=`x0` or `CSRRSI`/`CSRRCI` with _uimm_=0) raise an +illegal-instruction exception; any other CSR-access instruction may be used +to access `seed`. The write value (in `rs1` or `uimm`) must be ignored by implementations. The purpose of the write is to signal polling and flushing. -The instruction `csrrw rd, seed, x0` can be used for fetching seed status -and entropy values. It is available on both RV32 and RV64 base architectures -and will zero-extend the 32-bit word to XLEN bits. +Software normally uses the instruction `csrrw rd, seed, x0` to read the `seed` +CSR. Encoding:: [wavedrom, , svg] @@ -3677,7 +3582,7 @@ Encoding:: .... The `seed` CSR is also access controlled by execution mode, and attempted -read or write access will raise an illegal instruction exception outside M mode +read or write access will raise an illegal-instruction exception outside M mode unless access is explicitly granted. See <<crypto_scalar_es_access>> for more details. @@ -3756,18 +3661,18 @@ An implementation of the entropy source should meet at least one of the following requirements sets in order to be considered a secure and safe design: -* <<crypto_scalar_es_req_90b>>: A physical entropy source meeting - NIST SP 800-90B cite:[TuBaKe:18] criteria with evaluated min-entropy - of 192 bits for each 256 output bits (min-entropy rate 0.75). +* <<crypto_scalar_es_req_90b>>: A physical entropy source meeting + NIST SP 800-90B cite:[TuBaKe:18] criteria with evaluated min-entropy + of 192 bits for each 256 output bits (min-entropy rate 0.75). + +* <<crypto_scalar_es_req_ptg2>>: A physical entropy source meeting the + AIS-31 PTG.2 cite:[KiSc11] criteria, implying average Shannon entropy + rate 0.997. The source must also meet the NIST 800-90B + min-entropy rate 192/256 = 0.75. -* <<crypto_scalar_es_req_ptg2>>: A physical entropy source meeting the - AIS-31 PTG.2 cite:[KiSc11] criteria, implying average Shannon entropy - rate 0.997. The source must also meet the NIST 800-90B - min-entropy rate 192/256 = 0.75. - -* <<crypto_scalar_es_req_virt>>: A virtual entropy source is a DRBG - seeded from a physical entropy source. It must have at least a - 256-bit (Post-Quantum Category 5) internal security level. +* <<crypto_scalar_es_req_virt>>: A virtual entropy source is a DRBG + seeded from a physical entropy source. It must have at least a + 256-bit (Post-Quantum Category 5) internal security level. All implementations must signal initialization, test mode, and health alarms as required by respective standards. This may require the implementer @@ -3778,17 +3683,17 @@ an example of which is described in <<crypto_scalar_es_getnoise>> [[crypto_scalar_es_req_90b]] ===== NIST SP 800-90B / FIPS 140-3 Requirements -All NIST SP 800-90B cite:[TuBaKe:18] required components and health test -mechanisms must be implemented. +All NIST SP 800-90B cite:[TuBaKe:18] required components and health test +mechanisms must be implemented. The entropy requirement is satisfied if 128 bits of _full entropy_ can be obtained from each 256-bit (16*16 -bit) successful, but possibly non-consecutive `entropy` (ES16) output sequence using a vetted conditioning algorithm such as a cryptographic hash (See Section 3.1.5.1.1, SP 800-90B cite:[TuBaKe:18]). In practice, a min-entropy rate of 0.75 or larger is -required for this. +required for this. -Note that 128 bits of estimated input min-entropy does not yield 128 bits of +Note that 128 bits of estimated input min-entropy does not yield 128 bits of conditioned, full entropy in SP 800-90B/C evaluation. Instead, the implication is that every 256-bit sequence should have min-entropy of at least 128+64 = 192 bits, as discussed in SP 800-90C cite:[BaKeRo:21]; @@ -3809,7 +3714,7 @@ at least 512 `entropy` bits to initialize a DRBG that has 256-bit security. [[crypto_scalar_es_req_ptg2]] ===== BSI AIS-31 PTG.2 / Common Criteria Requirements -For alternative Common Criteria certification (or self-certification), +For alternative Common Criteria certification (or self-certification), AIS 31 PTG.2 class cite:[KiSc11] (Sect. 4.3.) required hardware components and mechanisms must be implemented. In addition to AIS-31 PTG.2 randomness requirements (Shannon entropy rate of @@ -3830,7 +3735,7 @@ A virtual source is not a physical entropy source but provides additional protection against covert channels, depletion attacks, and host identification in operating environments that can not be entirely trusted with direct access to a hardware resource. Despite limited trust, -implementors should try to guarantee that even such environments have +implementers should try to guarantee that even such environments have sufficient entropy available for secure cryptographic operations. A virtual source traps access to the `seed` CSR, emulates it, or @@ -3865,17 +3770,17 @@ available to other modes via the `mseccfg.sseed` and `mseccfg.useed` access control bits. `sseed` is bit `9` of and `useed` is bit `8` of the `mseccfg` CSR. Without the corresponding access control bit set to 1, any attempted -access to `seed` from U, S, or HS modes will raise an illegal instruction -exception. +access to `seed` from U, S, or HS modes will raise an illegal-instruction +exception. VS and VU modes are present in systems with Hypervisor (H) extension implemented. If desired, a hypervisor can emulate accesses to the seed CSR from a virtual machine. Attempted access to `seed` from virtual modes VS and VU always raises an exception; a read-only instruction causes an -illegal instruction exception, while a read-write instruction (that can -potentially be emulated) causes a virtual instruction exception only if +illegal-instruction exception, while a read-write instruction (that can +potentially be emulated) causes a virtual-instruction exception only if `mseccfg.sseed=1`. Note that `mseccfg.useed` has no effect on the exception -type for either VS or VU modes. +type for either VS or VU modes. .Entropy Source Access Control. @@ -3888,12 +3793,12 @@ type for either VS or VU modes. | `*` | The `seed` CSR is always available in machine mode as normal (with a CSR read-write instruction.) Attempted read without a write raises an -illegal instruction exception regardless of mode and access control bits. +illegal-instruction exception regardless of mode and access control bits. | U | `*` | `0` -| Any `seed` CSR access raises an illegal instruction exception. +| Any `seed` CSR access raises an illegal-instruction exception. | U | `*` @@ -3903,7 +3808,7 @@ illegal instruction exception regardless of mode and access control bits. | S/HS | `0` | `*` -| Any `seed` CSR access raises an illegal instruction exception. +| Any `seed` CSR access raises an illegal-instruction exception. | S/HS @@ -3914,13 +3819,13 @@ illegal instruction exception regardless of mode and access control bits. | VS/VU | `0` | `*` -| Any `seed` CSR access raises an illegal instruction exception. +| Any `seed` CSR access raises an illegal-instruction exception. | VS/VU | `1` | `*` -| A read-write `seed` access raises a virtual instruction exception, -while other access conditions raise an illegal instruction exception. +| A read-write `seed` access raises a virtual-instruction exception, +while other access conditions raise an illegal-instruction exception. |======================================================================= @@ -3929,10 +3834,10 @@ Systems should implement carefully considered access control policies from lower privilege modes to physical entropy sources. The system can trap attempted access to `seed` and feed a less privileged client _virtual entropy source_ data (<<crypto_scalar_es_req_virt>>) instead of -invoking an SP 800-90B (<<crypto_scalar_es_req_90b>>) or PTG.2 +invoking an SP 800-90B (<<crypto_scalar_es_req_90b>>) or PTG.2 (<<crypto_scalar_es_req_ptg2>>) _physical entropy source_. Emulated `seed` data generation is made with an appropriately seeded, secure software DRBG. -See <<crypto_scalar_appx_es_access>> for security considerations related +See <<crypto_scalar_appx_es_access>> for security considerations related to direct access to entropy sources. Implementations may implement `mseccfg` such that `[s,u]seed` is a read-only @@ -3995,10 +3900,10 @@ The Zkt extension explicitly states many of the common latency assumptions made by cryptography developers. Vendors do not have to implement all of the list's instructions to be Zkt -compliant; however, if they claim to have Zkt and implement any of the listed instructions, it must have data-independent latency. +compliant; however, if they claim to have Zkt and implement any of the listed instructions, it must have data-independent latency. -For example, many simple RV32I and RV64I cores (without Multiply, Compressed, -Bitmanip, or Cryptographic extensions) are technically compliant with Zkt. +For example, many simple RV32I and RV64I cores (without Multiply, Compressed, +Bitmanip, or Cryptographic extensions) are technically compliant with Zkt. A constant-time AES can be implemented on them using "bit-slice" techniques, but it will be excruciatingly slow when compared to implementation with AES instructions. There are no guarantees that even a bit-sliced cipher @@ -4059,14 +3964,14 @@ modifying any other variable/register turns that into a secret too. If a secret ends up in address calculation affecting a load or store, that is a violation. If a secret affects a branch's condition, that is also a violation. A secret variable location or register becomes a non-secret via -specific zeroization/sanitisation or by being declared ciphertext +specific zeroization/sanitisation or by being declared ciphertext (or otherwise no-longer-secret information). In essence, secrets can only "touch" instructions on the Zkt list while they are secrets. ==== Specific Instruction Rationale -* HINT instruction forms (typically encodings with `rd=x0`) are excluded from -the data-independent time requirement. +* HINT instruction forms (typically encodings with _rd_=`x0`) are excluded from +the data-independent time requirement. * Floating point (F, D, Q, L extensions) are currently excluded from the constant-time requirement as they have very few applications in standardised cryptography. We may consider adding floating point add, sub, multiply as a @@ -4111,7 +4016,7 @@ Rather, every one of these instructions that the core does implement must adhere to the requirements of `Zkt`. ==== -===== RVI (Base Instruction Set) +===== RVI (Base Instruction Set) Only basic arithmetic and `slt*` (for carry computations) are included. The data-independent timing requirement does not apply to HINT instruction @@ -4156,7 +4061,7 @@ encoding forms of these instructions. | | ✓ | sraw _rd_, _rs1_, _rs2_ | <<insns-sraw>> |=== -===== RVM (Multiply) +===== RVM (Multiply) Multiplication is included; division and remaindering excluded. @@ -4174,7 +4079,7 @@ Multiplication is included; division and remaindering excluded. | | ✓ | mulw _rd_, _rs1_, _rs2_ | <<insns-mulw>> |=== -===== RVC (Compressed) +===== RVC (Compressed) Same criteria as in RVI. Organised by quadrants. @@ -4203,7 +4108,7 @@ Same criteria as in RVI. Organised by quadrants. | ✓ | ✓ | c.add | <<insns-c_add>> |=== -===== RVK (Scalar Cryptography) +===== RVK (Scalar Cryptography) All K-specific instructions are included. Additionally, `seed` CSR latency should be independent of `ES16` state output @@ -4249,7 +4154,7 @@ See <<crypto_scalar_appx_es_access>>. |=== -===== RVB (Bitmanip) +===== RVB (Bitmanip) The <<zbkb-sc>>, <<zbkc-sc>> and <<zbkx-sc>> extensions are included in their entirety. @@ -4269,8 +4174,8 @@ specific instances of `grevi`, `shfli` and `unshfli` respectively. | ✓ | ✓ | clmul | <<insns-clmul-sc>> | ✓ | ✓ | clmulh | <<insns-clmulh-sc>> -| ✓ | ✓ | xperm4 | <<insns-xperm4>> -| ✓ | ✓ | xperm8 | <<insns-xperm8>> +| ✓ | ✓ | xperm4 | <<insns-xperm4-sc>> +| ✓ | ✓ | xperm8 | <<insns-xperm8-sc>> | ✓ | ✓ | ror | <<insns-ror-sc>> | ✓ | ✓ | rol | <<insns-rol-sc>> | ✓ | ✓ | rori | <<insns-rori-sc>> @@ -4283,7 +4188,7 @@ specific instances of `grevi`, `shfli` and `unshfli` respectively. | ✓ | ✓ | pack | <<insns-pack-sc>> | ✓ | ✓ | packh | <<insns-packh-sc>> | | ✓ | packw | <<insns-packw-sc>> -| ✓ | ✓ | brev8 | <<insns-brev8>> +| ✓ | ✓ | brev8 | <<insns-brev8-sc>> | ✓ | ✓ | rev8 | <<insns-rev8-sc>> | ✓ | | zip | <<insns-zip-sc>> | ✓ | | unzip | <<insns-unzip-sc>> @@ -4332,7 +4237,7 @@ contributed to the RISC-V cryptography extension. Many of the primitive operations used in symmetric key cryptography and cryptographic hash functions are well supported by the -RISC-V Bitmanip cite:[riscv:bitmanip:repo] extensions. +RISC-V Bitmanip extensions (see <<bits>>). NOTE: This section repeats much of the information in <<zbkb-sc>>, @@ -4362,8 +4267,7 @@ RV32, RV64: RV64 only: rori rd, rs1, imm roriw rd, rs1, imm ---- -See cite:[riscv:bitmanip:draft] (Section 3.1.1) for details of -these instructions. +See <<zbkb>> for details of these instructions. .Notes to software developers [NOTE,caption="SH"] @@ -4381,24 +4285,12 @@ class of block ciphers and stream ciphers. ===== Bit & Byte Permutations ---- -RV32: - brev8 rd, rs1 // grevi rd, rs1, 7 - Reverse bits in bytes - rev8 rd, rs1 // grevi rd, rs1, 24 - Reverse bytes in 32-bit word - -RV64: - brev8 rd, rs1 // grevi rd, rs1, 7 - Reverse bits in bytes - rev8 rd, rs1 // grevi rd, rs1, 56 - Reverse bytes in 64-bit word +RV32, RV64: + brev8 rd, rs1 + rev8 rd, rs1 ---- -The scalar cryptography extension provides the following instructions for -manipulating the bit and byte endianness of data. -They are all parameterisations of the Generalised Reverse with Immediate -(`grevi` instruction. -The scalar cryptography extension requires _only_ the above instances -of `grevi` be implemented, which can be invoked via their pseudoinstructions. - -The full specification of the `grevi` instruction is available in -cite:[riscv:bitmanip:draft] (Section 2.2.2). +See <<zbkb>> for details of these instructions. .Notes to software developers [NOTE,caption="SH"] @@ -4411,19 +4303,11 @@ of Galois/Counter Mode (GCM) cite:[nist:gcm]. ---- RV32: - zip rd, rs1 // shfli rd, rs1, 15 - Bit interleave - unzip rd, rs1 // unshfli rd, rs1, 15 - Bit de-interleave + zip rd, rs1 + unzip rd, rs1 ---- -The `zip` and `unzip` pseudoinstructions are specific instances of -the more general `shfli` and `unshfli` instructions. -The scalar cryptography extension requires _only_ the above instances -of `[un]shfli` be implemented, which can be invoked via their -pseudoinstructions. -Only RV32 implementations require these instructions. - -The full specification of the `shfli` instruction is available in -cite:[riscv:bitmanip:draft] (Section 2.2.3). +See <<zbkb>> for details of these instructions. .Notes to software developers [NOTE,caption="SH"] @@ -4444,8 +4328,7 @@ RV32, RV64: clmulh rd, rs1, rs2 ---- -See cite:[riscv:bitmanip:draft] (Section 2.6) for details of -this instruction. +See <<zbkc>> for details of these instructions. See <<crypto_scalar_zkt>> for additional implementation requirements for these instructions, related to data independent execution latency. @@ -4469,8 +4352,7 @@ RV32, RV64: xnor rd, rs1, rs2 ---- -See cite:[riscv:bitmanip:draft] (Section 2.1.3) for details of -these instructions. +See <<zbkb>> for details of these instructions. These instructions are useful inside hash functions, block ciphers and for implementing software based side-channel countermeasures like masking. The `andn` instruction is also useful for constant time word-select @@ -4488,13 +4370,12 @@ Software based power/EM side-channel countermeasures based on masking. ===== Packing ---- -RV32, RV64: RV64: +RV32, RV64: RV64: pack rd, rs1, rs2 packw rd, rs1, rs2 packh rd, rs1, rs2 ---- -See cite:[riscv:bitmanip:draft] (Section 2.1.4) for details of -these instructions. +See <<zbkb>> for details of these instructions. .Notes to software developers [NOTE,caption="SH"] @@ -4517,8 +4398,7 @@ RV32, RV64: xperm8 rd, rs1, rs2 ---- -See cite:[riscv:bitmanip:draft] (Section 2.2.4) for a complete -description of this instruction. +See <<zbkx>> for a complete description of these instructions. The `xperm4` instruction operates on nibbles. `GPR[rs1]` contains a vector of `XLEN/4` 4-bit elements. @@ -4550,7 +4430,7 @@ Skinny, MANTIS cite:[block:skinny], Midori cite:[block:midori]. National ciphers using 8-bit SBoxes include: -Camellia cite:[block:camellia] (Japan), +Camellia cite:[block:camellia] (Japan), Aria cite:[block:aria] (Korea), AES cite:[nist:fips:197] (USA, Belgium), SM4 cite:[gbt:sm4] (China) @@ -4572,7 +4452,7 @@ self-certification, and implementation aspects of entropy sources. Hence we also discuss non-ISA system features that may be needed for cryptographic standards compliance and security testing. -==== Checklists for Design and Self-Certification +==== Checklists for Design and Self-Certification The security of cryptographic systems is based on secret bits and keys. These bits need to be random and originate from cryptographically secure @@ -4584,47 +4464,47 @@ designs, RISC-V expects that they behave in a compatible manner and do not create unnecessary security risks to users. Self-evaluation and testing following appropriate security standards is usually needed to achieve this. -* *ISA Architectural Tests.* Verify, to the extent possible, that RISC-V ISA - requirements in this specification are correctly implemented. This includes - the state transitions (<<crypto_scalar_es>> and - <<crypto_scalar_es_getnoise>>), access control - (<<crypto_scalar_es_access>>), and that `seed` ES16 `entropy` words - can only be read destructively. - The scope of RISC-V ISA architectural tests are those behaviors that - are independent of the physical entropy source details. A smoke test ES - module may be helpful in design phase. -* *Technical justification for entropy.* This may take the form of a - stochastic model or a heuristic argument that explains why the noise - source output is from a random, rather than pseudorandom (deterministic) - process, and is not easily predictable or externally observable. - A complete physical model is not necessary; research literature can be - cited. For example, one can show that a good ring oscillator noise derives - an amount of physical entropy from local, spontaneously occurring - Johnson-Nyquist thermal noise cite:[Sa21], and is therefore not merely - "random-looking". -* *Entropy Source Design Review.* An entropy source is more than a noise - source, and must have features such as health tests - (<<crypto_scalar_es_security_controls>>), - a conditioner (<<crypto_scalar_appx_es_intro-cond>>), and a security - boundary with clearly defined interfaces. One may tabulate the SHALL - statements of SP 800-90B cite:[TuBaKe:18], FIPS 140-3 Implementation - Guidance cite:[NICC21], AIS-31 cite:[KiSc11] or other standards being - used. Official and non-official checklist tables are available: - https://github.com/usnistgov/90B-Shall-Statements -* *Experimental Tests.* The raw noise source is subjected to entropy - estimation as defined in NIST 800-90B, Section 3 cite:[TuBaKe:18]. +* *ISA Architectural Tests.* Verify, to the extent possible, that RISC-V ISA + requirements in this specification are correctly implemented. This includes + the state transitions (<<crypto_scalar_es>> and + <<crypto_scalar_es_getnoise>>), access control + (<<crypto_scalar_es_access>>), and that `seed` ES16 `entropy` words + can only be read destructively. + The scope of RISC-V ISA architectural tests are those behaviors that + are independent of the physical entropy source details. A smoke test ES + module may be helpful in design phase. +* *Technical justification for entropy.* This may take the form of a + stochastic model or a heuristic argument that explains why the noise + source output is from a random, rather than pseudorandom (deterministic) + process, and is not easily predictable or externally observable. + A complete physical model is not necessary; research literature can be + cited. For example, one can show that a good ring oscillator noise derives + an amount of physical entropy from local, spontaneously occurring + Johnson-Nyquist thermal noise cite:[Sa21], and is therefore not merely + "random-looking". +* *Entropy Source Design Review.* An entropy source is more than a noise + source, and must have features such as health tests + (<<crypto_scalar_es_security_controls>>), + a conditioner (<<crypto_scalar_appx_es_intro-cond>>), and a security + boundary with clearly defined interfaces. One may tabulate the SHALL + statements of SP 800-90B cite:[TuBaKe:18], FIPS 140-3 Implementation + Guidance cite:[NICC21], AIS-31 cite:[KiSc11] or other standards being + used. Official and non-official checklist tables are available: + https://github.com/usnistgov/90B-Shall-Statements +* *Experimental Tests.* The raw noise source is subjected to entropy + estimation as defined in NIST 800-90B, Section 3 cite:[TuBaKe:18]. The interface described in <<crypto_scalar_es_getnoise>> can used be to record datasets for this purpose. One also needs to show experimentally that the conditioner and health test components work appropriately to meet the ES16 output entropy requirements of <<crypto_scalar_es_req>>. - For SP 800-90B, NIST has made a min-entropy estimation - package freely available: - https://github.com/usnistgov/SP800-90B_EntropyAssessment -* **Resilience.** Above physical engineering steps should consider the - operational environment of the device, which may be unexpected or - hostile (actively attempting to exploit vulnerabilities in the design). - -See <<crypto_scalar_appx_es_implementation>> for a discussion of various + For SP 800-90B, NIST has made a min-entropy estimation + package freely available: + https://github.com/usnistgov/SP800-90B_EntropyAssessment +* **Resilience.** Above physical engineering steps should consider the + operational environment of the device, which may be unexpected or + hostile (actively attempting to exploit vulnerabilities in the design). + +See <<crypto_scalar_appx_es_implementation>> for a discussion of various implementation options. NOTE: It is one of the goals of the RISC-V Entropy Source specification @@ -4633,7 +4513,7 @@ from a third party and integrated with a RISC-V processor design. Compared to older (FIPS 140-2) RNG and DRBG modules, an entropy source module may have a relatively small area (just a few thousand NAND2 gate equivalent). CMVP is introducing an "Entropy Source Validation Scope" which potentially -allows 90B validations to be re-used for different (FIPS 140-3) modules. +allows 90B validations to be reused for different (FIPS 140-3) modules. ==== Standards and Terminology @@ -4664,18 +4544,18 @@ entropy source. ===== Entropy Source (ES) Entropy sources are built by sampling and processing data from a noise -source (<<crypto_scalar_appx_es_noise_sources>>). +source (<<crypto_scalar_appx_es_noise_sources>>). We will only consider physical sources of true randomness in this work. Since these are directly based on natural phenomena and are subject to -environmental conditions (which may be adversarial), they require features -that monitor the "health" and quality of those sources. +environmental conditions (which may be adversarial), they require features +that monitor the "health" and quality of those sources. The requirements for physical entropy sources are specified in NIST SP 800-90B cite:[TuBaKe:18] (<<crypto_scalar_es_req_90b>>) for U.S. Federal FIPS 140-3 cite:[NI19] evaluations and in BSI AIS-31 cite:[KiSc01,KiSc11] (<<crypto_scalar_es_req_ptg2>>) for high-security Common Criteria evaluations. -There is some divergence in the types of health tests and entropy metrics +There is some divergence in the types of health tests and entropy metrics mandated in these standards, and RISC-V enables support for both alternatives. [[crypto_scalar_appx_es_intro-cond]] @@ -4731,7 +4611,9 @@ refreshed (reseeded) for forward and backward security. ==== Specific Rationale and Considerations -===== (<<crypto_scalar_seed_csr>>) The `seed` CSR +===== The `seed` CSR + +See <<crypto_scalar_seed_csr>>. The interface was designed to be simple so that a vendor- and device-independent driver component (e.g., in Linux kernel, @@ -4751,12 +4633,12 @@ a write operation on this particular CSR. A blocking instruction may have been easier to use, but most users should be querying a (D)RBG instead of an entropy source. Without a polling-style mechanism, the entropy source could hang for -thousands of cycles under some circumstances. A `wfi` ot `pause` +thousands of cycles under some circumstances. A `wfi` or `pause` mechanism (at least potentially) allows energy-saving sleep on MCUs and context switching on higher-end CPUs. The reason for the particular `OPST = seed[31:0]` two-bit mechanism is to -provide redundancy. The "fault" bit combinations `11` (`DEAD`) and `00` +provide redundancy. The "fault" bit combinations `11` (`DEAD`) and `00` (`BIST`) are more likely for electrical reasons if feature discovery fails and the entropy source is actually not available. @@ -4767,13 +4649,15 @@ conditioning discussed in <<crypto_scalar_appx_es_crypto-cond>>), and the desire to have all of the bits "in the same place" on both RV32 and RV64 architectures for programming convenience. -===== (<<crypto_scalar_es_req_90b>>) NIST SP 800-90B +===== NIST SP 800-90B + +See <<crypto_scalar_es_req_90b>>. SP 800-90C cite:[BaKeRo:21] states that each conditioned block of n bits is required to have n+64 bits of input entropy to attain full entropy. Hence NIST SP 800-90B cite:[TuBaKe:18] min-entropy assessment must guarantee at least 128 + 64 = 192 bits input entropy per 256-bit block -( cite:[BaKeRo:21], Sections 4.1. and 4.3.2 ). +(cite:[BaKeRo:21], Sections 4.1. and 4.3.2). Only then a hashing of 16 * 16 = 256 bits from the entropy source will produce the desired 128 bits of full entropy. This follows from the specific requirements, threat model, and distinguishability proof @@ -4798,7 +4682,9 @@ Section 4.4 of cite:[TuBaKe:18]: the repetition count test and adaptive proportion test, or show that the same flaws will be detected by vendor-defined tests. -===== (<<crypto_scalar_es_req_ptg2>>) BSI AIS-31 +===== BSI AIS-31 + +See <<crypto_scalar_es_req_ptg2>>. PTG.2 is one of the security and functionality classes defined in BSI AIS 20/31 cite:[KiSc11]. The PTG.2 source requirements work as a @@ -4831,7 +4717,9 @@ PTG.2 modules built and certified to the AIS-31 standard can also meet the "full entropy" condition after 2:1 cryptographic conditioning, but not necessarily so. The technical validation process is somewhat different. -===== (<<crypto_scalar_es_req_virt>>) Virtual Sources +===== Virtual Sources + +<<crypto_scalar_es_req_virt>>. All sources that are not direct physical sources (meeting the SP 800-90B or the AIS-31 PTG.2 requirements) need to meet the security requirements @@ -4847,7 +4735,9 @@ standards and applications. The 256-bit requirement maps to in Suite B and the newer U.S. Government CNSA Suite cite:[NS15]. [[crypto_scalar_appx_es_access]] -===== (<<crypto_scalar_es_access>>) Security Considerations for Direct Hardware Access +===== Security Considerations for Direct Hardware Access + +<<crypto_scalar_es_access>>. The ISA implementation and system design must try to ensure that the hardware-software interface minimizes avenues for adversarial @@ -4855,7 +4745,7 @@ information flow even if not explicitly forbidden in the specification. For security, virtualization requires both conditioning and DRBG processing of physical entropy output. It is recommended if a single physical entropy -source is shared between multiple different virtual machnies or if the +source is shared between multiple different virtual machines or if the guest OS is untrusted. A virtual entropy source is significantly more resistant to depletion attacks and also lessens the risk from covert channels. @@ -5019,7 +4909,7 @@ To guarantee that no sensitive data is read twice and that different callers don’t get correlated output, it is required that hardware implements _wipe-on-read_ on the randomness pathway during each read (successful poll). For the same reasons, only complete and fully -processed random words shall be made available via `entropy` (ES16 status +processed random words shall be made available via `entropy` (ES16 status of `seed`). This also applies to the raw noise source. The raw source interface has @@ -5031,7 +4921,7 @@ operational at the same time. The noise source state shall be protected from adversarial knowledge or influence to the greatest extent possible. The methods used for this shall be documented, including a description of the -(conceptual) security boundarys role in protecting the noise source +(conceptual) security boundary's role in protecting the noise source from adversarial observation or influence. An entropy source is a singular resource, subject to depletion @@ -5049,7 +4939,7 @@ additional suggestions so that portable, vendor-independent middleware and kernel components can be created. The actual hardware implementation and certification are left to vendors and circuit designers; the discussion in this Section is purely informational. - + When considering implementation options and trade-offs, one must look at the entire information flow. @@ -5346,10 +5236,10 @@ information-theoretic assumptions only. Compliance testing, characterization, and configuration of entropy sources require access to raw, unconditioned noise samples. This conceptual test -interface is named GetNoise in Section 2.3.2 of NIST SP 800-90B +interface is named GetNoise in Section 2.3.2 of NIST SP 800-90B cite:[TuBaKe:18]. -Since this type of interface is both necessary for security testing +Since this type of interface is both necessary for security testing and also constitutes a potential backdoor to the cryptographic key generation process, we define a safety behavior that compliant implementations can have for temporarily disabling the entropy source `seed` CSR interface during @@ -5357,10 +5247,10 @@ test. In order for shared RISC-V self-certification scripts (and drivers) to accommodate the test interface in a secure fashion, we suggest that it is -implemented as a custom, M-mode only CSR, denoted here as `mnoise`. +implemented as a custom, M-mode only CSR, denoted here as `mnoise`. This non-normative interface is not intended to be used as a source of -randomness or for other production use. +randomness or for other production use. We define the semantics for single bit for this interface, `mnoise[31]`, which is named `NOISE_TEST`, which will affect the behavior of `seed` if implemented. @@ -5397,13 +5287,13 @@ security implications. ==== [[crypto_scalar_appx_materials]] -=== Supplementary Materials +=== Supplementary Materials While this document contains the specifications for the RISC-V cryptography extensions, numerous supplementary materials and example codes have also been developed. All of the materials related to the RISC-V Cryptography -extension live in a Github Repository, located at +extension live in a GitHub Repository, located at https://github.com/riscv/riscv-crypto * `doc/` @@ -5428,12 +5318,12 @@ https://github.com/riscv/riscv-crypto This section contains the supporting Sail code referenced by the instruction descriptions throughout the specification. The -link:https://github.com/rems-project/sail/blob/sail2/manual.pdf[Sail Manual] +link:https://alasdair.github.io/manual.html[Sail Manual] is recommended reading in order to best understand the supporting code. [source,sail] ---- -/* Auxiliary function for performing GF multiplicaiton */ +/* Auxiliary function for performing GF multiplication */ val xt2 : bits(8) -> bits(8) function xt2(x) = { (x << 1) ^ (if bit_to_bool(x[7]) then 0x1b else 0x00) @@ -5451,13 +5341,13 @@ function gfmul( x, y) = { (if bit_to_bool(y[3]) then xt2(xt2(xt2(x))) else 0x00) } -/* 8-bit to 32-bit partial AES Mix Colum - forwards */ +/* 8-bit to 32-bit partial AES Mix Column - forwards */ val aes_mixcolumn_byte_fwd : bits(8) -> bits(32) function aes_mixcolumn_byte_fwd(so) = { gfmul(so, 0x3) @ so @ so @ gfmul(so, 0x2) } -/* 8-bit to 32-bit partial AES Mix Colum - inverse*/ +/* 8-bit to 32-bit partial AES Mix Column - inverse*/ val aes_mixcolumn_byte_inv : bits(8) -> bits(32) function aes_mixcolumn_byte_inv(so) = { gfmul(so, 0xb) @ gfmul(so, 0xd) @ gfmul(so, 0x9) @ gfmul(so, 0xe) diff --git a/src/smcdeleg.adoc b/src/smcdeleg.adoc index fd0be2a..e9db74d 100644 --- a/src/smcdeleg.adoc +++ b/src/smcdeleg.adoc @@ -1,5 +1,5 @@ [[smcdeleg]] -== "Smcdeleg" Counter Delegation Extension, Version 1.0 +== "Smcdeleg/Ssccfg" Counter Delegation Extensions, Version 1.0 In modern “Rich OS” environments, hardware performance monitoring resources are managed by the kernel, kernel driver, and/or hypervisor. @@ -14,10 +14,10 @@ counter(s) . Context switch, between processes, threads, containers, or virtual machines -This extension provides a means for M-mode to allow writing select +These extensions provide a means for M-mode to allow writing select counters and event selectors from S/HS-mode. The purpose is to avert transitions to and from M-mode that add latency to these performance -critical supervisor/hypervisor code sections. This extension also +critical supervisor/hypervisor code sections. These extensions also defines one new CSR, scountinhibit. For a Machine-level environment, extension *Smcdeleg* (‘Sm’ for @@ -27,13 +27,14 @@ modifications for a hart, over all privilege levels. For a Supervisor-level environment, extension *Ssccfg* (‘Ss’ for Privileged architecture and Supervisor-level extension, ‘ccfg’ for Counter Configuration) provides access to delegated counters, and to new -supervisor-level state. +supervisor-level state. For a RISC-V hardware platform, Smcdeleg and +Ssccfg must always be implemented in tandem. === Counter Delegation The `mcounteren` register allows M-mode to provide the next-lower privilege mode with read access to select counters. When the Smcdeleg/Ssccfg -extension is enabled (`menvcfg`.CDE=1), it further allows M-mode to delegate select +extensions are enabled (`menvcfg`.CDE=1), it further allows M-mode to delegate select counters to S-mode. The `siselect` (and `vsiselect`) index range 0x40-0x5F is reserved for @@ -49,7 +50,7 @@ the table below. |=== |*`siselect` value* |*`sireg*` |*`sireg4`* |*`sireg2`* |*`sireg5`* |0x40 |`cycle`^1^ |`cycleh`^1^ |`cyclecfg`^14^ |`cyclecfgh`^14^ -|0x41 4+^|_See below_ +|0x41 4+^|_See below_ |0x42 |`instret`^1^ |`instreth`^1^ |`instretcfg`^14^ |`instretcfgh`^14^ |0x43 |`hpmcounter3`^2^ |`hpmcounter3h`^2^ |`hpmevent3`^2^ |`hpmevent3h`^23^ |… |… |… |… |… @@ -59,14 +60,15 @@ the table below. ^1^ Depends on Zicntr support + ^2^ Depends on Zihpm support + ^3^ Depends on Sscofpmf support + -^4^ Depends on Smcntrpmf support +^4^ Depends on Smcntrpmf support -[NOTE] -==== -`__hpmevent__i` _represents a subset of the state accessed by the_ `__mhpmevent__i` _register. Likewise, `cyclecfg` and `instretcfg` represent a subset of the state accessed by the `mcyclecfg` and `minstretcfg` registers, respectively. See below for subset details._ -==== +`hpmevent__i__` may represent a subset of the state accessed by the `mhpmevent__i__` register. Specifically, if Sscofpmf is implemented, event selector bit +62 (MINH) is read-only 0 when accessed through `sireg*`. -If extension Smstateen is implemented, refer to extension Smcsrind/Sscsrind (<<indirect-csr>>) for how setting bit 60 of CSR +Likewise, `cyclecfg` and `instretcfg` may represent a subset of the state accessed by the `mcyclecfg` and `minstretcfg` registers, respectively. If +Smcntrpmf is implemented, counter configuration register bit 62 (MINH) is read-only 0 when accessed through `sireg*`. + +If extension Smstateen is implemented, refer to extensions Smcsrind/Sscsrind (<<indirect-csr>>) for how setting bit 60 of CSR `mstateen0` to zero prevents access to registers `siselect`, `sireg*`, `vsiselect`, and `vsireg*` from privileged modes less privileged than M-mode, and likewise how setting bit 60 of `hstateen0` to zero prevents @@ -77,7 +79,7 @@ The remaining rules of this section apply only when access to a CSR is not blocked by `mstateen0`[60] = 0 or `hstateen0`[60] = 0. While the privilege mode is M or S and `siselect` holds a value in the -range 0x40-0x5F, illegal instruction exceptions are raised for the +range 0x40-0x5F, illegal-instruction exceptions are raised for the following cases: * attempts to access any `sireg*` when `menvcfg`.CDE = 0; @@ -95,25 +97,18 @@ For each `siselect` and `sireg*` combination defined in <<indirect-hpm-state-map further indicates the extensions upon which the underlying counter state depends. If any extension upon which the underlying state depends is not implemented, an attempt from M or S mode to access the given state -through `sireg*` raises an illegal instruction exception. +through `sireg*` raises an illegal-instruction exception. If the hypervisor (H) extension is also implemented, then as specified -by extension Smcsrind/Sscsrind, a virtual instruction exception is +by extensions Smcsrind/Sscsrind, a virtual-instruction exception is raised for attempts from VS-mode or VU-mode to directly access `vsiselect` or `vsireg*`, or attempts from VU-mode to access `siselect` or `sireg*`. Furthermore, while `vsiselect` holds a value in the range 0x40-0x5F: * An attempt to access any `vsireg*` from M or S mode raises an illegal instruction exception. -* An attempt from VS-mode to access any `sireg*` (really `vsireg*`) raises an illegal instruction exception if `menvcfg`.CDE = 0, or a virtual +* An attempt from VS-mode to access any `sireg*` (really `vsireg*`) raises an illegal-instruction exception if `menvcfg`.CDE = 0, or a virtual instruction exception if `menvcfg`.CDE = 1. -If Sscofpmf is implemented, `sireg2` and `sireg5` provide access only to a -subset of the event selector registers. Specifically, event selector bit -62 (MINH) is read-only 0 when accessed through `sireg*`. Similarly, if -Smcntrpmf is implemented, `sireg2` and `sireg5` provide access only to a -subset of the counter configuration registers. Counter configuration -register bit 62 (MINH) is read-only 0 when accessed through `sireg*`. - === Supervisor Counter Inhibit (`scountinhibit`) Register Smcdeleg/Ssccfg defines a new `scountinhibit` register, a masked alias of @@ -123,20 +118,20 @@ delegated to S-mode, the associated bits in `scountinhibit` are read-only zero. When `menvcfg`.CDE=0, attempts to access `scountinhibit` raise an illegal -instruction exception. When the Supervisor Counter Delegation extension +instruction exception. When Supervisor Counter Delegation is enabled, attempts to access `scountinhibit` from VS-mode or VU-mode -raise a virtual instruction exception. +raise a virtual-instruction exception. === Virtualizing `scountovf` For implementations that support Smcdeleg/Ssccfg, Sscofpmf, and the H -extension, when `menvcfg`.CDE=1, attempts to access `scountovf` from VS-mode -or VU-mode raise a virtual instruction exception. +extension, when `menvcfg`.CDE=1, attempts to read `scountovf` from VS-mode +or VU-mode raise a virtual-instruction exception. -=== Virtualizing Local Counter Overflow Interrupts +=== Virtualizing Local-Counter-Overflow Interrupts For implementations that support Smcdeleg, Sscofpmf, and Smaia, the -local counter overflow interrupt (LCOFI) bit (bit 13) in each of CSRs +local-counter-overflow interrupt (LCOFI) bit (bit 13) in each of CSRs `mvip` and `mvien` is implemented and writable. For implementations that support Smcdeleg/Ssccfg, Sscofpmf, @@ -145,10 +140,9 @@ and `hvien` is implemented and writable. [NOTE] ==== -_The `hvip` register is defined by the hypervisor (H) extension, while the `mvien` and `hvien` registers are defined by the Smaia/Ssaia extension._ +_The `hvip` register is defined by the hypervisor (H) extension, while the `mvien` and `hvien` registers are defined by the Smaia/Ssaia extensions._ _By virtue of implementing `hvip`.LCOFI, it is implicit that the LCOFI bit (bit 13) in each of `vsie` and `vsip` is also implemented._ _Requiring support for the LCOFI bits listed above ensures that virtual LCOFIs can be delivered to an OS running in S-mode, and to a guest OS running in VS-mode. It is optional whether the LCOFI bit (bit 13) in each of `mideleg` and `hideleg`, which allows all LCOFIs to be delegated to S-mode and VS-mode, respectively, is implemented and writable._ ==== - diff --git a/src/smcntrpmf.adoc b/src/smcntrpmf.adoc index ca87901..922df36 100644 --- a/src/smcntrpmf.adoc +++ b/src/smcntrpmf.adoc @@ -1,4 +1,5 @@ [[smcntrpmf]] + == "Smcntrpmf" Cycle and Instret Privilege Mode Filtering, Version 1.0 === Introduction @@ -14,12 +15,12 @@ Smcntrpmf remedies these issues by introducing privilege mode filtering for the ==== Machine Counter Configuration (`mcyclecfg`, `minstretcfg`) Registers -mcyclecfg and minstretcfg are 64-bit registers that configure privilege mode filtering for the cycle and instret counters, respectively. +mcyclecfg and minstretcfg are 64-bit registers that configure privilege mode filtering for the cycle and instret counters, respectively. [cols="^1,^1,^1,^1,^1,^1,^5",stripes=even,options="header"] |==== |63 |62 |61 |60 |59 |58 |57:0 -|0 |MINH |SINH |UINH |VSINH |VUINH |_WPRI_ +|0 |MINH |SINH |UINH |VSINH |VUINH |_WPRI_ |==== [cols="15%,85%",options="header"] @@ -34,19 +35,17 @@ mcyclecfg and minstretcfg are 64-bit registers that configure privilege mode fil When all __x__INH bits are zero, event counting is enabled in all modes. -For each bit in 61:58, if the associated privilege mode is not implemented, the bit is read-only zero. Bits 57:56 are reserved for possible future modes. +For each bit in 61:58, if the associated privilege mode is not implemented, the bit is read-only zero. For RV32, bits 63:32 of mcyclecfg can be accessed via the mcyclecfgh CSR, and bits 63:32 of minstretcfg can be accessed via the minstretcfgh CSR. -The CSR numbers are 0x321 for mcyclecfg, 0x322 for minstretcfg, 0x721 for mcyclecfgh, and 0x722 for minstretcfgh. - The content of these registers may be accessible from Supervisor level if the Smcdeleg/Ssccfg extensions are implemented. [NOTE] ==== The more natural CSR number for mcyclecfg would be 0x320, but that was allocated to mcountinhibit. -This register format matches that specified for programmable counters by Sscofpmf. The bit position for the OF bit (bit 63) is read-only 0, since these counters do not generate local counter overflow interrupts on overflow. +This register format matches that specified for programmable counters by Sscofpmf. The bit position for the OF bit (bit 63) is read-only 0, since these counters do not generate local-counter-overflow interrupts on overflow. ==== === Counter Behavior diff --git a/src/smctr.adoc b/src/smctr.adoc new file mode 100644 index 0000000..1082b23 --- /dev/null +++ b/src/smctr.adoc @@ -0,0 +1,760 @@ +[[smctr]] + +== "Smctr" Control Transfer Records Extension, Version 1.0 + +A method for recording control flow transfer history is valuable not only for performance profiling but also for debugging. Control flow transfers refer to jump instructions (including function calls and returns), taken branch instructions, traps, and trap returns. Profiling tools, such as Linux perf, collect control transfer history when sampling software execution, thereby enabling tools, like AutoFDO, to identify hot paths for optimization. + +Control flow trace capabilities offer very deep transfer history, but the volume of data produced can result in significant performance overheads due to memory bandwidth consumption, buffer management, and decoder overhead. The Control Transfer Records (CTR) extension provides a method to record a limited history in register-accessible internal chip storage, with the intent of dramatically reducing the performance overhead and complexity of collecting transfer history. + +CTR defines a circular (FIFO) buffer. Each buffer entry holds a record for a single recorded control flow transfer. The number of records that can be held in the buffer depends upon both the implementation (the maximum supported depth) and the CTR configuration (the software selected depth). + +Only qualified transfers are recorded. Qualified transfers are those that meet the filtering criteria, which include the privilege mode and the transfer type. + +Recorded transfers are inserted at the write pointer, which is then incremented, while older recorded transfers may be overwritten once the buffer is full. Or the user can enable RAS (Return Address Stack) emulation mode, where only function calls are recorded, and function returns pop the last call record. The source PC, target PC, and some optional metadata (transfer type, elapsed cycles) are stored for each recorded transfer. + +The CTR buffer is accessible through an indirect CSR interface, such that software can specify which logical entry in the buffer it wishes to read or write. Logical entry 0 always corresponds to the youngest recorded transfer, followed by entry 1 as the next youngest, and so on. + +The machine-level extension, *Smctr*, encompasses all newly added Control Status Registers (CSRs), instructions, and behavior modifications for a hart across all privilege levels. The corresponding supervisor-level extension, *Ssctr*, is essentially identical to Smctr, except that it excludes machine-level CSRs and behaviors not intended to be directly accessible at the supervisor level. + +Smctr and Ssctr depend on both the implementation of S-mode and the Sscsrind extension. + +=== CSRs + +==== Machine Control Transfer Records Control Register (`mctrctl`) + +The `mctrctl` register is a 64-bit read/write register that enables and configures the CTR capability. + +.Machine Control Transfer Records Control Register Format +[%unbreakable] +[wavedrom, , svg] +.... +{reg: [ + {bits: 1, name: 'U'}, + {bits: 1, name: 'S'}, + {bits: 1, name: 'M'}, + {bits: 4, name: '<i>WPRI</i>'}, + {bits: 1, name: 'RASEMU'}, + {bits: 1, name: 'STE'}, + {bits: 1, name: 'MTE'}, + {bits: 1, name: '<i>WPRI</i>'}, + {bits: 1, name: 'BPFRZ'}, + {bits: 1, name: 'LCOFIFRZ'}, + {bits: 20, name: '<i>WPRI</i>'}, + {bits: 1, name: 'EXCINH'}, + {bits: 1, name: 'INTRINH'}, + {bits: 1, name: 'TRETINH'}, + {bits: 1, name: 'NTBREN'}, + {bits: 1, name: 'TKBRINH'}, + {bits: 2, name: '<i>WPRI</i>'}, + {bits: 1, name: 'INDCALLINH'}, + {bits: 1, name: 'DIRCALLINH'}, + {bits: 1, name: 'INDJMPINH'}, + {bits: 1, name: 'DIRJMPINH'}, + {bits: 1, name: 'CORSWAPINH'}, + {bits: 1, name: 'RETINH'}, + {bits: 1, name: 'INDLJMPINH'}, + {bits: 1, name: 'DIRLJMPINH'}, + {bits: 12, name: '<i>WPRI</i>'}, + {bits: 4, name: 'Custom'}, +], config:{lanes: 8, hspace:1024}} +.... + +.Machine Control Transfer Records Control Register Field Definitions +[%unbreakable] +[width="100%",cols="20%,80%",options="header",] +|=== +|Field |Description +|M, S, U |Enable transfer recording in the selected privileged mode(s). + +|RASEMU |Enables RAS (Return Address Stack) Emulation Mode. See <<RAS (Return Address Stack) Emulation Mode>>. + +|MTE |Enables recording of traps to M-mode when M=0. See <<External Traps>>. + +|STE |Enables recording of traps to S-mode when S=0. See <<External Traps>>. + +|BPFRZ |Set `sctrstatus`.FROZEN on a breakpoint exception that traps to M-mode or S-mode. See <<Freeze>>. + +|LCOFIFRZ |Set `sctrstatus`.FROZEN on local-counter-overflow interrupt (LCOFI) that traps to M-mode or S-mode. See <<Freeze>>. + +|EXCINH |Inhibit recording of exceptions. See <<Transfer Type Filtering>>. + +|INTRINH |Inhibit recording of interrupts. See <<Transfer Type Filtering>>. + +|TRETINH |Inhibit recording of trap returns. See <<Transfer Type Filtering>>. + +|NTBREN |Enable recording of not-taken branches. See <<Transfer Type Filtering>>. + +|TKBRINH |Inhibit recording of taken branches. See <<Transfer Type Filtering>>. + +|INDCALLINH |Inhibit recording of indirect calls. See <<Transfer Type Filtering>>. + +|DIRCALLINH |Inhibit recording of direct calls. See <<Transfer Type Filtering>>. + +|INDJMPINH |Inhibit recording of indirect jumps (without linkage). See <<Transfer Type Filtering>>. + +|DIRJMPINH |Inhibit recording of direct jumps (without linkage). See <<Transfer Type Filtering>>. + +|CORSWAPINH |Inhibit recording of co-routine swaps. See <<Transfer Type Filtering>>. + +|RETINH |Inhibit recording of function returns. See <<Transfer Type Filtering>>. + +|INDLJMPINH |Inhibit recording of other indirect jumps (with linkage). See <<Transfer Type Filtering>>. + +|DIRLJMPINH |Inhibit recording of other direct jumps (with linkage). See <<Transfer Type Filtering>>. +|Custom[3:0] | WARL bits designated for custom use. The value 0 must correspond to standard behavior. See <<Custom Extensions>>. +|=== + +[%unbreakable] +-- +All fields are optional except for M, S, U, and BPFRZ. All unimplemented fields are read-only 0, while all implemented fields are writable. If the Sscofpmf extension is implemented, LCOFIFRZ must be writable. +-- + +[NOTE] +==== +_Because the ROI of CTR is perceived to be low for RV32 implementations, CTR does not fully support RV32. While control flow transfers in RV32 can be recorded, RV32 cannot access_ `x__ctrctl__` _bits 63:32. A future extension could add support for RV32 by adding 3 new CSRs (`mctrctlh`, `sctrctlh`, and `vsctrctlh`) to provide this access._ +==== + +==== Supervisor Control Transfer Records Control Register (`sctrctl`) + +The `sctrctl` register provides supervisor mode access to a subset of `mctrctl`. + +Bits 2 and 9 in `sctrctl` are read-only 0. As a result, the M and MTE fields in `mctrctl` are not accessible through `sctrctl`. All other `mctrctl` fields are accessible through `sctrctl`. + +==== Virtual Supervisor Control Transfer Records Control Register (`vsctrctl`) + +If the H extension is implemented, the `vsctrctl` register is a 64-bit read/write register that is VS-mode's version of supervisor register `sctrctl`. When V=1, `vsctrctl` substitutes for the usual `sctrctl`, so instructions that normally read or modify `sctrctl` actually access `vsctrctl` instead. + +.Virtual Supervisor Control Transfer Records Control Register Format +[%unbreakable] +[wavedrom, , svg] +.... +{reg: [ + {bits: 1, name: 'U'}, + {bits: 1, name: 'S'}, + {bits: 5, name: '<i>WPRI</i>'}, + {bits: 1, name: 'RASEMU'}, + {bits: 1, name: 'STE'}, + {bits: 2, name: '<i>WPRI</i>'}, + {bits: 1, name: 'BPFRZ'}, + {bits: 1, name: 'LCOFIFRZ'}, + {bits: 20, name: '<i>WPRI</i>'}, + {bits: 1, name: 'EXCINH'}, + {bits: 1, name: 'INTRINH'}, + {bits: 1, name: 'TRETINH'}, + {bits: 1, name: 'NTBREN'}, + {bits: 1, name: 'TKBRINH'}, + {bits: 2, name: '<i>WPRI</i>'}, + {bits: 1, name: 'INDCALLINH'}, + {bits: 1, name: 'DIRCALLINH'}, + {bits: 1, name: 'INDJMPINH'}, + {bits: 1, name: 'DIRJMPINH'}, + {bits: 1, name: 'CORSWAPINH'}, + {bits: 1, name: 'RETINH'}, + {bits: 1, name: 'INDLJMPINH'}, + {bits: 1, name: 'DIRLJMPINH'}, + {bits: 12, name: '<i>WPRI</i>'}, + {bits: 4, name: 'Custom'}, +], config:{lanes: 8, hspace:1024}} +.... + +.Virtual Supervisor Control Transfer Records Control Register Field Definitions +[%unbreakable] +[width="100%",cols="20%,80%",options="header",] +|=== +|Field |Description +|S |Enable transfer recording in VS-mode. +|U |Enable transfer recording in VU-mode. +|STE |Enables recording of traps to VS-mode when S=0. See <<External Traps>>. +|BPFRZ |Set `sctrstatus`.FROZEN on a breakpoint exception that traps to VS-mode. See <<Freeze>>. +|LCOFIFRZ |Set `sctrstatus`.FROZEN on local-counter-overflow interrupt (LCOFI) that traps to VS-mode. See <<Freeze>>. +2+|Other field definitions match those of `sctrctl`. The optional fields implemented in `vsctrctl` should match those implemented in `sctrctl`. +|=== + +[NOTE] +[%unbreakable] +==== +_Unlike the CTR status register or the CTR entry registers, the CTR control register has a VS-mode version. This allows a guest to manage the CTR configuration directly, without requiring traps to HS-mode, while ensuring that the guest configuration (most notably the privilege mode enable bits) do not impact CTR behavior when V=0._ +==== + +==== Supervisor Control Transfer Records Depth Register (`sctrdepth`) + +The 32-bit `sctrdepth` register specifies the depth of the CTR buffer. + +.Supervisor Control Transfer Records Depth Register Format +[%unbreakable] +[wavedrom, , svg] +.... +{reg: [ + {bits: 3, name: 'DEPTH'}, + {bits: 29, name: '<i>WPRI</i>'}, +], config:{lanes: 1, hspace:1024}} +.... + +.Supervisor Control Transfer Records Depth Register Field Definitions +[%unbreakable] +[width="100%",cols="15%,85%",options="header",] +|=== +|Field |Description +|DEPTH |WARL field that selects the depth of the CTR buffer. Encodings: + +‘000 - 16 + +‘001 - 32 + +‘010 - 64 + +‘011 - 128 + +‘100 - 256 + +'11x - reserved + +The depth of the CTR buffer dictates the number of entries to which the hardware records transfers. For a depth of N, the hardware records transfers to entries 0..N-1. All <<_entry_registers, Entry Registers>> read as '0' and are read-only when the selected entry is in the range N to 255. When the depth is increased, the newly accessible entries contain unspecified but legal values. + +It is implementation-specific which DEPTH value(s) are supported. +|=== + +Attempts to access `sctrdepth` from VS-mode or VU-mode raise a virtual-instruction exception, unless CTR state enable access restrictions apply. See <<State Enable Access Control>>. + +[NOTE] +[%unbreakable] +==== +_It is expected that operating systems (OSs) will access `sctrdepth` only at boot, to select the maximum supported depth value. More frequent accesses may result in reduced performance in virtualization scenarios, as a result of traps from VS-mode incurred._ + +_There may be scenarios where software chooses to operate on only a subset of the entries, to reduce overhead. In such cases tools may choose to read only the lower entries, and OSs may choose to save/restore only on the lower entries while using SCTRCLR to clear the others._ + +_The value in configurable depth lies in supporting VM migration. It is expected that a platform spec may specify that one or more CTR depth values must be supported. A hypervisor may wish to restrict guests to using one of these required depths, in order to ensure that such guests can be migrated to any system that complies with the platform spec. The trapping behavior specified for VS-mode accesses to `sctrdepth` ensures that the hypervisor can impose such restrictions._ +==== + +==== Supervisor Control Transfer Records Status Register (`sctrstatus`) + +The 32-bit `sctrstatus` register grants access to CTR status information and is updated by the hardware whenever CTR is active. CTR is active when the current privilege mode is enabled for recording and CTR is not frozen. + +.Supervisor Control Transfer Records Status Register Format +[%unbreakable] +[wavedrom, , svg] +.... +{reg: [ + {bits: 8, name: 'WRPTR'}, + {bits: 23, name: '<i>WPRI</i>'}, + {bits: 1, name: 'FROZEN'}, +], config:{lanes: 2, hspace:1024}} +.... + +.Supervisor Control Transfer Records Status Register Field Definitions +[%unbreakable] +[width="100%",cols="15%,85%",options="header",] +|=== +|Field |Description +|WRPTR |WARL field that indicates the physical CTR buffer entry to be written next. It is incremented after new transfers are recorded (see <<Behavior>>), though there are exceptions when `__x__ctrctl`.RASEMU=1, see <<RAS (Return Address Stack) Emulation Mode>>. For a given CTR depth (where depth = 2^(DEPTH+4)^), WRPTR wraps to 0 on an increment when the value matches depth-1, and to depth-1 on a decrement when the value is 0. Bits above those needed to represent depth-1 (e.g., bits 7:4 for a depth of 16) are read-only 0. On depth changes, WRPTR holds an unspecified but legal value. +|FROZEN |Inhibit transfer recording. See <<Freeze>>. +|=== + +Undefined bits in `sctrstatus` are WPRI. Status fields may be added by future extensions, +and software should ignore but preserve any fields that it does not recognize. Undefined bits must be implemented as read-only 0, unless a custom extension is implemented and enabled (see <<Custom Extensions>>). + +[NOTE] +[%unbreakable] +==== +_Logical entry 0, accessed via `sireg*` when `siselect`=0x200, is always the physical buffer entry preceding the WRPTR entry. More generally, the physical buffer entry Y associated with logical entry X (X < depth) can be determined using the formula Y = (WRPTR - X - 1) % depth, where depth = 2^(DEPTH+4)^. Logical entries >= depth are read-only 0._ +==== +[NOTE] +[%unbreakable] +==== +_Because the `sctrstatus` register is updated by hardware, writes should be performed with caution. If a multi-instruction read-modify-write to `sctrstatus` is performed while CTR is active, and between the read and write a qualified transfer or trap that causes CTR freeze completes, a hardware update could be lost. Software may wish to ensure that CTR is inactive before performing a read-modify-write, by ensuring that either `sctrstatus`.FROZEN=1, or that the current privilege mode is not enabled for recording._ + +_When restoring CTR state, `sctrstatus` should be written before CTR entry state is restored. This ensures that the software writes to logical CTR entries modify the proper physical entries._ +==== + +[NOTE] +[%unbreakable] +==== +_Exposing the WRPTR provides a more efficient means for synthesizing CTR entries. If a qualified control transfer is emulated, the emulator can simply increment the WRPTR, then write the synthesized record to logical entry 0. If a qualified function return is emulated while RASEMU=1, the emulator can clear `ctrsource`.V for logical entry 0, then decrement the WRPTR._ + +_Exposing the WRPTR may also allow support for Linux perf's https://lwn.net/Articles/802821[[.underline]#stack stitching#] capability._ +==== + +[NOTE] +[%unbreakable] +==== +_Smctr/Ssctr depends upon implementation of S-mode because much of CTR state is accessible only through S-mode CSRs. If, in the future, it becomes desirable to remove this dependency, an extension could add `mctrdepth` and `mctrstatus` CSRs that reflect the same state as `sctrdepth` and `sctrstatus`, respectively. Further, such an extension should make CTR entries accessible via `miselect`/`mireg*`. See <<Entry Registers>>._ +==== + +=== Entry Registers + +Control transfer records are stored in a CTR buffer, such that each buffer entry stores information about a single transfer. The CTR buffer entries are logically accessed via the indirect register access mechanism defined by the Sscsrind extension. The `siselect` index range 0x200 through 0x2FF is reserved for CTR logical entries 0 through 255. When `siselect` holds a value in this range, `sireg` provides access to `ctrsource`, `sireg2` provides access to `ctrtarget`, and `sireg3` provides access to `ctrdata`. `sireg4`, `sireg5`, and `sireg6` are read-only 0. + +When `vsiselect` holds a value in 0x200..0x2FF, the `vsireg*` registers provide access to the same CTR entry register state as the analogous `sireg*` registers. There is not a separate set of entry registers for V=1. + +See <<State Enable Access Control>> for cases where CTR accesses from S-mode and VS-mode may be restricted. + +==== Control Transfer Record Source Register (`ctrsource`) + +The `ctrsource` register contains the source program counter, which is the `pc` of the recorded control transfer instruction, or the epc of the recorded trap. The valid (V) bit is set by the hardware when a transfer is recorded in the selected CTR buffer entry, and implies that data in `ctrsource`, `ctrtarget`, and `ctrdata` is valid for this entry. + +`ctrsource` is an MXLEN-bit WARL register that must be able to hold all valid virtual or physical addresses that can serve as a `pc`. It need not be able to hold any invalid addresses; implementations may convert an invalid address into a valid address that the register is capable of holding. When XLEN < MXLEN, both explicit writes (by software) and implicit writes (for recorded transfers) will be zero-extended. + +.Control Transfer Record Source Register Format for MXLEN=64 +[%unbreakable] +[wavedrom, , svg] +.... +{reg: [ + {bits: 1, name: 'V'}, + {bits: 63, name: 'PC[63:1]'}, +], config:{lanes: 1, hspace: 1024}} +.... + +[NOTE] +[%unbreakable] +==== +_CTR entry registers are defined as MXLEN, despite the_ `x__ireg*__` _CSRs used to access them being XLEN, to ensure that entries recorded in RV64 are not truncated, as a result of CSR Width Modulation, on a transition to RV32._ +==== + +==== Control Transfer Record Target Register (`ctrtarget`) + +The `ctrtarget` register contains the target (destination) program counter +of the recorded transfer. The optional MISP bit is set by the hardware +when the recorded transfer is an instruction whose target or +taken/not-taken direction was mispredicted by the branch predictor. MISP +is read-only 0 when not implemented. + +`ctrtarget` is an MXLEN-bit WARL register that must be able to hold all valid virtual or physical addresses that can serve as a `pc`. It need not be able to hold any invalid addresses; implementations may convert an invalid address into a valid address that the register is capable of holding. When XLEN < MXLEN, both explicit writes (by software) and implicit writes (by recorded transfers) will be zero-extended. + +.Control Transfer Record Target Register Format for MXLEN=64 +[%unbreakable] +[wavedrom, , svg] +.... +{reg: [ + {bits: 1, name: 'MISP'}, + {bits: 31, name: 'PC[31:1]'}, + {bits: 32, name: 'PC[63:32]'}, +], config:{lanes: 2, hspace: 1024}} +.... + +==== Control Transfer Record Metadata Register (`ctrdata`) + +The `ctrdata` register contains metadata for the recorded transfer. This +register must be implemented, though all fields within it are optional. +Unimplemented fields are read-only 0. `ctrdata` is a 64-bit register. + +.Control Transfer Record Metadata Register Format +[%unbreakable] +[wavedrom, , svg] +.... +{reg: [ + {bits: 4, name: 'TYPE'}, + {bits: 11, name: '<i>WPRI</i>'}, + {bits: 1, name: 'CCV'}, + {bits: 16, name: 'CC'}, + {bits: 32, name: '<i>WPRI</i>'}, +], config:{lanes: 2, hspace: 1024}} +.... + +.Control Transfer Record Metadata Register Field Definitions +[%unbreakable] +[width="100%",cols="15%,75%,10%",options="header",] +|=== +|Field |Description |Access +|TYPE[3:0] a| +Identifies the type of the control flow transfer recorded in the entry, using the encodings listed in xref:transfer-type-defs[xrefstyle=short]. Implementations that do not support this field will report 0. +|WARL + +|CCV |Cycle Count Valid. See <<Cycle Counting>>. |WARL + +|CC[15:0] |Cycle Count, composed of the Cycle Count Exponent (CCE, in +CC[15:12]) and Cycle Count Mantissa (CCM, in CC[11:0]). See +<<Cycle Counting>>. |WARL +|=== + +Undefined bits in `ctrdata` are WPRI. Undefined bits must be implemented as read-only 0, unless a <<_custom_extensions, custom extension>> is implemented and enabled. + +[NOTE] +[%unbreakable] +==== +_Like the <<_transfer_type_filtering, Transfer Type Filtering>> bits in `mctrctl`, the `ctrdata`.TYPE bits leverage the E-trace itype encodings._ +==== + +=== Instructions +==== Supervisor CTR Clear Instruction + +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 'opcode', attr: ['7', 'SYSTEM'], type: 8}, + {bits: 5, name: 'rd', attr: ['5', '0'], type: 2}, + {bits: 3, name: 'funct3', attr: ['3', '0'], type: 8}, + {bits: 5, name: 'rs1', attr: ['5', '0'], type: 4}, + {bits: 12, name: 'func12', attr: ['12', 'SCTRCLR (0x104)'], type: 8}, +]} +.... + +The SCTRCLR instruction performs the following operations: + +* Zeroes all CTR <<_entry_registers, Entry Registers>>, for all DEPTH values +* Zeroes the CTR cycle counter and CCV (see <<Cycle Counting>>) + +Any read of `ctrsource`, `ctrtarget`, or `ctrdata` that follows SCTRCLR, such that it precedes the next qualified control transfer, will return the value 0. Further, the first recorded transfer following SCTRCLR will have `ctrdata`.CCV=0. + +SCTRCLR raises an illegal-instruction exception in U-mode, and a virtual-instruction exception in VU-mode, unless CTR state enable access restrictions apply. See <<State Enable Access Control>>. + +=== State Enable Access Control + +When Smstateen is implemented, the `mstateen0`.CTR bit controls access to CTR register state from privilege modes less privileged than M-mode. When `mstateen0`.CTR=1, accesses to CTR register state behave as described in <<CSRs>> and <<Entry Registers>> above, while SCTRCLR behaves as described in <<Supervisor CTR Clear Instruction>>. When `mstateen0`.CTR=0 and the privilege mode is less privileged than M-mode, the following operations raise an illegal-instruction exception: + +* Attempts to access `sctrctl`, `vsctrctl`, `sctrdepth`, or `sctrstatus` +* Attempts to access `sireg*` when `siselect` is in 0x200..0x2FF, or `vsireg*` when `vsiselect` is in 0x200..0x2FF +* Execution of the SCTRCLR instruction + +When `mstateen0`.CTR=0, qualified control transfers executed in privilege modes less privileged than M-mode will continue to implicitly update entry registers and `sctrstatus`. + +If the H extension is implemented and `mstateen0`.CTR=1, the `hstateen0`.CTR bit controls access to supervisor CTR state when V=1. This state includes `sctrctl` (really `vsctrctl`), `sctrstatus`, and `sireg*` (really `vsireg*`) when `siselect` (really `vsiselect`) is in 0x200..0x2FF. `hstateen0`.CTR is read-only 0 when `mstateen0`.CTR=0. + +When `mstateen0`.CTR=1 and `hstateen0`.CTR=1, VS-mode accesses to supervisor CTR state behave as described in <<CSRs>> and <<Entry Registers>> above, while SCTRCLR behaves as described in <<Supervisor CTR Clear Instruction>>. When `mstateen0`.CTR=1 and `hstateen0`.CTR=0, both VS-mode accesses to supervisor CTR state and VS-mode execution of SCTRCLR raise a virtual-instruction exception. + +[NOTE] +[%unbreakable] +==== +`__sctrdepth__` _is not included in the above list of supervisor CTR state controlled by `hstateen0`.CTR since accesses to `sctrdepth` from VS-mode raise a virtual-instruction exception regardless of the value of `hstateen0`.CTR._ +==== + +When `hstateen0`.CTR=0, qualified control transfers executed while V=1 will continue to implicitly update entry registers and `sctrstatus`. + +[NOTE] +[%unbreakable] +==== +_See <<indirect-csr>> for how bit 60 in `mstateen0` and `hstateen0` can also restrict access to `sireg*`/`siselect` and `vsireg*`/`vsiselect` from privilege modes less privileged than M-mode._ +==== + +[NOTE] +[%unbreakable] +==== +_Implementations that support Smctr/Ssctr but not Smstateen/Ssstateen may observe reduced performance. Because Smctr/Ssctr introduces a significant number of new CSRs, it is desirable to avoid save/restore of CTR state when possible. A hypervisor is likely to leverage State Enable to trap on the initial guest access to CTR state, delegating CTR and enabling save/restore of guest CTR state only once the guest has begun to use it. Without Smstateen/Ssstateen, a hypervisor is required to save/restore guest CTR state on every context switch._ +==== + +=== Behavior + +CTR records qualified control transfers. Control transfers are qualified if they meet the following criteria: + +* The current privilege mode is enabled +* The transfer type is not inhibited +* `sctrstatus`.FROZEN is not set +* The transfer completes/retires + +Such qualified transfers update the <<_entry_registers, Entry Registers>> at logical entry 0. As a result, older entries are pushed down the stack; the record previously in logical entry 0 moves to logical entry 1, the record in logical entry 1 moves to logical entry 2, and so on. If the CTR buffer is full, the oldest recorded entry (previously at entry depth-1) is lost. + +Recorded transfers will set the `ctrsource`.V bit to 1, and will update all implemented record fields. + +[NOTE] +[%unbreakable] +==== +_In order to collect accurate and representative performance profiles while using CTR, it is recommended that hardware recording of control transfers incurs no added performance overhead, e.g., in the form of retirement or instruction execution restrictions that are not present when CTR is not active._ +==== + +==== Privilege Mode Transitions + +Transfers that change the privilege mode are a special case. What is +recorded, if anything, depends on whether the source privilege mode +and/or target privilege mode are enabled for recording, and on the transfer type (trap +or trap return). + +Traps between enabled privilege modes are recorded as normal. Traps from a disabled privilege mode to an enabled privilege mode are partially recorded, such that the `ctrsource`.PC is 0. Traps from an enabled mode to a disabled mode, known as external traps, are not recorded by default. See <<External Traps>> for how they can be recorded. + +Trap returns have similar treatment. Trap returns between enabled privilege modes are recorded as normal. Trap returns from an enabled mode back to a disabled mode are partially recorded, such that `ctrtarget`.PC is 0. Trap returns from a disabled mode to an enabled mode are not recorded. + +[NOTE] +==== +_If privileged software is configuring CTR on behalf of less privileged software, it should ensure that its privilege mode enable bit (e.g., `sctrctl`.S for Supervisor software) is cleared before a trap return to the less privileged mode. Otherwise the trap return will be recorded, leaking the privileged source `pc`._ +==== + +Recording in Debug Mode is always inhibited. Transfers into and out of Debug Mode are never recorded. + +The table below provides details on recording of privilege mode transitions. Standard dependencies on FROZEN and transfer type inhibits also apply, but are not covered by the table. + +.Trap and Trap Return Recording +[%unbreakable] +[width="100%",cols="18%,17%,30%,35%",] +|=== +.2+|*Transfer Type* .2+| *Source Mode* 2+|*Target Mode* +|*Enabled* |*Disabled* +.2+|*Trap* |*Enabled* |Recorded. | External trap. Not recorded by default, but see <<External Traps>>. + +|*Disabled* |Recorded, `ctrsource`.PC is 0. |Not recorded. + +.2+|*Trap Return* |*Enabled* |Recorded. |Recorded, `ctrtarget`.PC is 0. + +|*Disabled* |Not recorded. |Not recorded. +|=== + +===== Virtualization Mode Transitions + +Transitions between VS/VU-mode and M/HS-mode are unique in that they effect a change in the active CTR control register, and hence the CTR configuration. What is recorded, if anything, on these virtualization mode transitions depends upon fields from both `[ms]ctrctl` and `vsctrctl`. + +* `mctrctl`.M, `sctrctl`.S, and `vsctrctl`.{S,U} are used to determine whether the source and target modes are enabled; +* `mctrctl`.MTE, `sctrctl`.STE, and `vsctrctl`.STE are used to determine whether an external trap is recorded (see <<External Traps>>); +* `sctrctl`.LCOFIFRZ and `sctrctl`.BPFRZ determine whether CTR becomes frozen (see <<Freeze>>) +* For all other `__x__ctrctl` fields, the value in `vsctrctl` is used. + +[NOTE] +==== +_Consider an exception that traps from VU-mode to HS-mode, with `vsctrctl`.U=1 and `sctrctl`.S=1. Because both the source mode and target mode are enabled for recording, whether the trap is recorded then depends on the CTR configuration (e.g., the <<_transfer_type_filtering, transfer type filter>> bits) in `vsctrctl`, not in `sctrctl`._ +==== + +===== External Traps + +External traps are traps from a privilege mode enabled for CTR recording to a privilege mode that is not enabled for CTR recording. By default external traps are not recorded, but privileged software running in the target mode of the trap can opt-in to allowing CTR to record external traps into that mode. The `__x__ctrctl`.__x__TE bits allow M-mode, S-mode, and VS-mode to opt-in separately. + +External trap recording depends not only on the target mode, but on any intervening modes, which are modes that are more privileged than the source mode but less privileged than the target mode. Not only must the external trap enable bit for the target mode be set, but the external trap enable bit(s) for any intervening modes must also be set. See the table below for details. + +[NOTE] +[%unbreakable] +==== +_Requiring intervening modes to be enabled for external traps simplifies software management of CTR. Consider a scenario where S-mode software is configuring CTR for U-mode contexts A and B, such that external traps (to any mode) are enabled for A but not for B. When switching between the two contexts, S-mode can simply toggle `sctrctl`.STE, rather than requiring a trap to M-mode to additionally toggle `mctrctl`.MTE._ + +_This method does not provide the flexibility to record external traps to a more privileged mode but not to all intervening mode(s). Because it is expected that profiling tools generally wish to observe all external traps or none, this is not considered a meaningful limitation._ +==== + +.External Trap Enable Requirements +[%unbreakable] +[options="header", width="85%", cols="23%,23%,54%"] +|=== +|Source Mode |Target Mode |External Trap Enable(s) Required +.2+|U-mode | S-mode | `sctrctl`.STE +|M-mode | `mctrctl`.MTE, `sctrctl`.STE +|S-mode | M-mode | `mctrctl`.MTE +.3+|VU-mode | VS-mode | `vsctrctl`.STE +| HS-mode | `sctrctl`.STE, `vsctrctl`.STE +| M-mode | `mctrctl`.MTE, `sctrctl`.STE, `vsctrctl`.STE +.2+| VS-mode | HS-mode | `sctrctl`.STE +| M-mode | `mctrctl`.MTE, `sctrctl`.STE +|=== + +In records for external traps, the `ctrtarget`.PC is 0. + +[NOTE] +[%unbreakable] +==== +_No mechanism exists for recording external trap returns, because +the external trap record includes all relevant information, and gives +the trap handler (e.g., an emulator) the opportunity to modify the +record._ +==== + +[NOTE] +[%unbreakable] +==== +_Note that external trap recording does not depend on EXCINH/INTRINH. Thus, when external traps are enabled, both external interrupts and external exceptions are recorded._ + +_STE allows recording of traps from U-mode to S-mode as well as from VS/VU-mode to HS-mode. The hypervisor can flip `sctrctl`.STE before entering a guest if it wants different behavior for U-to-S vs VS/VU-to-HS._ +==== + +If external trap recording is implemented, `mctrctl`.MTE and `sctrctl`.STE must be implemented, while `vsctrctl`.STE must be implemented if the H extension is implemented. + +==== Transfer Type Filtering + +Default CTR behavior, when all transfer type filter bits (`__x__ctrctl`[47:32]) are unimplemented or 0, is to record all control transfers within enabled privileged modes. By setting transfer type filter bits, software can opt out of recording select transfer types, or opt into recording non-default operations. All transfer type filter bits are optional. + +[NOTE] +[%unbreakable] +==== +_Because not-taken branches are not recorded by default, the polarity of the associated enable bit (NTBREN) is the opposite of other bits associated with transfer type filtering (TKBRINH, RETINH, etc). Non-default operations require opt-in rather than opt-out._ +==== + +The transfer type filter bits leverage the type definitions specified +in the +https://github.com/riscv-non-isa/riscv-trace-spec/releases/download/v2.0rc2/riscv-trace-spec.pdf[[.underline]#RISC-V Efficient Trace Spec v2.0#] (Table 4.4 and Section 4.1.1). For completeness, the definitions are reproduced below. + +[NOTE] +==== +_Here "indirect" is used interchangeably with "uninferrable", which is used in the trace spec. Both imply that the target of the jump is not encoded in the opcode._ +==== + +.Control Transfer Type Definitions +[#transfer-type-defs] +[%unbreakable] +[width="60%", cols="22%,78%", options="header",] +|=== +| Encoding | Transfer Type Name +| 0 | _Not used by CTR_ +| 1 | Exception +| 2 | Interrupt +| 3 | Trap return +| 4 | Not-taken branch +| 5 | Taken branch +| 6 | _reserved_ +| 7 | _reserved_ +| 8 | Indirect call +| 9 | Direct call +| 10 | Indirect jump (without linkage) +| 11 | Direct jump (without linkage) +| 12 | Co-routine swap +| 13 | Function return +| 14 | Other indirect jump (with linkage) +| 15 | Other direct jump (with linkage) +|=== + +Encodings 8 through 15 refer to various encodings of jump instructions. The types are distinguished as described below. + +.Control Transfer Type Definitions +[%unbreakable] +[cols="37%,63%", options="header",] +|=== +| Transfer Type Name | Associated Opcodes +.3+| Indirect call | JALR _x1_, _rs_ where _rs_ != _x5_ +| JALR _x5_, _rs_ where _rs_ != _x1_ +| C.JALR _rs1_ where _rs1_ != _x5_ +.4+| Direct call | JAL _x1_ +| JAL _x5_ +| C.JAL +| CM.JALT _index_ +.2+| Indirect jump (without linkage) | JALR _x0_, _rs_ where _rs_ != (_x1_ or _x5_) +| C.JR _rs1_ where _rs1_ != (_x1_ or _x5_) +.3+| Direct jump (without linkage) | JAL _x0_ +| C.J +| CM.JT _index_ +.3+| Co-routine swap | JALR _x1_, _x5_ +| JALR _x5_, _x1_ +| C.JALR _x5_ +.3+| Function return | JALR _rd_, _rs_ where _rs_ == (_x1_ or _x5_) and _rd_ != (_x1_ or _x5_) +| C.JR _rs1_ where _rs1_ == (_x1_ or _x5_) +| CM.POPRET(Z) +| Other indirect jump (with linkage) | JALR _rd_, _rs_ where _rs_ != (_x1_ or _x5_) and _rd_ != (_x0_, _x1_, or _x5_) +| Other direct jump (with linkage) | JAL _rd_ where _rd_ != (_x0_, _x1_, or _x5_) +|=== + + +[NOTE] +[%unbreakable] +==== +_If implementation of any transfer type filter bit results in reduced software performance, perhaps due to additional retirement restrictions, it is strongly recommended that this reduced performance apply only when the bit is set. Alternatively, support for the bit may be omitted. Maintaining software performance for the default CTR configuration, when all transfer type bits are cleared, is recommended._ +==== + +==== Cycle Counting + +The `ctrdata` register may optionally include a count of CPU cycles elapsed since the prior CTR record. The elapsed cycle count value is represented by the CC field, which has a 12-bit mantissa component (Cycle Count Mantissa, or CCM) and a 4-bit exponent component (Cycle Count Exponent, or CCE). + +The elapsed cycle counter (CtrCycleCounter) increments at the same rate as the `mcycle` counter. Only cycles while CTR is active are counted, where active implies that the current privilege mode is enabled for recording and CTR is not frozen. The CC field is encoded such that CCE holds 0 if the CtrCycleCounter value is less than 4096, otherwise it holds the index of the most significant one bit in the CtrCycleCounter value, minus 11. CCM holds CtrCycleCounter bits CCE+10:CCE-1. + +The elapsed cycle count can then be calculated by software using the following formula: + +[subs="specialchars,quotes"] +[%unbreakable] +---- +if (CCE==0): + return CCM +else: + return (2^12^ + CCM) << CCE-1 +endif +---- + +The CtrCycleCounter is reset on writes to `__x__ctrctl`, and on execution of SCTRCLR, to ensure that any accumulated cycle counts do not persist across a context switch. + +An implementation that supports cycle counting must implement CCV and all +CCM bits, but may implement 0..4 exponent bits in CCE. Unimplemented CCE +bits are read-only 0. For implementations that support transfer type +filtering, it is recommended to implement at least 3 exponent bits. This +allows capturing the full latency of most functions, when recording only +calls and returns. + +The size of the CtrCycleCounter required to support each CCE width is given in the table below. + +.Cycle Counter Size Options +[%unbreakable] +[width="70%", cols="20%,38%,42%", options="header",] +|=== +| CCE bits | CtrCycleCounter bits | Max elapsed cycle value +| 0 | 12 | 4095 +| 1 | 13 | 8191 +| 2 | 15 | 32764 +| 3 | 19 | 524224 +| 4 | 27 | 134201344 +|=== + +[NOTE] +[%unbreakable] +==== +_When CCE>1, the granularity of the reported cycle count is reduced. For example, when CCE=3, the bottom 2 bits of the cycle counter are not reported, and thus the reported value increments only every 4 cycles. As a result, the reported value represents an undercount of elapsed cycles for most cases (when the unreported bits are non-zero). On average, the undercount will be (2^CCE-1^-1)/2. Software can reduce the average undercount to 0 by adding (2^CCE-1^-1)/2 to each computed cycle count value when CCE>1._ + +_Though this compressed method of representation results in some imprecision for larger cycle count values, it produces meaningful area savings, reducing storage per entry from 27 bits to 16._ +==== + +The CC value saturates when all implemented bits in CCM and CCE are 1. + +The CC value is valid only when the Cycle Count Valid (CCV) bit is set. If CCV=0, the CC value might not hold the correct count of elapsed active cycles since the last recorded transfer. The next record will have CCV=0 after a write to `__x__ctrctl`, or execution of SCTRCLR, since CtrCycleCounter is reset. CCV should additionally be cleared after any other implementation-specific scenarios where active cycles might not be counted in CtrCycleCounter. + +==== RAS (Return Address Stack) Emulation Mode + +When the optional `__x__ctrctl`.RASEMU bit is implemented and set to 1, transfer recording behavior is altered to emulate the behavior of a return-address stack (RAS). + +* Indirect and direct calls are recorded as normal +* Function returns pop the most recent call, by decrementing the WRPTR then invalidating the WRPTR entry (by setting ctrsource.V=0). As a result, logical entry 0 is invalidated and moves to logical entry depth-1, while logical entries 1..depth-1 move to 0..depth-2. +* Co-routine swaps affect both a return and a call. Logical entry 0 is +overwritten, and WRPTR is not modified. +* Other transfer types are inhibited +* Transfer type filtering bits (`__x__ctrctl`[47:32]) and external trap enable bits (`__x__ctrctl`.__x__TE) are ignored + +[NOTE] +[%unbreakable] +==== +_Profiling tools often collect call stacks along with each sample. Stack +walking, however, is a complex and often slow process that may require +recompilation (e.g., -fno-omit-frame-pointer) to work reliably. With RAS +emulation, tools can ask CTR hardware to save call stacks even for +unmodified code._ + +_CTR RAS emulation has limitations. The CTR buffer will contain only partial stacks in cases where the call stack depth was greater than the CTR depth, CTR recording was enabled at a lower point in the call stack than main(), or where the CTR buffer was cleared since main()._ + +_The CTR stack may be corrupted in cases where calls and returns are not symmetric, such as with stack unwinding (e.g., setjmp/longjmp, C++ exceptions), where stale call entries may be left on the CTR stack, or user stack switching, where calls from multiple stacks may be intermixed._ +==== + +[NOTE] +[%unbreakable] +==== +_As described in <<Cycle Counting>>, +when CCV=1, the CC field provides the elapsed cycles since the prior CTR +entry was recorded. This introduces implementation challenges when +RASEMU=1 because, for each recorded call, there may have been several +recorded calls (and returns which “popped” them) since the prior +remaining call entry was recorded (see <<RAS (Return Address Stack) Emulation Mode>>). The implication is that returns that +pop a call entry not only do not reset the cycle counter, but instead +add the CC field from the popped entry to the counter. For simplicity, +an implementation may opt to record CCV=0 for all calls, or those whose parent call was popped, when RASEMU=1._ +==== + +==== Freeze + +When `sctrstatus`.FROZEN=1, transfer recording is inhibited. This bit can be set by hardware, as described below, or by software. + +When `sctrctl`.LCOFIFRZ=1 and a local-counter-overflow interrupt +(LCOFI) traps (as a result of an HPM counter overflow) to M-mode or to S-mode, `sctrstatus`.FROZEN is set by hardware. This inhibits CTR recording until software clears FROZEN. The LCOFI trap itself is not recorded. +[NOTE] +[%unbreakable] +==== +_Freeze on LCOFI ensures that the execution path leading to the sampled +instruction (`__x__epc`) is preserved, and that the local-counter-overflow +interrupt (LCOFI) and associated Interrupt Service Routine (ISR) do not +displace any recorded transfer history state. It is the responsibility +of the ISR to clear FROZEN before __x__RET, if continued control transfer +recording is desired._ + +_LCOFI refers only to architectural traps directly caused by a local counter overflow. If a local-counter-overflow interrupt is recognized without a trap, FROZEN is not automatically set. For instance, no freeze occurs if the LCOFI is pended while interrupts are masked, and software recognizes the LCOFI (perhaps by reading `stopi` or `sip`) and clears `sip`.LCOFIP before the trap is raised. As a result, some or all CTR history may be overwritten while handling the LCOFI. Such cases are expected to be very rare; for most usages (e.g., application profiling) privilege mode filtering is sufficient to ensure that CTR updates are inhibited while interrupts are handled in a more privileged mode._ +==== +Similarly, on a breakpoint exception that traps to M-mode or S-mode with `sctrctl`.BPFRZ=1, FROZEN is set by hardware. The breakpoint exception itself is not recorded. + +[NOTE] +[%unbreakable] +==== +_Breakpoint exception refers to synchronous exceptions with a cause value of Breakpoint (3), regardless of source (ebreak, c.ebreak, Sdtrig); it does not include entry into Debug Mode, even in cores where this is implemented as an exception._ +==== + +If the H extension is implemented, freeze behavior for LCOFIs and breakpoint exceptions that trap to VS-mode is determined by the LCOFIFRZ and BPFRZ values, respectively, in `vsctrctl`. This includes virtual LCOFIs pended by a hypervisor. + +[NOTE] +[%unbreakable] +==== +_When a guest uses the SBI Supervisor Software Events (SSE) extension, the LCOFI will trap to HS-mode, which will then invoke a registered VS-mode LCOFI handler routine. If `vsctrctl`.LCOFIFRZ=1, the HS-mode handler will need to emulate the freeze by setting `sctrstatus`.FROZEN=1 before invoking the registered handler routine._ +==== + + +=== Custom Extensions + +Any custom CTR extension must be associated with a non-zero value within the designated custom bits in `__x__ctrctl`. When the custom bits hold a non-zero value that enables a custom extension, the extension may alter standard CTR behavior, and may define new custom status fields within `sctrstatus` or the CTR <<_entry_registers, Entry Registers>>. All custom status fields, and standard status fields whose behavior is altered by the custom extension, must revert to standard behavior when the custom bits hold zero. This includes read-only 0 behavior for any bits undefined by any implemented standard extensions. diff --git a/src/smepmp.adoc b/src/smepmp.adoc index 0f602c5..a202532 100644 --- a/src/smepmp.adoc +++ b/src/smepmp.adoc @@ -13,16 +13,16 @@ Terms: * *PMP Entry*: A pair of ``pmpcfg[i]`` / ``pmpaddr[i]`` registers. * *PMP Rule*: The contents of a pmpcfg register and its associated pmpaddr register(s), that encode a valid protected physical memory region, where ``pmpcfg[i].A != OFF``, and if ``pmpcfg[i].A == TOR``, ``pmpaddr[i-1] < pmpaddr[i]``. -* *Ignored*: Any permissions set by a matching PMP rule are ignored, and _all_ accesses to the requested address range are allowed. -* *Enforced*: Only access types configured in the PMP rule matching the requested address range are allowed; failures will cause an access-fault exception. -* *Denied*: Any permissions set by a matching PMP rule are ignored, and _no_ accesses to the requested address range are allowed.; failures will cause an access-fault exception. +* *Ignored*: Any permissions set by a matching PMP rule are ignored, and _all_ accesses to the requested address range are allowed. +* *Enforced*: Only access types configured in the PMP rule matching the requested address range are allowed; failures will cause an access-fault exception. +* *Denied*: Any permissions set by a matching PMP rule are ignored, and _no_ accesses to the requested address range are allowed.; failures will cause an access-fault exception. * *Locked*: A PMP rule/entry where the ``pmpcfg.L`` bit is set. * *PMP reset*: A reset process where all PMP settings of the hart, including locked rules/settings, are re-initialized to a set of safe defaults, before releasing the hart (back) to the firmware / OS / application. ==== ==== Threat model -However, there are no such mechanisms available on Machine mode in the current (v1.11) Privileged Spec. It is not possible for a PMP rule to be *enforced* only on non-Machine modes and *denied* on Machine mode, to only allow access to a memory region by less-privileged modes. it is only possible to have a *locked* rule that will be *enforced* on all modes, or a rule that will be *enforced* on non-Machine modes and be *ignored* by Machine mode. So for any physical memory region which is not protected with a Locked rule, Machine mode has unlimited access, including the ability to execute it. +However, there are no such mechanisms available on Machine mode in the current (v1.11) Privileged Spec. It is not possible for a PMP rule to be *enforced* only on non-Machine modes and *denied* on Machine mode, to only allow access to a memory region by less-privileged modes. It is only possible to have a *locked* rule that will be *enforced* on all modes, or a rule that will be *enforced* on non-Machine modes and be *ignored* by Machine mode. So for any physical memory region which is not protected with a Locked rule, Machine mode has unlimited access, including the ability to execute it. Without being able to protect less-privileged modes from Machine mode, it is not possible to prevent the mentioned attack vector. This becomes even more important for RISC-V than on other architectures, since implementations are allowed where a hart only has Machine and User modes available, so the whole OS will run on Machine mode instead of the non-existent Supervisor mode. In such implementations the attack surface is greatly increased, and the same kind of attacks performed on Supervisor mode and mitigated through SMAP/SMEP, can be performed on Machine mode without any available mitigations. Even on implementations with Supervisor mode present attacks are still possible against the Firmware and/or the Secure Monitor running on Machine mode. @@ -42,7 +42,7 @@ Without being able to protect less-privileged modes from Machine mode, it is not Note that this feature is intended to be used as a debug mechanism, or as a temporary workaround during the boot process for simplifying software, and optimizing the allocation of memory and PMP rules. Using this functionality under normal operation, after the boot process is completed, should be avoided since it weakens the protection of _M-mode-only_ rules. Vendors who don’t need this functionality may hardwire this field to 0. ==== -. On ``mseccfg`` we introduce a field in bit 1 called *Machine-Mode alloWlist Policy (mseccfg.MMWP)*. This is a sticky bit, meaning that once set it cannot be unset until a *PMP reset*. When set it changes the default PMP policy for M-mode when accessing memory regions that don’t have a matching PMP rule, to *denied* instead of *ignored*. +. On ``mseccfg`` we introduce a field in bit 1 called *Machine-Mode Allowlist Policy (mseccfg.MMWP)*. This is a sticky bit, meaning that once set it cannot be unset until a *PMP reset*. When set it changes the default PMP policy for M-mode when accessing memory regions that don’t have a matching PMP rule, to *denied* instead of *ignored*. . On ``mseccfg`` we introduce a field in bit 0 called *Machine Mode Lockdown (mseccfg.MML)*. This is a sticky bit, meaning that once set it cannot be unset until a *PMP reset*. When ``mseccfg.MML`` is set the system's behavior changes in the following way: @@ -56,7 +56,7 @@ A _Shared-Region_ rule is *enforced* on all modes, with restrictions depending o + * A _Shared-Region_ rule where ``pmpcfg.L`` is not set can be used for sharing data between M-mode and S/U-mode, so is not executable. M-mode has read/write access to that region, and S/U-mode has read access if ``pmpcfg.X`` is not set, or read/write access if ``pmpcfg.X`` is set. + -* A _Shared-Region_ rule where ``pmpcfg.L`` is set can be used for sharing code between M-mode and S/U-mode, so is not writeable. Both M-mode and S/U-mode have execute access on the region, and M-mode also has read access if ``pmpcfg.X`` is set. The rule remains *locked* so that any further modifications to its associated configuration or address registers are ignored until a *PMP reset*, unless ``mseccfg.RLB`` is set. +* A _Shared-Region_ rule where ``pmpcfg.L`` is set can be used for sharing code between M-mode and S/U-mode, so is not writable. Both M-mode and S/U-mode have execute access on the region, and M-mode also has read access if ``pmpcfg.X`` is set. The rule remains *locked* so that any further modifications to its associated configuration or address registers are ignored until a *PMP reset*, unless ``mseccfg.RLB`` is set. + * The encoding ``pmpcfg.LRWX=1111`` can be used for sharing data between M-mode and S/U mode, where both modes only have read-only access to the region. The rule remains *locked* so that any further modifications to its associated configuration or address registers are ignored until a *PMP reset*, unless ``mseccfg.RLB`` is set. @@ -113,12 +113,12 @@ Also when ``mseccfg.MML`` is set, according to 4b it’s not possible to add a _ + [WARNING] ==== -*Be aware that RLB introduces a security vulnerability if left set after the boot process is over and in general it should be used with caution, even when used temporarily.* Having editable PMP rules in M-mode gives a false sense of security since it only takes a few malicious instructions to lift any PMP restrictions this way. It doesn’t make sense to have a security control in place and leave it unprotected. Rule Locking Bypass is only meant as a way to optimize the allocation of PMP rules, catch errors durring debugging, and allow the bootrom/firmware to register executable _Shared-Region_ rules. If developers / vendors have no use for such functionality, they should never set ``mseccfg.RLB`` and if possible hard-wire it to 0. In any case *RLB should be disabled and locked as soon as possible*. +*Be aware that RLB introduces a security vulnerability if left set after the boot process is over and in general it should be used with caution, even when used temporarily.* Having editable PMP rules in M-mode gives a false sense of security since it only takes a few malicious instructions to lift any PMP restrictions this way. It doesn’t make sense to have a security control in place and leave it unprotected. Rule Locking Bypass is only meant as a way to optimize the allocation of PMP rules, catch errors during debugging, and allow the bootrom/firmware to register executable _Shared-Region_ rules. If developers / vendors have no use for such functionality, they should never set ``mseccfg.RLB`` and if possible hard-wire it to 0. In any case *RLB should be disabled and locked as soon as possible*. ==== + [NOTE] ==== -If ``mseccfg.RLB`` is not used and left unset, it wil be locked as soon as a PMP rule/entry with the ``pmpcfg.L`` bit set is configured. +If ``mseccfg.RLB`` is not used and left unset, it will be locked as soon as a PMP rule/entry with the ``pmpcfg.L`` bit set is configured. ==== + [IMPORTANT] @@ -145,13 +145,13 @@ In order to support zero-copy transfers between M-mode and S/U-mode we need to e Although it’s possible to use ``mstatus.MPRV`` in M-mode to read/write data on an _S/U-mode-only_ region using general purpose registers for copying, this will happen with S/U-mode permissions, honoring any MMU restrictions put in place by S-mode. Of course it’s still possible for M-mode to tamper with the page tables and / or add _S/U-mode-only_ rules and bypass the protections put in place by S-mode but if an attacker has managed to compromise M-mode to such extent, no security guarantees are possible in any way. *Also note that the threat model we present here assumes buggy software in M-mode, not compromised software*. We considered disabling ``mstatus.MPRV`` but it seemed too much and out of scope. ==== + -_Shared-region_ rules can be used both for zero-copy data transfers and for sharing code segments. The latter may be used for example to allow S/U-mode to execute code by the vendor, that makes use of some vendor-specific ISA extension, without having to go through the firmware with an ecall. This is similar to the vDSO approach followed on Linux, that allows userspace code to execute kernel code without having to perform a system call. +_Shared-region_ rules can be used both for zero-copy data transfers and for sharing code segments. The latter may be used for example to allow S/U-mode to execute code by the vendor, that makes use of some vendor-specific ISA extension, without having to go through the firmware with an ecall. This is similar to the vDSO approach followed on Linux, that allows user space code to execute kernel code without having to perform a system call. + To make sure that shared data regions can’t be executed and shared code regions can’t be modified, the encoding changes the meaning of the ``pmpcfg.X bit``. In case of shared data regions, with the exception of the ``pmpcfg.LRWX=1111`` encoding, the ``pmpcfg.X`` bit marks the capability of S/U-mode to write to that region, so it’s not possible to encode an executable shared data region. In case of shared code regions, the ``pmpcfg.X`` bit marks the capability of M-mode to read from that region, and since ``pmpcfg.RW=01`` is used for encoding the shared region, it’s not possible to encode a shared writable code region. + [NOTE] ==== -For adding _Shared-region_ rules with executable privileges to share code segments between M-mode and S/U-mode, ``mseccfg.RLB`` needs to be implemented, or else such rules can only be added together with ``mseccfg.MML`` being set on *PMP Reset*. That's because the reserved encoding ``pmpcfg.RW=01`` being used for _Shared-region_ rules is only defined when ``mseccfg.MML`` is set, and 4b prevents the adition of rules with executable privileges on M-mode after ``mseccfg.MML`` is set unless ``mseccfg.RLB`` is also set. +For adding _Shared-region_ rules with executable privileges to share code segments between M-mode and S/U-mode, ``mseccfg.RLB`` needs to be implemented, or else such rules can only be added together with ``mseccfg.MML`` being set on *PMP Reset*. That's because the reserved encoding ``pmpcfg.RW=01`` being used for _Shared-region_ rules is only defined when ``mseccfg.MML`` is set, and 4b prevents the addition of rules with executable privileges on M-mode after ``mseccfg.MML`` is set unless ``mseccfg.RLB`` is also set. ==== + [NOTE] @@ -168,4 +168,3 @@ For encoding Shared-region rules initially we used one of the two reserved bits .. In case ``mseccfg.MMWP`` is not set, M-mode can still access and execute any region not covered by a PMP rule. Since we try to prevent M-mode from executing malicious code and since an attacker may manage to place code on some region not covered by PMP (e.g. a directly-addressable flash memory), we need to ensure that M-mode can only execute the code segments initialized during firmware / OS initialization. .. We are only using the encoding ``pmpcfg.RW=01`` together with ``mseccfg.MML``, if ``mseccfg.MML`` is not set the encoding remains usable for future use. - diff --git a/src/smstateen.adoc b/src/smstateen.adoc index c409a8e..2a810aa 100644 --- a/src/smstateen.adoc +++ b/src/smstateen.adoc @@ -59,9 +59,8 @@ level: And if the hypervisor extension is implemented, another set of CSRs is added: `hstateen0`, `hstateen1`, `hstateen2`, and `hstateen3`. -For RV32, the registers listed above are 32-bit, and for the machine-level and -hypervisor CSRs there is a corresponding set of high-half CSRs for the upper 32 -bits of each register: +For RV32, there are CSR addresses for accessing the upper 32 bits of +corresponding machine-level and hypervisor CSRs: `mstateen0h`, `mstateen1h`, `mstateen2h`, `mstateen3h`, `hstateen0h`, `hstateen1h`, `hstateen2h`, and `hstateen3h`. @@ -87,16 +86,16 @@ as with the `counteren` CSRs, when a `stateen` CSR prevents access to state by less-privileged levels, an attempt in one of those privilege modes to execute an instruction that would read or write the protected state raises an illegal instruction exception, or, if executing in VS or VU mode and the circumstances -for a virtual instruction exception apply, raises a virtual instruction -exception instead of an illegal instruction exception. +for a virtual-instruction exception apply, raises a virtual-instruction +exception instead of an illegal-instruction exception. When this extension is not implemented, all state added by an extension is accessible as defined by that extension. When a `stateen` CSR prevents access to state for a privilege mode, attempting to execute in that privilege mode an instruction that _implicitly_ updates the -state without reading it may or may not raise an illegal instruction or virtual -instruction exception. Such cases must be disambiguated by being explicitly +state without reading it may or may not raise an illegal-instruction or virtual-instruction +exception. Such cases must be disambiguated by being explicitly specified one way or the other. In some cases, the bits of the `stateen` CSRs will have a dual purpose as enables @@ -182,7 +181,9 @@ read-only). {bits: 1, name: 'C'}, {bits: 1, name: 'FCSR'}, {bits: 1, name: 'JVT'}, -{bits: 53, name: 'WPRI'}, +{bits: 51, name: 'WPRI'}, +{bits: 1, name: 'CTR'}, +{bits: 1, name: 'SRMCFG'}, {bits: 1, name: 'P1P13'}, {bits: 1, name: 'CONTEXT'}, {bits: 1, name: 'IMSIC'}, @@ -201,7 +202,9 @@ read-only). {bits: 1, name: 'C'}, {bits: 1, name: 'FCSR'}, {bits: 1, name: 'JVT'}, -{bits: 54, name: 'WPRI'}, +{bits: 51, name: 'WPRI'}, +{bits: 1, name: 'CTR'}, +{bits: 2, name: 'WPRI'}, {bits: 1, name: 'CONTEXT'}, {bits: 1, name: 'IMSIC'}, {bits: 1, name: 'AIA'}, @@ -223,8 +226,8 @@ read-only). ], config:{bits: 32, lanes: 2, hspace:1024}} .... -The C bit controls access to any and all custom state. This bit is not custom -state itself. The C bit of these registers is not custom state itself; it is a +The C bit controls access to any and all custom state. +The C bit of these registers is not custom state itself; it is a standard field of a standard CSR, either `mstateen0`, `hstateen0`, or `sstateen0`. @@ -243,8 +246,8 @@ the Zfinx and related extensions (Zdinx, etc.). Whenever `misa.F` = 1, FCSR bit of `mstateen0` is read-only zero (and hence read-only zero in `hstateen0` and `sstateen0` too). For convenience, when the `stateen` CSRs are implemented and `misa.F` = 0, then if the FCSR bit of a controlling `stateen0` CSR is zero, all -floating-point instructions cause an illegal instruction trap (or virtual -instruction trap, if relevant), as though they all access `fcsr`, regardless of +floating-point instructions cause an illegal-instruction exception (or virtual-instruction +exception, if relevant), as though they all access `fcsr`, regardless of whether they really do. The JVT bit controls access to the `jvt` CSR provided by the Zcmt extension. @@ -288,8 +291,8 @@ extension. The P1P13 bit in `mstateen0` controls access to the `hedelegh` introduced by Privileged Specification Version 1.13. -//The P1P14 bit in mstateen0 controls access to the srmcfg CSR introduced by -//Privileged Specification Version 1.14. +The SRMCFG bit in `mstateen0` controls access to the `srmcfg` CSR introduced by +the Ssqosid <<ssqosid>> extension. === Usage diff --git a/src/sscofpmf.adoc b/src/sscofpmf.adoc index 7e67a25..db1a45e 100644 --- a/src/sscofpmf.adoc +++ b/src/sscofpmf.adoc @@ -30,10 +30,10 @@ interrupt that is assigned to bit 13 in the mip/mie/sip/sie registers. The following bits are added to `mhpmevent`: -[cols="^1,^1,^1,^1,^1,^1",stripes=even,options="header"] +[cols="^1,^1,^1,^1,^1,^1,^1,^1",stripes=even,options="header"] |==== -|63 |62 |61 |60 |59 |58 -|OF |MINH |SINH |UINH |VSINH |VUINH +|63 |62 |61 |60 |59 |58 |57 |56 +|OF |MINH |SINH |UINH |VSINH |VUINH |_WPRI_ |_WPRI_ |==== [cols="15%,85%",options="header"] @@ -45,6 +45,8 @@ The following bits are added to `mhpmevent`: | UINH | If set, then counting of events in U-mode is inhibited | VSINH | If set, then counting of events in VS-mode is inhibited | VUINH | If set, then counting of events in VU-mode is inhibited +| _WPRI_ | Reserved +| _WPRI_ | Reserved |==== For each ``x``INH bit, if the associated privilege mode is not implemented, @@ -71,17 +73,20 @@ count overflow interrupt disable for the associated hpmcounter. Count overflow never results from writes to the mhpmcounter__n__ or mhpmevent__n__ registers, only from hardware increments of counter registers. -This "count overflow interrupt request" signal is treated as a standard local -interrupt that corresponds to bit 13 in the mip/mie/sip/sie registers. The -mip/sip LCOFIP and mie/sie LCOFIE bits are respectively the interrupt-pending -and interrupt-enable bits for this interrupt. ('LCOFI' represents 'Local Count -Overflow Interrupt'.) - -Generation of a "count overflow interrupt request" by an hpmcounter sets the -LCOFIP bit in the mip/sip registers and sets the associated OF bit. The mideleg -register controls the delegation of this interrupt to S-mode versus M-mode. The -LCOFIP bit is cleared by software before servicing the count overflow interrupt -resulting from one or more count overflows. +This count-overflow-interrupt-request signal is treated as a standard local +interrupt that corresponds to bit 13 in the `mip`/`mie`/`sip`/`sie` registers. +The `mip`/`sip` LCOFIP and `mie`/`sie` LCOFIE bits are, respectively, the +interrupt-pending and interrupt-enable bits for this interrupt. +('LCOFI' represents 'Local Count Overflow Interrupt'.) + +Generation of a count-overflow-interrupt request by an `hpmcounter` sets the +associated OF bit. +When an OF bit is set, it eventually, but not necessarily immediately, sets +the LCOFIP bit in the `mip`/`sip` registers. +The LCOFIP bit is cleared by software before servicing the count overflow +interrupt resulting from one or more count overflows. +The `mideleg` register controls the delegation of this interrupt to S-mode +versus M-mode. [NOTE] ==== diff --git a/src/ssdbltrp.adoc b/src/ssdbltrp.adoc index e2b1127..83a98bd 100644 --- a/src/ssdbltrp.adoc +++ b/src/ssdbltrp.adoc @@ -10,6 +10,6 @@ S/HS-mode. The Ssdbltrp extension adds the `menvcfg`.DTE (See <<sec:menvcfg>>) and the `sstatus`.SDT fields (See <<sstatus>>). If the hypervisor extension is additionally implemented, then the extension adds the `henvcfg`.DTE (See -<<sec:henvcfg>>) and the `vstatus`.SDT fields (See <<vstatus>>). +<<sec:henvcfg>>) and the `vsstatus`.SDT fields (See <<vsstatus>>). See <<supv-double-trap>> for the operational details. diff --git a/src/sstc.adoc b/src/sstc.adoc index 8198349..7b735da 100644 --- a/src/sstc.adoc +++ b/src/sstc.adoc @@ -17,17 +17,13 @@ overheads for emulating S/HS-mode timers and timer interrupt generation up in M-mode. Further, this extension adds a similar facility to the Hypervisor extension for VS-mode. -To make it easy to understand the deltas from the current Priv 1.11/1.12 specs, -this is written as the actual exact changes to be made to existing paragraphs -of Priv spec text (or additional paragraphs within the existing text). - The extension name is "Sstc" ('Ss' for Privileged arch and Supervisor-level extensions, and 'tc' for timecmp). This extension adds the S-level stimecmp CSR and the VS-level vstimecmp CSR. === Machine and Supervisor Level Additions -==== Supervisor Timer (`stimecmp`) Register +==== Supervisor Timer (`stimecmp`) Register This extension adds this new CSR. @@ -38,13 +34,15 @@ bits, while accesses to the `stimecmph` CSR access the high 32 bits of `stimecmp The CSR numbers for `stimecmp` / `stimecmph` are 0x14D / 0x15D (within the Supervisor Trap Setup block of CSRs). -A supervisor timer interrupt becomes pending - as reflected in the STIP bit in -the mip and sip registers - whenever time contains a value greater than or -equal to stimecmp, treating the values as unsigned integers. Writes to stimecmp -are guaranteed to be reflected in STIP eventually, but not necessarily -immediately. The interrupt remains posted until stimecmp becomes greater than -time - typically as a result of writing stimecmp. The interrupt will be taken -based on the standard interrupt enable and delegation rules. +A supervisor timer interrupt becomes pending, as reflected in the STIP bit in +the `mip` and `sip` registers whenever `time` contains a value greater than or +equal to `stimecmp`, treating the values as unsigned integers. +If the result of this comparison changes, it is guaranteed to be reflected in +STIP eventually, but not necessarily immediately. +The interrupt remains posted until `stimecmp` becomes greater than `time`, +typically as a result of writing `stimecmp`. +The interrupt will be taken based on the standard interrupt enable and +delegation rules. [NOTE] ==== @@ -66,7 +64,7 @@ existing S-mode software that uses this SEE facility, while new S-mode software takes advantage of stimecmp directly.) ==== -==== Machine Interrupt (`mip` and `mie`) Registers +==== Machine Interrupt (`mip` and `mie`) Registers This extension modifies the description of the STIP/STIE bits in these registers as follows: @@ -80,13 +78,13 @@ implemented, STIP is read-only in mip and reflects the supervisor-level timer interrupt signal resulting from stimecmp. This timer interrupt signal is cleared by writing `stimecmp` with a value greater than the current time value. -==== Supervisor Interrupt (`sip` and `sie`) Registers +==== Supervisor Interrupt (`sip` and `sie`) Registers This extension modifies the description of the STIP/STIE bits in these registers as follows: Bits `sip`.STIP and `sie`.STIE are the interrupt-pending and interrupt-enable bits -for supervisor level timer interrupts. If implemented, STIP is read-only in +for supervisor-level timer interrupts. If implemented, STIP is read-only in sip, and is either set and cleared by the execution environment (if `stimecmp` is not implemented), or reflects the timer interrupt signal resulting from `stimecmp` (if `stimecmp` is implemented). The `sip`.STIP bit, in response to timer @@ -94,14 +92,14 @@ interrupts generated by `stimecmp`, is set and cleared by writing `stimecmp` wit value that respectively is less than or equal to, or greater than, the current time value. -==== Machine Counter-Enable (`mcounteren`) Register +==== Machine Counter-Enable (`mcounteren`) Register This extension adds to the description of the TM bit in this register as follows: In addition, when the TM bit in the mcounteren register is clear, attempts to access the `stimecmp` or `vstimecmp` register while executing in a mode less -privileged than M will cause an illegal instruction exception. When this bit +privileged than M will cause an illegal-instruction exception. When this bit is set, access to the `stimecmp` or `vstimecmp` register is permitted in S-mode if implemented, and access to the `vstimecmp` register (via `stimecmp`) is permitted in VS-mode if implemented and not otherwise prevented by the TM bit in @@ -109,7 +107,7 @@ in VS-mode if implemented and not otherwise prevented by the TM bit in === Hypervisor Extension Additions -==== Virtual Supervisor Timer (`vstimecmp`) Register +==== Virtual Supervisor Timer (`vstimecmp`) Register This extension adds this new CSR. @@ -118,18 +116,20 @@ RV64 systems. In RV32 only, accesses to the `vstimecmp` CSR access the low 32 bits, while accesses to the `vstimecmph` CSR access the high 32 bits of vstimecmp. -The proposed CSR numbers for `vstimecmp` / `vstimecmph` are 0x24D / 0x25D (within +The CSR numbers for `vstimecmp` / `vstimecmph` are 0x24D / 0x25D (within the Virtual Supervisor Registers block of CSRs, and mirroring the CSR numbers for stimecmp/stimecmph). -A virtual supervisor timer interrupt becomes pending - as reflected in the -VSTIP bit in the `hip` register - whenever (`time` + `htimedelta`), truncated to 64 -bits, contains a value greater than or equal to `vstimecmp`, treating the values -as unsigned integers. Writes to `vstimecmp` and `htimedelta` are guaranteed to be -reflected in VSTIP eventually, but not necessarily immediately. The interrupt -remains posted until `vstimecmp` becomes greater than (`time` + `htimedelta`) - -typically as a result of writing `vstimecmp`. The interrupt will be taken based -on the standard interrupt enable and delegation rules while V=1. +A virtual supervisor timer interrupt becomes pending, as reflected in the +VSTIP bit in the `hip` register, whenever (`time` + `htimedelta`), truncated +to 64 bits, contains a value greater than or equal to `vstimecmp`, treating +the values as unsigned integers. +If the result of this comparison changes, it is guaranteed to be reflected in +VSTIP eventually, but not necessarily immediately. +The interrupt remains posted until `vstimecmp` becomes greater than (`time` ++ `htimedelta`), typically as a result of writing `vstimecmp`. +The interrupt will be taken based on the standard interrupt enable and +delegation rules while V=1. [NOTE] ==== @@ -141,7 +141,7 @@ ensures compatibility with existing guest VS-mode software that uses this SEE facility, while new VS-mode software takes advantage of vstimecmp directly.) ==== -==== Hypervisor Interrupt (`hvip`, `hip`, and `hie`) Registers +==== Hypervisor Interrupt (`hvip`, `hip`, and `hie`) Registers This extension modifies the description of the VSTIP/VSTIE bits in the hip/hie registers as follows: @@ -155,14 +155,14 @@ timer interrupts generated by `vstimecmp`, is set and cleared by writing than, the current (`time` + `htimedelta`) value. The `hip`.VSTIP bit remains defined while V=0 as well as V=1. -==== Hypervisor Counter-Enable (`hcounteren`) Register +==== Hypervisor Counter-Enable (`hcounteren`) Register This extension adds to the description of the TM bit in this register as follows: In addition, when the TM bit in the `hcounteren` register is clear, attempts to access the `vstimecmp` register (via stimecmp) while executing in VS-mode will -cause a virtual instruction exception if the same bit in `mcounteren` is set. +cause a virtual-instruction exception if the same bit in `mcounteren` is set. When this bit and the same bit in `mcounteren` are both set, access to the `vstimecmp` register (if implemented) is permitted in VS-mode. @@ -177,11 +177,11 @@ enables `vstimecmp` for VS-mode. These STCE bits are WARL and are hard-wired to when this extension is not implemented. When this extension is implemented and STCE in `menvcfg` is zero, an attempt to access `stimecmp` or `vstimecmp` in a -mode other than M-mode raises an illegal instruction exception, STCE in `henvcfg` +mode other than M-mode raises an illegal-instruction exception, STCE in `henvcfg` is read-only zero, and STIP in `mip` and `sip` reverts to its defined behavior as if this extension is not implemented. Further, if the H extension is implemented, then hip.VSTIP also reverts its defined behavior as if this extension is not implemented. But when STCE in `menvcfg` is one and STCE in `henvcfg` is zero, an attempt to access -`stimecmp` (really `vstimecmp`) when V = 1 raises a virtual instruction exception, +`stimecmp` (really `vstimecmp`) when V = 1 raises a virtual-instruction exception, and VSTIP in hip reverts to its defined behavior as if this extension is not implemented. diff --git a/src/supervisor.adoc b/src/supervisor.adoc index daecbc2..e482d4b 100644 --- a/src/supervisor.adoc +++ b/src/supervisor.adoc @@ -36,7 +36,7 @@ supervisor-level CSR descriptions. ==== [[sstatus]] -==== Supervisor Status (`sstatus`) Register +==== Supervisor Status (`sstatus`) Register The `sstatus` register is an SXLEN-bit read/write register formatted as shown in <<sstatusreg-rv32>> when SXLEN=32 @@ -149,6 +149,20 @@ and load and store effective addresses are taken modulo latexmath:[$2^{\text{UXLEN}}$]. For example, when UXLEN=32 and SXLEN=64, user-mode memory accesses reference the lowest 4 GiB of the address space. +Some HINT instructions are encoded as integer computational instructions that +overwrite their destination register with its current value, e.g., +`c.addi x8, 0`. +When such a HINT is executed with XLEN < SXLEN and bits SXLEN..XLEN of the +destination register not all equal to bit XLEN-1, it is implementation-defined +whether bits SXLEN..XLEN of the destination register are unchanged or are +overwritten with copies of bit XLEN-1. + +NOTE: This definition allows implementations to elide register write-back for +some HINTs, while allowing them to execute other HINTs in the same manner as +other integer computational instructions. +The implementation choice is observable only by S-mode with SXLEN > UXLEN; it +is invisible to U-mode. + [[sum]] ===== Memory Privilege in `sstatus` Register @@ -178,7 +192,7 @@ code with SUM clear; the few code segments that should access user memory can temporarily set SUM. The SUM mechanism does not avail S-mode software of permission to -execute instructions in user code pages. Legitimate uses cases for +execute instructions in user code pages. Legitimate use cases for execution from user memory in supervisor context are rare in general and nonexistent in POSIX environments. However, bugs in supervisors that lead to arbitrary code execution are much easier to exploit if the @@ -262,10 +276,10 @@ trap that may occur during the tail phase, where it restores critical state to return from a trap. The consequence of this specification is that if a critical error condition was -caused by a guest page-fault, then the GPA will not be available in `mtval2` +caused by a guest-page fault, then the GPA will not be available in `mtval2` when the double trap is delivered to M-mode. This condition arises if the HS-mode invokes a hypervisor virtual-machine load or store instruction when -`SDT` is 1 and the instruction raises a guest page-fault. The use of such an +`SDT` is 1 and the instruction raises a guest-page fault. The use of such an instruction in this phase of trap handling is not common. However, not recording the GPA is considered benign because, if required, it can still be obtained -- albeit with added effort -- through the process of walking the page tables. @@ -283,7 +297,7 @@ Additionally, the implementation of an SSE protocol can be considered as an optional measure to aid in the recovery from such critical errors. ==== -==== Supervisor Trap Vector Base Address (`stvec`) Register +==== Supervisor Trap Vector Base Address (`stvec`) Register The `stvec` register is an SXLEN-bit read/write register that holds trap vector configuration, consisting of a vector base address (BASE) and a @@ -297,6 +311,9 @@ physical address, subject to the following alignment constraints: the address must be 4-byte aligned, and MODE settings other than Direct might impose additional alignment constraints on the value in the BASE field. +Note that the CSR contains only bits XLEN-1 through 2 of the address BASE. +When used as an address, the lower two bits are filled with zeroes to obtain +an XLEN-bit address that is always aligned on a 4-byte boundary. [[stvec-mode]] .Encoding of `stvec` MODE field. @@ -305,7 +322,7 @@ field. |Value |Name |Description |0 + 1 + -≥2 +≥2 |Direct + Vectored |All exceptions set `pc` to BASE. + @@ -324,7 +341,7 @@ supervisor-mode timer interrupt (see <<scauses>>) causes the `pc` to be set to BASE+`0x14`. Setting MODE=Vectored may impose a stricter alignment constraint on BASE. -==== Supervisor Interrupt (`sip` and `sie`) Registers +==== Supervisor Interrupt (`sip` and `sie`) Registers The `sip` register is an SXLEN-bit read/write register containing information on pending interrupts, while `sie` is the corresponding @@ -395,7 +412,7 @@ implemented, SSIP is writable in `sip` and may also be set to 1 by a platform-specific interrupt controller. If the Sscofpmf extension is implemented, bits `sip`.LCOFIP and `sie`.LCOFIE -are the interrupt-pending and interrupt-enable bits for local counter-overflow +are the interrupt-pending and interrupt-enable bits for local-counter-overflow interrupts. LCOFIP is read-write in `sip` and reflects the occurrence of a local counter-overflow overflow interrupt request resulting from any of the @@ -443,7 +460,7 @@ the counter values. The implementation must provide a facility for scheduling timer interrupts in terms of the real-time counter, `time`. -==== Counter-Enable (`scounteren`) Register +==== Counter-Enable (`scounteren`) Register .Counter-enable (`scounteren`) register include::images/bytefield/scounteren.edn[] @@ -471,13 +488,15 @@ access a counter if the corresponding bits in `scounteren` and `mcounteren` are both set. ==== -==== Supervisor Scratch (`sscratch`) Register +==== Supervisor Scratch (`sscratch`) Register The `sscratch` CSR is an SXLEN-bit read/write register, dedicated for use by the supervisor. Typically, `sscratch` is used to hold a pointer to the hart-local supervisor context while the hart is executing -user code. At the beginning of a trap handler, `sscratch` is swapped -with a user register to provide an initial working register. +user code. +At the beginning of a trap handler, software normally uses a CSRRW +instruction to swap `sscratch` with an integer register to obtain an +initial working register. .Supervisor Scratch Register include::images/bytefield/sscratch.edn[] @@ -509,7 +528,7 @@ though it may be explicitly written by software. include::images/bytefield/epcreg.edn[] [[scause]] -==== Supervisor Cause (`scause`) Register +==== Supervisor Cause (`scause`) Register The `scause` CSR is an SXLEN-bit read-write register formatted as shown in <<scausereg>>. When a trap is taken into @@ -563,7 +582,7 @@ Supervisor external interrupt + _Reserved_ + Counter-overflow interrupt + _Reserved_ + -_Designated for platform use_ +_Designated for platform use_ |0 + 0 + @@ -584,6 +603,9 @@ _Designated for platform use_ 0 + 0 + 0 + +0 + +0 + +0 + 0 |0 + 1 + @@ -630,7 +652,7 @@ _Reserved_ + _Designated for custom use_ + _Reserved_ + _Designated for custom use_ + -_Reserved_ +_Reserved_ |=== ==== Supervisor Trap Value (`stval`) Register @@ -678,7 +700,7 @@ will contain the shortest of: The value loaded into `stval` on an illegal-instruction exception is right-justified and all unused upper bits are cleared to zero. -On a trap caused by a software check exception, the `stval` register holds the +On a trap caused by a software-check exception, the `stval` register holds the cause for the exception. The following encodings are defined: * 0 - No information provided. @@ -697,7 +719,8 @@ instruction bits is implemented, `stval` must also be able to hold all values less than latexmath:[$2^N$], where latexmath:[$N$] is the smaller of SXLEN and ILEN. -==== Supervisor Environment Configuration (`senvcfg`) Register +[[sec:senvcfg]] +==== Supervisor Environment Configuration (`senvcfg`) Register The `senvcfg` CSR is an SXLEN-bit read/write register, formatted as shown in <<senvcfg>>, that controls certain @@ -728,7 +751,7 @@ characteristics of the U-mode execution environment. {bits: 1, name: 'FIOM'}, {bits: 1, name: 'WPRI'}, {bits: 1, name: 'LPE'}, - {bits: 1, name: 'WPRI'}, + {bits: 1, name: 'SSE'}, {bits: 2, name: 'CBIE'}, {bits: 1, name: 'CBCFE'}, {bits: 1, name: 'CBZE'}, @@ -801,17 +824,12 @@ to implement that we consider it worth supporting even if only rarely enabled. ==== -The definition of the CBZE field will be furnished by the forthcoming -Zicboz extension. Its allocation within `senvcfg` may change prior to -the ratification of that extension. +The definition of the CBZE field is furnished by the Zicboz extension. -The definitions of the CBCFE and CBIE fields will be furnished by the -forthcoming Zicbom extension. Their allocations within `senvcfg` may -change prior to the ratification of that extension. +The definitions of the CBCFE and CBIE fields are furnished by the Zicbom +extension. -The definition of the PMM field will be furnished by the forthcoming -Ssnpm extension. Its allocation within `senvcfg` may change prior to the -ratification of that extension. +The definition of the PMM field is furnished by the Ssnpm extension. The Zicfilp extension adds the `LPE` field in `senvcfg`. When the `LPE` field is set to 1, the Zicfilp extension is enabled in VU/U-mode. When the `LPE` field is @@ -829,7 +847,7 @@ rules apply: * 32-bit Zicfiss instructions will revert to their behavior as defined by Zimop. * 16-bit Zicfiss instructions will revert to their behavior as defined by Zcmop. * When `menvcfg.SSE` is one, `SSAMOSWAP.W/D` raises an illegal-instruction - exception in U-mode and a virtual instruction exception in VU-mode. + exception in U-mode and a virtual-instruction exception in VU-mode. [[satp]] ==== Supervisor Address Translation and Protection (`satp`) Register @@ -1096,13 +1114,6 @@ If the value held in _rs1_ is not a valid virtual address, then the SFENCE.VMA instruction has no effect. No exception is raised in this case. -When __rs2__≠``x0``, bits SXLEN-1:ASIDMAX of the value held -in _rs2_ are reserved for future standard use. Until their use is -defined by a standard extension, they should be zeroed by software and -ignored by current implementations. Furthermore, if -ASIDLEN<ASIDMAX, the implementation shall ignore bits -ASIDMAX-1:ASIDLEN of the value held in _rs2_. - [NOTE] ==== It is always legal to over-fence, e.g., by fencing only based on a @@ -1114,6 +1125,13 @@ choice not to raise an exception when an invalid virtual address is held in _rs1_ facilitates this type of simplification. ==== +When __rs2__≠``x0``, bits SXLEN-1:ASIDMAX of the value held +in _rs2_ are reserved for future standard use. Until their use is +defined by a standard extension, they should be zeroed by software and +ignored by current implementations. Furthermore, if +ASIDLEN<ASIDMAX, the implementation shall ignore bits +ASIDMAX-1:ASIDLEN of the value held in _rs2_. + An implicit read of the memory-management data structures may return any translation for an address that was valid at any time since the most recent SFENCE.VMA that subsumes that address. The ordering implied by @@ -1169,7 +1187,7 @@ without the need to execute an SFENCE.VMA instruction. Changing immediately, without the need to execute an SFENCE.VMA instruction. Likewise, changes to `satp`.ASID take effect immediately. -[TIP] +[NOTE] ==== The following common situations typically require executing an SFENCE.VMA instruction: @@ -1422,15 +1440,16 @@ Two schemes to manage the A and D bits are defined: accesses. + + The PTE update must be atomic with respect to other accesses to the - PTE, and must atomically perform all tablewalk checks for that leaf + PTE, and must atomically perform all page-table walk checks for that leaf PTE as part of, and before, conditionally updating the PTE value. Updates of the A bit may be performed as a result of speculation, even if the associated memory access ultimately is not performed architecturally. However, updates to the D bit, resulting from an explicit store, must be exact (i.e., non-speculative), and observed in program order by the local hart. When two-stage address translation is - active, updates of the D bit in G-stage PTEs may be performed as a - result of speculative updates of the A bit in VS-stage PTEs. + + active, updates to the D bit in G-stage PTEs may be performed by an + implicit access to a VS-stage PTE, if the G-stage PTE provides write + permission, before any speculative access to the VS-stage PTE. + + The PTE update must appear in the global memory order before the memory access that caused the PTE update and before any subsequent @@ -1519,14 +1538,24 @@ A virtual address _va_ is translated into a physical address _pa_ as follows: . Let _a_ be ``satp``.__ppn__×PAGESIZE, and let __i__=LEVELS-1. (For Sv32, PAGESIZE=2^12^ and LEVELS=2.) The `satp` register must be _active_, i.e., the effective privilege mode must be S-mode or U-mode. + . Let _pte_ be the value of the PTE at address __a__+__va__.__vpn__[__i__]×PTESIZE. (For Sv32, PTESIZE=4.) If accessing _pte_ violates a PMA or PMP check, raise an access-fault exception corresponding to the original access type. + . If _pte_._v_=0, or if _pte_._r_=0 and _pte_._w_=1, or if any bits or encodings that are reserved for future standard use are set within _pte_, stop and raise a page-fault exception corresponding to the original access type. + . Otherwise, the PTE is valid. If __pte__.__r__=1 or __pte__.__x__=1, go to step 5. Otherwise, this PTE is a pointer to the next level of the page table. Let __i=i__-1. If __i__<0, stop and raise a page-fault exception corresponding to the original access type. Otherwise, let __a__=__pte__.__ppn__×PAGESIZE and go to step 2. -. A leaf PTE has been found. Determine if the requested memory access is -allowed by the _pte_._r_, _pte_._w_, _pte_._x_, and _pte_._u_ bits, given the current privilege mode and the value of the SUM and MXR fields of the `mstatus` register. If not, stop and raise a page-fault exception corresponding to the original access type. -. If _i>0_ and _pte_._ppn_[__i__-1:0] ≠ 0, this is a misaligned superpage; stop and raise a page-fault exception corresponding to the original access type. + +. A leaf PTE has been reached. If _i>0_ and _pte_._ppn_[__i__-1:0] ≠ 0, this is a misaligned superpage; stop and raise a page-fault exception corresponding to the original access type. + +. Determine if the requested memory access is allowed by the _pte_._u_ bit, given the current privilege mode and the value of the SUM and MXR fields of the *mstatus* register. If not, stop and raise a page-fault exception corresponding to the original access type. + +. Determine if the requested memory access is allowed by the _pte_._r_, _pte_._w_, and _pte_._x_ bits, given the Shadow Stack Memory Protection rules. If not, stop and raise an access-fault exception. + +. Determine if the requested memory access is allowed by the _pte_._r_, _pte_._w_, and _pte_._x_ bits. If not, stop and raise a page-fault exception corresponding to the original access type. + . If _pte_._a_=0, or if the original memory access is a store and _pte_._d_=0: + * If the Svade extension is implemented, stop and raise a page-fault exception corresponding to the original access type. * If a store to _pte_ would violate a PMA or PMP check, raise an access-fault exception corresponding to the original access @@ -1536,6 +1565,7 @@ type. ** If the values match, set _pte_._a_ to 1 and, if the original memory access is a store, also set _pte_._d_ to 1. ** If the comparison fails, return to step 2. + . The translation is successful. The translated physical address is given as follows: * _pa.pgoff_ = _va.pgoff_. @@ -1565,7 +1595,7 @@ _global_ mapping. To ensure that implicit reads observe writes to the same memory locations, an SFENCE.VMA instruction must be executed after the writes to flush the relevant cached translations. -The address-translation cache cannot be used in step 7; accessed and +The address-translation cache cannot be used in step 9; accessed and dirty bits may only be updated in memory directly. [NOTE] ==== @@ -1625,7 +1655,7 @@ Although it would be uncommon to place page tables in non-idempotent memory, there is no explicit prohibition against doing so. Since the algorithm may only touch page tables reachable from the root page table indicated in `satp`, the range of addresses that an implementation's -page table walker will touch is fully under supervisor control. +page-table walker will touch is fully under supervisor control. *** @@ -1839,7 +1869,7 @@ translations with the same values for PTE bits 5–0. Such ranges must be of a naturally aligned power-of-2 (NAPOT) granularity larger than the base page size. -The Svnapot extension depends on Sv39. +The Svnapot extension depends on the Sv39 extension. [[ptenapot]] .Page table entry encodings when __pte__.N=1 @@ -1884,7 +1914,7 @@ __vpn__[__i__][__pte__.__napot_bits__-1:0]. If the encoding in _pte_ is reserved <<ptenapot>>, then a page-fault exception must be raised. * Implicit reads of NAPOT page table entries may create address-translation cache entries mapping -_a_ + _j_×PTESIZE to a copy of _pte_ in which _pte_._ppn_[_i_][_pte_.__napot_bits__-1:0] +_a_ + _j_×PTESIZE to a copy of _pte_ in which _pte_._ppn_[_i_][_pte_.__napot_bits__-1:0] is replaced by _vpn[i][pte.napot_bits_-1:0], for any or all _j_ such that __j__ >> __napot_bits__ = __vpn__[__i__] >> __napot_bits__, all for the address space identified in _satp_ as loaded by step 1. @@ -1980,7 +2010,7 @@ __ 1 + 2 + ... -|=== +|=== In such a case, an implementation may or may not support all options. The discoverability mechanism for this extension would be extended to @@ -2004,7 +2034,7 @@ the use of page-based memory types that override the PMA(s) for the associated memory pages. The encoding for the PBMT bits is captured in <<pbmt>>. -The Svpbmt extension depends on Sv39. +The Svpbmt extension depends on the Sv39 extension. [[pbmt]] .Encodings for PBMT field in Sv39, Sv48, and Sv57 PTEs. @@ -2119,6 +2149,11 @@ bits override the intermediate attributes to produce the final set of attributes used by accesses to the page in question. Otherwise, the intermediate attributes are used as the final set of attributes. +NOTE: These final attributes apply to implicit and explicit accesses that +are subject to both stages of address translation. +For accesses that are not subject to the first stage of address translation, +e.g. VS-stage page-table accesses, the intermediate attributes apply instead. + [[svinval]] == "Svinval" Extension for Fine-Grained Address-Translation Cache Invalidation, Version 1.0 @@ -2177,7 +2212,7 @@ or HS-mode when `mstatus`.TVM=1 also raises an illegal-instruction exception. An attempt to execute HINVAL.VVMA or HINVAL.GVMA in VS-mode or VU-mode, or to execute SINVAL.VMA in VU-mode, raises a virtual-instruction exception. When `hstatus`.VTVM=1, an attempt to execute -SINVAL.VMA in VS-mode also raises a virtual instruction exception. +SINVAL.VMA in VS-mode also raises a virtual-instruction exception. Attempting to execute SFENCE.W.INVAL or SFENCE.INVAL.IR in U-mode raises an illegal-instruction exception. @@ -2227,33 +2262,35 @@ exceptions when A/D bits need be set, instead takes effect. The Svade extension is also defined in <<translation>>. [[sec:svvptc]] -== "Svvptc" Extension for Eliding Memory-Management Fences on Making PTEs Valid, Version 1.0 +== "Svvptc" Extension for Obviating Memory-Management Instructions after Marking PTEs Valid, Version 1.0 -When the Svvptc extension is implemented, explicit stores that update the Valid -bit of leaf and/or non-leaf PTEs from 0 to 1 and are visible to a hart will -eventually become visible within a bounded timeframe to subsequent implicit +When the Svvptc extension is implemented, explicit stores by a hart that update +the Valid bit of leaf and/or non-leaf PTEs from 0 to 1 and are visible to a hart +will eventually become visible within a bounded timeframe to subsequent implicit accesses by that hart to such PTEs. [NOTE] ==== -Typically, PTEs are marked as Valid by the operating system following a -page-fault exception or during system calls for memory mapping. In such cases, -the trap handler commonly employs an `SRET` instruction to return from the trap. -When Svvptc is implemented, the stores it executes to change the Valid bit -of the PTEs from 0 to 1 then become visible to implicit references to those PTEs -within a bounded timeframe. This visibility pertains to the instructions like -the one causing the page fault or those accessing new memory regions. A -memory-management fence can be used to force immediate visibility of these PTE -updates to all implicit references associated with instructions following the -memory-management fence. However, when Svvptc is implemented, visibility (in a -bounded amount of time) is guaranteed and use of a memory-management fence is -not required in these scenarios. While this approach might lead to an occasional -gratuitous page-fault, the performance benefit of omitting the memory-management -fence instructions outweighs the occasional cost of a gratuitous page fault. -==== - -//// -[[sec:ssqosid]] +Svvptc relieves an operating system from executing certain memory-management +instructions, such as `SFENCE.VMA` or `SINVAL.VMA`, which would normally be used +to synchronize the hart's address-translation caches when a memory-resident PTE +is changed from Invalid to Valid. Synchronizing the hart's address-translation +caches with other forms of updates to a memory-resident PTE, including when a +PTE is changed from Valid to Invalid, requires the use of suitable +memory-management instructions. Svvptc guarantees that a change to a PTE from +Invalid to Valid is made visible within a bounded time, thereby making the +execution of these memory-management instructions redundant. The performance +benefit of eliding these instructions outweighs the cost of an occasional +gratuitous additional page fault that may occur. + +Depending on the microarchitecture, some possible ways to facilitate +implementation of Svvptc include: not having any address-translation caches, not +storing Invalid PTEs in the address-translation caches, automatically evicting +Invalid PTEs using a bounded timer, or making address-translation caches +coherent with store instructions that modify PTEs. +==== + +[[ssqosid]] == "Ssqosid" Extension for Quality-of-Service (QoS) Identifiers, Version 1.0 Quality of Service (QoS) is defined as the minimal end-to-end performance @@ -2335,11 +2372,11 @@ modes of software execution on that hart by default, but this behavior may be overridden by future extensions. If extension Smstateen is implemented together with Ssqosid, then Ssqosid also -requires the bit 55 in `mstateen0` introduced by Priv 1.14 to be implemented. If -bit 55 of `mstateen0` is 0, attempts to access `srmcfg` in privilege modes less -privileged than M-mode raise an illegal-instruction exception. If bit 55 of -`mstateen0` is 1 or if extension Smstateen is not implemented, attempts to -access `srmcfg` when `V=1` raise a virtual-instruction exception. +requires the SRMCFG bit in `mstateen0` to be implemented. +If `mstateen0`.SRMCFG is 0, attempts to access `srmcfg` in privilege modes +less privileged than M-mode raise an illegal-instruction exception. +If `mstateen0`.SRMCFG is 1 or if extension Smstateen is not implemented, +attempts to access `srmcfg` when `V=1` raise a virtual-instruction exception. [NOTE] ==== @@ -2375,4 +2412,3 @@ the new context, it switches to the new VM's `srmcfg`. The supervisor can also use a separate configuration for execution not to be attributed to either contexts. ==== -//// diff --git a/src/svgnam.def b/src/svgnam.def index a947636..dcd8115 100644 --- a/src/svgnam.def +++ b/src/svgnam.def @@ -5,17 +5,17 @@ %% The original source files were: %% %% xcolor.dtx (with options: `svgnames') -%% +%% %% IMPORTANT NOTICE: -%% +%% %% For the copyright see the source file. -%% +%% %% Any modified versions of this file must be renamed %% with new filenames distinct from svgnam.def. -%% +%% %% For distribution of the original source see the terms %% for copying and modification in the file xcolor.dtx. -%% +%% %% This generated file may be distributed as long as the %% original source files, as listed above, are part of the %% same distribution. (The sources need not necessarily be diff --git a/src/unpriv-cfi.adoc b/src/unpriv-cfi.adoc index 1615d62..0bec558 100644 --- a/src/unpriv-cfi.adoc +++ b/src/unpriv-cfi.adoc @@ -40,22 +40,22 @@ return from procedure if `rs1` is a conventional link register (i.e. `x1` or `x5`); else it is an indirect jump. The term _call_ is used to refer to a `JAL` or `JALR` instruction with a link -register as destination, i.e., `rd != x0`. Conventionally, the link register is +register as destination, i.e., _rd_≠`x0`. Conventionally, the link register is `x1` or `x5`. A _call_ using `JAL` or `C.JAL` is termed a direct call. A `C.JALR` expands to `JALR x1, 0(rs1)` and is a _call_. A _call_ using `JALR` or `C.JALR` is termed an _indirect-call_. -The term _return_ is used to refer to a `JALR` instruction with `rd == x0` and -with `rs1 == x1` or `rs1 == x5` and `rd == x0`. A `C.JR` instruction expands to -`JALR x0, 0(rs1)` and is a _return_ if `rs1 == x1` or `rs1 == x5`. +The term _return_ is used to refer to a `JALR` instruction with _rd_=`x0` and +with _rs1_=`x1` or _rs1_=`x5`. A `C.JR` instruction expands to +`JALR x0, 0(rs1)` and is a _return_ if _rs1_=`x1` or _rs1_=`x5`. -The term _indirect-jump_ is used to refer to a `JALR` instruction with `rd == x0` -and where the `rs1` is not `x1` or `x5` (i.e., not a return). A `C.JR` -instruction where `rs1` is not `x1` or `x5` (i.e., not a return) is an +The term _indirect-jump_ is used to refer to a `JALR` instruction with _rd_=`x0` +and where the _rs1_ is not `x1` or `x5` (i.e., not a return). A `C.JR` +instruction where _rs1_ is not `x1` or `x5` (i.e., not a return) is an _indirect-jump_. The Zicfiss and Zicfilp extensions build on these conventions and hints and -provide backward-edge and forward-edge control flow integrity respectively. +provide backward-edge and forward-edge control flow integrity respectively. The Unprivileged ISA for Zicfilp extension is specified in <<unpriv-forward>> and for the Unprivileged ISA for Zicfiss extension is specified in @@ -69,7 +69,7 @@ To enforce forward-edge control-flow integrity, the Zicfilp extension introduces a landing pad (`LPAD`) instruction. The `LPAD` instruction must be placed at the program locations that are valid targets of indirect jumps or calls. The `LPAD` instruction (See <<LP_INST>>) is encoded using the `AUIPC` major opcode with -`rd=x0`. +_rd_=`x0`. Compilers emit a landing pad instruction as the first instruction of an address-taken function, as well as at any indirect jump targets. A landing pad @@ -337,15 +337,15 @@ the shadow stack are compared. A mismatch of the two values is indicative of a subversion of the return address control variable and causes a software-check exception. -The Zicfiss instructions are encoded using a subset of May-Be-Operation -instructions defined by the Zimop and Zcmop extensions. This subset -of instructions revert to their Zimop/Zcmop defined behavior when the Zicfiss -extension is not implemented or if the extension has not been activated. A -program that is built with Zicfiss instructions can thus continue to operate -correctly, but without backward-edge control-flow integrity, on processors that -do not support the Zicfiss extension or if the Zicfiss extension is not active. -The Zicfiss extension may be activated for use individually and independently -for each privilege mode. +The Zicfiss instructions, except `SSAMOSWAP.W/D`, are encoded using a subset of +May-Be-Operation instructions defined by the Zimop and Zcmop extensions. +This subset of instructions revert to their Zimop/Zcmop defined behavior when +the Zicfiss extension is not implemented or if the extension has not been +activated. A program that is built with Zicfiss instructions can thus continue +to operate correctly, but without backward-edge control-flow integrity, on +processors that do not support the Zicfiss extension or if the Zicfiss extension +is not active. The Zicfiss extension may be activated for use individually and +independently for each privilege mode. Compilers should flag each object file (for example, using flags in the ELF attributes) to indicate if the object file has been compiled with the Zicfiss @@ -554,7 +554,8 @@ that uses shadow stacks is as follows: : ld x1,(sp) # pop link register x1 from regular stack addi sp,sp,8 - sspopchk x1 # fault if x1 not equal to shadow return address + sspopchk x1 # fault if x1 not equal to shadow + # return address ret ---- @@ -655,7 +656,7 @@ register. ], config:{lanes: 1, hspace:1024}} .... -Encoding `rd` as `x0` is not supported for `SSRDP`. +Encoding _rd_ as `x0` is not supported for `SSRDP`. The operation of the `SSRDP` instructions is as follows: @@ -709,7 +710,8 @@ stack instructions to unwind a shadow stack. This example assumes that the `setjmp` function itself does not push on to the shadow stack (being a leaf function, it is not required to). -[listing] +[source,c] +---- setjmp() { : : @@ -740,7 +742,7 @@ longjmp() { back_cfi_not_active: : } -==== +---- <<< @@ -772,14 +774,14 @@ data values. ---- if privilege_mode != M && menvcfg.SSE == 0 raise illegal-instruction exception - if S-mode not implemented + else if S-mode not implemented raise illegal-instruction exception else if privilege_mode == U && senvcfg.SSE == 0 raise illegal-instruction exception else if privilege_mode == VS && henvcfg.SSE == 0 - raise virtual instruction exception + raise virtual-instruction exception else if privilege_mode == VU && senvcfg.SSE == 0 - raise virtual instruction exception + raise virtual-instruction exception else X(rd) = mem[X(rs1)] mem[X(rs1)] = X(rs2) @@ -796,14 +798,14 @@ address in `rs1`. ---- if privilege_mode != M && menvcfg.SSE == 0 raise illegal-instruction exception - if S-mode not implemented + else if S-mode not implemented raise illegal-instruction exception else if privilege_mode == U && senvcfg.SSE == 0 raise illegal-instruction exception else if privilege_mode == VS && henvcfg.SSE == 0 - raise virtual instruction exception + raise virtual-instruction exception else if privilege_mode == VU && senvcfg.SSE == 0 - raise virtual instruction exception + raise virtual-instruction exception else temp[31:0] = mem[X(rs1)] X(rd) = SignExtend(temp[31:0]) diff --git a/src/v-st-ext.adoc b/src/v-st-ext.adoc index b8cd859..ea446b7 100644 --- a/src/v-st-ext.adoc +++ b/src/v-st-ext.adoc @@ -15,7 +15,7 @@ This spec includes the complete set of currently frozen vector instructions. Other instructions that have been considered during development but are not present in this document are not included in the review and ratification process, and may be completely revised or -abandoned. Section <<sec-vector-extensions>> lists the standard +abandoned. <<sec-vector-extensions>> lists the standard vector extensions and which instructions and element widths are supported by each extension. @@ -27,7 +27,7 @@ Each hart supporting a vector extension defines two parameters: must be a power of 2. . The number of bits in a single vector register, _VLEN_ {ge} ELEN, which must be a power of 2, and must be no greater than 2^16^. -Standard vector extensions (Section <<sec-vector-extensions>>) and +Standard vector extensions (<<sec-vector-extensions>>) and architecture profiles may set further constraints on _ELEN_ and _VLEN_. NOTE: Future extensions may allow ELEN {gt} VLEN by holding one @@ -65,7 +65,7 @@ base scalar RISC-V ISA. |=== | Address | Privilege | Name | Description -| 0x008 | URW | vstart | Vector start position +| 0x008 | URW | vstart | Vector start element index | 0x009 | URW | vxsat | Fixed-Point Saturate Flag | 0x00A | URW | vxrm | Fixed-Point Rounding Mode | 0x00F | URW | vcsr | Vector control and status register @@ -156,7 +156,7 @@ The `vtype` register has five fields, `vill`, `vma`, `vta`, `vsew[2:0]`, and `vlmul[2:0]`. Bits `vtype[XLEN-2:8]` should be written with zero, and non-zero values in this field are reserved. -include::images/wavedrom/vtype-format.adoc[] +include::images/wavedrom/vtype-format.edn[] NOTE: A small implementation supporting ELEN=32 requires only seven bits of state in `vtype`: two bits for `ma` and `ta`, two bits for @@ -228,6 +228,7 @@ will run. The `vill` bit in `vtype` should be checked after setting code path should be provided if it is not. Alternatively, a profile can mandate the minimum SEW at each LMUL setting. +[[vector-register-grouping]] ===== Vector Register Grouping (`vlmul[2:0]`) Multiple vector registers can be grouped together, so that a single @@ -349,7 +350,7 @@ These two bits modify the behavior of destination tail elements and destination inactive masked-off elements respectively during the execution of vector instructions. The tail and inactive sets contain element positions that are not receiving new results during a vector -operation, as defined in Section <<sec-inactive-defs>>. +operation, as defined in <<sec-inactive-defs>>. All systems must support all four options: @@ -477,7 +478,7 @@ instruction variants. The `vl` register holds an unsigned integer specifying the number of elements to be updated with results from a vector instruction, as -further detailed in Section <<sec-inactive-defs>>. +further detailed in <<sec-inactive-defs>>. NOTE: The number of bits implemented in `vl` depends on the implementation's maximum vector length of the smallest supported @@ -501,7 +502,7 @@ settings which require them to be saved and restored. The _XLEN_-bit-wide read-write `vstart` CSR specifies the index of the first element to be executed by a vector instruction, as described in -Section <<sec-inactive-defs>>. +<<sec-inactive-defs>>. Normally, `vstart` is only written by hardware on a trap on a vector instruction, with the `vstart` value representing the element on which @@ -544,7 +545,7 @@ instruction exception as defined below. NOTE: Making `vstart` visible to unprivileged code supports user-level threading libraries. -Implementations are permitted to raise illegal instruction exceptions when +Implementations are permitted to raise illegal-instruction exceptions when attempting to execute a vector instruction with a value of `vstart` that the implementation can never produce when executing that same instruction with the same `vtype` setting. @@ -552,7 +553,7 @@ the same `vtype` setting. NOTE: For example, some implementations will never take interrupts during execution of a vector arithmetic instruction, instead waiting until the instruction completes to take the interrupt. Such implementations are -permitted to raise an illegal instruction exception when attempting to execute +permitted to raise an illegal-instruction exception when attempting to execute a vector arithmetic instruction when `vstart` is nonzero. NOTE: When migrating a software thread between two harts with @@ -591,7 +592,7 @@ mode as specified in the following table. | 0 | 0 | rnu | round-to-nearest-up (add +0.5 LSB) | `v[d-1]` | 0 | 1 | rne | round-to-nearest-even | `v[d-1] & (v[d-2:0]{ne}0 \| v[d])` -| 1 | 0 | rdn | round-down (truncate) | `0` +| 1 | 0 | rdn | round-down | `0` | 1 | 1 | rod | round-to-odd (OR bits into LSB, aka "jam") | `!v[d] & v[d-1:0]{ne}0` |=== @@ -815,7 +816,7 @@ The following example shows four different packed element widths (8b, 16b, 32b, 64b) in a VLEN=128b implementation. The vector register grouping factor (LMUL) is increased by the relative element size such that each group can hold the same number of vector elements (VLMAX=8 -in this example) to simplify stripmining code. +in this example) to simplify strip-mining code. ---- Example VLEN=128b, with SEW/LMUL=16 @@ -837,9 +838,9 @@ v4*n+3 7 6 The following table shows each possible constant SEW/LMUL operating point for loops with mixed-width operations. Each column represents a constant SEW/LMUL operating point. Entries in table are the LMUL -values that yield that column's SEW/LMUL value for the datawidth on -that row. In each column, an LMUL setting for a datawidth indicates -that it can be aligned with the other datawidths in the same column +values that yield that column's SEW/LMUL value for the data width on +that row. In each column, an LMUL setting for a data width indicates +that it can be aligned with the other data widths in the same column that also have an LMUL setting, such that all have the same VLMAX. |=== @@ -878,11 +879,11 @@ floating-point load/store 12-bit immediate field to provide further vector instruction encoding, with bit 25 holding the standard vector mask bit (see <<sec-vector-mask-encoding>>). -include::images/wavedrom/vmem-format.adoc[] +include::images/wavedrom/vmem-format.edn[] -include::images/wavedrom/valu-format.adoc[] +include::images/wavedrom/valu-format.edn[] -include::images/wavedrom/vcfg-format.adoc[] +include::images/wavedrom/vcfg-format.edn[] Vector instructions can have scalar or vector source operands and produce scalar or vector results, and most vector instructions can be @@ -955,12 +956,16 @@ A destination vector register group can overlap a source vector register group only if one of the following holds: - The destination EEW equals the source EEW. -- The destination EEW is smaller than the source EEW and the overlap is in - the lowest-numbered part of the source register group (e.g., when LMUL=1, +- The destination EEW is smaller than the source EEW, and the lowest-numbered + register in the destination vector register group is the same as the the + lowest-numbered register in the source vector register group. + (For example, when LMUL=1, `vnsrl.wi v0, v0, 3` is legal, but a destination of `v1` is not). - The destination EEW is greater than the source EEW, the source EMUL is - at least 1, and the overlap is in the highest-numbered part of the - destination register group (e.g., when LMUL=8, `vzext.vf4 v0, v6` is legal, + at least 1, and the highest-numbered register in the destination + vector register group is the same as the highest-numbered register in + the source vector register group. + (For example, when LMUL=8, `vzext.vf4 v0, v6` is legal, but a source of `v0`, `v2`, or `v4` is not). For the purpose of determining register group overlap constraints, @@ -992,8 +997,8 @@ Masking is supported on many vector instructions. Element operations that are masked off (inactive) never generate exceptions. The destination vector register elements corresponding to masked-off elements are handled with either a mask-undisturbed or mask-agnostic -policy depending on the setting of the `vma` bit in `vtype` (Section -<<sec-agnostic>>). +policy depending on the setting of the `vma` bit in `vtype` +(<<sec-agnostic>>). The mask value used to control execution of a masked vector instruction is always supplied by vector register `v0`. @@ -1016,7 +1021,7 @@ Other vector registers can be used to hold working mask values, and mask vector logical operations are provided to perform predicate calculations. [[sec-mask-vector-logical]] -As specified in Section <<sec-agnostic>>, mask destination values are +As specified in <<sec-agnostic>>, mask destination tail elements are always treated as tail-agnostic, regardless of the setting of `vta`. [[sec-vector-mask-encoding]] @@ -1120,7 +1125,7 @@ VLMAX in the source vector register group. === Configuration-Setting Instructions (`vsetvli`/`vsetivli`/`vsetvl`) One of the common approaches to handling a large number of elements is -"stripmining" where each iteration of a loop handles some number of elements, +"strip mining" where each iteration of a loop handles some number of elements, and the iterations continue until all elements have been processed. The RISC-V vector specification provides direct, portable support for this approach. The application specifies the total number of elements to be processed (the application vector length or AVL) as a @@ -1143,11 +1148,11 @@ their arguments, and write the new value of `vl` into `rd`. vsetvl rd, rs1, rs2 # rd = new vl, rs1 = AVL, rs2 = new vtype value ---- -include::images/wavedrom/vcfg-format.adoc[] +include::images/wavedrom/vcfg-format.edn[] ==== `vtype` encoding -include::images/wavedrom/vtype-format.adoc[] +include::images/wavedrom/vtype-format.edn[] The new `vtype` value is encoded in the immediate fields of `vsetvli` and `vsetivli`, and in the `rs2` register for `vsetvl`. @@ -1163,13 +1168,13 @@ and `vsetivli`, and in the `rs2` register for `vsetvl`. mf8 # LMUL=1/8 mf4 # LMUL=1/4 mf2 # LMUL=1/2 - m1 # LMUL=1, assumed if m setting absent + m1 # LMUL=1 m2 # LMUL=2 m4 # LMUL=4 m8 # LMUL=8 Examples: - vsetvli t0, a0, e8, ta, ma # SEW= 8, LMUL=1 + vsetvli t0, a0, e8, m1, ta, ma # SEW= 8, LMUL=1 vsetvli t0, a0, e8, m2, ta, ma # SEW= 8, LMUL=2 vsetvli t0, a0, e32, mf2, ta, ma # SEW=32, LMUL=1/2 ---- @@ -1216,37 +1221,37 @@ fields as follows: [%autowidth,float="center",align="center",options="header"] |=== | `rd` | `rs1` | AVL value | Effect on `vl` -| - | !x0 | Value in `x[rs1]` | Normal stripmining +| - | !x0 | Value in `x[rs1]` | Normal strip mining | !x0 | x0 | ~0 | Set `vl` to VLMAX | x0 | x0 | Value in `vl` register | Keep existing `vl` (of course, `vtype` may change) |=== -When `rs1` is not `x0`, the AVL is an unsigned integer held in the `x` -register specified by `rs1`, and the new `vl` value is also written to -the `x` register specified by `rd`. +When _rs1_ is not `x0`, the AVL is an unsigned integer held in the `x` +register specified by _rs1_, and the new `vl` value is also written to +the `x` register specified by _rd_. -When `rs1=x0` but `rd!=x0`, the maximum unsigned integer value (`~0`) +When _rs1_=`x0` but _rd_≠`x0`, the maximum unsigned integer value (`~0`) is used as the AVL, and the resulting VLMAX is written to `vl` and also to the `x` register specified by `rd`. -When `rs1=x0` and `rd=x0`, the instruction operates as if the current +When _rs1_=`x0` and _rd_=`x0`, the instructions operate as if the current vector length in `vl` is used as the AVL, and the resulting value is written to `vl`, but not to a destination register. This form can only be used when VLMAX and hence `vl` is not actually changed by the -new SEW/LMUL ratio. Use of the instruction with a new SEW/LMUL ratio +new SEW/LMUL ratio. Use of the instructions with a new SEW/LMUL ratio that would result in a change of VLMAX is reserved. -Use of the instruction is also reserved if `vill` was 1 beforehand. +Use of the instructions is also reserved if `vill` was 1 beforehand. Implementations may set `vill` in either case. NOTE: This last form of the instructions allows the `vtype` register to be changed while maintaining the current `vl`, provided VLMAX is not reduced. This design was chosen to ensure `vl` would always hold a legal value for current `vtype` setting. The current `vl` value can -be read from the `vl` CSR. The `vl` value could be reduced by this -instruction if the new SEW/LMUL ratio causes VLMAX to shrink, and so +be read from the `vl` CSR. The `vl` value could be reduced by these +instructions if the new SEW/LMUL ratio causes VLMAX to shrink, and so this case has been reserved as it is not clear this is a generally useful operation, and implementations can otherwise assume `vl` is not -changed by this instruction to optimize their microarchitecture. +changed by these instructions to optimize their microarchitecture. For the `vsetivli` instruction, the AVL is encoded as a 5-bit zero-extended immediate (0--31) in the `rs1` field. @@ -1256,7 +1261,7 @@ CSR immediate values. NOTE: The `vsetivli` instruction provides more compact code when the dimensions of vectors are small and known to fit inside the vector -registers, in which case there is no stripmining overhead. +registers, in which case there is no strip-mining overhead. [[constraints-on-setting-vl]] ==== Constraints on Setting `vl` @@ -1285,17 +1290,17 @@ vector lane utilization for `AVL > VLMAX`. For example, this permits an implementation to set `vl = ceil(AVL / 2)` for `VLMAX < AVL < 2*VLMAX` in order to evenly distribute work over the -last two iterations of a stripmine loop. -Requirement 2 ensures that the first stripmine iteration of reduction +last two iterations of a strip-mine loop. +Requirement 2 ensures that the first strip-mine iteration of reduction loops uses the largest vector length of all iterations, even in the case of `AVL < 2*VLMAX`. This allows software to avoid needing to explicitly calculate a running -maximum of vector lengths observed during a stripmined loop. +maximum of vector lengths observed during a strip-mined loop. Requirement 2 also allows an implementation to set vl to VLMAX for `VLMAX < AVL < 2*VLMAX` -- [[example-stripmine-sew]] -==== Example of stripmining and changes to SEW +==== Example of strip mining and changes to SEW The SEW and LMUL settings can be changed dynamically to provide high throughput on mixed-width operations in a single loop. @@ -1345,7 +1350,7 @@ floating-point load/store 12-bit immediate field to provide further vector instruction encoding, with bit 25 holding the standard vector mask bit (see <<sec-vector-mask-encoding>>). -include::images/wavedrom/vmem-format.adoc[] +include::images/wavedrom/vmem-format.edn[] [cols="4,12"] |=== @@ -1373,7 +1378,7 @@ SEW/LMUL to specify the data width. ==== Vector Load/Store Addressing Modes -The vector extension supports unit-stride, strided, and +The vector extension supports unit-stride, constant-stride, and indexed (scatter/gather) addressing modes. Vector load/store base registers and strides are taken from the GPR `x` registers. @@ -1383,7 +1388,7 @@ contents of the `x` register named in `rs1`. Vector unit-stride operations access elements stored contiguously in memory starting from the base effective address. -Vector constant-strided operations access the first memory element at the base +Vector constant-stride operations access the first memory element at the base effective address, and then access subsequent elements at address increments given by the byte offset contained in the `x` register specified by `rs2`. @@ -1408,9 +1413,9 @@ EEW. If the vector offset elements are narrower than XLEN, they are zero-extended to XLEN before adding to the base effective address. If the vector offset elements are wider than XLEN, the least-significant -XLEN bits are used in the address calculation. An implementation must -raise an illegal instruction exception if the EEW is not supported for -offset elements. +XLEN bits are used in the address calculation. +If the implementation does not support the EEW of the offset elements, +the instruction is reserved. NOTE: A profile may place an upper limit on the maximum supported index EEW (e.g., only up to XLEN) smaller than ELEN. @@ -1425,7 +1430,7 @@ field. | 0 | 0 | unit-stride | VLE<EEW> | 0 | 1 | indexed-unordered | VLUXEI<EEW> -| 1 | 0 | strided | VLSE<EEW> +| 1 | 0 | constant-stride | VLSE<EEW> | 1 | 1 | indexed-ordered | VLOXEI<EEW> |=== @@ -1436,7 +1441,7 @@ field. | 0 | 0 | unit-stride | VSE<EEW> | 0 | 1 | indexed-unordered | VSUXEI<EEW> -| 1 | 0 | strided | VSSE<EEW> +| 1 | 0 | constant-stride | VSSE<EEW> | 1 | 1 | indexed-ordered | VSOXEI<EEW> |=== @@ -1488,7 +1493,7 @@ regular vector loads and stores, `nf`=0, indicating that a single value is moved between a vector register group and memory at each element position. Larger values in the `nf` field are used to access multiple contiguous fields within a segment as described below in -Section <<sec-aos>>. +<<sec-aos>>. The `nf[2:0]` field also encodes the number of whole vector registers to transfer for the whole vector register load/store instructions. @@ -1513,8 +1518,7 @@ claimed by the standard scalar floating-point loads and stores. Implementations must provide vector loads and stores with EEWs corresponding to all supported SEW settings. Vector load/store -encodings for unsupported EEW widths must raise an illegal -instruction exception. +encodings for unsupported EEW widths are reserved. .Width encoding for vector loads and stores. [cols="5,1,1,1,1,>3,>3,>3,3"] @@ -1549,19 +1553,19 @@ currently reserved. ==== Vector Unit-Stride Instructions ---- - # Vector unit-stride loads and stores +# Vector unit-stride loads and stores - # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) - vle8.v vd, (rs1), vm # 8-bit unit-stride load - vle16.v vd, (rs1), vm # 16-bit unit-stride load - vle32.v vd, (rs1), vm # 32-bit unit-stride load - vle64.v vd, (rs1), vm # 64-bit unit-stride load +# vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) +vle8.v vd, (rs1), vm # 8-bit unit-stride load +vle16.v vd, (rs1), vm # 16-bit unit-stride load +vle32.v vd, (rs1), vm # 32-bit unit-stride load +vle64.v vd, (rs1), vm # 64-bit unit-stride load - # vs3 store data, rs1 base address, vm is mask encoding (v0.t or <missing>) - vse8.v vs3, (rs1), vm # 8-bit unit-stride store - vse16.v vs3, (rs1), vm # 16-bit unit-stride store - vse32.v vs3, (rs1), vm # 32-bit unit-stride store - vse64.v vs3, (rs1), vm # 64-bit unit-stride store +# vs3 store data, rs1 base address, vm is mask encoding (v0.t or <missing>) +vse8.v vs3, (rs1), vm # 8-bit unit-stride store +vse16.v vs3, (rs1), vm # 16-bit unit-stride store +vse32.v vs3, (rs1), vm # 32-bit unit-stride store +vse64.v vs3, (rs1), vm # 64-bit unit-stride store ---- Additional unit-stride mask load and store instructions are @@ -1572,11 +1576,11 @@ and the destination register is always written with a tail-agnostic policy. ---- - # Vector unit-stride mask load - vlm.v vd, (rs1) # Load byte vector of length ceil(vl/8) +# Vector unit-stride mask load +vlm.v vd, (rs1) # Load byte vector of length ceil(vl/8) - # Vector unit-stride mask store - vsm.v vs3, (rs1) # Store byte vector of length ceil(vl/8) +# Vector unit-stride mask store +vsm.v vs3, (rs1) # Store byte vector of length ceil(vl/8) ---- `vlm.v` and `vsm.v` are encoded with the same `width[2:0]`=0 encoding as @@ -1599,27 +1603,27 @@ mechanism to use packed bit vectors in memory as mask values, and also reduce the cost of mask spill/fill by reducing need to change `vl`. -==== Vector Strided Instructions +==== Vector Constant-Stride Instructions ---- - # Vector strided loads and stores +# Vector constant-stride loads and stores - # vd destination, rs1 base address, rs2 byte stride - vlse8.v vd, (rs1), rs2, vm # 8-bit strided load - vlse16.v vd, (rs1), rs2, vm # 16-bit strided load - vlse32.v vd, (rs1), rs2, vm # 32-bit strided load - vlse64.v vd, (rs1), rs2, vm # 64-bit strided load +# vd destination, rs1 base address, rs2 byte constant-stride +vlse8.v vd, (rs1), rs2, vm # 8-bit constant-stride load +vlse16.v vd, (rs1), rs2, vm # 16-bit constant-stride load +vlse32.v vd, (rs1), rs2, vm # 32-bit constant-stride load +vlse64.v vd, (rs1), rs2, vm # 64-bit constant-stride load - # vs3 store data, rs1 base address, rs2 byte stride - vsse8.v vs3, (rs1), rs2, vm # 8-bit strided store - vsse16.v vs3, (rs1), rs2, vm # 16-bit strided store - vsse32.v vs3, (rs1), rs2, vm # 32-bit strided store - vsse64.v vs3, (rs1), rs2, vm # 64-bit strided store +# vs3 store data, rs1 base address, rs2 byte constant-stride +vsse8.v vs3, (rs1), rs2, vm # 8-bit constant-stride store +vsse16.v vs3, (rs1), rs2, vm # 16-bit constant-stride store +vsse32.v vs3, (rs1), rs2, vm # 32-bit constant-stride store +vsse64.v vs3, (rs1), rs2, vm # 64-bit constant-stride store ---- Negative and zero strides are supported. -Element accesses within a strided instruction are unordered with +Element accesses within a constant-stride instruction are unordered with respect to each other. When `rs2`=`x0`, then an implementation is allowed, but not required, @@ -1648,36 +1652,35 @@ address are required, then an ordered indexed operation can be used. ==== Vector Indexed Instructions ---- - # Vector indexed loads and stores +# Vector indexed loads and stores - # Vector indexed-unordered load instructions - # vd destination, rs1 base address, vs2 byte offsets - vluxei8.v vd, (rs1), vs2, vm # unordered 8-bit indexed load of SEW data - vluxei16.v vd, (rs1), vs2, vm # unordered 16-bit indexed load of SEW data - vluxei32.v vd, (rs1), vs2, vm # unordered 32-bit indexed load of SEW data - vluxei64.v vd, (rs1), vs2, vm # unordered 64-bit indexed load of SEW data +# Vector indexed-unordered load instructions +# vd destination, rs1 base address, vs2 byte offsets +vluxei8.v vd, (rs1), vs2, vm # unordered 8-bit indexed load of SEW data +vluxei16.v vd, (rs1), vs2, vm # unordered 16-bit indexed load of SEW data +vluxei32.v vd, (rs1), vs2, vm # unordered 32-bit indexed load of SEW data +vluxei64.v vd, (rs1), vs2, vm # unordered 64-bit indexed load of SEW data - # Vector indexed-ordered load instructions - # vd destination, rs1 base address, vs2 byte offsets - vloxei8.v vd, (rs1), vs2, vm # ordered 8-bit indexed load of SEW data - vloxei16.v vd, (rs1), vs2, vm # ordered 16-bit indexed load of SEW data - vloxei32.v vd, (rs1), vs2, vm # ordered 32-bit indexed load of SEW data - vloxei64.v vd, (rs1), vs2, vm # ordered 64-bit indexed load of SEW data +# Vector indexed-ordered load instructions +# vd destination, rs1 base address, vs2 byte offsets +vloxei8.v vd, (rs1), vs2, vm # ordered 8-bit indexed load of SEW data +vloxei16.v vd, (rs1), vs2, vm # ordered 16-bit indexed load of SEW data +vloxei32.v vd, (rs1), vs2, vm # ordered 32-bit indexed load of SEW data +vloxei64.v vd, (rs1), vs2, vm # ordered 64-bit indexed load of SEW data - # Vector indexed-unordered store instructions - # vs3 store data, rs1 base address, vs2 byte offsets - vsuxei8.v vs3, (rs1), vs2, vm # unordered 8-bit indexed store of SEW data - vsuxei16.v vs3, (rs1), vs2, vm # unordered 16-bit indexed store of SEW data - vsuxei32.v vs3, (rs1), vs2, vm # unordered 32-bit indexed store of SEW data - vsuxei64.v vs3, (rs1), vs2, vm # unordered 64-bit indexed store of SEW data - - # Vector indexed-ordered store instructions - # vs3 store data, rs1 base address, vs2 byte offsets - vsoxei8.v vs3, (rs1), vs2, vm # ordered 8-bit indexed store of SEW data - vsoxei16.v vs3, (rs1), vs2, vm # ordered 16-bit indexed store of SEW data - vsoxei32.v vs3, (rs1), vs2, vm # ordered 32-bit indexed store of SEW data - vsoxei64.v vs3, (rs1), vs2, vm # ordered 64-bit indexed store of SEW data +# Vector indexed-unordered store instructions +# vs3 store data, rs1 base address, vs2 byte offsets +vsuxei8.v vs3, (rs1), vs2, vm # unordered 8-bit indexed store of SEW data +vsuxei16.v vs3, (rs1), vs2, vm # unordered 16-bit indexed store of SEW data +vsuxei32.v vs3, (rs1), vs2, vm # unordered 32-bit indexed store of SEW data +vsuxei64.v vs3, (rs1), vs2, vm # unordered 64-bit indexed store of SEW data +# Vector indexed-ordered store instructions +# vs3 store data, rs1 base address, vs2 byte offsets +vsoxei8.v vs3, (rs1), vs2, vm # ordered 8-bit indexed store of SEW data +vsoxei16.v vs3, (rs1), vs2, vm # ordered 16-bit indexed store of SEW data +vsoxei32.v vs3, (rs1), vs2, vm # ordered 32-bit indexed store of SEW data +vsoxei64.v vs3, (rs1), vs2, vm # ordered 64-bit indexed store of SEW data ---- NOTE: The assembler syntax for indexed loads and stores uses @@ -1714,13 +1717,13 @@ operation will not be restarted due to a trap or vector-length trimming. ---- - # Vector unit-stride fault-only-first loads +# Vector unit-stride fault-only-first loads - # vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) - vle8ff.v vd, (rs1), vm # 8-bit unit-stride fault-only-first load - vle16ff.v vd, (rs1), vm # 16-bit unit-stride fault-only-first load - vle32ff.v vd, (rs1), vm # 32-bit unit-stride fault-only-first load - vle64ff.v vd, (rs1), vm # 64-bit unit-stride fault-only-first load +# vd destination, rs1 base address, vm is mask encoding (v0.t or <missing>) +vle8ff.v vd, (rs1), vm # 8-bit unit-stride fault-only-first load +vle16ff.v vd, (rs1), vm # 16-bit unit-stride fault-only-first load +vle32ff.v vd, (rs1), vm # 32-bit unit-stride fault-only-first load +vle64ff.v vd, (rs1), vm # 64-bit unit-stride fault-only-first load ---- ---- @@ -1735,7 +1738,7 @@ versions only allow probing a region immediately contiguous to a known region, and so reduce the security impact when used in unprivileged code. However, code running in S-mode can establish arbitrary page translations that allow probing of random guest physical addresses -provided by a hypervisor. Strided and scatter/gather fault-only-first +provided by a hypervisor. Constant-stride and scatter/gather fault-only-first instructions are not provided due to lack of encoding space, but they can also represent a larger security hole, allowing even unprivileged software to easily check multiple random pages for accessibility @@ -1837,14 +1840,15 @@ The assembler prefixes `vlseg`/`vsseg` are used for unit-stride segment loads and stores respectively. ---- - # Format - vlseg<nf>e<eew>.v vd, (rs1), vm # Unit-stride segment load template - vsseg<nf>e<eew>.v vs3, (rs1), vm # Unit-stride segment store template +# Format +# In this syntax, <nf> equals NFIELDS and is an integer in the range [2, 8]. +vlseg<nf>e<eew>.v vd, (rs1), vm # Unit-stride segment load template +vsseg<nf>e<eew>.v vs3, (rs1), vm # Unit-stride segment store template - # Examples - vlseg8e8.v vd, (rs1), vm # Load eight vector registers with eight byte fields. +# Examples +vlseg8e8.v vd, (rs1), vm # Load eight vector registers with eight byte fields. - vsseg3e32.v vs3, (rs1), vm # Store packed vector of 3*4-byte segments from vs3,vs3+1,vs3+2 to memory +vsseg3e32.v vs3, (rs1), vm # Store packed vector of 3*4-byte segments from vs3,vs3+1,vs3+2 to memory ---- For loads, the `vd` register will hold the first field loaded from the @@ -1852,60 +1856,63 @@ segment. For stores, the `vs3` register is read to provide the first field to be stored to each segment. ---- - # Example 1 - # Memory structure holds packed RGB pixels (24-bit data structure, 8bpp) - vsetvli a1, t0, e8, ta, ma - vlseg3e8.v v8, (a0), vm - # v8 holds the red pixels - # v9 holds the green pixels - # v10 holds the blue pixels +# Example 1 +# Memory structure holds packed RGB pixels (24-bit data structure, 8bpp) +vsetvli a1, t0, e8, m1, ta, ma +vlseg3e8.v v8, (a0), vm +# v8 holds the red pixels +# v9 holds the green pixels +# v10 holds the blue pixels - # Example 2 - # Memory structure holds complex values, 32b for real and 32b for imaginary - vsetvli a1, t0, e32, ta, ma - vlseg2e32.v v8, (a0), vm - # v8 holds real - # v9 holds imaginary +# Example 2 +# Memory structure holds complex values, 32b for real and 32b for imaginary +vsetvli a1, t0, e32, m1, ta, ma +vlseg2e32.v v8, (a0), vm +# v8 holds real +# v9 holds imaginary ---- There are also fault-only-first versions of the unit-stride instructions. ---- - # Template for vector fault-only-first unit-stride segment loads. - vlseg<nf>e<eew>ff.v vd, (rs1), vm # Unit-stride fault-only-first segment loads +# Template for vector fault-only-first unit-stride segment loads. +vlseg<nf>e<eew>ff.v vd, (rs1), vm # Unit-stride fault-only-first segment loads ---- For fault-only-first segment loads, if an exception is detected partway -through accessing a segment, regardless of whether the element index is zero, -it is implementation-defined whether a subset of the segment is loaded. +through accessing the zeroth segment, the trap is taken. +If an exception is detected partway through accessing a subsequent segment, +`vl` is reduced to the index of that segment. +In both cases, it is implementation-defined whether a subset of the segment is +loaded. These instructions may overwrite destination vector register group elements past the point at which a trap is reported or past the point at which vector length is trimmed. -===== Vector Strided Segment Loads and Stores +===== Vector Constant-Stride Segment Loads and Stores -Vector strided segment loads and stores move contiguous segments where +Vector constant-stride segment loads and stores move contiguous segments where each segment is separated by the byte-stride offset given in the `rs2` GPR argument. NOTE: Negative and zero strides are supported. ---- - # Format - vlsseg<nf>e<eew>.v vd, (rs1), rs2, vm # Strided segment loads - vssseg<nf>e<eew>.v vs3, (rs1), rs2, vm # Strided segment stores +# Format +vlsseg<nf>e<eew>.v vd, (rs1), rs2, vm # Constant-stride segment loads +vssseg<nf>e<eew>.v vs3, (rs1), rs2, vm # Constant-stride segment stores - # Examples - vsetvli a1, t0, e8, ta, ma - vlsseg3e8.v v4, (x5), x6 # Load bytes at addresses x5+i*x6 into v4[i], - # and bytes at addresses x5+i*x6+1 into v5[i], - # and bytes at addresses x5+i*x6+2 into v6[i]. +# Examples +vsetvli a1, t0, e8, m1, ta, ma +vlsseg3e8.v v4, (x5), x6 # Load bytes at addresses x5+i*x6 into v4[i], + # and bytes at addresses x5+i*x6+1 into v5[i], + # and bytes at addresses x5+i*x6+2 into v6[i]. - # Examples - vsetvli a1, t0, e32, ta, ma - vssseg2e32.v v2, (x5), x6 # Store words from v2[i] to address x5+i*x6 - # and words from v3[i] to address x5+i*x6+4 +# Examples +vsetvli a1, t0, e32, m1, ta, ma +vssseg2e32.v v2, (x5), x6 # Store words from v2[i] to address x5+i*x6 + # and words from v3[i] to address x5+i*x6+4 ---- Accesses to the fields within each segment can occur in any order, @@ -1928,22 +1935,22 @@ EMUL=(EEW/SEW)*LMUL. The EMUL * NFIELDS {le} 8 constraint applies to the data vector register group. ---- - # Format - vluxseg<nf>ei<eew>.v vd, (rs1), vs2, vm # Indexed-unordered segment loads - vloxseg<nf>ei<eew>.v vd, (rs1), vs2, vm # Indexed-ordered segment loads - vsuxseg<nf>ei<eew>.v vs3, (rs1), vs2, vm # Indexed-unordered segment stores - vsoxseg<nf>ei<eew>.v vs3, (rs1), vs2, vm # Indexed-ordered segment stores +# Format +vluxseg<nf>ei<eew>.v vd, (rs1), vs2, vm # Indexed-unordered segment loads +vloxseg<nf>ei<eew>.v vd, (rs1), vs2, vm # Indexed-ordered segment loads +vsuxseg<nf>ei<eew>.v vs3, (rs1), vs2, vm # Indexed-unordered segment stores +vsoxseg<nf>ei<eew>.v vs3, (rs1), vs2, vm # Indexed-ordered segment stores - # Examples - vsetvli a1, t0, e8, ta, ma - vluxseg3ei8.v v4, (x5), v3 # Load bytes at addresses x5+v3[i] into v4[i], - # and bytes at addresses x5+v3[i]+1 into v5[i], - # and bytes at addresses x5+v3[i]+2 into v6[i]. +# Examples +vsetvli a1, t0, e8, m1, ta, ma +vluxseg3ei8.v v4, (x5), v3 # Load bytes at addresses x5+v3[i] into v4[i], + # and bytes at addresses x5+v3[i]+1 into v5[i], + # and bytes at addresses x5+v3[i]+2 into v6[i]. - # Examples - vsetvli a1, t0, e32, ta, ma - vsuxseg2ei32.v v2, (x5), v5 # Store words from v2[i] to address x5+v5[i] - # and words from v3[i] to address x5+v5[i]+4 +# Examples +vsetvli a1, t0, e32, m1, ta, ma +vsuxseg2ei32.v v2, (x5), v5 # Store words from v2[i] to address x5+v5[i] + # and words from v3[i] to address x5+v5[i]+4 ---- For vector indexed segment loads, the destination vector register @@ -2060,44 +2067,41 @@ environments can mandate the minimum alignment requirements to support an ABI. ---- - # Format of whole register load and store instructions. - vl1r.v v3, (a0) # Pseudoinstruction equal to vl1re8.v +# Format of whole register load and store instructions. +vl1r.v v3, (a0) # Pseudoinstruction equal to vl1re8.v - vl1re8.v v3, (a0) # Load v3 with VLEN/8 bytes held at address in a0 - vl1re16.v v3, (a0) # Load v3 with VLEN/16 halfwords held at address in a0 - vl1re32.v v3, (a0) # Load v3 with VLEN/32 words held at address in a0 - vl1re64.v v3, (a0) # Load v3 with VLEN/64 doublewords held at address in a0 +vl1re8.v v3, (a0) # Load v3 with VLEN/8 bytes held at address in a0 +vl1re16.v v3, (a0) # Load v3 with VLEN/16 halfwords held at address in a0 +vl1re32.v v3, (a0) # Load v3 with VLEN/32 words held at address in a0 +vl1re64.v v3, (a0) # Load v3 with VLEN/64 doublewords held at address in a0 - vl2r.v v2, (a0) # Pseudoinstruction equal to vl2re8.v +vl2r.v v2, (a0) # Pseudoinstruction equal to vl2re8.v - vl2re8.v v2, (a0) # Load v2-v3 with 2*VLEN/8 bytes from address in a0 - vl2re16.v v2, (a0) # Load v2-v3 with 2*VLEN/16 halfwords held at address in a0 - vl2re32.v v2, (a0) # Load v2-v3 with 2*VLEN/32 words held at address in a0 - vl2re64.v v2, (a0) # Load v2-v3 with 2*VLEN/64 doublewords held at address in a0 +vl2re8.v v2, (a0) # Load v2-v3 with 2*VLEN/8 bytes from address in a0 +vl2re16.v v2, (a0) # Load v2-v3 with 2*VLEN/16 halfwords held at address in a0 +vl2re32.v v2, (a0) # Load v2-v3 with 2*VLEN/32 words held at address in a0 +vl2re64.v v2, (a0) # Load v2-v3 with 2*VLEN/64 doublewords held at address in a0 - vl4r.v v4, (a0) # Pseudoinstruction equal to vl4re8.v +vl4r.v v4, (a0) # Pseudoinstruction equal to vl4re8.v - vl4re8.v v4, (a0) # Load v4-v7 with 4*VLEN/8 bytes from address in a0 - vl4re16.v v4, (a0) - vl4re32.v v4, (a0) - vl4re64.v v4, (a0) +vl4re8.v v4, (a0) # Load v4-v7 with 4*VLEN/8 bytes from address in a0 +vl4re16.v v4, (a0) +vl4re32.v v4, (a0) +vl4re64.v v4, (a0) - vl8r.v v8, (a0) # Pseudoinstruction equal to vl8re8.v +vl8r.v v8, (a0) # Pseudoinstruction equal to vl8re8.v - vl8re8.v v8, (a0) # Load v8-v15 with 8*VLEN/8 bytes from address in a0 - vl8re16.v v8, (a0) - vl8re32.v v8, (a0) - vl8re64.v v8, (a0) +vl8re8.v v8, (a0) # Load v8-v15 with 8*VLEN/8 bytes from address in a0 +vl8re16.v v8, (a0) +vl8re32.v v8, (a0) +vl8re64.v v8, (a0) - vs1r.v v3, (a1) # Store v3 to address in a1 - vs2r.v v2, (a1) # Store v2-v3 to address in a1 - vs4r.v v4, (a1) # Store v4-v7 to address in a1 - vs8r.v v8, (a1) # Store v8-v15 to address in a1 +vs1r.v v3, (a1) # Store v3 to address in a1 +vs2r.v v2, (a1) # Store v2-v3 to address in a1 +vs4r.v v4, (a1) # Store v4-v7 to address in a1 +vs8r.v v8, (a1) # Store v8-v15 to address in a1 ---- -NOTE: Implementations should raise illegal instruction exceptions on -`vl<nf>r` instructions for EEW values that are not supported. - NOTE: We have considered adding a whole register mask load instruction (`vl1rm.v`) but have decided to omit from initial extension. The primary purpose would be to inform the microarchitecture that the data @@ -2109,17 +2113,17 @@ following vector instruction needs a new SEW/LMUL. So, in best case only two instructions (of which only one performs vector operations) are needed to synthesize the effect of the dedicated instruction: ---- - csrr t0, vl # Save current vl (potentially not needed) - vsetvli t1, x0, e8, m8, ta, ma # Maximum VLMAX - vlm.v v0, (a0) # Load mask register - vsetvli x0, t0, <new type> # Restore vl (potentially already present) +csrr t0, vl # Save current vl (potentially not needed) +vsetvli t1, x0, e8, m8, ta, ma # Maximum VLMAX +vlm.v v0, (a0) # Load mask register +vsetvli x0, t0, <new type> # Restore vl (potentially already present) ---- === Vector Memory Alignment Constraints If an element accessed by a vector memory instruction is not naturally aligned to the size of the element, either the element is transferred -successfully or an address misaligned exception is raised on that +successfully or an address-misaligned exception is raised on that element. Support for misaligned vector memory accesses is independent of an @@ -2172,7 +2176,7 @@ The vector arithmetic instructions use a new major opcode (OP-V = 1010111~2~) which neighbors OP-FP. The three-bit `funct3` field is used to define sub-categories of vector instructions. -include::images/wavedrom/valu-format.adoc[] +include::images/wavedrom/valu-format.edn[] [[sec-arithmetic-encoding]] ==== Vector Arithmetic Instruction encoding @@ -2306,7 +2310,7 @@ The first vector register group operand can be either single or double-width. ---- -Assembly syntax pattern for vector widening arithmetic instructions +# Assembly syntax pattern for vector widening arithmetic instructions # Double-width result, two single-width sources: 2*SEW = SEW op SEW vwop.vv vd, vs2, vs1, vm # integer vector-vector vd[i] = vs2[i] op vs1[i] @@ -2325,7 +2329,7 @@ NOTE: The floating-point widening operations were changed to `vfw*` from `vwf*` to be more consistent with any scalar widening floating-point operations that will be written as `fw*`. -Widening instruction encodings must follow the constraints in Section +Widening instruction encodings must follow the constraints in <<sec-vec-operands>>. [[sec-narrowing]] @@ -2360,7 +2364,7 @@ vnop.wv vd, vs2, vs1, vm # integer vector-vector vd[i] = vs2[i] op vs1[i] vnop.wx vd, vs2, rs1, vm # integer vector-scalar vd[i] = vs2[i] op x[rs1] ---- -Narrowing instruction encodings must follow the constraints in Section +Narrowing instruction encodings must follow the constraints in <<sec-vec-operands>>. [[sec-vector-integer]] @@ -2469,7 +2473,7 @@ the second to generate the carry output (single bit encoded as a mask boolean). The carry inputs and outputs are represented using the mask register -layout as described in Section <<sec-mask-register-layout>>. Due to +layout as described in <<sec-mask-register-layout>>. Due to encoding constraints, the carry input must come from the implicit `v0` register, but carry outputs can be written to any vector register that respects the source/destination overlap restrictions. @@ -2482,9 +2486,9 @@ Encodings corresponding to the unmasked versions (`vm=1`) are reserved. `vmadc` and `vmsbc` add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked (`vm=0`), and -write the result back to mask register `vd`. If unmasked (`vm=1`), -there is no carry-in or borrow-in. These instructions operate on and -write back all body elements, even if masked. Because these +write the resulting carry-out or borrow-out back to mask register `vd`. +If unmasked (`vm=1`), there is no carry-in or borrow-in. These instructions +operate on and write back all body elements, even if masked. Because these instructions produce a mask value, they always operate with a tail-agnostic policy. @@ -2526,10 +2530,10 @@ instructions with unchanged inputs, destructive accumulations will require an additional move to obtain correct results. ---- - # Example multi-word arithmetic sequence, accumulating into v4 - vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 - vadc.vvm v4, v4, v8, v0 # Calc new sum - vmmv.m v0, v1 # Move temp carry into v0 for next word +# Example multi-word arithmetic sequence, accumulating into v4 +vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1 +vadc.vvm v4, v4, v8, v0 # Calc new sum +vmmv.m v0, v1 # Move temp carry into v0 for next word ---- The subtract with borrow instruction `vsbc` performs the equivalent @@ -2537,27 +2541,27 @@ function to support long word arithmetic for subtraction. There are no subtract with immediate instructions. ---- - # Produce difference with borrow. +# Produce difference with borrow. - # vd[i] = vs2[i] - vs1[i] - v0.mask[i] - vsbc.vvm vd, vs2, vs1, v0 # Vector-vector +# vd[i] = vs2[i] - vs1[i] - v0.mask[i] +vsbc.vvm vd, vs2, vs1, v0 # Vector-vector - # vd[i] = vs2[i] - x[rs1] - v0.mask[i] - vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar +# vd[i] = vs2[i] - x[rs1] - v0.mask[i] +vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar - # Produce borrow out in mask register format +# Produce borrow out in mask register format - # vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) - vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector +# vd.mask[i] = borrow_out(vs2[i] - vs1[i] - v0.mask[i]) +vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector - # vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) - vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar +# vd.mask[i] = borrow_out(vs2[i] - x[rs1] - v0.mask[i]) +vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar - # vd.mask[i] = borrow_out(vs2[i] - vs1[i]) - vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in +# vd.mask[i] = borrow_out(vs2[i] - vs1[i]) +vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in - # vd.mask[i] = borrow_out(vs2[i] - x[rs1]) - vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in +# vd.mask[i] = borrow_out(vs2[i] - x[rs1]) +vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in ---- For `vmsbc`, the borrow is defined to be 1 iff the difference, prior to @@ -2650,7 +2654,7 @@ pseudoinstruction is provided `vncvt.x.x.w vd,vs,vm` = `vnsrl.wx vd,vs,x0,vm`. The following integer compare instructions write 1 to the destination mask register element if the comparison evaluates to true, and 0 otherwise. The destination mask vector is always held in a single -vector register, with a layout of elements as described in Section +vector register, with a layout of elements as described in <<sec-mask-register-layout>>. The destination mask vector register may be the same as the source vector mask register (`v0`). @@ -2807,9 +2811,9 @@ masked va >= x, any vd Compares effectively AND in the mask under a mask-undisturbed policy if the destination register is `v0`, e.g., ---- - # (a < b) && (b < c) in two instructions when mask-undisturbed - vmslt.vv v0, va, vb # All body elements written - vmslt.vv v0, vb, vc, v0.t # Only update at set mask +# (a < b) && (b < c) in two instructions when mask-undisturbed +vmslt.vv v0, va, vb # All body elements written +vmslt.vv v0, vb, vc, v0.t # Only update at set mask ---- Compares write mask registers, and so always operate under a @@ -2883,21 +2887,21 @@ standard scalar integer multiply/divides, with the same results for extreme inputs. ---- - # Unsigned divide. - vdivu.vv vd, vs2, vs1, vm # Vector-vector - vdivu.vx vd, vs2, rs1, vm # vector-scalar +# Unsigned divide. +vdivu.vv vd, vs2, vs1, vm # Vector-vector +vdivu.vx vd, vs2, rs1, vm # vector-scalar - # Signed divide - vdiv.vv vd, vs2, vs1, vm # Vector-vector - vdiv.vx vd, vs2, rs1, vm # vector-scalar +# Signed divide +vdiv.vv vd, vs2, vs1, vm # Vector-vector +vdiv.vx vd, vs2, rs1, vm # vector-scalar - # Unsigned remainder - vremu.vv vd, vs2, vs1, vm # Vector-vector - vremu.vx vd, vs2, rs1, vm # vector-scalar +# Unsigned remainder +vremu.vv vd, vs2, vs1, vm # Vector-vector +vremu.vx vd, vs2, rs1, vm # vector-scalar - # Signed remainder - vrem.vv vd, vs2, vs1, vm # Vector-vector - vrem.vx vd, vs2, rs1, vm # vector-scalar +# Signed remainder +vrem.vv vd, vs2, vs1, vm # Vector-vector +vrem.vx vd, vs2, rs1, vm # vector-scalar ---- NOTE: The decision to include integer divide and remainder was @@ -2944,16 +2948,16 @@ floating-point `fnmsub` instruction definition. Similarly for the ---- # Integer multiply-add, overwrite addend -vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] -vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] +vmacc.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vs2[i]) + vd[i] +vmacc.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vs2[i]) + vd[i] # Integer multiply-sub, overwrite minuend vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i] # Integer multiply-add, overwrite multiplicand -vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] -vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i] +vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] +vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i] # Integer multiply-sub, overwrite multiplicand vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] @@ -2969,19 +2973,19 @@ multiply operands are supported. ---- # Widening unsigned-integer multiply-add, overwrite addend -vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] -vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] +vwmaccu.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vs2[i]) + vd[i] +vwmaccu.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vs2[i]) + vd[i] # Widening signed-integer multiply-add, overwrite addend -vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] -vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i] +vwmacc.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vs2[i]) + vd[i] +vwmacc.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vs2[i]) + vd[i] # Widening signed-unsigned-integer multiply-add, overwrite addend -vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i] -vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i] +vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = (signed(vs1[i]) * unsigned(vs2[i])) + vd[i] +vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = (signed(x[rs1]) * unsigned(vs2[i])) + vd[i] # Widening unsigned-signed-integer multiply-add, overwrite addend -vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i] +vwmaccus.vx vd, rs1, vs2, vm # vd[i] = (unsigned(x[rs1]) * signed(vs2[i])) + vd[i] ---- ==== Vector Integer Merge Instructions @@ -3039,7 +3043,7 @@ can shuffle the internal representation according to SEW. Implementations that do not internally reorganize data can dynamically elide this instruction, and treat as a NOP. -NOTE: The `vmv.v.v vd. vd` instruction is not a RISC-V HINT as a +NOTE: The `vmv.v.v vd, vd` instruction is not a RISC-V HINT as a tail-agnostic setting may cause an architectural state change on some implementations. @@ -3093,6 +3097,7 @@ vssub.vx vd, vs2, rs1, vm # vector-scalar The averaging add and subtract instructions right shift the result by one bit and round off the result according to the setting in `vxrm`. +Computation is performed in infinite precision before rounding and truncating. Both unsigned and signed versions are provided. For `vaaddu` and `vaadd` there can be no overflow in the result. For `vasub` and `vasubu`, overflow is ignored and the result wraps around. @@ -3132,7 +3137,7 @@ is set. ---- # Signed saturating and rounding fractional multiply -# See vxrm description for rounding calculation +# See vxrm description for rounding calculation vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*vs1[i], SEW-1)) vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*x[rs1], SEW-1)) ---- @@ -3188,14 +3193,14 @@ used to control the right shift amount, which provides the scaling. ---- # Narrowing unsigned clip # SEW 2*SEW SEW - vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) - vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) - vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) +vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i])) +vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1])) +vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm)) # Narrowing signed clip - vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) - vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) - vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm)) +vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i])) +vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1])) +vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm)) ---- For `vnclipu`/`vnclip`, the rounding mode is specified in the `vxrm` @@ -3211,7 +3216,7 @@ into an unsigned destination. A sequence of two vector instructions that first removes negative numbers by performing a max against 0 using `vmax` then clips the resulting unsigned value into the destination using `vnclipu` can be used if setting `vxsat` value for -negative numbers is not required. A `vsetvli` is required inbetween +negative numbers is not required. A `vsetvli` is required between these two instructions to change SEW. For `vnclip`, the shifted rounded source value is treated as a signed @@ -3246,7 +3251,7 @@ half-precision floating-point support. If the floating-point unit status field `mstatus.FS` is `Off` then any attempt to execute a vector floating-point instruction will raise an -illegal instruction exception. Any vector floating-point instruction +illegal-instruction exception. Any vector floating-point instruction that modifies any floating-point extension state (i.e., floating-point CSRs or `f` registers) must set `mstatus.FS` to `Dirty`. @@ -3254,7 +3259,7 @@ If the hypervisor extension is implemented and V=1, the `vsstatus.FS` field is additionally in effect for vector floating-point instructions. If `vsstatus.FS` or `mstatus.FS` is `Off` then any attempt to execute a vector floating-point instruction will raise an -illegal instruction exception. Any vector floating-point instruction +illegal-instruction exception. Any vector floating-point instruction that modifies any floating-point extension state (i.e., floating-point CSRs or `f` registers) must set both `mstatus.FS` and `vsstatus.FS` to `Dirty`. @@ -3262,7 +3267,7 @@ The vector floating-point instructions have the same behavior as the scalar floating-point instructions with regard to NaNs. Scalar values for floating-point vector-scalar operations are sourced -as described in Section <<sec-arithmetic-encoding>>. +as described in <<sec-arithmetic-encoding>>. ==== Vector Floating-Point Exception Flags @@ -3273,14 +3278,14 @@ elements do not set FP exception flags. ==== Vector Single-Width Floating-Point Add/Subtract Instructions ---- - # Floating-point add - vfadd.vv vd, vs2, vs1, vm # Vector-vector - vfadd.vf vd, vs2, rs1, vm # vector-scalar +# Floating-point add +vfadd.vv vd, vs2, vs1, vm # Vector-vector +vfadd.vf vd, vs2, rs1, vm # vector-scalar - # Floating-point subtract - vfsub.vv vd, vs2, vs1, vm # Vector-vector - vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] - vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i] +# Floating-point subtract +vfsub.vv vd, vs2, vs1, vm # Vector-vector +vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1] +vfrsub.vf vd, vs2, rs1, vm # Scalar-vector vd[i] = f[rs1] - vs2[i] ---- ==== Vector Widening Floating-Point Add/Subtract Instructions @@ -3302,16 +3307,16 @@ vfwsub.wf vd, vs2, rs1, vm # vector-scalar ==== Vector Single-Width Floating-Point Multiply/Divide Instructions ---- - # Floating-point multiply - vfmul.vv vd, vs2, vs1, vm # Vector-vector - vfmul.vf vd, vs2, rs1, vm # vector-scalar +# Floating-point multiply +vfmul.vv vd, vs2, vs1, vm # Vector-vector +vfmul.vf vd, vs2, rs1, vm # vector-scalar - # Floating-point divide - vfdiv.vv vd, vs2, vs1, vm # Vector-vector - vfdiv.vf vd, vs2, rs1, vm # vector-scalar +# Floating-point divide +vfdiv.vv vd, vs2, vs1, vm # Vector-vector +vfdiv.vf vd, vs2, rs1, vm # vector-scalar - # Reverse floating-point divide vector = scalar / vector - vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i] +# Reverse floating-point divide vector = scalar / vector +vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i] ---- ==== Vector Widening Floating-Point Multiply @@ -3330,32 +3335,32 @@ addend or the first multiplicand. ---- # FP multiply-accumulate, overwrites addend -vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] -vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] +vfmacc.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vs2[i]) + vd[i] +vfmacc.vf vd, rs1, vs2, vm # vd[i] = (f[rs1] * vs2[i]) + vd[i] # FP negate-(multiply-accumulate), overwrites subtrahend vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP multiply-subtract-accumulator, overwrites subtrahend -vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] -vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] +vfmsac.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vs2[i]) - vd[i] +vfmsac.vf vd, rs1, vs2, vm # vd[i] = (f[rs1] * vs2[i]) - vd[i] # FP negate-(multiply-subtract-accumulator), overwrites minuend vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] # FP multiply-add, overwrites multiplicand -vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i] -vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i] +vfmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i] +vfmadd.vf vd, rs1, vs2, vm # vd[i] = (f[rs1] * vd[i]) + vs2[i] # FP negate-(multiply-add), overwrites multiplicand vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i] vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i] # FP multiply-sub, overwrites multiplicand -vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i] -vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i] +vfmsub.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) - vs2[i] +vfmsub.vf vd, rs1, vs2, vm # vd[i] = (f[rs1] * vd[i]) - vs2[i] # FP negate-(multiply-sub), overwrites multiplicand vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i] @@ -3375,16 +3380,16 @@ all SEW wide, while the addend and destination is 2*SEW bits wide. ---- # FP widening multiply-accumulate, overwrites addend -vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i] -vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i] +vfwmacc.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vs2[i]) + vd[i] +vfwmacc.vf vd, rs1, vs2, vm # vd[i] = (f[rs1] * vs2[i]) + vd[i] # FP widening negate-(multiply-accumulate), overwrites addend vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i] vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i] # FP widening multiply-subtract-accumulator, overwrites addend -vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i] -vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i] +vfwmsac.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vs2[i]) - vd[i] +vfwmsac.vf vd, rs1, vs2, vm # vd[i] = (f[rs1] * vs2[i]) - vd[i] # FP widening negate-(multiply-subtract-accumulator), overwrites addend vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i] @@ -3396,15 +3401,15 @@ vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i] This is a unary vector-vector instruction. ---- - # Floating-point square root - vfsqrt.v vd, vs2, vm # Vector-vector square root +# Floating-point square root +vfsqrt.v vd, vs2, vm # Vector-vector square root ---- ==== Vector Floating-Point Reciprocal Square-Root Estimate Instruction ---- - # Floating-point reciprocal square-root estimate to 7 bits. - vfrsqrt7.v vd, vs2, vm +# Floating-point reciprocal square-root estimate to 7 bits. +vfrsqrt7.v vd, vs2, vm ---- This is a unary vector-vector instruction that returns an estimate of @@ -3460,7 +3465,7 @@ The following table gives the seven MSBs of the output significand as a function of the LSB of the normalized input exponent and the six MSBs of the normalized input significand; the other bits of the output significand are zero. -include::images/wavedrom/vfrsqrt7.adoc[] +include::images/wavedrom/vfrsqrt7.edn[] NOTE: For example, when SEW=32, vfrsqrt7(0x00718abc ({approx} 1.043e-38)) = 0x5f080000 ({approx} 9.800e18), and vfrsqrt7(0x7f765432 ({approx} 3.274e38)) = 0x1f820000 ({approx} 5.506e-20). @@ -3472,8 +3477,8 @@ with greater estimate accuracy. ==== Vector Floating-Point Reciprocal Estimate Instruction ---- - # Floating-point reciprocal estimate to 7 bits. - vfrec7.v vd, vs2, vm +# Floating-point reciprocal estimate to 7 bits. +vfrec7.v vd, vs2, vm ---- NOTE: An earlier draft version had used the assembler name `vfrece7` @@ -3547,7 +3552,7 @@ The following table gives the seven MSBs of the normalized output significand as a function of the seven MSBs of the normalized input significand; the other bits of the normalized output significand are zero. -include::images/wavedrom/vfrec7.adoc[] +include::images/wavedrom/vfrec7.edn[] If the normalized output exponent is 0 or -1, the result is subnormal: the output exponent is 0, and the output significand is given by concatenating @@ -3572,13 +3577,13 @@ in version 2.2 of the RISC-V F/D/Q extension: they perform the `minimumNumber` or `maximumNumber` operation on active elements. ---- - # Floating-point minimum - vfmin.vv vd, vs2, vs1, vm # Vector-vector - vfmin.vf vd, vs2, rs1, vm # vector-scalar +# Floating-point minimum +vfmin.vv vd, vs2, vs1, vm # Vector-vector +vfmin.vf vd, vs2, rs1, vm # vector-scalar - # Floating-point maximum - vfmax.vv vd, vs2, vs1, vm # Vector-vector - vfmax.vf vd, vs2, rs1, vm # vector-scalar +# Floating-point maximum +vfmax.vv vd, vs2, vs1, vm # Vector-vector +vfmax.vf vd, vs2, rs1, vm # vector-scalar ---- ==== Vector Floating-Point Sign-Injection Instructions @@ -3587,14 +3592,14 @@ Vector versions of the scalar sign-injection instructions. The result takes all bits except the sign bit from the vector `vs2` operands. ---- - vfsgnj.vv vd, vs2, vs1, vm # Vector-vector - vfsgnj.vf vd, vs2, rs1, vm # vector-scalar +vfsgnj.vv vd, vs2, vs1, vm # Vector-vector +vfsgnj.vf vd, vs2, rs1, vm # vector-scalar - vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector - vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar +vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector +vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar - vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector - vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar +vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector +vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar ---- NOTE: A vector of floating-point values can be negated using a @@ -3611,7 +3616,7 @@ pseudoinstruction is provided: `vfabs.v vd,vs` = `vfsgnjx.vv vd,vs,vs`. These vector FP compare instructions compare two source operands and write the comparison result to a mask register. The destination mask vector is always held in a single vector register, with a layout of -elements as described in Section <<sec-mask-register-layout>>. The +elements as described in <<sec-mask-register-layout>>. The destination mask vector register may be the same as the source vector mask register (`v0`). Compares write mask registers, and so always operate under a tail-agnostic policy. @@ -3626,27 +3631,27 @@ operand is NaN, whereas the other compares write 0 when either operand is NaN. ---- - # Compare equal - vmfeq.vv vd, vs2, vs1, vm # Vector-vector - vmfeq.vf vd, vs2, rs1, vm # vector-scalar +# Compare equal +vmfeq.vv vd, vs2, vs1, vm # Vector-vector +vmfeq.vf vd, vs2, rs1, vm # vector-scalar - # Compare not equal - vmfne.vv vd, vs2, vs1, vm # Vector-vector - vmfne.vf vd, vs2, rs1, vm # vector-scalar +# Compare not equal +vmfne.vv vd, vs2, vs1, vm # Vector-vector +vmfne.vf vd, vs2, rs1, vm # vector-scalar - # Compare less than - vmflt.vv vd, vs2, vs1, vm # Vector-vector - vmflt.vf vd, vs2, rs1, vm # vector-scalar +# Compare less than +vmflt.vv vd, vs2, vs1, vm # Vector-vector +vmflt.vf vd, vs2, rs1, vm # vector-scalar - # Compare less than or equal - vmfle.vv vd, vs2, vs1, vm # Vector-vector - vmfle.vf vd, vs2, rs1, vm # vector-scalar +# Compare less than or equal +vmfle.vv vd, vs2, vs1, vm # Vector-vector +vmfle.vf vd, vs2, rs1, vm # vector-scalar - # Compare greater than - vmfgt.vf vd, vs2, rs1, vm # vector-scalar +# Compare greater than +vmfgt.vf vd, vs2, rs1, vm # vector-scalar - # Compare greater than or equal - vmfge.vf vd, vs2, rs1, vm # vector-scalar +# Compare greater than or equal +vmfge.vf vd, vs2, rs1, vm # vector-scalar ---- ---- @@ -3675,11 +3680,11 @@ the comparand is a non-NaN constant, the middle two instructions can be omitted. ---- - # Example of implementing isgreater() - vmfeq.vv v0, va, va # Only set where A is not NaN. - vmfeq.vv v1, vb, vb # Only set where B is not NaN. - vmand.mm v0, v0, v1 # Only set where A and B are ordered, - vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values. +# Example of implementing isgreater() +vmfeq.vv v0, va, va # Only set where A is not NaN. +vmfeq.vv v1, vb, vb # Only set where B is not NaN. +vmand.mm v0, v0, v1 # Only set where A and B are ordered, +vmfgt.vv v0, va, vb, v0.t # so only set flags on ordered values. ---- NOTE: In the above sequence, it is tempting to mask the second `vmfeq` @@ -3694,7 +3699,7 @@ This is a unary vector-vector instruction that operates in the same way as the scalar classify instruction. ---- - vfclass.v vd, vs2, vm # Vector-vector +vfclass.v vd, vs2, vm # Vector-vector ---- The 10-bit mask produced by this instruction is placed in the @@ -3718,7 +3723,6 @@ register value is copied to the destination element. vfmerge.vfm vd, vs2, rs1, v0 # vd[i] = v0.mask[i] ? f[rs1] : vs2[i] ---- -[[sec-vector-float-move]] ==== Vector Floating-Point Move Instruction The vector floating-point move instruction __splats__ a floating-point @@ -3885,15 +3889,15 @@ All operands and results of single-width reduction instructions have the same SEW width. Overflows wrap around on arithmetic sums. ---- - # Simple reductions, where [*] denotes all active elements: - vredsum.vs vd, vs2, vs1, vm # vd[0] = sum( vs1[0] , vs2[*] ) - vredmaxu.vs vd, vs2, vs1, vm # vd[0] = maxu( vs1[0] , vs2[*] ) - vredmax.vs vd, vs2, vs1, vm # vd[0] = max( vs1[0] , vs2[*] ) - vredminu.vs vd, vs2, vs1, vm # vd[0] = minu( vs1[0] , vs2[*] ) - vredmin.vs vd, vs2, vs1, vm # vd[0] = min( vs1[0] , vs2[*] ) - vredand.vs vd, vs2, vs1, vm # vd[0] = and( vs1[0] , vs2[*] ) - vredor.vs vd, vs2, vs1, vm # vd[0] = or( vs1[0] , vs2[*] ) - vredxor.vs vd, vs2, vs1, vm # vd[0] = xor( vs1[0] , vs2[*] ) +# Simple reductions, where [*] denotes all active elements: +vredsum.vs vd, vs2, vs1, vm # vd[0] = sum( vs1[0] , vs2[*] ) +vredmaxu.vs vd, vs2, vs1, vm # vd[0] = maxu( vs1[0] , vs2[*] ) +vredmax.vs vd, vs2, vs1, vm # vd[0] = max( vs1[0] , vs2[*] ) +vredminu.vs vd, vs2, vs1, vm # vd[0] = minu( vs1[0] , vs2[*] ) +vredmin.vs vd, vs2, vs1, vm # vd[0] = min( vs1[0] , vs2[*] ) +vredand.vs vd, vs2, vs1, vm # vd[0] = and( vs1[0] , vs2[*] ) +vredor.vs vd, vs2, vs1, vm # vd[0] = or( vs1[0] , vs2[*] ) +vredxor.vs vd, vs2, vs1, vm # vd[0] = xor( vs1[0] , vs2[*] ) ---- [[sec-vector-integer-reduce-widen]] @@ -3909,23 +3913,22 @@ elements before summing them. For both `vwredsumu.vs` and `vwredsum.vs`, overflows wrap around. ---- - # Unsigned sum reduction into double-width accumulator - vwredsumu.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(zero-extend(SEW)) +# Unsigned sum reduction into double-width accumulator +vwredsumu.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(zero-extend(SEW)) - # Signed sum reduction into double-width accumulator - vwredsum.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(sign-extend(SEW)) +# Signed sum reduction into double-width accumulator +vwredsum.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(sign-extend(SEW)) ---- [[sec-vector-float-reduce]] ==== Vector Single-Width Floating-Point Reduction Instructions ---- - # Simple reductions. - vfredosum.vs vd, vs2, vs1, vm # Ordered sum - vfredusum.vs vd, vs2, vs1, vm # Unordered sum - vfredmax.vs vd, vs2, vs1, vm # Maximum value - vfredmin.vs vd, vs2, vs1, vm # Minimum value - +# Simple reductions. +vfredosum.vs vd, vs2, vs1, vm # Ordered sum +vfredusum.vs vd, vs2, vs1, vm # Unordered sum +vfredmax.vs vd, vs2, vs1, vm # Maximum value +vfredmin.vs vd, vs2, vs1, vm # Minimum value ---- NOTE: Older assembler mnemonic `vfredsum` is retained as alias for `vfredusum`. @@ -3943,7 +3946,7 @@ where each addition operates identically to the scalar floating-point instructions in terms of raising exception flags and generating or propagating special values. -NOTE: The ordered reduction supports compiler autovectorization, while +NOTE: The ordered reduction supports compiler auto-vectorization, while the unordered FP sum allows for faster implementations. When the operation is masked (`vm=0`), the masked-off elements do not @@ -3952,7 +3955,7 @@ affect the result or the exception flags. NOTE: If no elements are active, no additions are performed, so the scalar in `vs1[0]` is simply copied to the destination register, without canonicalizing NaN values and without setting any exception flags. This behavior preserves -the handling of NaNs, exceptions, and rounding when autovectorizing a scalar +the handling of NaNs, exceptions, and rounding when auto-vectorizing a scalar summation loop. ===== Vector Unordered Single-Width Floating-Point Sum Reduction @@ -4058,14 +4061,14 @@ Mask elements past `vl`, the tail elements, are always updated with a tail-agnostic policy. ---- - vmand.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && vs1.mask[i] - vmnand.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] && vs1.mask[i]) - vmandn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && !vs1.mask[i] - vmxor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] ^^ vs1.mask[i] - vmor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || vs1.mask[i] - vmnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] || vs1.mask[i]) - vmorn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || !vs1.mask[i] - vmxnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] ^^ vs1.mask[i]) +vmand.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && vs1.mask[i] +vmnand.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] && vs1.mask[i]) +vmandn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] && !vs1.mask[i] +vmxor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] ^^ vs1.mask[i] +vmor.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || vs1.mask[i] +vmnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] || vs1.mask[i]) +vmorn.mm vd, vs2, vs1 # vd.mask[i] = vs2.mask[i] || !vs1.mask[i] +vmxnor.mm vd, vs2, vs1 # vd.mask[i] = !(vs2.mask[i] ^^ vs1.mask[i]) ---- NOTE: The previous assembler mnemonics `vmandnot` and `vmornot` have @@ -4076,10 +4079,10 @@ mnemonics can be retained as assembler aliases for compatibility. Several assembler pseudoinstructions are defined as shorthand for common uses of mask logical operations: ---- - vmmv.m vd, vs => vmand.mm vd, vs, vs # Copy mask register - vmclr.m vd => vmxor.mm vd, vd, vd # Clear mask register - vmset.m vd => vmxnor.mm vd, vd, vd # Set mask register - vmnot.m vd, vs => vmnand.mm vd, vs, vs # Invert bits +vmmv.m vd, vs => vmand.mm vd, vs, vs # Copy mask register +vmclr.m vd => vmxor.mm vd, vd, vd # Clear mask register +vmset.m vd => vmxnor.mm vd, vd, vd # Set mask register +vmnot.m vd, vs => vmnand.mm vd, vs, vs # Invert bits ---- NOTE: The `vmmv.m` instruction was previously called `vmcpy.m`, but @@ -4132,7 +4135,7 @@ use. ==== Vector count population in mask `vcpop.m` ---- - vcpop.m rd, vs2, vm +vcpop.m rd, vs2, vm ---- NOTE: This instruction previously had the assembler mnemonic `vpopc.m` @@ -4141,7 +4144,7 @@ assembler instruction alias `vpopc.m` is being retained for software compatibility. The source operand is a single vector register holding mask register -values as described in Section <<sec-mask-register-layout>>. +values as described in <<sec-mask-register-layout>>. The `vcpop.m` instruction counts the number of mask elements of the active elements of the vector source mask register that have the value @@ -4151,20 +4154,20 @@ The operation can be performed under a mask, in which case only the masked elements are counted. ---- - vcpop.m rd, vs2, v0.t # x[rd] = sum_i ( vs2.mask[i] && v0.mask[i] ) +vcpop.m rd, vs2, v0.t # x[rd] = sum_i ( vs2.mask[i] && v0.mask[i] ) ---- The `vcpop.m` instruction writes `x[rd]` even if `vl`=0 (with the value 0, since no mask elements are active). Traps on `vcpop.m` are always reported with a `vstart` of 0. The -`vcpop.m` instruction will raise an illegal instruction exception if +`vcpop.m` instruction will raise an illegal-instruction exception if `vstart` is non-zero. ==== `vfirst` find-first-set mask bit ---- - vfirst.m rd, vs2, vm +vfirst.m rd, vs2, vm ---- The `vfirst` instruction finds the lowest-numbered active element of @@ -4180,7 +4183,7 @@ The `vfirst.m` instruction writes `x[rd]` even if `vl`=0 (with the value -1, since no mask elements are active). Traps on `vfirst` are always reported with a `vstart` of 0. The -`vfirst` instruction will raise an illegal instruction exception if +`vfirst` instruction will raise an illegal-instruction exception if `vstart` is non-zero. ==== `vmsbf.m` set-before-first mask bit @@ -4221,7 +4224,7 @@ The tail elements in the destination mask register are updated under a tail-agnostic policy. Traps on `vmsbf.m` are always reported with a `vstart` of 0. The -`vmsbf` instruction will raise an illegal instruction exception if +`vmsbf` instruction will raise an illegal-instruction exception if `vstart` is non-zero. The destination register cannot overlap the source register @@ -4257,7 +4260,7 @@ The tail elements in the destination mask register are updated under a tail-agnostic policy. Traps on `vmsif.m` are always reported with a `vstart` of 0. The -`vmsif` instruction will raise an illegal instruction exception if +`vmsif` instruction will raise an illegal-instruction exception if `vstart` is non-zero. The destination register cannot overlap the source register @@ -4294,7 +4297,7 @@ The tail elements in the destination mask register are updated under a tail-agnostic policy. Traps on `vmsof.m` are always reported with a `vstart` of 0. The -`vmsof` instruction will raise an illegal instruction exception if +`vmsof` instruction will raise an illegal-instruction exception if `vstart` is non-zero. The destination register cannot overlap the source register @@ -4346,7 +4349,7 @@ destination SEW, the least-significant SEW bits are retained. Traps on `viota.m` are always reported with a `vstart` of 0, and execution is always restarted from the beginning when resuming after a -trap handler. An illegal instruction exception is raised if `vstart` +trap handler. An illegal-instruction exception is raised if `vstart` is non-zero. The destination register group cannot overlap the source register @@ -4356,27 +4359,26 @@ The `viota.m` instruction can be combined with memory scatter instructions (indexed stores) to perform vector compress functions. ---- - # Compact non-zero elements from input memory array to output memory array - # - # size_t compact_non_zero(size_t n, const int* in, int* out) - # { - # size_t i; - # size_t count = 0; - # int *p = out; - # - # for (i=0; i<n; i++) - # { - # const int v = *in++; - # if (v != 0) - # *p++ = v; - # } - # - # return (size_t) (p - out); - # } - # - # a0 = n - # a1 = &in - # a2 = &out +# Compact non-zero elements from input memory array to output memory array +# +# size_t compact_non_zero(size_t n, const int* in, int* out) +# { +# size_t i; +# int *p = out; +# +# for (i=0; i<n; i++) +# { +# const int v = *in++; +# if (v != 0) +# *p++ = v; +# } +# +# return (size_t) (p - out); +# } +# +# a0 = n +# a1 = &in +# a2 = &out compact_non_zero: li a6, 0 # Clear count of non-zero elements @@ -4406,7 +4408,7 @@ The `vid.v` instruction writes each element's index to the destination vector register group, from 0 to `vl`-1. ---- - vid.v vd, vm # Write element ID to destination. +vid.v vd, vm # Write element ID to destination. ---- The instruction can be masked. Masking does not change the @@ -4459,6 +4461,7 @@ destination vector register group, regardless of `vstart`. The encodings corresponding to the masked versions (`vm=0`) of `vmv.x.s` and `vmv.s.x` are reserved. +[[sec-vector-float-move]] ==== Floating-Point Scalar Move Instructions The floating-point scalar read/write instructions transfer a single @@ -4513,11 +4516,11 @@ The slide instructions may be masked, with mask element _i_ controlling whether _destination_ element _i_ is written. The mask undisturbed/agnostic policy is followed for inactive elements. -===== Vector Slideup Instructions +===== Vector Slide-up Instructions ---- - vslideup.vx vd, vs2, rs1, vm # vd[i+x[rs1]] = vs2[i] - vslideup.vi vd, vs2, uimm, vm # vd[i+uimm] = vs2[i] +vslideup.vx vd, vs2, rs1, vm # vd[i+x[rs1]] = vs2[i] +vslideup.vi vd, vs2, uimm, vm # vd[i+uimm] = vs2[i] ---- For `vslideup`, the value in `vl` specifies the maximum number of destination @@ -4529,13 +4532,13 @@ Destination elements _OFFSET_ through `vl`-1 are written if unmasked and if _OFFSET_ < `vl`. ---- - vslideup behavior for destination elements (`vstart` < `vl`) +vslideup behavior for destination elements (`vstart` < `vl`) - OFFSET is amount to slideup, either from x register or a 5-bit immediate +OFFSET is amount to slideup, either from x register or a 5-bit immediate - 0 <= i < min(vl, max(vstart, OFFSET)) Unchanged - max(vstart, OFFSET) <= i < vl vd[i] = vs2[i-OFFSET] if v0.mask[i] enabled - vl <= i < VLMAX Follow tail policy + 0 <= i < min(vl, max(vstart, OFFSET)) Unchanged +max(vstart, OFFSET) <= i < vl vd[i] = vs2[i-OFFSET] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy ---- The destination vector register group for `vslideup` cannot overlap @@ -4546,17 +4549,17 @@ NOTE: The non-overlap constraint avoids WAR hazards on the input vectors during execution, and enables restart with non-zero `vstart`. -===== Vector Slidedown Instructions +===== Vector Slide-down Instructions ---- - vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+x[rs1]] - vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm] +vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+x[rs1]] +vslidedown.vi vd, vs2, uimm, vm # vd[i] = vs2[i+uimm] ---- For `vslidedown`, the value in `vl` specifies the maximum number of destination elements that are written. The remaining elements past -`vl` are handled according to the current tail policy (Section -<<sec-agnostic>>). +`vl` are handled according to the current tail policy +(<<sec-agnostic>>). The start index (_OFFSET_) for the source can be either specified using an unsigned integer in the `x` register specified by `rs1`, or a @@ -4564,25 +4567,24 @@ using an unsigned integer in the `x` register specified by `rs1`, or a If XLEN > SEW, _OFFSET_ is _not_ truncated to SEW bits. ---- - vslidedown behavior for source elements for element i in slide (`vstart` < `vl`) - 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] - VLMAX <= i+OFFSET src[i] = 0 - - vslidedown behavior for destination element i in slide (`vstart` < `vl`) - 0 <= i < vstart Unchanged - vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled - vl <= i < VLMAX Follow tail policy +vslidedown behavior for source elements for element i in slide (`vstart` < `vl`) + 0 <= i+OFFSET < VLMAX src[i] = vs2[i+OFFSET] + VLMAX <= i+OFFSET src[i] = 0 +vslidedown behavior for destination element i in slide (`vstart` < `vl`) + 0 <= i < vstart Unchanged + vstart <= i < vl vd[i] = src[i] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy ---- -===== Vector Slide1up +===== Vector Slide-1-up Variants of slide are provided that only move by one element but which also allow a scalar integer value to be inserted at the vacated element position. ---- - vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] +vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i] ---- The `vslide1up` instruction places the `x` register argument at @@ -4598,17 +4600,17 @@ vector register group. The `vl` register specifies the maximum number of destination vector register elements updated with source values, and remaining elements -past `vl` are handled according to the current tail policy (Section -<<sec-agnostic>>). +past `vl` are handled according to the current tail policy +(<<sec-agnostic>>). ---- - vslide1up behavior when vl > 0 +vslide1up behavior when vl > 0 - i < vstart unchanged - 0 = i = vstart vd[i] = x[rs1] if v0.mask[i] enabled - max(vstart, 1) <= i < vl vd[i] = vs2[i-1] if v0.mask[i] enabled - vl <= i < VLMAX Follow tail policy + i < vstart unchanged + 0 = i = vstart vd[i] = x[rs1] if v0.mask[i] enabled +max(vstart, 1) <= i < vl vd[i] = vs2[i-1] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy ---- The `vslide1up` instruction requires that the destination vector @@ -4616,16 +4618,16 @@ register group does not overlap the source vector register group. Otherwise, the instruction encoding is reserved. [[sec-vfslide1up]] -===== Vector Floating-Point Slide1up Instruction +===== Vector Floating-Point Slide-1-up Instruction ---- - vfslide1up.vf vd, vs2, rs1, vm # vd[0]=f[rs1], vd[i+1] = vs2[i] +vfslide1up.vf vd, vs2, rs1, vm # vd[0]=f[rs1], vd[i+1] = vs2[i] ---- The `vfslide1up` instruction is defined analogously to `vslide1up`, but sources its scalar argument from an `f` register. -===== Vector Slide1down Instruction +===== Vector Slide-1-down Instruction The `vslide1down` instruction copies the first `vl`-1 active elements values from index _i_+1 in the source vector register group to index @@ -4633,11 +4635,11 @@ _i_ in the destination vector register group. The `vl` register specifies the maximum number of destination vector register elements written with source values, and remaining elements -past `vl` are handled according to the current tail policy (Section -<<sec-agnostic>>). +past `vl` are handled according to the current tail policy +(<<sec-agnostic>>). ---- - vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] +vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1] ---- The `vslide1down` instruction places the `x` register argument at @@ -4649,12 +4651,12 @@ XLEN > SEW, the least-significant bits are copied over and the high SEW-XLEN bits are ignored. ---- - vslide1down behavior +vslide1down behavior - i < vstart unchanged - vstart <= i < vl-1 vd[i] = vs2[i+1] if v0.mask[i] enabled - vstart <= i = vl-1 vd[vl-1] = x[rs1] if v0.mask[i] enabled - vl <= i < VLMAX Follow tail policy + i < vstart unchanged +vstart <= i < vl-1 vd[i] = vs2[i+1] if v0.mask[i] enabled +vstart <= i = vl-1 vd[vl-1] = x[rs1] if v0.mask[i] enabled + vl <= i < VLMAX Follow tail policy ---- NOTE: The `vslide1down` instruction can be used to load values into a @@ -4664,10 +4666,10 @@ contents of a vector register, albeit slowly, with multiple repeated `vslide1down` invocations. [[sec-vfslide1down]] -===== Vector Floating-Point Slide1down Instruction +===== Vector Floating-Point Slide-1-down Instruction ---- - vfslide1down.vf vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=f[rs1] +vfslide1down.vf vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=f[rs1] ---- The `vfslide1down` instruction is defined analogously to `vslide1down`, @@ -4682,7 +4684,7 @@ treated as unsigned integers. The source vector can be read at any index < VLMAX regardless of `vl`. The maximum number of elements to write to the destination register is given by `vl`, and the remaining elements past `vl` are handled according to the current tail policy -(Section <<sec-agnostic>>). The operation can be masked, and the mask +(<<sec-agnostic>>). The operation can be masked, and the mask undisturbed/agnostic policy is followed for inactive elements. ---- @@ -4729,27 +4731,27 @@ contiguous elements at the start of the destination vector register group. ---- - vcompress.vm vd, vs2, vs1 # Compress into vd elements of vs2 where vs1 is enabled +vcompress.vm vd, vs2, vs1 # Compress into vd elements of vs2 where vs1 is enabled ---- The vector mask register specified by `vs1` indicates which of the first `vl` elements of vector register group `vs2` should be extracted and packed into contiguous elements at the beginning of vector register `vd`. The remaining elements of `vd` are treated as tail -elements according to the current tail policy (Section -<<sec-agnostic>>). +elements according to the current tail policy +(<<sec-agnostic>>). ---- - Example use of vcompress instruction +Example use of vcompress instruction - 8 7 6 5 4 3 2 1 0 Element number +8 7 6 5 4 3 2 1 0 Element number - 1 1 0 1 0 0 1 0 1 v0 - 8 7 6 5 4 3 2 1 0 v1 - 1 2 3 4 5 6 7 8 9 v2 - vsetivli t0, 9, e8, m1, tu, ma - vcompress.vm v2, v1, v0 - 1 2 3 4 8 7 5 2 0 v2 +1 1 0 1 0 0 1 0 1 v0 +8 7 6 5 4 3 2 1 0 v1 +1 2 3 4 5 6 7 8 9 v2 + vsetivli t0, 9, e8, m1, tu, ma + vcompress.vm v2, v1, v0 +1 2 3 4 8 7 5 2 0 v2 ---- `vcompress` is encoded as an unmasked instruction (`vm=1`). The equivalent @@ -4761,7 +4763,7 @@ encoding is reserved. A trap on a `vcompress` instruction is always reported with a `vstart` of 0. Executing a `vcompress` instruction with a non-zero -`vstart` raises an illegal instruction exception. +`vstart` raises an illegal-instruction exception. NOTE: Although possible, `vcompress` is one of the more difficult instructions to restart with a non-zero `vstart`, so assumption is @@ -4775,30 +4777,30 @@ There is no inverse `vdecompress` provided, as this operation can be readily synthesized using iota and a masked vrgather: ---- - Desired functionality of 'vdecompress' - 7 6 5 4 3 2 1 0 # vid +Desired functionality of 'vdecompress' +7 6 5 4 3 2 1 0 # vid - e d c b a # packed vector of 5 elements - 1 0 0 1 1 1 0 1 # mask vector of 8 elements - p q r s t u v w # destination register before vdecompress + e d c b a # packed vector of 5 elements +1 0 0 1 1 1 0 1 # mask vector of 8 elements +p q r s t u v w # destination register before vdecompress - e q r d c b v a # result of vdecompress +e q r d c b v a # result of vdecompress ---- ---- - # v0 holds mask - # v1 holds packed data - # v11 holds input expanded vector and result - viota.m v10, v0 # Calc iota from mask in v0 - vrgather.vv v11, v1, v10, v0.t # Expand into destination +# v0 holds mask +# v1 holds packed data +# v11 holds input expanded vector and result +viota.m v10, v0 # Calc iota from mask in v0 +vrgather.vv v11, v1, v10, v0.t # Expand into destination ---- ---- - p q r s t u v w # v11 destination register - e d c b a # v1 source vector - 1 0 0 1 1 1 0 1 # v0 mask vector +p q r s t u v w # v11 destination register + e d c b a # v1 source vector +1 0 0 1 1 1 0 1 # v0 mask vector - 4 4 4 3 2 1 1 0 # v10 result of viota.m - e q r d c b v a # v11 destination after vrgather using viota.m under mask +4 4 4 3 2 1 1 0 # v10 result of viota.m +e q r d c b v a # v11 destination after vrgather using viota.m under mask ---- ==== Whole Vector Register Move @@ -4810,7 +4812,7 @@ copy. The instructions operate as if EEW=SEW, EMUL = NREG, effective length `evl`= EMUL * VLEN/SEW. NOTE: These instructions are intended to aid compilers to shuffle -vector registers without needing to know or change `vl` or `vtype`. +vector registers without needing to know or change `vl`. NOTE: The usual property that no elements are written if `vstart` {ge} `vl` does not apply to these instructions. @@ -4838,12 +4840,12 @@ related `vmerge` encoding, and it is unlikely the `vsmul` instruction would benefit from an immediate form. ---- - vmv<nr>r.v vd, vs2 # General form +vmv<nr>r.v vd, vs2 # General form - vmv1r.v v1, v2 # Copy v1=v2 - vmv2r.v v10, v12 # Copy v10=v12; v11=v13 - vmv4r.v v4, v8 # Copy v4=v8; v5=v9; v6=v10; v7=v11 - vmv8r.v v0, v8 # Copy v0=v8; v1=v9; ...; v7=v15 +vmv1r.v v1, v2 # Copy v1=v2 +vmv2r.v v10, v12 # Copy v10=v12; v11=v13 +vmv4r.v v4, v8 # Copy v4=v8; v5=v9; v6=v10; v7=v11 +vmv8r.v v0, v8 # Copy v0=v8; v1=v9; ...; v7=v15 ---- The source and destination vector register numbers must be aligned @@ -4868,7 +4870,7 @@ scheme in the IBM 3090 vector facility. To ensure forward progress without the `vstart` CSR, implementations would have to guarantee an entire vector instruction can always complete atomically without generating a trap. This is particularly difficult to ensure in the -presence of strided or scatter/gather operations and demand-paged +presence of constant-stride or scatter/gather operations and demand-paged virtual memory. ==== Precise vector traps @@ -4990,10 +4992,6 @@ NOTE: Longer vector length extensions should follow the same pattern. NOTE: Every vector length extension effectively includes all shorter vector length extensions. -NOTE: The syntax for extension names is being revised, and these names -are subject to change. The trailing "b" will be required to -disambiguate numeric fields from version numbers. - NOTE: Explicit use of the Zvl32b extension string is not required for any standard vector extension as they all effectively mandate at least this minimum, but the string can be useful when stating hardware @@ -5034,14 +5032,14 @@ All Zve* extensions provide support for EEW of 8, 16, and 32, and Zve64* extensions also support EEW of 64. All Zve* extensions support the vector configuration instructions -(Section <<sec-vector-config>>). +(<<sec-vector-config>>). All Zve* extensions support all vector load and store instructions -(Section <<sec-vector-memory>>), except Zve64* extensions do not +(<<sec-vector-memory>>), except Zve64* extensions do not support EEW=64 for index values when XLEN=32. -All Zve* extensions support all vector integer instructions (Section -<<sec-vector-integer>>), except that the `vmulh` integer multiply +All Zve* extensions support all vector integer instructions +(<<sec-vector-integer>>), except that the `vmulh` integer multiply variants that return the high word of the product (`vmulh.vv`, `vmulh.vx`, `vmulhu.vv`, `vmulhu.vx`, `vmulhsu.vv`, `vmulhsu.vx`) are not included for EEW=64 in Zve64*. @@ -5057,27 +5055,27 @@ NOTE: As with `vmulh`, `vsmul` requires a large amount of additional logic, and 64-bit fixed-point multiplies are relatively rare. All Zve* extensions support all vector integer single-width and -widening reduction operations (Sections <<sec-vector-integer-reduce>>, +widening reduction operations (<<sec-vector-integer-reduce>>, <<sec-vector-integer-reduce-widen>>). -All Zve* extensions support all vector mask instructions (Section -<<sec-vector-mask>>). +All Zve* extensions support all vector mask instructions +(<<sec-vector-mask>>). All Zve* extensions support all vector permutation instructions -(Section <<sec-vector-permute>>), except that Zve32x and Zve64x +(<<sec-vector-permute>>), except that Zve32x and Zve64x do not include those with floating-point operands, and Zve64f does not include those with EEW=64 floating-point operands. The Zve32x extension depends on the Zicsr extension. The Zve32f and Zve64f extensions depend upon the F extension, and implement all -vector floating-point instructions (Section <<sec-vector-float>>) for +vector floating-point instructions (<<sec-vector-float>>) for floating-point operands with EEW=32. Vector single-width floating-point reduction operations (<<sec-vector-float-reduce>>) for EEW=32 are supported. The Zve64d extension depends upon the D extension, and implements all vector -floating-point instructions (Section <<sec-vector-float>>) for +floating-point instructions (<<sec-vector-float>>) for floating-point operands with EEW=32 or EEW=64 (including widening instructions and conversions between FP32 and FP64). Vector single-width floating-point reductions (<<sec-vector-float-reduce>>) @@ -5097,42 +5095,42 @@ The V vector extension has precise traps. The V vector extension depends upon the Zvl128b and Zve64d extensions. NOTE: The value of 128 was chosen as a compromise for application -processors. Providing a larger VLEN allows stripmining code to be +processors. Providing a larger VLEN allows strip-mining code to be elided in some cases for short vectors, but also increases the size of the minimum implementation. Note that larger LMUL can be used to -avoid stripmining for longer known-size application vectors at the +avoid strip mining for longer known-size application vectors at the cost of having fewer available vector register groups. For example, an LMUL of 8 allows vectors of up to sixteen 64-bit elements to be -processed without stripmining using four vector register groups. +processed without strip mining using four vector register groups. The V extension supports EEW of 8, 16, and 32, and 64. The V extension supports the vector configuration instructions -(Section <<sec-vector-config>>). +(<<sec-vector-config>>). The V extension supports all vector load and store instructions -(Section <<sec-vector-memory>>), except the V extension does not +(<<sec-vector-memory>>), except the V extension does not support EEW=64 for index values when XLEN=32. -The V extension supports all vector integer instructions (Section -<<sec-vector-integer>>). +The V extension supports all vector integer instructions +(<<sec-vector-integer>>). The V extension supports all vector fixed-point arithmetic instructions (<<sec-vector-fixed-point>>). The V extension supports all vector integer single-width and -widening reduction operations (Sections <<sec-vector-integer-reduce>>, +widening reduction operations (<<sec-vector-integer-reduce>>, <<sec-vector-integer-reduce-widen>>). -The V extension supports all vector mask instructions (Section -<<sec-vector-mask>>). +The V extension supports all vector mask instructions +(<<sec-vector-mask>>). -The V extension supports all vector permutation instructions (Section -<<sec-vector-permute>>). +The V extension supports all vector permutation instructions +(<<sec-vector-permute>>). The V extension depends upon the F and D extensions, and implements all vector floating-point instructions -(Section <<sec-vector-float>>) for floating-point operands with EEW=32 +(<<sec-vector-float>>) for floating-point operands with EEW=32 or EEW=64 (including widening instructions and conversions between FP32 and FP64). Vector single-width floating-point reductions (<<sec-vector-float-reduce>>) for EEW=32 and EEW=64 are supported as @@ -5161,7 +5159,7 @@ The Zvfhmin extension depends on the Zve32f extension. The Zvfh extension provides support for vectors of IEEE 754-2008 binary16 values. -When the Zvfh extension is implemented, all instructions in Sections +When the Zvfh extension is implemented, all instructions in <<sec-vector-float>>, <<sec-vector-float-reduce>>, <<sec-vector-float-reduce-widen>>, <<sec-vector-float-move>>, <<sec-vfslide1up>>, and <<sec-vfslide1down>> @@ -5209,13 +5207,12 @@ straddling the individual vector registers in a vector register group. Non-POT EGS can also cause large increases in the lowest-common-multiple of element group sizes, which adds constraints to `vl` setting in order to avoid splitting an element group across -stripmine iterations in vector-length-agnostic code. +strip-mine iterations in vector-length-agnostic code. The element group size is statically encoded in the instruction, often implicitly as part of the opcode. -Executing a vector instruction with EGS > VLMAX causes an illegal -instruction exception to be raised. +Vector instructions with EGS > VLMAX are reserved. NOTE: The vector instructions in the base V vector ISA can be viewed as all having an element group size of 1 for all operands statically @@ -5247,8 +5244,8 @@ NOTE: As the base vector extension only has element group size of 1, this constraint is backwards-compatible. NOTE: This constraint prevents element groups being broken across -stripmining iterations in vector-length-agnostic code when a -VLMAX-size vector would otherwise be able to accomodate a whole number +strip-mining iterations in vector-length-agnostic code when a +VLMAX-size vector would otherwise be able to accommodate a whole number of element groups. NOTE: If EEW is encoded statically in the instruction, or if an @@ -5259,7 +5256,7 @@ instructions. NOTE: Additional constraints may be required for some element group instructions to ensure legal length values for all operands. -==== Determining EEW +==== Determining EEW The `vtype` SEW can be used to indicate or calculate the effective element size (EEW) of one or more operands of an element group @@ -5325,5 +5322,4 @@ the mask element group is set). === Vector Instruction Listing -include::images/wavedrom/v-inst-table.adoc[] - +include::images/wavedrom/v-inst-table.edn[] diff --git a/src/vector-crypto.adoc b/src/vector-crypto.adoc index 695a46a..61650ae 100644 --- a/src/vector-crypto.adoc +++ b/src/vector-crypto.adoc @@ -1,6 +1,6 @@ == Cryptography Extensions: Vector Instructions, Version 1.0 -This document describes the Vector Cryptography extensions to the +This document describes the Vector Cryptography extensions to the RISC-V Instruction Set Architecture. [[crypto_vector_introduction]] @@ -70,7 +70,7 @@ but they are the ones we considered most while writing it. [[crypto_vector_sail_specifications]] ==== Sail Specifications -RISC-V maintains a +RISC-V maintains a link:https://github.com/riscv/sail-riscv[formal model] of the ISA specification, implemented in the Sail ISA specification language @@ -110,15 +110,15 @@ This supporting code is listed in <<crypto_vector_appx_sail>>. -The -link:https://github.com/rems-project/sail/blob/sail2/manual.pdf[Sail Manual] +The +link:https://alasdair.github.io/manual.html[Sail Manual] is recommended reading in order to best understand the code snippets. Also, the link:https://github.com/billmcspadden-riscv/sail/blob/cookbook_br/cookbook/doc/TheSailCookbook_Complete.pdf[The Sail Programming Language: A Sail Cookbook] -is a good reference that is in the process of being written. +is a good reference. For the latest RISC-V Sail model, refer to -the formal model Github +the formal model GitHub link:https://github.com/riscv/sail-riscv[repository]. [[crypto_vector_policies]] @@ -138,7 +138,7 @@ policies: where recommended (but not required) instruction sequences for performing particular tasks are given as an example, such that both hardware and software implementers can optimize for only a single use-case. - + * The extensions will be designed to support _existing_ standardized cryptographic constructs well. It will not try to support proposed standards, or cryptographic @@ -147,7 +147,7 @@ policies: the RISC-V vector cryptographic extensions standardization will be dealt with by future RISC-V vector cryptographic standard extensions. - + * Historically, there has been some discussion cite:[LSYRR:04] on how newly supported operations in general-purpose computing might @@ -155,7 +155,7 @@ policies: The standard will not try to anticipate new useful low-level operations which _may_ be useful as building blocks for future cryptographic constructs. - + * Regarding side-channel countermeasures: Where relevant, proposed instructions must aim to remove the possibility of any timing side-channels. All instructions @@ -206,18 +206,18 @@ For all of the vector crypto instructions in this specification, `EEW`=`SEW`. [NOTE] ==== The required `SEW` for each cryptographic instruction was chosen to match what is -typically needed for other instructions when implementing the targeted algorithm. +typically needed for other instructions when implementing the targeted algorithm. ==== - A *Vector Element Group* is a vector of one or more element groups. -- A *Scalar Element Group* is a single element group. +- A *Scalar Element Group* is a single element group. Element groups can be formed across registers in implementations where `VLEN`< `EGW` by using an `LMUL`>1. [NOTE] ==== -Since the the *vector extension for application processors* requires a minimum of VLEN of 128, +Since the *vector extension for application processors* requires a minimum of VLEN of 128, at most such implementations would require LMUL=2 to form the largest element groups in this specification. However, implementations with a smaller VLEN, such as embedded designs, will requires a larger `LMUL` @@ -236,15 +236,20 @@ Likewise the starting element group is `vstart`/`EGS`. See <<crypto-vector-instruction-constraints>> for limitations on `vl` and `vstart` for vector crypto instructions. -// If this ratio is not an integer for a vector crypto instruction, an illegal instruction exception is raised. +// If this ratio is not an integer for a vector crypto instruction, an illegal-instruction exception is raised. -// Since `vstart` is expressed in elements, the starting element group is `vstart`/`EGS`. -// If this ratio is not an integer for a vector crypto instruction, an illegal instruction exception is raised. +// Since `vstart` is expressed in elements, the starting element group is `vstart`/`EGS`. +// If this ratio is not an integer for a vector crypto instruction, an illegal-instruction exception is raised. [[crypto-vector-instruction-constraints]] ==== Instruction Constraints + +All standard vector instruction constraints specified by RVV 1.0 apply to Vector Crypto instructions. +In addition to those constraints a few additional specific constraints are introduced. + The following is a quick reference for the various constraints of specific Vector Crypto instructions. + vl and vstart constraints:: Since `vl` and `vstart` refer to elements, Vector Crypto instructions that use elements groups (See <<crypto-vector-element-groups>>) require that these values are an integer multiple of the @@ -255,33 +260,33 @@ Element Group Size (`EGS`). [%autowidth] [%header,cols="4,4"] |=== -| Instructions +| Instructions | EGS | vaes* | 4 | vsha2* | 4 | vg* | 4 -| vsm3* | 8 +| vsm3* | 8 | vsm4* | 4 |=== LMUL constraints:: For element-group instructions, `LMUL`*`VLEN` must always be at least as large as `EGW`, otherwise an -_illegal instruction exception_ is raised, even if `vl`=0. +_illegal-instruction exception_ is raised, even if `vl`=0. [%autowidth] [%header,cols="4,2,2"] |=== | Instructions -| SEW +| SEW | EGW | vaes* | 32 | 128 | vsha2* | 32 | 128 | vsha2* | 64 | 256 | vg* | 32 | 128 -| vsm3* | 32 | 256 +| vsm3* | 32 | 256 | vsm4* | 32 | 128 |=== @@ -294,7 +299,7 @@ all other `SEW` values are _reserved_. [%autowidth] [%header,cols="4,4"] |=== -| Instructions +| Instructions | Required SEW | vaes* | 32 @@ -308,8 +313,13 @@ all other `SEW` values are _reserved_. |=== -Source/Destination overlap constraints:: -Some Vector Crypto instructions have overlap constraints. Encodings that violate these constraints are _reserved_. +Vector/Scalar constraints:: +This specification defines new vector/scalar (.vs) instructions that uses *Scalar Element Groups*. The *Scalar Element Group* operand has `EMUL = ceil(EGW / VLEN)`. + +[NOTE] +==== +Scalar element group operands do not need to be aligned to LMUL for any implementation with VLEN >= EGW. +==== In the case of the `.vs` instructions defined in this specification, `vs2` holds a 128-bit scalar element group. For implementations with `VLEN` ≥ 128, `vs2` refers to a single register. Thus, the `vd` register group must not @@ -322,11 +332,11 @@ overlap this `vs2` register group. [%header,cols="4,4,4"] |=== | Instruction -| Register +| Register | Cannot Overlap | vaes*.vs | vs2 | vd -| vsm4r.vs | vs2 | vd +| vsm4r.vs | vs2 | vd | vsha2c[hl] | vs1, vs2 | vd | vsha2ms | vs1, vs2 | vd | vsm3me | vs2 | vd @@ -353,7 +363,6 @@ The Vector Crypto Extensions define Vector-Scalar instructions that are similar Vector Reduction Operations in that they get a scalar operand from a vector register. However, they differ in that they get a scalar element group (see <<crypto-vector-element-groups>>) -// link:https://github.com/riscv/riscv-v-spec/blob/master/element_groups.adoc[RISC-V Vector Element Groups]) from `vs2` and they return _vector_ results to `vd`, which is also a source vector operand. These Vector-Scalar crypto instructions also use the `.vs` suffix in their mnemonics. @@ -385,7 +394,7 @@ crypto instructions allow the round key to be specified as a scalar element grou // In the case of AES256 all-rounds instructions we need to provide two 128-bit keys; one is held in `vs1` and // the other is held in `vs2`. The 128-bit data to be processed is held in `vd`. // A vector-scalar form of this instruction looks different from the existing vector-scalar instructions in that -// both `vs1` and `vs2` are treated as scalar operands that apply to the vector operands of `vd`. +// both `vs1` and `vs2` are treated as scalar operands that apply to the vector operands of `vd`. // [NOTE] // ==== @@ -395,7 +404,7 @@ crypto instructions allow the round key to be specified as a scalar element grou // to use `vs1` for the key and `vs2` for the data. // In the case of vector-scalar instructions, the scalar key will be held in // element group 0 of `vs1` . This is done to remain consistent with the use of `vs1` for the scalar element in -// all of the existing vector-scalar operations as well as the vector reduction operations. +// all of the existing vector-scalar operations as well as the vector reduction operations. // ==== [[crypto-vector-software-portability]] @@ -435,7 +444,7 @@ libraries should also include more optimal code for these instructions when `VLE | LMUL | SHA-512 | vsha2* | 64 | vl/2 -| SM3 | vsm3* | 32 | vl/4 +| SM3 | vsm3* | 32 | vl/4 |=== // [NOTE] @@ -468,12 +477,13 @@ The section introduces all of the extensions in the Vector Cryptography Instruction Set Extension Specification. The <<zvknh,Zvknhb>> and <<zvbc>> Vector Crypto Extensions ---and accordingly the composite extensions <<Zvkn>> and <<Zvks>>-- -require a Zve64x base, -or application ("V") base Vector Extension. +--and accordingly the composite extensions <<Zvkn>>, <<Zvknc>>, <<Zvkng>>, and <<Zvksc>>-- +depend on Zve64x. + +All of the other Vector Crypto Extensions depend on `Zve32x`. + +Note: If `Zve32x` is supported then `Zvkb` or `Zvbb` provide support for EEW of 8, 16, and 32. If `Zve64x` is supported then `Zvkb` or `Zvbb` also add support for EEW 64. -All of the other Vector Crypto Extensions can be built -on _any_ embedded (Zve*) or application ("V") base Vector Extension. // See <<crypto-vector-element-groups>> for more details on vector element groups and the drawbacks of // small `VLEN` values. @@ -482,7 +492,7 @@ on _any_ embedded (Zve*) or application ("V") base Vector Extension. All _cryptography-specific_ instructions defined in this Vector Crypto specification (i.e., those in <<zvkned>>, <<zvknh,Zvknh[ab]>>, <<Zvkg>>, <<Zvksed>> and <<zvksh>> but _not_ <<zvbb>>,<<zvkb>>, or <<zvbc>>) shall be executed with data-independent execution latency as defined in the -link:https://github.com/riscv/riscv-crypto/releases/tag/v1.0.1-scalar[RISC-V Scalar Cryptography Extensions specification]. +<<#crypto_scalar_instructions,RISC-V Scalar Cryptography Extensions specification>>. It is important to note that the Vector Crypto instructions are independent of the implementation of the `Zkt` extension and do not require that `Zkt` is implemented. @@ -550,7 +560,7 @@ These instructions are only defined for `SEW`=64. <<< [[zvkb,Zvkb]] -==== `Zvkb` - Vector Cryptography Bit-manipulation +==== `Zvkb` - Vector Cryptography Bit-manipulation Vector bit-manipulation instructions that are essential for implementing common cryptographic workloads securely & @@ -559,8 +569,8 @@ efficiently. [NOTE] ==== This Zvkb extension is a proper subset of the Zvbb extension. -Zvkb allows for vector crypto implementations without incuring -the the cost of implementing the additional bitmanip instructions +Zvkb allows for vector crypto implementations without incurring +the cost of implementing the additional bitmanip instructions in the Zvbb extension: vbrev.v, vclz.v, vctz.v, vcpop.v, and vwsll.[vv,vx,vi]. ==== @@ -630,7 +640,7 @@ Likewise, `vstart` must be a multiple of `EGS=4`. [[zvkned,Zvkned]] ==== `Zvkned` - NIST Suite: Vector AES Block Cipher -Instructions for accelerating +Instructions for accelerating encryption, decryption and key-schedule functions of the AES block cipher as defined in Federal Information Processing Standards Publication 197 @@ -650,8 +660,8 @@ from two or four registers by using an LMUL =2 and LMUL=4 respectively. To help avoid side-channel timing attacks, these instructions shall be implemented with data-independent timing. The number of element groups to be processed is `vl`/`EGS`. -`vl` must be set to the number of `SEW=32` elements to be processed and -therefore must be a multiple of `EGS=4`. + +`vl` must be set to the number of `SEW=32` elements to be processed and +therefore must be a multiple of `EGS=4`. + Likewise, `vstart` must be a multiple of `EGS=4`. [%autowidth] @@ -722,7 +732,7 @@ To help avoid side-channel timing attacks, these instructions shall be implement // ==== // It is recommended that implementations have VLEN≥128 for these instructions. // // Furthermore, for the best performance in SHA512, it is recommended that implementations have VLEN≥256. -// When VLEN<EGW, an appropriate LMUL needs to be used by software so that elements from the +// When VLEN<EGW, an appropriate LMUL needs to be used by software so that elements from the // specified register groups can be combined to form the full element group. // ==== @@ -749,7 +759,7 @@ Likewise, `vstart` must be a multiple of `EGS=4`. [[zvksed,Zvksed]] ==== `Zvksed` - ShangMi Suite: SM4 Block Cipher -Instructions for accelerating +Instructions for accelerating encryption, decryption and key-schedule functions of the SM4 block cipher. @@ -815,7 +825,7 @@ to provide all eight elements of the element group. // The instructions will be most efficient on implementations where `VLEN`≥256. // They will also provide substantial benefit on implementations where -// `VLEN`=128, but will require an `LMUL`>1 in order to combine elements +// `VLEN`=128, but will require an `LMUL`>1 in order to combine elements // within a register group to form the full element group. // Implementations with `VLEN`<128 might not be as efficient and should // consider the existing @@ -913,7 +923,7 @@ This extension is shorthand for the following set of other extensions: [NOTE] ==== This extension combines the NIST Algorithm Suite with the -GCM/GMAC extension to enable high-performace AES-GCM. +GCM/GMAC extension to enable high-performance AES-GCM. ==== <<< @@ -988,7 +998,7 @@ This extension is shorthand for the following set of other extensions: [NOTE] ==== This extension combines the ShangMi Algorithm Suite with the -GCM/GMAC extension to enable high-performace SM4-GCM. +GCM/GMAC extension to enable high-performance SM4-GCM. ==== <<< @@ -997,8 +1007,8 @@ GCM/GMAC extension to enable high-performace SM4-GCM. ==== `Zvkt` - Vector Data-Independent Execution Latency The Zvkt extension requires all implemented instructions from the following list to be -executed with data-independent execution latency as defined in the -link:https://github.com/riscv/riscv-crypto/releases/tag/v1.0.1-scalar[RISC-V Scalar Cryptography Extensions specification]. +executed with data-independent execution latency as defined in the +<<#crypto_scalar_instructions,RISC-V Scalar Cryptography Extensions specification>>. Data-independent execution latency (DIEL) applies to all _data operands_ of an instruction, even those that are not a part of the body or that are inactive. However, DIEL does not apply @@ -1021,7 +1031,7 @@ values include cryptographic keys, plain text, and partially encrypted text. DIEL is not intended to keep software (and cryptographic algorithms contained therein) secret as it is assumed that an adversary would already know these. This is why DIEL doesn't apply to constants -embedded in instruction encodings. +embedded in instruction encodings. It is important that the _values_ of elements that are not in the body or that are masked off do not affect the execution latency of the instruction. Sometimes such elements contain data that @@ -1107,7 +1117,7 @@ proper subset of <<Zvbb>> - vmerge.v[ivx]m ===== permute -In the `.vv` and `.xv` forms of the `vragather[ei16]` instructions, +In the `.vv` and `.xv` forms of the `vrgather[ei16]` instructions, the values in `vs1` and `rs1` are used for control and therefore are exempt from DIEL. - vrgather.v[ivx] @@ -1133,7 +1143,7 @@ from DIEL. [NOTE] ==== The following instructions are not affected by Zvkt: - + - *All storage operations* - *All floating-point operations* - add/sub saturate @@ -1183,7 +1193,7 @@ Synopsis:: Vector AES final-round decryption Mnemonic:: -vaesdf.vv vd, vs2 + +vaesdf.vv vd, vs2 + vaesdf.vs vd, vs2 Encoding (Vector-Vector):: @@ -1225,7 +1235,7 @@ Arguments:: |Register |Direction |EGW -|EGS +|EGS |EEW |Definition @@ -1251,7 +1261,7 @@ Operation:: -- function clause execute (VAESDF(vs2, vd, suffix)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { @@ -1326,7 +1336,7 @@ Arguments:: |Register |Direction |EGW -|EGS +|EGS |EEW |Definition @@ -1356,7 +1366,7 @@ Operation:: -- function clause execute (VAESDM(vs2, vd, suffix)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { @@ -1432,16 +1442,16 @@ Arguments:: |Register |Direction |EGW -|EGS +|EGS |EEW |Definition | vd | input | 128 | 4 | 32 | round state -| vs2 | input | 128 | 4 | 32 | round key +| vs2 | input | 128 | 4 | 32 | round key | vd | output | 128 | 4 | 32 | new round state |=== -Description:: +Description:: A final-round encryption function of the AES block cipher is performed. The SubBytes and ShiftRows steps are applied to each round state element group from `vd`. @@ -1452,8 +1462,8 @@ This instruction must always be implemented such that its execution latency does on the data being operated upon. // // The number of element groups to be processed is `vl`/`EGS`. -// `vl` must be set to the number of `SEW=32` elements to be processed and -// therefore must be a multiple of `EGS=4`. + +// `vl` must be set to the number of `SEW=32` elements to be processed and +// therefore must be a multiple of `EGS=4`. + // Likewise, `vstart` must be a multiple of `EGS=4`. @@ -1462,13 +1472,13 @@ Operation:: -- function clause execute (VAESEF(vs2, vd, suffix) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + foreach (i from eg_start to eg_len-1) { let keyelem = if suffix == "vv" then i else 0; let state : bits(128) = get_velem(vd, EGW=128, i); @@ -1538,7 +1548,7 @@ Arguments:: |Register |Direction |EGW -|EGS +|EGS |EEW |Definition @@ -1558,8 +1568,8 @@ This instruction must always be implemented such that its execution latency does on the data being operated upon. // // The number of element groups to be processed is `vl`/`EGS`. -// `vl` must be set to the number of `SEW=32` elements to be processed and -// therefore must be a multiple of `EGS=4`. + +// `vl` must be set to the number of `SEW=32` elements to be processed and +// therefore must be a multiple of `EGS=4`. + // Likewise, `vstart` must be a multiple of `EGS=4`. Operation:: @@ -1567,13 +1577,13 @@ Operation:: -- function clause execute (VAESEM(vs2, vd, suffix)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + foreach (i from eg_start to eg_len-1) { let keyelem = if suffix == "vv" then i else 0; let state : bits(128) = get_velem(vd, EGW=128, i); @@ -1627,7 +1637,7 @@ Arguments:: |Register |Direction |EGW -|EGS +|EGS |EEW |Definition @@ -1636,15 +1646,15 @@ Arguments:: | Vd | output | 128 | 4 | 32 | Next round key |=== -Description:: +Description:: A single round of the forward AES-128 KeySchedule is performed. -// Within each element group, +// Within each element group, The next round key is generated word by word from the current round key element group in `vs2` and the immediately previous word of the -round key. The least significant word is generated using the most significant +round key. The least significant word is generated using the most significant word of the current round key as well as a round constant which is selected by -the round number. +the round number. The round number, which ranges from 1 to 10, comes from `uimm[3:0]`; `uimm[4]` is ignored. @@ -1659,7 +1669,7 @@ on the data being operated upon. [NOTE] ==== We chose to map out-of-range round numbers to in-range values as this allows the instruction's -behavior to be fully defined for all values of `uimm[4:0]` with minimal extra logic. +behavior to be fully defined for all values of `uimm[4:0]` with minimal extra logic. ==== // Each `EGW=128` element group next-round-key output is produced and is written to each `EGW=128` @@ -1668,8 +1678,8 @@ behavior to be fully defined for all values of `uimm[4:0]` with minimal extra lo // // The number of element groups to be processed is `vl`/`EGS`. -// `vl` must be set to the number of `SEW=32` elements to be processed and -// therefore must be a multiple of `EGS=4`. + +// `vl` must be set to the number of `SEW=32` elements to be processed and +// therefore must be a multiple of `EGS=4`. + // Likewise, `vstart` must be a multiple of `EGS=4`. @@ -1678,13 +1688,13 @@ Operation:: -- function clause execute (VAESKF1(rnd, vd, vs2)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { // project out-of-range immediates onto in-range values if( (unsigned(rnd[3:0]) > 10) | (rnd[3:0] = 0)) then rnd[3] = ~rnd[3] - + eg_len = (vl/EGS) eg_start = (vstart/EGS) @@ -1743,7 +1753,7 @@ Arguments:: |Register |Direction |EGW -|EGS +|EGS |EEW |Definition @@ -1753,10 +1763,10 @@ Arguments:: | Vd | output | 128 | 4 | 32 | Next round key |=== -Description:: +Description:: A single round of the forward AES-256 KeySchedule is performed. -// Within each element group, +// Within each element group, The next round key is generated word by word from the previous round key element group in `vd` and the immediately previous word of the round key. The least significant word of the next round key is generated by @@ -1775,14 +1785,14 @@ on the data being operated upon. [NOTE] ==== We chose to map out-of-range round numbers to in-range values as this allows the instruction's -behavior to be fully defined for all values of `uimm[4:0]` with minimal extra logic. +behavior to be fully defined for all values of `uimm[4:0]` with minimal extra logic. ==== // // The number of element groups to be processed is `vl`/`EGS`. -// `vl` must be set to the number of `SEW=32` elements to be processed and -// therefore must be a multiple of `EGS=4`. + +// `vl` must be set to the number of `SEW=32` elements to be processed and +// therefore must be a multiple of `EGS=4`. + // Likewise, `vstart` must be a multiple of `EGS=4`. Operation:: @@ -1790,7 +1800,7 @@ Operation:: -- function clause execute (VAESKF2(rnd, vd, vs2)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { @@ -1805,7 +1815,7 @@ function clause execute (VAESKF2(rnd, vd, vs2)) = { let RoundKeyB[3:0] : bits(32) = get_velem(vd, EGW=128, i); // Previous round key let w[0] : bits(32) = if (rnd[0]==1) then - aes_subword_fwd(CurrentRoundKey[3]) XOR RoundKeyB[0]; + aes_subword_fwd(CurrentRoundKey[3]) XOR RoundKeyB[0]; else aes_subword_fwd(aes_rotword(CurrentRoundKey[3])) XOR aes_decode_rcon((rnd>>1) - 1) XOR RoundKeyB[0]; w[1] : bits(32) = w[0] XOR RoundKeyB[1] @@ -1857,17 +1867,17 @@ Arguments:: |Register |Direction |EGW -|EGS +|EGS |EEW |Definition | vd | input | 128 | 4 | 32 | round state -| vs2 | input | 128 | 4 | 32 | round key +| vs2 | input | 128 | 4 | 32 | round key | vd | output | 128 | 4 | 32 | new round state |=== -Description:: -A round-0 AES block cipher operation is performed. This operation is used for both encryption and decryption. +Description:: +A round-0 AES block cipher operation is performed. This operation is used for both encryption and decryption. There is only a `.vs` form of the instruction. `Vs2` holds a @@ -1883,14 +1893,14 @@ on the data being operated upon. ==== This instruction is needed to avoid the need to "splat" a 128-bit vector register group when the round key is the same for all 128-bit "lanes". Such a splat would typically be implemented with a `vrgather` instruction which would hurt performance -in many implementations. +in many implementations. This instruction only exists in the `.vs` form because the `.vv` form would be identical to the `vxor.vv vd, vs2, vd` instruction. ==== // // The number of element groups to be processed is `vl`/`EGS`. -// `vl` must be set to the number of `SEW=32` elements to be processed and -// therefore must be a multiple of `EGS=4`. + +// `vl` must be set to the number of `SEW=32` elements to be processed and +// therefore must be a multiple of `EGS=4`. + // Likewise, `vstart` must be a multiple of `EGS=4`. Operation:: @@ -1898,13 +1908,13 @@ Operation:: -- function clause execute (VAESZ(vs2, vd) = { if(((vstart%EGS)<>0) | (LMUL*VLEN < EGW)) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + foreach (i from eg_start to eg_len-1) { let state : bits(128) = get_velem(vd, EGW=128, i); let rkey : bits(128) = get_velem(vs2, EGW=128, 0); @@ -1970,7 +1980,7 @@ Vector-Vector Arguments:: | Vs1 | input | Op1 (to be inverted) | Vs2 | input | Op2 -| Vd | output | Result +| Vd | output | Result |=== Vector-Scalar Arguments:: @@ -1983,16 +1993,16 @@ Vector-Scalar Arguments:: |Definition | Rs1 | input | Op1 (to be inverted) -| Vs2 | input | Op2 -| Vd | output | Result +| Vs2 | input | Op2 +| Vd | output | Result |=== -Description:: +Description:: A bitwise _and-not_ operation is performed. Each bit of `Op1` is inverted and logically ANDed with the corresponding bits in `vs2`. In the vector-scalar version, `Op1` is the sign-extended or truncated value in scalar -register `rs1`. +register `rs1`. In the vector-vector version, `Op1` is `vs1`. // This instruction must always be implemented such that its execution latency does not depend @@ -2001,7 +2011,7 @@ In the vector-vector version, `Op1` is `vs1`. [NOTE] .Note on necessity of instruction ==== -This instruction is performance-critical to SHA3. Specifically, the Chi step of the FIPS 202 Keccak Permutation. +This instruction is performance-critical to SHA3. Specifically, the Chi step of the FIPS 202 Keccak Permutation. Emulating it via 2 instructions is expected to have significant performance impact. The `.vv` form of the instruction is what is needed for SHA3; the `.vx` form was added for completeness. ==== @@ -2140,7 +2150,7 @@ A bit reversal is performed on the bits of each byte. [NOTE] ==== This instruction is commonly used for GCM when the zvkg extension is not implemented. -This byte-wise instruction is defined for all SEWs to eliminate the need to change SEW when operating on wider elements. +This byte-wise instruction is defined for all SEWs to eliminate the need to change SEW when operating on wider elements. ==== Operation:: @@ -2222,7 +2232,7 @@ Arguments:: Description:: Produces the low half of 128-bit carry-less product. -Each 64-bit element in the `vs2` vector register is carry-less multiplied by +Each 64-bit element in the `vs2` vector register is carry-less multiplied by either each 64-bit element in `vs1` (vector-vector), or the 64-bit value from integer register `rs1` (vector-scalar). The result is the least significant 64 bits of the carry-less product. @@ -2323,10 +2333,10 @@ Arguments:: | Vd | output | carry-less product high |=== -Description:: +Description:: Produces the high half of 128-bit carry-less product. -Each 64-bit element in the `vs2` vector register is carry-less multiplied by +Each 64-bit element in the `vs2` vector register is carry-less multiplied by either each 64-bit element in `vs1` (vector-vector), or the 64-bit value from integer register `rs1` (vector-scalar). The result is the most significant 64 bits of the carry-less product. @@ -2565,7 +2575,7 @@ Encoding:: ]} .... Reserved Encodings:: -* `SEW` is any value other than 32 +* `SEW` is any value other than 32 Arguments:: @@ -2585,10 +2595,10 @@ Arguments:: | Vd | output | 128 | 4 | 32 | Partial-hash (Y~i+1~) |=== -Description:: +Description:: A single "iteration" of the GHASH~H~ algorithm is performed. -This instruction treats all of the inputs and outputs as 128-bit polynomials and +This instruction treats all of the inputs and outputs as 128-bit polynomials and performs operations over GF[2]. It produces the next partial hash (Y~i+1~) by adding the current partial hash (Y~i~) to the cipher text block (X~i~) and then multiplying (over GF(2^128^)) @@ -2604,8 +2614,8 @@ Y~i+1~ = ((Y~i~ ^ X~i~) · H) The NIST specification (see <<zvkg>>) orders the coefficients from left to right x~0~x~1~x~2~...x~127~ for a polynomial x~0~ + x~1~u +x~2~ u^2^ + ... + x~127~u^127^. This can be viewed as a collection of byte elements in memory with the byte containing the lowest coefficients (i.e., 0,1,2,3,4,5,6,7) -residing at the lowest memory address. Since the bits in the bytes are reversed, -This instruction internally performs bit swaps within bytes to put the bits in the standard ordering +residing at the lowest memory address. Since the bits in the bytes are reversed, +this instruction internally performs bit swaps within bytes to put the bits in the standard ordering (e.g., 7,6,5,4,3,2,1,0). This instruction must always be implemented such that its execution latency does not depend @@ -2622,7 +2632,7 @@ swap bit positions and therefore do not require any logic. ==== Since the same hash subkey `H` will typically be used repeatedly on a given message, a future extension might define a vector-scalar version of this instruction where -`vs2` is the scalar element group. This would help reduce register pressure when `LMUL` > 1. +`vs2` is the scalar element group. This would help reduce register pressure when `LMUL` > 1. ==== Operation:: @@ -2631,13 +2641,13 @@ Operation:: function clause execute (VGHSH(vs2, vs1, vd)) = { // operands are input with bits reversed in each byte if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + foreach (i from eg_start to eg_len-1) { let Y = (get_velem(vd,EGW=128,i)); // current partial-hash let X = get_velem(vs1,EGW=128,i); // block cipher output @@ -2693,7 +2703,7 @@ Encoding:: ]} .... Reserved Encodings:: -* `SEW` is any value other than 32 +* `SEW` is any value other than 32 Arguments:: @@ -2712,10 +2722,10 @@ Arguments:: | Vd | output | 128 | 4 | 32 | Product |=== -Description:: +Description:: A GHASH~H~ multiply is performed. -This instruction treats all of the inputs and outputs as 128-bit polynomials and +This instruction treats all of the inputs and outputs as 128-bit polynomials and performs operations over GF[2]. It produces the product over GF(2^128^) of the two 128-bit inputs. @@ -2725,7 +2735,7 @@ modulo GHASH's irreducible polynomial (x^128^ + x^7^ + x^2^ + x + 1). The NIST specification (see <<zvkg>>) orders the coefficients from left to right x~0~x~1~x~2~...x~127~ for a polynomial x~0~ + x~1~u +x~2~ u^2^ + ... + x~127~u^127^. This can be viewed as a collection of byte elements in memory with the byte containing the lowest coefficients (i.e., 0,1,2,3,4,5,6,7) -residing at the lowest memory address. Since the bits in the bytes are reversed, +residing at the lowest memory address. Since the bits in the bytes are reversed, This instruction internally performs bit swaps within bytes to put the bits in the standard ordering (e.g., 7,6,5,4,3,2,1,0). @@ -2743,7 +2753,7 @@ swap bit positions and therefore do not require any logic. ==== Since the same multiplicand will typically be used repeatedly on a given message, a future extension might define a vector-scalar version of this instruction where -`vs2` is the scalar element group. This would help reduce register pressure when `LMUL` > 1. +`vs2` is the scalar element group. This would help reduce register pressure when `LMUL` > 1. ==== [NOTE] @@ -2751,8 +2761,8 @@ a future extension might define a vector-scalar version of this instruction wher This instruction is identical to `vghsh.vv` with vs1=0. This instruction is often used in GHASH code. In some cases it is followed by an XOR to perform a multiply-add. Implementations may choose to fuse these -two instructions to improve performance on GHASH code that -doesn't use the add-multiply form of the `vghsh.vv` instruction. +two instructions to improve performance on GHASH code that +doesn't use the add-multiply form of the `vghsh.vv` instruction. ==== @@ -2762,13 +2772,13 @@ Operation:: function clause execute (VGMUL(vs2, vs1, vd)) = { // operands are input with bits reversed in each byte if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + foreach (i from eg_start to eg_len-1) { let Y = brev8(get_velem(vd,EGW=128,i)); // Multiplier let H = brev8(get_velem(vs2,EGW=128,i)); // Multiplicand @@ -2785,7 +2795,7 @@ function clause execute (VGMUL(vs2, vs1, vd)) = { } - let result = brev8(Z); + let result = brev8(Z); set_velem(vd, EGW=128, i, result); } RETIRE_SUCCESS @@ -2917,7 +2927,7 @@ Vector-Vector Arguments:: | Vs1 | input | Rotate amount | Vs2 | input | Data -| Vd | output | Rotated data +| Vd | output | Rotated data |=== Vector-Scalar Arguments:: @@ -2934,10 +2944,10 @@ Vector-Scalar Arguments:: | Vd | output | Rotated data |=== -Description:: +Description:: A bitwise left rotation is performed on each element of `vs2` -The elements in `vs2` are rotated left by the rotate amount specified by either +The elements in `vs2` are rotated left by the rotate amount specified by either the corresponding elements of `vs1` (vector-vector), or integer register `rs1` (vector-scalar). Only the low log2(`SEW`) bits of the rotate-amount value are used, all other @@ -2966,7 +2976,7 @@ function clause execute (VROL_VV(vs2, vs1, vd)) = { function clause execute (VROL_VX(vs2, rs1, vd)) = { foreach (i from vstart to vl - 1) { - set_velem(vd, EEW=SEW, i, + set_velem(vd, EEW=SEW, i, get_velem(vs2, i) <<< (X(rs1) & (SEW-1)) ) } @@ -3046,7 +3056,7 @@ Vector-Vector Arguments:: | Vs1 | input | Rotate amount | Vs2 | input | Data -| Vd | output | Rotated data +| Vd | output | Rotated data |=== Vector-Scalar/Immediate Arguments:: @@ -3064,10 +3074,10 @@ Vector-Scalar/Immediate Arguments:: |=== -Description:: +Description:: A bitwise right rotation is performed on each element of `vs2`. -The elements in `vs2` are rotated right by the rotate amount specified by either +The elements in `vs2` are rotated right by the rotate amount specified by either the corresponding elements of `vs1` (vector-vector), integer register `rs1` (vector-scalar), or an immediate value (vector-immediate). Only the low log2(`SEW`) bits of the rotate-amount value are used, all other @@ -3090,17 +3100,17 @@ function clause execute (VROR_VV(vs2, vs1, vd)) = { function clause execute (VROR_VX(vs2, rs1, vd)) = { foreach (i from vstart to vl - 1) { - set_velem(vd, EEW=SEW, i, + set_velem(vd, EEW=SEW, i, get_velem(vs2, i) >>> (X(rs1) & (SEW-1)) ) } RETIRE_SUCCESS } -function clause execute (VROR_VI(vs2, imm[5:0], vd)) = { +function clause execute (VROR_VI(vs2, uimm[5:0], vd)) = { foreach (i from vstart to vl - 1) { - set_velem(vd, EEW=SEW, i, - get_velem(vs2, i) >>> (imm[5:0] & (SEW-1)) + set_velem(vd, EEW=SEW, i, + get_velem(vs2, i) >>> (uimm[5:0] & (SEW-1)) ) } RETIRE_SUCCESS @@ -3190,7 +3200,7 @@ next state. // output is the new values of _a, b, e_ and _f_ after performing 2 rounds of the hash // computation. The new values, _c_, _d_, _g_, and _h_, are equal to the input values for _a_, _b_, // _e_, _f_ respectively. -// [TIP] +// [NOTE] // .Note to software developers // ==== // The MessageSchedplus constant input to this instruction is generated by Software @@ -3198,7 +3208,7 @@ next state. // round constant as defined in the NIST specification (see <<zvknh>>). // ==== -[TIP] +[NOTE] .Note to software developers ==== The NIST standard (see <<zvknh>>) requires the final hash to be in big-endian byte ordering @@ -3213,15 +3223,15 @@ The `vsha2ch` version of this instruction uses the two most significant message words from the element group in `vs1` while the `vsha2cl` version uses the two least significant message schedule words. Otherwise, these versions of the instruction are identical. -Having a high and low version of this instruction typically improves performance when +Having a high and low version of this instruction typically improves performance when interleaving independent hashing operations (i.e., when hashing several files at once). ==== -// [TIP] +// [NOTE] // .Note to software developers // ==== // These instructions take in two SEW words _W1_ and _W0_ which are the next two words of the message -// schedule incremented by the appropriate constant, +// schedule incremented by the appropriate constant, // and eight SEW word variables: _a_, _b_, _c_, _d_, _e_, _f_, _g,_ and _h_. The // output is the new values of _a, b, e_ and _f_ after performing 2 rounds of the hash // computation. The new values, _c_, _d_, _g_, and _h_, are equal to the input values for _a_, _b_, _e_, _f_ respectively. @@ -3257,59 +3267,59 @@ Operation:: -- function clause execute (VSHA2c(vs2, vs1, vd)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + foreach (i from eg_start to eg_len-1) { - let {a @ b @ e @ f} : bits(4*SEW) = get_velem(vs2, 4*SEW, i); - let {c @ d @ g @ h} : bits(4*SEW) = get_velem(vd, 4*SEW, i); - let MessageShedPlusC[3:0] : bits(4*SEW) = get_velem(vs1, 4*SEW, i); - let {W1, W0} == VSHA2cl ? MessageSchedPlusC[1:0] : MessageSchedPlusC[3:2]; // l vs h difference is the words selected - - let T1 : bits(SEW) = h + sum1(e) + ch(e,f,g) + W0; - let T2 : bits(SEW) = sum0(a) + maj(a,b,c); - h = g; - g = f; - f = e; - e = d + T1; - d = c; - c = b; - b = a; - a = T1 + T2; - - - T1 = h + sum1(e) + ch(e,f,g) + W1; - T2 = sum0(a) + maj(a,b,c); - h = g; - g = f; - f = e; - e = d + T1; - d = c; - c = b; - b = a; - a = T1 + T2; - set_velem(vd, 4*SEW, i, {a @ b @ e @ f}); + let {a @ b @ e @ f} : bits(4*SEW) = get_velem(vs2, 4*SEW, i); + let {c @ d @ g @ h} : bits(4*SEW) = get_velem(vd, 4*SEW, i); + let MessageShedPlusC[3:0] : bits(4*SEW) = get_velem(vs1, 4*SEW, i); + let {W1, W0} == VSHA2cl ? MessageSchedPlusC[1:0] : MessageSchedPlusC[3:2]; // l vs h difference is the words selected + + let T1 : bits(SEW) = h + sum1(e) + ch(e,f,g) + W0; + let T2 : bits(SEW) = sum0(a) + maj(a,b,c); + h = g; + g = f; + f = e; + e = d + T1; + d = c; + c = b; + b = a; + a = T1 + T2; + + + T1 = h + sum1(e) + ch(e,f,g) + W1; + T2 = sum0(a) + maj(a,b,c); + h = g; + g = f; + f = e; + e = d + T1; + d = c; + c = b; + b = a; + a = T1 + T2; + set_velem(vd, 4*SEW, i, {a @ b @ e @ f}); } RETIRE_SUCCESS } } function sum0(x) = { - match SEW { - 32 => rotr(x,2) XOR rotr(x,13) XOR rotr(x,22), - 64 => rotr(x,28) XOR rotr(x,34) XOR rotr(x,39) - } + match SEW { + 32 => rotr(x,2) XOR rotr(x,13) XOR rotr(x,22), + 64 => rotr(x,28) XOR rotr(x,34) XOR rotr(x,39) + } } function sum1(x) = { - match SEW { - 32 => rotr(x,6) XOR rotr(x,11) XOR rotr(x,25), - 64 => rotr(x,14) XOR rotr(x,18) XOR rotr(x,41) - } + match SEW { + 32 => rotr(x,6) XOR rotr(x,11) XOR rotr(x,25), + 64 => rotr(x,14) XOR rotr(x,18) XOR rotr(x,41) + } } function ch(x, y, z) = ((x & y) ^ ((~x) & z)) @@ -3378,7 +3388,7 @@ Eleven of the last 16 `SEW`-sized message-schedule words from `vd` (oldest), `vs and `vs1` (most recent) are processed to produce the next 4 message-schedule words. -[TIP] +[NOTE] .Note to software developers ==== The first 16 SEW-sized words of the message schedule come from the _message block_ @@ -3389,7 +3399,7 @@ All of the subsequent message schedule words are produced by this instruction an therefore do not require an endian swap. ==== -[TIP] +[NOTE] .Note to software developers ==== Software is required to pack the words into element groups @@ -3402,7 +3412,7 @@ lower indices indicating older words. // ==== // Four `SEW` message schedule words are packed into each element group of the -// source and destination registers. From a vector register point of view, +// source and destination registers. From a vector register point of view, // the message schedule words are packed into the // element groups from the left to the right with the most significant word on the left // and the least significant word on the right. @@ -3419,7 +3429,7 @@ lower indices indicating older words. // {W~11~, W~10~, W~9~, W~4~} + // {W~15~, W~14~, W~13~, W~12~}` -[TIP] +[NOTE] .Note to software developers ==== The {W~11~, W~10~, W~9~, W~4~} element group can easily be formed by using a vector @@ -3464,7 +3474,7 @@ function clause execute (VSHA2ms(vs2, vs1, vd)) = { // SEW32 = SHA-256 // SEW64 = SHA-512 if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { @@ -3475,7 +3485,7 @@ function clause execute (VSHA2ms(vs2, vs1, vd)) = { {W[3] @ W[2] @ W[1] @ W[0]} : bits(EGW) = get_velem(vd, EGW, i); {W[11] @ W[10] @ W[9] @ W[4]} : bits(EGW) = get_velem(vs2, EGW, i); {W[15] @ W[14] @ W[13] @ W[12]} : bits(EGW) = get_velem(vs1, EGW, i); - + W[16] = sig1(W[14]) + W[9] + sig0(W[1]) + W[0]; W[17] = sig1(W[15]) + W[10] + sig0(W[2]) + W[1]; W[18] = sig1(W[16]) + W[11] + sig0(W[3]) + W[2]; @@ -3488,17 +3498,17 @@ function clause execute (VSHA2ms(vs2, vs1, vd)) = { } function sig0(x) = { - match SEW { - 32 => (ROTR(x,7) XOR ROTR(x,18) XOR SHR(x,3)), - 64 => (ROTR(x,1) XOR ROTR(x,8) XOR SHR(x,7))); - } + match SEW { + 32 => (ROTR(x,7) XOR ROTR(x,18) XOR SHR(x,3)), + 64 => (ROTR(x,1) XOR ROTR(x,8) XOR SHR(x,7))); + } } function sig1(x) = { - match SEW { - 32 => (ROTR(x,17) XOR ROTR(x,19) XOR SHR(x,10), - 64 => ROTR(x,19) XOR ROTR(x,61) XOR SHR(x,6)); - } + match SEW { + 32 => (ROTR(x,17) XOR ROTR(x,19) XOR SHR(x,10), + 64 => ROTR(x,19) XOR ROTR(x,61) XOR SHR(x,6)); + } } function ROTR(x,n) = (x >> n) | (x << SEW - n) @@ -3613,13 +3623,13 @@ Operation:: -- function clause execute (VSM3C(rnds, vs2, vd)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + foreach (i from eg_start to eg_len-1) { // load state @@ -3767,7 +3777,7 @@ specification without requiring that software perform these swaps. // NOTE // ==== // For the best performance, it is recommended that implementations have VLEN≥256. -// When VLEN<EGW, an appropriate LMUL needs to be used by software so that elements from the +// When VLEN<EGW, an appropriate LMUL needs to be used by software so that elements from the // specified register groups can be combined to form the full element group. // ==== @@ -3788,17 +3798,17 @@ Operation:: -- function clause execute (VSM3ME(vs2, vs1)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + foreach (i from eg_start to eg_len-1) { let w[7:0] : bits(256) = get_velem(vs1, 256, i); let w[15:8] : bits(256) = get_velem(vs2, 256, i); - + // Byte Swap inputs from big-endian to little-endian let w15 = rev8(w[15]); let w14 = rev8(w[14]); @@ -3904,7 +3914,7 @@ Four rounds of the SM4 Key Expansion are performed. Four round keys are read in as a 4-element group from `vs2`. Each of the next four round keys are generated by iteratively XORing the last three round keys with a constant that is indexed by the Round Group Number, performing a byte-wise substitution, and then performing XORs between rotated versions of this value -and the corresponding current round key. +and the corresponding current round key. The Round group number (`rnd`) comes from `uimm[2:0]`; the bits in `uimm[4:3]` are ignored. Round group numbers range from 0 to 7 and indicate which @@ -3933,7 +3943,7 @@ the system parameters: FK[0:3] |constant | 0 | A3B1BAC6 -| 1 | 56AA3350 +| 1 | 56AA3350 | 2 | 677D9197 | 3 | B27022DC |=== @@ -3948,7 +3958,7 @@ the system parameters: FK[0:3] |constant | 0 | A3B1BAC6 -| 1 | 56AA3350 +| 1 | 56AA3350 | 2 | 677D9197 | 3 | B27022DC |=== @@ -3960,7 +3970,7 @@ the system parameters: FK[0:3] // The round keys are rK[0] to rK[31] // B = (rK[i-3] XOR rK[i-2] XOR rK[i-1] XOR CK[round]); + -// S = subBytes(B); + +// S = subBytes(B); + // rK[i]= rK[i-4] XOR S XOR ROTL13(S) XOR ROTR23(S); + // // The round constants and the S-box are described below and can be found at https://datatracker.ietf.org/doc/id/// draft-crypto-sm4-00 @@ -3973,7 +3983,7 @@ The round constants (CK) can be generated on the fly fairly cheaply. If the bytes of the constants are assigned an incrementing index from 0 to 127, the value of each byte is equal to its index multiplied by 7 modulo 256. Since the results are all limited to 8 bits, the modulo operation occurs for free: - B[n] = n + 2n + 4n; + B[n] = n + 2n + 4n; = 8n + ~n + 1; ==== @@ -3996,7 +4006,7 @@ Since the results are all limited to 8 bits, the modulo operation occurs for fre |constant | 0 | A3B1BAC6 -| 1 | 56AA3350 +| 1 | 56AA3350 | 2 | 677D9197 | 3 | B27022DC |=== @@ -4008,13 +4018,13 @@ Operation:: function clause execute (vsm4k(uimm, vs2)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + let B : bits(32) = 0; let S : bits(32) = 0; let rk4 : bits(32) = 0; @@ -4025,7 +4035,7 @@ function clause execute (vsm4k(uimm, vs2)) = { foreach (i from eg_start to eg_len-1) { let (rk3 @ rk2 @ rk1 @ rk0) : bits(128) = get_velem(vs2, 128, i); - + B = rk1 ^ rk2 ^ rk3 ^ ck(4 * rnd); S = sm4_subword(B); rk4 = ROUND_KEY(rk0, S); @@ -4054,14 +4064,14 @@ function ROUND_KEY(X, S) = ((X) ^ ((S) ^ ROL32((S), 13) ^ ROL32((S), 23))) // SM4 Constant Key (CK) let ck : list(bits(32)) = [| - 0x00070E15, 0x1C232A31, 0x383F464D, 0x545B6269, - 0x70777E85, 0x8C939AA1, 0xA8AFB6BD, 0xC4CBD2D9, - 0xE0E7EEF5, 0xFC030A11, 0x181F262D, 0x343B4249, - 0x50575E65, 0x6C737A81, 0x888F969D, 0xA4ABB2B9, - 0xC0C7CED5, 0xDCE3EAF1, 0xF8FF060D, 0x141B2229, - 0x30373E45, 0x4C535A61, 0x686F767D, 0x848B9299, - 0xA0A7AEB5, 0xBCC3CAD1, 0xD8DFE6ED, 0xF4FB0209, - 0x10171E25, 0x2C333A41, 0x484F565D, 0x646B7279 + 0x00070E15, 0x1C232A31, 0x383F464D, 0x545B6269, + 0x70777E85, 0x8C939AA1, 0xA8AFB6BD, 0xC4CBD2D9, + 0xE0E7EEF5, 0xFC030A11, 0x181F262D, 0x343B4249, + 0x50575E65, 0x6C737A81, 0x888F969D, 0xA4ABB2B9, + 0xC0C7CED5, 0xDCE3EAF1, 0xF8FF060D, 0x141B2229, + 0x30373E45, 0x4C535A61, 0x686F767D, 0x848B9299, + 0xA0A7AEB5, 0xBCC3CAD1, 0xD8DFE6ED, 0xF4FB0209, + 0x10171E25, 0x2C333A41, 0x484F565D, 0x646B7279 |] }; @@ -4139,10 +4149,10 @@ and the round keys are read from either the corresponding 4-element group in `vs2` (vector-vector form) or the scalar element group in `vs2` (vector-scalar form). The next four words of state are generated -by iteratively XORing the last three words of the state with +by iteratively XORing the last three words of the state with the corresponding round key, performing a byte-wise substitution, and then performing XORs between rotated -versions of this value and the corresponding current state. +versions of this value and the corresponding current state. [NOTE] ==== @@ -4167,13 +4177,13 @@ Operation:: -- function clause execute (VSM4R(vd, vs2)) = { if(LMUL*VLEN < EGW) then { - handle_illegal(); // illegal instruction exception + handle_illegal(); // illegal-instruction exception RETIRE_FAIL } else { eg_len = (vl/EGS) eg_start = (vstart/EGS) - + let B : bits(32) = 0; let S : bits(32) = 0; let rk0 : bits(32) = 0; @@ -4294,7 +4304,7 @@ Vector-Vector Arguments:: | Vs1 | input | Shift amount | Vs2 | input | Data -| Vd | output | Shifted data +| Vd | output | Shifted data |=== Vector-Scalar/Immediate Arguments:: @@ -4313,7 +4323,7 @@ Vector-Scalar/Immediate Arguments:: |=== -Description:: +Description:: A widening logical shift left is performed on each element of `vs2`. The elements in `vs2` are zero-extended to 2*`SEW` bits, then shifted left @@ -4337,7 +4347,7 @@ function clause execute (VWSLL_VV(vs2, vs1, vd)) = { function clause execute (VWSLL_VX(vs2, rs1, vd)) = { foreach (i from vstart to vl - 1) { - set_velem(vd, EEW=2*SEW, i, + set_velem(vd, EEW=2*SEW, i, get_velem(vs2, i) << (X(rs1) & ((2*SEW)-1)) ) } @@ -4346,7 +4356,7 @@ function clause execute (VWSLL_VX(vs2, rs1, vd)) = { function clause execute (VWSLL_VI(vs2, uimm[4:0], vd)) = { foreach (i from vstart to vl - 1) { - set_velem(vd, EEW=2*SEW, i, + set_velem(vd, EEW=2*SEW, i, get_velem(vs2, i) << (uimm[4:0] & ((2*SEW)-1)) ) } @@ -4383,23 +4393,23 @@ Crypto Vector instructions except Zvbb and Zvbc |=== 5+^| funct6 4+^| funct6 4+^| funct6 -|100000||||| 100000 |V| | vsm3me | 100000 | | | -| 100001 | | | | | 100001 |V| | vsm4k.vi | 100001 | | | -| 100010 | | | | | 100010 |V| | vaeskf1.vi | 100010 | | | +|100000||||| 100000 |V| | vsm3me | 100000 | | | +| 100001 | | | | | 100001 |V| | vsm4k.vi | 100001 | | | +| 100010 | | | | | 100010 |V| | vaeskf1.vi | 100010 | | | | 100011 | | | | | 100011 | | | | 100011 | | | -| 100100 | | | | | 100100 | | | | 100100 | | | +| 100100 | | | | | 100100 | | | | 100100 | | | | 100101 | | | | | 100101 | | | | 100101 | | | | 100110 | | | | | 100110 | | | | 100110 | | | -| 100111 | | | | | 100111 | | | | 100111 | | | +| 100111 | | | | | 100111 | | | | 100111 | | | | | | | | | | | | | | | | -| 101000 | | | | | 101000 |V| | *VAES.vv* | 101000 | | | -| 101001 | | | | | 101001 |V| | *VAES.vs* | 101001 | | | -| 101010 | | | | | 101010 |V| | vaeskf2.vi | 101010 | | | -| 101011 | | | | | 101011 |V| | vsm3c.vi | 101011 | | | -| 101100 | | | | | 101100 |V| | vghsh | 101100 | | | -| 101101 | | | | | 101101 |V| | vsha2ms | 101101 | | | -| 101110 | | | | | 101110 |V| | vsha2ch | 101110 | | | -| 101111 | | | | | 101111 |V| | vsha2cl | 101111 | | | +| 101000 | | | | | 101000 |V| | *VAES.vv* | 101000 | | | +| 101001 | | | | | 101001 |V| | *VAES.vs* | 101001 | | | +| 101010 | | | | | 101010 |V| | vaeskf2.vi | 101010 | | | +| 101011 | | | | | 101011 |V| | vsm3c.vi | 101011 | | | +| 101100 | | | | | 101100 |V| | vghsh | 101100 | | | +| 101101 | | | | | 101101 |V| | vsha2ms | 101101 | | | +| 101110 | | | | | 101110 |V| | vsha2ch | 101110 | | | +| 101111 | | | | | 101111 |V| | vsha2cl | 101111 | | | |=== <<< @@ -4556,12 +4566,12 @@ OP-V (0x57) This section contains the supporting Sail code referenced by the instruction descriptions throughout the specification. The -link:https://github.com/rems-project/sail/blob/sail2/manual.pdf[Sail Manual] +link:https://alasdair.github.io/manual.html[Sail Manual] is recommended reading in order to best understand the supporting code. [source,sail] ---- -/* Auxiliary function for performing GF multiplicaiton */ +/* Auxiliary function for performing GF multiplication */ val xt2 : bits(8) -> bits(8) function xt2(x) = { (x << 1) ^ (if bit_to_bool(x[7]) then 0x1b else 0x00) @@ -4579,13 +4589,13 @@ function gfmul( x, y) = { (if bit_to_bool(y[3]) then xt2(xt2(xt2(x))) else 0x00) } -/* 8-bit to 32-bit partial AES Mix Colum - forwards */ +/* 8-bit to 32-bit partial AES Mix Column - forwards */ val aes_mixcolumn_byte_fwd : bits(8) -> bits(32) function aes_mixcolumn_byte_fwd(so) = { gfmul(so, 0x3) @ so @ so @ gfmul(so, 0x2) } -/* 8-bit to 32-bit partial AES Mix Colum - inverse*/ +/* 8-bit to 32-bit partial AES Mix Column - inverse*/ val aes_mixcolumn_byte_inv : bits(8) -> bits(32) function aes_mixcolumn_byte_inv(so) = { gfmul(so, 0xb) @ gfmul(so, 0xd) @ gfmul(so, 0x9) @ gfmul(so, 0xe) @@ -4687,7 +4697,7 @@ let aes_sbox_fwd_table : list(bits(8)) = [| 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf, 0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16 |] - + let aes_sbox_inv_table : list(bits(8)) = [| 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38, 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb, 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87, 0x34, 0x8e, @@ -4749,7 +4759,7 @@ function aes_subword_inv(x) = { aes_sbox_inv(x[31..24]) @ aes_sbox_inv(x[23..16]) @ aes_sbox_inv(x[15.. 8]) @ - aes_sbox_inv(x[ 7.. 0]) + aes_sbox_inv(x[ 7.. 0]) } /* Easy function to perform an SM4 SBox operation on 1 byte. */ @@ -4775,7 +4785,7 @@ function aes_apply_fwd_sbox_to_each_byte(x) = { } /* 64-bit to 64-bit function which applies the AES inverse sbox to each byte - * in a 64-bit word. + * in a 64-bit word. */ val aes_apply_inv_sbox_to_each_byte : bits(64) -> bits(64) function aes_apply_inv_sbox_to_each_byte(x) = { @@ -4820,16 +4830,16 @@ function aes_rv64_shiftrows_inv(rs2, rs1) = { getbyte(rs1, 0) } -/* 128-bit to 128-bit implementation of the forward AES ShiftRows transform. +/* 128-bit to 128-bit implementation of the forward AES ShiftRows transform. * Byte 0 of state is input column 0, bits 7..0. * Byte 5 of state is input column 1, bits 15..8. */ val aes_shift_rows_fwd : bits(128) -> bits(128) function aes_shift_rows_fwd(x) = { - let ic3 : bits(32) = aes_get_column(x, 3); - let ic2 : bits(32) = aes_get_column(x, 2); - let ic1 : bits(32) = aes_get_column(x, 1); - let ic0 : bits(32) = aes_get_column(x, 0); + let ic3 : bits(32) = aes_get_column(x, 3); + let ic2 : bits(32) = aes_get_column(x, 2); + let ic1 : bits(32) = aes_get_column(x, 1); + let ic0 : bits(32) = aes_get_column(x, 0); let oc0 : bits(32) = ic3[31..24] @ ic2[23..16] @ ic1[15.. 8] @ ic0[ 7.. 0]; let oc1 : bits(32) = ic0[31..24] @ ic3[23..16] @ ic2[15.. 8] @ ic1[ 7.. 0]; let oc2 : bits(32) = ic1[31..24] @ ic0[23..16] @ ic3[15.. 8] @ ic2[ 7.. 0]; @@ -4844,9 +4854,9 @@ function aes_shift_rows_fwd(x) = { val aes_shift_rows_inv : bits(128) -> bits(128) function aes_shift_rows_inv(x) = { let ic3 : bits(32) = aes_get_column(x, 3); /* In column 3 */ - let ic2 : bits(32) = aes_get_column(x, 2); - let ic1 : bits(32) = aes_get_column(x, 1); - let ic0 : bits(32) = aes_get_column(x, 0); + let ic2 : bits(32) = aes_get_column(x, 2); + let ic1 : bits(32) = aes_get_column(x, 1); + let ic0 : bits(32) = aes_get_column(x, 0); let oc0 : bits(32) = ic1[31..24] @ ic2[23..16] @ ic3[15.. 8] @ ic0[ 7.. 0]; let oc1 : bits(32) = ic2[31..24] @ ic3[23..16] @ ic0[15.. 8] @ ic1[ 7.. 0]; let oc2 : bits(32) = ic3[31..24] @ ic0[23..16] @ ic1[15.. 8] @ ic2[ 7.. 0]; @@ -4917,7 +4927,7 @@ function aes_rotword(x) = { val brev : bits(SEW) -> bits(SEW) function brev(x) = { let output : bits(SEW) = 0; - foreach (i from 0 to SEW-8 by 8) + foreach (i from 0 to SEW-8 by 8) output[i+7..i] = reverse_bits_in_byte(input[i+7..i]); output /* Return Value */ } @@ -4925,7 +4935,7 @@ function brev(x) = { val reverse_bits_in_byte : bits(8) -> bits(8) function reverse_bits_in_byte(x) = { let output : bits(8) = 0; - foreach (i from 0 to 7) + foreach (i from 0 to 7) output[i] = x[7-i]); output /* Return Value */ } diff --git a/src/vector-examples.adoc b/src/vector-examples.adoc index 9e54acd..41d81ea 100644 --- a/src/vector-examples.adoc +++ b/src/vector-examples.adoc @@ -82,38 +82,38 @@ include::example/sgemm.S[lines=4..-1] # v1 = v1 / v2 to almost 23 bits of precision. vfrec7.v v3, v2 # Estimate 1/v2 - li t0, 0x40000000 -vmv.v.x v4, t0 # Splat 2.0 -vfnmsac.vv v4, v2, v3 # 2.0 - v2 * est(1/v2) -vfmul.vv v3, v3, v4 # Better estimate of 1/v2 -vmv.v.x v4, t0 # Splat 2.0 -vfnmsac.vv v4, v2, v3 # 2.0 - v2 * est(1/v2) -vfmul.vv v3, v3, v4 # Better estimate of 1/v2 + li t0, 0x3f800000 +vmv.v.x v4, t0 # Splat 1.0 +vfnmsac.vv v4, v2, v3 # 1.0 - v2 * est(1/v2) +vfmadd.vv v3, v4, v3 # Better estimate of 1/v2 +vmv.v.x v4, t0 # Splat 1.0 +vfnmsac.vv v4, v2, v3 # 1.0 - v2 * est(1/v2) +vfmadd.vv v3, v4, v3 # Better estimate of 1/v2 vfmul.vv v1, v1, v3 # Estimate of v1/v2 ---- === Square root approximation example ---- -# v1 = sqrt(v1) to almost 23 bits of precision. +# v1 = sqrt(v1) to more than 23 bits of precision. fmv.w.x ft0, x0 # Mask off zero inputs -vmfne.vf v0, v1, ft0 # to avoid div by zero -vfrsqrt7.v v2, v1, v0.t # Estimate 1/sqrt(x) -vmfne.vf v0, v2, ft0, v0.t # Additionally mask off +inf inputs - li t0, 0x40400000 -vmv.v.x v4, t0 # Splat 3.0 -vfmul.vv v3, v1, v2, v0.t # x * est -vfnmsub.vv v3, v2, v4, v0.t # - x * est * est + 3 -vfmul.vv v3, v3, v2, v0.t # est * (-x * est * est + 3) - li t0, 0x3f000000 - fmv.w.x ft0, t0 # 0.5 -vfmul.vf v2, v3, ft0, v0.t # Estimate to 14 bits -vfmul.vv v3, v1, v2, v0.t # x * est -vfnmsub.vv v3, v2, v4, v0.t # - x * est * est + 3 -vfmul.vv v3, v3, v2, v0.t # est * (-x * est * est + 3) -vfmul.vf v2, v3, ft0, v0.t # Estimate to 23 bits -vfmul.vv v1, v2, v1, v0.t # x * 1/sqrt(x) +vmfne.vf v0, v1, ft0 # to avoid DZ exception +vfrsqrt7.v v2, v1, v0.t # Estimate r ~= 1/sqrt(v1) +vmfne.vf v0, v2, ft0, v0.t # Mask off +inf to avoid NV + li t0, 0x3f800000 + fli.s ft0, 0.5 +vmv.v.x v5, t0 # Splat 1.0 +vfmul.vv v3, v1, v2, v0.t # t = v1 r +vfmul.vf v4, v2, ft0, v0.t # 0.5 r +vfmsub.vv v3, v2, v5, v0.t # t r - 1 +vfnmsac.vv v2, v3, v4, v0.t # r - (0.5 r) (t r - 1) + # Better estimate of 1/sqrt(v1) +vfmul.vv v1, v1, v2, v0.t # t = v1 r +vfmsub.vv v2, v1, v5, v0.t # t r - 1 +vfmul.vf v3, v1, ft0, v0.t # 0.5 t +vfnmsac.vv v1, v2, v3, v0.t # t - (0.5 t) (t r - 1) + # ~ sqrt(v1) to about 23.3 bits ---- === C standard library strcmp example diff --git a/src/zabha.adoc b/src/zabha.adoc index 26529a5..a802908 100644 --- a/src/zabha.adoc +++ b/src/zabha.adoc @@ -1,4 +1,4 @@ -== "Zabha" Extension for Byte and Halfword Atomic Memory Operations, Version 1.0.0 +== "Zabha" Extension for Byte and Halfword Atomic Memory Operations, Version 1.0 The A-extension offers atomic memory operation (AMO) instructions for _words_, _doublewords_, and _quadwords_ (only for `AMOCAS`). The absence of atomic @@ -36,7 +36,7 @@ instructions. If Zacas extension is also implemented, Zabha further provides the [wavedrom, zabha-ext-wavedrom-reg,svg] .... -{reg: [ +{reg: [ {bits: 7, name: 'opcode', attr:['AMO','AMO','AMO','AMO','AMO','AMO','AMO','AMO']}, {bits: 5, name: 'rd', attr:['dest','dest','dest','dest','dest','dest','dest','dest']}, {bits: 3, name: 'funct3', attr:['width=0/1','width=0/1','width=0/1','width=0/1','width=0/1','width=0/1','width=0/1','width=0/1']}, @@ -45,8 +45,8 @@ instructions. If Zacas extension is also implemented, Zabha further provides the {bits: 1, name: 'rl'}, {bits: 1, name: 'aq', attr:['ordering','ordering','ordering','ordering','ordering','ordering','ordering','ordering']}, {bits: 5, name: 'funct5', attr:['AMOSWAP.B/H','AMOADD.B/H','AMOAND.B/H','AMOOR.B/H','AMOXOR.B/H','AMOMAX[U].B/H','AMOMIN[U].B/H','AMOCAS.B/H']}, -], config:{lanes: 1, hspace:1024}} -.... +], config:{lanes: 1, hspace:1024}} +.... Byte and halfword AMOs always sign-extend the value placed in `rd`, and ignore the stem:[XLEN-1:2^{(width + 3)}] bits of the original value in `rs2`. The diff --git a/src/zacas.adoc b/src/zacas.adoc index 1dd5231..49c48cd 100644 --- a/src/zacas.adoc +++ b/src/zacas.adoc @@ -23,7 +23,7 @@ The Zacas extension depends upon the Zaamo extension. === Word/Doubleword/Quadword CAS (AMOCAS.W/D/Q) Instructions [wavedrom, , svg] -.... +.... {reg: [ {bits: 7, name: 'opcode', attr:'AMO'}, {bits: 5, name: 'rd', attr:'dest'}, @@ -140,12 +140,6 @@ and neither destination register is written. The operation performed by [NOTE] ==== -For a future RV128 extension, `AMOCAS.Q` would encode a single XLEN=128 register -in `rs2` and `rd`. -==== - -[NOTE] -==== Some algorithms may load the previous data value of a memory location into the register used as the compare data value source by a Zacas instruction. When using a Zacas instruction that uses a register pair to source the compare value, @@ -195,6 +189,8 @@ produced, the memory write access by an `AMOCAS.W/D/Q` instruction. An unsuccessful `AMOCAS.W/D/Q` may either not perform a memory write or may write back the old value loaded from memory. The memory write, if produced, does not have release semantics, regardless of `rl`. +Irrespective of whether a write is actually performed, the instruction is +treated as an AMO for the purposes of the RVWMO PPO rules. ==== An `AMOCAS.W/D/Q` instruction always requires write permissions. @@ -259,5 +255,5 @@ indicated by `AMOCASD` level support, the `AMOCAS.Q` instruction is supported. ==== `AMOCASW/D/Q` require `AMOArithmetic` level support as the `AMOCAS.W/D/Q` instructions require ability to perform an arithmetic comparison and a swap -operation. +operation. ==== diff --git a/src/zawrs.adoc b/src/zawrs.adoc index f6e5ddc..a8bc2d7 100644 --- a/src/zawrs.adoc +++ b/src/zawrs.adoc @@ -1,37 +1,37 @@ == "Zawrs" Extension for Wait-on-Reservation-Set instructions, Version 1.01 -The Zawrs extension defines a pair of instructions to be used in polling loops -that allows a core to enter a low-power state and wait on a store to a memory -location. Waiting for a memory location to be updated is a common pattern in +The Zawrs extension defines a pair of instructions to be used in polling loops +that allows a core to enter a low-power state and wait on a store to a memory +location. Waiting for a memory location to be updated is a common pattern in many use cases such as: . Contenders for a lock waiting for the lock variable to be updated. -. Consumers waiting on the tail of an empty queue for the producer to queue +. Consumers waiting on the tail of an empty queue for the producer to queue work/data. The producer may be code executing on a RISC-V hart, an accelerator device, an external I/O agent. -. Code waiting on a flag to be set in memory indicative of an event occurring. +. Code waiting on a flag to be set in memory indicative of an event occurring. For example, software on a RISC-V hart may wait on a "done" flag to be set in - memory by an accelerator device indicating completion of a job previously + memory by an accelerator device indicating completion of a job previously submitted to the device. Such use cases involve polling on memory locations, and such busy loops can be a wasteful expenditure of energy. To mitigate the wasteful looping in such usages, -a `WRS.NTO` (WRS-with-no-timeout) instruction is provided. Instead of polling +a `WRS.NTO` (WRS-with-no-timeout) instruction is provided. Instead of polling for a store to a specific memory location, software registers a reservation set -that includes all the bytes of the memory location using the `LR` instruction. -Then a subsequent `WRS.NTO` instruction would cause the hart to temporarily +that includes all the bytes of the memory location using the `LR` instruction. +Then a subsequent `WRS.NTO` instruction would cause the hart to temporarily stall execution in a low-power state until a store occurs to the reservation set or an interrupt is observed. Sometimes the program waiting on a memory update may also need to carry out a task at a future time or otherwise place an upper bound on the wait. To support -such use cases a second instruction `WRS.STO` (WRS-with-short-timeout) is -provided that works like `WRS.NTO` but bounds the stall duration to an -implementation-define short timeout such that the stall is terminated on the -timeout if no other conditions have occurred to terminate the stall. The -program using this instruction may then determine if its deadline has been +such use cases a second instruction `WRS.STO` (WRS-with-short-timeout) is +provided that works like `WRS.NTO` but bounds the stall duration to an +implementation-define short timeout such that the stall is terminated on the +timeout if no other conditions have occurred to terminate the stall. The +program using this instruction may then determine if its deadline has been reached. [NOTE] @@ -44,8 +44,8 @@ LR instruction, which is provided by the Zalrsc component of the A extension. The `WRS.NTO` and `WRS.STO` instructions cause the hart to temporarily stall execution in a low-power state as long as the reservation set is valid and no -pending interrupts, even if disabled, are observed. For `WRS.STO` the stall -duration is bounded by an implementation defined short timeout. These +pending interrupts, even if disabled, are observed. For `WRS.STO` the stall +duration is bounded by an implementation defined short timeout. These instructions are available in all privilege modes. [wavedrom, ,svg] @@ -63,12 +63,12 @@ instructions are available in all privilege modes. Hart execution may be stalled while the following conditions are all satisfied: [loweralpha] - . The reservation set is valid + . The reservation set is valid . If `WRS.STO`, a "short" duration since start of stall has not elapsed . No pending interrupt is observed (see the rules below) -While stalled, an implementation is permitted to occasionally terminate the -stall and complete execution for any reason. +While stalled, an implementation is permitted to occasionally terminate the +stall and complete execution for any reason. `WRS.NTO` and `WRS.STO` instructions follow the rules of the `WFI` instruction for resuming execution on a pending interrupt. @@ -76,28 +76,28 @@ for resuming execution on a pending interrupt. When the `TW` (Timeout Wait) bit in `mstatus` is set and `WRS.NTO` is executed in any privilege mode other than M mode, and it does not complete within an implementation-specific bounded time limit, the `WRS.NTO` instruction will cause -an illegal instruction exception. +an illegal-instruction exception. -When executing in VS or VU mode, if the `VTW` bit is set in `hstatus`, the -`TW` bit in `mstatus` is clear, and the `WRS.NTO` does not complete within an +When executing in VS or VU mode, if the `VTW` bit is set in `hstatus`, the +`TW` bit in `mstatus` is clear, and the `WRS.NTO` does not complete within an implementation-specific bounded time limit, the `WRS.NTO` instruction will cause -a virtual instruction exception. +a virtual-instruction exception. [NOTE] ==== -Since the `WRS.STO` and `WRS.NTO` instructions can complete execution for -reasons other than stores to the reservation set, software will likely need +Since the `WRS.STO` and `WRS.NTO` instructions can complete execution for +reasons other than stores to the reservation set, software will likely need a means of looping until the required stores have occurred. -The duration of a `WRS.STO` instruction's timeout may vary significantly within -and among implementations. In typical implementations this duration should be -roughly in the range of 10 to 100 times an on-chip cache miss latency or a +The duration of a `WRS.STO` instruction's timeout may vary significantly within +and among implementations. In typical implementations this duration should be +roughly in the range of 10 to 100 times an on-chip cache miss latency or a cacheless access to main memory. `WRS.NTO`, unlike `WFI`, is not specified to cause an illegal instruction exception if executed in U-mode when the governing `TW` bit is 0. `WFI` is typically not expected to be used in U-mode and on many systems may promptly -cause an illegal instruction exception if used at U-mode. Unlike `WFI`, +cause an illegal-instruction exception if used at U-mode. Unlike `WFI`, `WRS.NTO` is expected to be used by software in U-mode when waiting on memory but without a deadline for that wait. ==== diff --git a/src/zc.adoc b/src/zc.adoc index fa10416..4fd303d 100644 --- a/src/zc.adoc +++ b/src/zc.adoc @@ -12,30 +12,30 @@ Zcm* all reuse the encodings for _c.fld_, _c.fsd_, _c.fldsp_, _c.fsdsp_. |==================================================================================== |Instruction |Zca |Zcf |Zcd |Zcb |Zcmp |Zcmt 7+|*The Zca extension is added as way to refer to instructions in the C extension that do not include the floating-point loads and stores* -|C excl. c.f* |yes | | | | | +|C excl. c.f* |yes | | | | | 7+|*The Zcf extension is added as a way to refer to compressed single-precision floating-point load/stores* |c.flw | |rv32 | | | | |c.flwsp | |rv32 | | | | |c.fsw | |rv32 | | | | |c.fswsp | |rv32 | | | | 7+|*The Zcd extension is added as a way to refer to compressed double-precision floating-point load/stores* -|c.fld | | |yes | | | -|c.fldsp | | |yes | | | -|c.fsd | | |yes | | | -|c.fsdsp | | |yes | | | +|c.fld | | |yes | | | +|c.fldsp | | |yes | | | +|c.fsd | | |yes | | | +|c.fsdsp | | |yes | | | 7+|*Simple operations for use on all architectures* -|c.lbu | | | |yes | | -|c.lh | | | |yes | | -|c.lhu | | | |yes | | -|c.sb | | | |yes | | -|c.sh | | | |yes | | -|c.zext.b | | | |yes | | -|c.sext.b | | | |yes | | -|c.zext.h | | | |yes | | -|c.sext.h | | | |yes | | -|c.zext.w | | | |yes | | -|c.mul | | | |yes | | -|c.not | | | |yes | | +|c.lbu | | | |yes | | +|c.lh | | | |yes | | +|c.lhu | | | |yes | | +|c.sb | | | |yes | | +|c.sh | | | |yes | | +|c.zext.b | | | |yes | | +|c.sext.b | | | |yes | | +|c.zext.h | | | |yes | | +|c.sext.h | | | |yes | | +|c.zext.w | | | |yes | | +|c.mul | | | |yes | | +|c.not | | | |yes | | 7+|*PUSH/POP and double move which overlap with _c.fsdsp_. Complex operations intended for embedded CPUs* |cm.push | | | | |yes | |cm.pop | | | | |yes | @@ -44,8 +44,8 @@ Zcm* all reuse the encodings for _c.fld_, _c.fsd_, _c.fldsp_, _c.fsdsp_. |cm.mva01s | | | | |yes | |cm.mvsa01 | | | | |yes | 7+|*Table jump which overlaps with _c.fsdsp_. Complex operations intended for embedded CPUs* -|cm.jt | | | | | |yes -|cm.jalt | | | | | |yes +|cm.jt | | | | | |yes +|cm.jalt | | | | | |yes |==================================================================================== [#C] @@ -84,7 +84,7 @@ Therefore common ISA strings can be updated as follows to include the relevant Z MISA.C is set if the following extensions are selected: * Zca and not F -* Zca, Zcf and F is specified (RV32 only) +* Zca, Zcf and F (but not D) is specified (RV32 only) * Zca, Zcf and Zcd if D is specified (RV32 only) ** this configuration excludes Zcmp, Zcmt * Zca, Zcd if D is specified (RV64 only) @@ -112,7 +112,7 @@ Zcf is only relevant to RV32, it cannot be specified for RV64. The Zcf extension depends on the <<Zca>> and F extensions. [reftext="Zcd"] -=== Zcd +=== Zcd Zcd is the existing set of compressed double precision floating point loads and stores: _c.fld_, _c.fldsp_, _c.fsd_, _c.fsdsp_. @@ -131,7 +131,7 @@ The Zcb extension depends on the <<Zca>> extension. As shown on the individual instruction pages, many of the instructions in Zcb depend upon another extension being implemented. For example, _c.mul_ is only implemented if M or Zmmul is implemented, and _c.sext.b_ is only implemented if Zbb is implemented. -The _c.mul_ encoding uses the CA register format along with other instructions such as _c.sub_, _c.xor_ etc. +The _c.mul_ encoding uses the CA register format along with other instructions such as _c.sub_, _c.xor_ etc. [NOTE] @@ -144,69 +144,69 @@ The _c.mul_ encoding uses the CA register format along with other instructions s |Mnemonic |Instruction -|yes -|yes +|yes +|yes |c.lbu _rd'_, uimm(_rs1'_) |<<#insns-c_lbu>> -|yes -|yes +|yes +|yes |c.lhu _rd'_, uimm(_rs1'_) |<<#insns-c_lhu>> -|yes -|yes +|yes +|yes |c.lh _rd'_, uimm(_rs1'_) |<<#insns-c_lh>> -|yes -|yes +|yes +|yes |c.sb _rs2'_, uimm(_rs1'_) |<<#insns-c_sb>> -|yes -|yes +|yes +|yes |c.sh _rs2'_, uimm(_rs1'_) |<<#insns-c_sh>> -|yes -|yes +|yes +|yes |c.zext.b _rsd'_ -|<<#insns-c_zext_b>> +|<<#insns-c_zext_b>> -|yes -|yes +|yes +|yes |c.sext.b _rsd'_ -|<<#insns-c_sext_b>> +|<<#insns-c_sext_b>> -|yes -|yes +|yes +|yes |c.zext.h _rsd'_ -|<<#insns-c_zext_h>> +|<<#insns-c_zext_h>> -|yes -|yes +|yes +|yes |c.sext.h _rsd'_ -|<<#insns-c_sext_h>> +|<<#insns-c_sext_h>> | -|yes +|yes |c.zext.w _rsd'_ -|<<#insns-c_zext_w>> +|<<#insns-c_zext_w>> -|yes -|yes +|yes +|yes |c.not _rsd'_ -|<<#insns-c_not>> +|<<#insns-c_not>> -|yes -|yes +|yes +|yes |c.mul _rsd'_, _rs2'_ -|<<#insns-c_mul>> +|<<#insns-c_mul>> |=== -<<< +<<< [#Zcmp] === Zcmp @@ -214,7 +214,7 @@ The _c.mul_ encoding uses the CA register format along with other instructions s The Zcmp extension is a set of instructions which may be executed as a series of existing 32-bit RISC-V instructions. This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <<Zcd>>, - which is included when C and D extensions are both present. + which is included when C and D extensions are both present. NOTE: Zcmp is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with architecture class profiles. @@ -225,7 +225,7 @@ The PUSH/POP assembly syntax uses several variables, the meaning of which are: * _reg_list_ is a list containing 1 to 13 registers (ra and 0 to 12 s registers) ** valid values: {ra}, {ra, s0}, {ra, s0-s1}, {ra, s0-s2}, ..., {ra, s0-s8}, {ra, s0-s9}, {ra, s0-s11} ** note that {ra, s0-s10} is _not_ valid, giving 12 lists not 13 for better encoding -* _stack_adj_ is the total size of the stack frame. +* _stack_adj_ is the total size of the stack frame. ** valid values vary with register list length and the specific encoding, see the instruction pages for details. [%header,cols="^1,^1,4,8"] @@ -235,35 +235,35 @@ The PUSH/POP assembly syntax uses several variables, the meaning of which are: |Mnemonic |Instruction -|yes -|yes +|yes +|yes |cm.push _{reg_list}, -stack_adj_ -|<<#insns-cm_push>> +|<<#insns-cm_push>> -|yes -|yes +|yes +|yes |cm.pop _{reg_list}, stack_adj_ -|<<#insns-cm_pop>> +|<<#insns-cm_pop>> -|yes -|yes +|yes +|yes |cm.popret _{reg_list}, stack_adj_ -|<<#insns-cm_popret>> +|<<#insns-cm_popret>> -|yes -|yes +|yes +|yes |cm.popretz _{reg_list}, stack_adj_ -|<<#insns-cm_popretz>> +|<<#insns-cm_popretz>> -|yes -|yes +|yes +|yes |cm.mva01s _rs1', rs2'_ -|<<#insns-cm_mva01s>> +|<<#insns-cm_mva01s>> -|yes -|yes +|yes +|yes |cm.mvsa01 _r1s', r2s'_ -|<<#insns-cm_mvsa01>> +|<<#insns-cm_mvsa01>> |=== @@ -272,11 +272,11 @@ The PUSH/POP assembly syntax uses several variables, the meaning of which are: [#Zcmt] === Zcmt -Zcmt adds the table jump instructions and also adds the jvt CSR. The jvt CSR requires a +Zcmt adds the table jump instructions and also adds the jvt CSR. The jvt CSR requires a state enable if Smstateen is implemented. See <<csrs-jvt>> for details. This extension reuses some encodings from _c.fsdsp_. Therefore it is _incompatible_ with <<Zcd>>, - which is included when C and D extensions are both present. + which is included when C and D extensions are both present. NOTE: Zcmt is primarily targeted at embedded class CPUs due to implementation complexity. Additionally, it is not compatible with RVA profiles. @@ -289,15 +289,15 @@ The Zcmt extension depends on the <<Zca>> and Zicsr extensions. |Mnemonic |Instruction -|yes -|yes +|yes +|yes |cm.jt _index_ -|<<#insns-cm_jt>> +|<<#insns-cm_jt>> -|yes -|yes +|yes +|yes |cm.jalt _index_ -|<<#insns-cm_jalt>> +|<<#insns-cm_jalt>> |=== @@ -365,7 +365,7 @@ The immediate offset is formed as follows: Description: -This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. +This instruction loads a byte from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting byte is zero extended to XLEN bits and is written to _rd'_. [NOTE] ==== @@ -425,7 +425,7 @@ The immediate offset is formed as follows: Description: -This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. +This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is zero extended to XLEN bits and is written to _rd'_. [NOTE] ==== @@ -486,7 +486,7 @@ The immediate offset is formed as follows: Description: -This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. +This instruction loads a halfword from the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. The resulting halfword is sign extended to XLEN bits and is written to _rd'_. [NOTE] ==== @@ -546,7 +546,7 @@ The immediate offset is formed as follows: Description: -This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. +This instruction stores the least significant byte of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. [NOTE] ==== @@ -608,7 +608,7 @@ The immediate offset is formed as follows: Description: -This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. +This instruction stores the least significant halfword of _rs2'_ to the memory address formed by adding _rs1'_ to the zero extended immediate _uimm_. [NOTE] ==== @@ -659,13 +659,13 @@ Encoding (RV32, RV64): Description: -This instruction takes a single source/destination operand. +This instruction takes a single source/destination operand. It zero-extends the least-significant byte of the operand to XLEN bits by inserting zeros into all of the bits more significant than 7. [NOTE] ==== -_rd'/rs1'_ is from the standard 8-register set x8-x15. +_rd'/rs1'_ is from the standard 8-register set x8-x15. ==== Prerequisites: @@ -719,7 +719,7 @@ Encoding (RV32, RV64): Description: -This instruction takes a single source/destination operand. +This instruction takes a single source/destination operand. It sign-extends the least-significant byte in the operand to XLEN bits by copying the most-significant bit in the byte (i.e., bit 7) to all of the more-significant bits. @@ -775,7 +775,7 @@ Encoding (RV32, RV64): Description: -This instruction takes a single source/destination operand. +This instruction takes a single source/destination operand. It zero-extends the least-significant halfword of the operand to XLEN bits by inserting zeros into all of the bits more significant than 15. @@ -832,7 +832,7 @@ Encoding (RV32, RV64): Description: -This instruction takes a single source/destination operand. +This instruction takes a single source/destination operand. It sign-extends the least-significant halfword in the operand to XLEN bits by copying the most-significant bit in the halfword (i.e., bit 15) to all of the more-significant bits. @@ -889,7 +889,7 @@ Encoding (RV64): Description: -This instruction takes a single source/destination operand. +This instruction takes a single source/destination operand. It zero-extends the least-significant word of the operand to XLEN bits by inserting zeros into all of the bits more significant than 31. @@ -901,7 +901,7 @@ _rd'/rs1'_ is from the standard 8-register set x8-x15. Prerequisites: Zba is also required. - + 32-bit equivalent: [source,sail] @@ -1040,12 +1040,12 @@ X(rsdc) = result_wide[(sizeof(xlen) - 1) .. 0]; [#insns-pushpop,reftext="PUSH/POP Register Instructions"] === PUSH/POP register instructions -These instructions are collectively referred to as PUSH/POP: +These instructions are collectively referred to as PUSH/POP: -* <<#insns-cm_push>> -* <<#insns-cm_pop>> -* <<#insns-cm_popret>> -* <<#insns-cm_popretz>> +* <<#insns-cm_push>> +* <<#insns-cm_pop>> +* <<#insns-cm_popret>> +* <<#insns-cm_popretz>> The term PUSH refers to _cm.push_. @@ -1144,11 +1144,11 @@ with PUSH/POPRET this reduces to 10866: bcfa cm.popretz {ra,s0-s11}, 96 ---- -The prologue / epilogue reduce from 60-bytes in the original code, to 14-bytes with _-msave-restore_, -and to 4-bytes with PUSH and POPRET. -As well as reducing the code-size PUSH and POPRET eliminate the branches from -calling the millicode _save/restore_ routines and so may also perform better. - +The prologue / epilogue reduce from 60-bytes in the original code, to 14-bytes with _-msave-restore_, +and to 4-bytes with PUSH and POPRET. +As well as reducing the code-size PUSH and POPRET eliminate the branches from +calling the millicode _save/restore_ routines and so may also perform better. + [NOTE] ==== The calls to _<riscv_save_0>/<riscv_restore_0>_ become 64-bit when the target functions are out of the ±1MB range, increasing the prologue/epilogue size to 22-bytes. @@ -1163,7 +1163,7 @@ POP is typically used in tail-calling sequences where _ret_ is not used to retur ===== Stack pointer adjustment handling -The instructions all automatically adjust the stack pointer by enough to cover the memory required for the registers being saved or restored. +The instructions all automatically adjust the stack pointer by enough to cover the memory required for the registers being saved or restored. Additionally the _spimm_ field in the encoding allows the stack pointer to be adjusted in additional increments of 16-bytes. There is only a small restricted range available in the encoding; if the range is insufficient then a separate _c.addi16sp_ can be used to increase the range. @@ -1174,8 +1174,8 @@ There is no support for the _{ra, s0-s10}_ register list without also adding _s1 [#pushpop-idempotent-memory] ==== PUSH/POP Fault handling -Correct execution requires that _sp_ refers to idempotent memory (also see <<pushpop_non-idem-mem>>), because the core must be able to -handle traps detected during the sequence. +Correct execution requires that _sp_ refers to idempotent memory (also see <<pushpop_non-idem-mem>>), because the core must be able to +handle traps detected during the sequence. The entire PUSH/POP sequence is re-executed after returning from the trap handler, and multiple traps are possible during the sequence. If a trap occurs during the sequence then _xEPC_ is updated with the PC of the instruction, _xTVAL_ (if not read-only-zero) updated with the bad address if it was an access fault and _xCAUSE_ updated with the type of trap. @@ -1202,8 +1202,8 @@ If an implementation allows interrupts during the sequence, and the interrupt ha The stack pointer adjustment must only be committed only when it is certain that the entire PUSH instruction will commit. -Stores may also return imprecise faults from the bus. -It is platform defined whether the core implementation waits for the bus responses before continuing to the final stage of the sequence, +Stores may also return imprecise faults from the bus. +It is platform defined whether the core implementation waits for the bus responses before continuing to the final stage of the sequence, or handles errors responses after completing the PUSH instruction. <<< @@ -1219,19 +1219,19 @@ Appears to software as: [source,sail] ---- -# any bytes from sp-1 to sp-28 may be written multiple times before -# the instruction completes therefore these updates may be visible in +# any bytes from sp-1 to sp-28 may be written multiple times before +# the instruction completes therefore these updates may be visible in # the interrupt/exception handler below the stack pointer -sw s5, -4(sp) -sw s4, -8(sp) -sw s3,-12(sp) -sw s2,-16(sp) -sw s1,-20(sp) -sw s0,-24(sp) -sw ra,-28(sp) +sw s5, -4(sp) +sw s4, -8(sp) +sw s3,-12(sp) +sw s2,-16(sp) +sw s1,-20(sp) +sw s0,-24(sp) +sw ra,-28(sp) # this must only execute once, and will only execute after all stores -# completed without any precise faults, therefore this update is only +# completed without any precise faults, therefore this update is only # visible in the interrupt/exception handler if cm.push has completed addi sp, sp, -64 ---- @@ -1248,8 +1248,8 @@ From a software perspective the POP/POPRET sequence appears as: * An optional `li a0, 0` * An optional `ret` -If a trap occurs during the sequence, then any loads which were executed before the trap may update architectural state. -The loads will be re-executed once the trap handler completes, so the values will be overwritten. +If a trap occurs during the sequence, then any loads which were executed before the trap may update architectural state. +The loads will be re-executed once the trap handler completes, so the values will be overwritten. Therefore it is permitted for an implementation to update some of the destination registers before taking a fault. The optional `li a0, 0`, stack pointer adjustment and optional `ret` must only be committed only when it is certain that the entire POP/POPRET instruction will commit. @@ -1276,7 +1276,7 @@ lw s1, 20(sp) lw s0, 16(sp) lw ra, 12(sp) -# these must only execute once, will only execute after all loads +# these must only execute once, will only execute after all loads # complete successfully all instructions must execute atomically # therefore these updates are not visible in the interrupt/exception handler li a0, 0 @@ -1287,10 +1287,10 @@ ret [[pushpop_non-idem-mem,Non-idempotent memory handling]] ==== Non-idempotent memory handling -An implementation may have a requirement to issue a PUSH/POP instruction to non-idempotent memory. +An implementation may have a requirement to issue a PUSH/POP instruction to non-idempotent memory. -If the core implementation does not support PUSH/POP to non-idempotent memories, the core may use an idempotency PMA to detect it and take a -load (POP/POPRET) or store (PUSH) access fault exception in order to avoid unpredictable results. +If the core implementation does not support PUSH/POP to non-idempotent memories, the core may use an idempotency PMA to detect it and take a +load (POP/POPRET) or store (PUSH) access-fault exception in order to avoid unpredictable results. Software should only use these instructions on non-idempotent memory regions when software can tolerate the required memory accesses being issued repeatedly in the case that they cause exceptions. @@ -1299,7 +1299,7 @@ being issued repeatedly in the case that they cause exceptions. ==== Example RV32I PUSH/POP sequences -The examples are included show the load/store series expansion and the stack adjustment. +The examples are included show the load/store series expansion and the stack adjustment. Examples of _cm.popret_ and _cm.popretz_ are not included, as the difference in the expanded sequence from _cm.pop_ is trivial in all cases. ===== cm.push {ra, s0-s2}, -64 @@ -1375,7 +1375,7 @@ addi sp, sp, 48; Encoding: _rlist_=9, _spimm_=2 -expands to: +expands to: [source,sail] ---- @@ -1406,11 +1406,11 @@ Encoding (RV32, RV64): [wavedrom, , svg] .... {reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 2, name: 'spimm\[5:4\]', attr: [] }, - { bits: 4, name: 'rlist', attr: [] }, - { bits: 5, name: 0x18, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x18, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, ],config:{bits:16}} .... @@ -1439,7 +1439,7 @@ switch (rlist){ case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} default: reserved(); } -stack_adj = stack_adj_base + spimm[5:4] * 16; +stack_adj = stack_adj_base + spimm * 16; ---- [source,sail] @@ -1462,7 +1462,7 @@ switch (rlist){ case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} default: reserved(); } -stack_adj = stack_adj_base + spimm[5:4] * 16; +stack_adj = stack_adj_base + spimm * 16; ---- [source,sail] @@ -1523,8 +1523,8 @@ switch (rlist) { <<< Description: -This instruction pushes (stores) the registers in _reg_list_ to the memory below the stack pointer, -and then creates the stack frame by decrementing the stack pointer by _stack_adj_, +This instruction pushes (stores) the registers in _reg_list_ to the memory below the stack pointer, +and then creates the stack frame by decrementing the stack pointer by _stack_adj_, including any additional stack space requested by the value of _spimm_. @@ -1537,11 +1537,11 @@ For further information see <<insns-pushpop>>. Stack Adjustment Calculation: -_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. _spimm_ is the number of additional 16-byte address increments allocated for the stack frame. -The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, +The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, as defined above. Prerequisites: @@ -1601,11 +1601,11 @@ Encoding (RV32, RV64): [wavedrom, , svg] .... {reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 2, name: 'spimm\[5:4\]', attr: [] }, - { bits: 4, name: 'rlist', attr: [] }, - { bits: 5, name: 0x1a, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x1a, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, ],config:{bits:16}} .... @@ -1634,7 +1634,7 @@ switch (rlist){ case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} default: reserved(); } -stack_adj = stack_adj_base + spimm[5:4] * 16; +stack_adj = stack_adj_base + spimm * 16; ---- [source,sail] @@ -1657,7 +1657,7 @@ switch (rlist){ case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} default: reserved(); } -stack_adj = stack_adj_base + spimm[5:4] * 16; +stack_adj = stack_adj_base + spimm * 16; ---- [source,sail] @@ -1719,8 +1719,8 @@ switch (rlist) { Description: -This instruction pops (loads) the registers in _reg_list_ from stack memory, -and then adjusts the stack pointer by _stack_adj_. +This instruction pops (loads) the registers in _reg_list_ from stack memory, +and then adjusts the stack pointer by _stack_adj_. [NOTE] ==== @@ -1731,11 +1731,11 @@ For further information see <<insns-pushpop>>. Stack Adjustment Calculation: -_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. _spimm_ is the number of additional 16-byte address increments allocated for the stack frame. -The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, +The total stack adjustment represents the total size of the stack frame, which is _stack_adj_base_ added to _spimm_ scaled by 16, as defined above. Prerequisites: @@ -1826,7 +1826,7 @@ switch (rlist){ case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} default: reserved(); } -stack_adj = stack_adj_base + spimm[5:4] * 16; +stack_adj = stack_adj_base + spimm * 16; ---- [source,sail] @@ -1849,7 +1849,7 @@ switch (rlist){ case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} default: reserved(); } -stack_adj = stack_adj_base + spimm[5:4] * 16; +stack_adj = stack_adj_base + spimm * 16; ---- [source,sail] @@ -1922,7 +1922,7 @@ For further information see <<insns-pushpop>>. Stack Adjustment Calculation: -_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. _spimm_ is the number of additional 16-byte address increments allocated for the stack frame. @@ -1993,11 +1993,11 @@ Encoding (RV32, RV64): [wavedrom, , svg] .... {reg:[ - { bits: 2, name: 0x2, attr: ['C2'] }, - { bits: 2, name: 'spimm\[5:4\]', attr: [] }, - { bits: 4, name: 'rlist', attr: [] }, - { bits: 5, name: 0x1e, attr: [] }, - { bits: 3, name: 0x5, attr: ['FUNCT3'] }, + { bits: 2, name: 0x2, attr: ['C2'] }, + { bits: 2, name: 'spimm', attr: [] }, + { bits: 4, name: 'rlist', attr: [] }, + { bits: 5, name: 0x1e, attr: [] }, + { bits: 3, name: 0x5, attr: ['FUNCT3'] }, ],config:{bits:16}} .... @@ -2026,7 +2026,7 @@ switch (rlist){ case 6: {reg_list="ra, s0-s1"; xreg_list="x1, x8-x9";} default: reserved(); } -stack_adj = stack_adj_base + spimm[5:4] * 16; +stack_adj = stack_adj_base + spimm * 16; ---- [source,sail] @@ -2049,7 +2049,7 @@ switch (rlist){ case 15: {reg_list="ra, s0-s11"; xreg_list="x1, x8-x9, x18-x27";} default: reserved(); } -stack_adj = stack_adj_base + spimm[5:4] * 16; +stack_adj = stack_adj_base + spimm * 16; ---- [source,sail] @@ -2122,7 +2122,7 @@ For further information see <<insns-pushpop>>. Stack Adjustment Calculation: -_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. +_stack_adj_base_ is the minimum number of bytes, in multiples of 16-byte address increments, required to cover the registers in the list. _spimm_ is the number of additional 16-byte address increments allocated for the stack frame. @@ -2212,7 +2212,7 @@ Description: This instruction moves _a0_ into _r1s'_ and _a1_ into _r2s'_. _r1s'_ and _r2s'_ must be different. The execution is atomic, so it is not possible to observe state where only one of _r1s'_ or _r2s'_ has been updated. -The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. +The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. The mapping between them is specified in the pseudocode below. [NOTE] @@ -2277,10 +2277,10 @@ cm.mva01s r1s', r2s' ---- Description: -This instruction moves _r1s'_ into _a0_ and _r2s'_ into _a1_. +This instruction moves _r1s'_ into _a0_ and _r2s'_ into _a1_. The execution is atomic, so it is not possible to observe state where only one of _a0_ or _a1_ have been updated. -The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. +The encoding uses _sreg_ number specifiers instead of _xreg_ number specifiers to save encoding space. The mapping between them is specified in the pseudocode below. [NOTE] @@ -2328,10 +2328,10 @@ This is used as a form of dictionary compression to reduce the code size of _jal Table jump allows the linker to replace the following instruction sequences with a _cm.jt_ or _cm.jalt_ encoding, and an entry in the table: -* 32-bit _j_ calls -* 32-bit _jal_ ra calls -* 64-bit _auipc+jr_ calls to fixed locations -* 64-bit _auipc+jalr ra_ calls to fixed locations +* 32-bit _j_ calls +* 32-bit _jal_ ra calls +* 64-bit _auipc+jr_ calls to fixed locations +* 64-bit _auipc+jalr ra_ calls to fixed locations ** The _auipc+jr/jalr_ sequence is used because the offset from the PC is out of the ±1MB range. If a return address stack is implemented, then as _cm.jalt_ is equivalent to _jal ra_, it pushes to the stack. @@ -2340,20 +2340,20 @@ If a return address stack is implemented, then as _cm.jalt_ is equivalent to _ja The base of the table is in the jvt CSR (see <<csrs-jvt>>), each table entry is XLEN bits. -If the same function is called with and without linking then it must have two entries in the table. +If the same function is called with and without linking then it must have two entries in the table. This is typically caused by the same function being called with and without tail calling. [#tablejump-fault-handling] ==== Table Jump Fault handling For a table jump instruction, the table entry that the instruction selects is considered an extension of the instruction itself. -Hence, the execution of a table jump instruction involves two instruction fetches, the first to read the instruction (_cm.jt_/_cm.jalt_) +Hence, the execution of a table jump instruction involves two instruction fetches, the first to read the instruction (_cm.jt_/_cm.jalt_) and the second to read from the jump vector table (JVT). Both instruction fetches are _implicit_ reads, and both require execute permission; read permission is irrelevant. It is recommended that the second fetch be ignored for hardware triggers and breakpoints. Memory writes to the jump vector table require an instruction barrier (_fence.i_) to guarantee that they are visible to the instruction fetch. -Multiple contexts may have different jump vector tables. JVT may be switched between them without an instruction barrier +Multiple contexts may have different jump vector tables. JVT may be switched between them without an instruction barrier if the tables have not been updated in memory since the last _fence.i_. If an exception occurs on either instruction fetch, xEPC is set to the PC of the table jump instruction, xCAUSE is set as expected for the type of fault and xTVAL (if not set to zero) contains the fetch address which caused the fault. @@ -2368,7 +2368,7 @@ Table jump base vector and control register Address: -0x0017 +0x017 Permissions: @@ -2399,10 +2399,11 @@ Description: The _jvt_ register is an XLEN-bit *WARL* read/write register that holds the jump table configuration, consisting of the jump table base address (BASE) and the jump table mode (MODE). If <<Zcmt>> is implemented then _jvt_ must also be implemented, but can contain a read-only value. If _jvt_ is writable, the set of values the register may hold can vary by implementation. The value in the BASE field must always be aligned on a 64-byte boundary. +Note that the CSR contains only bits XLEN-1 through 6 of the address _base_. When computing jump-table accesses, the lower six bits of _base_ are filled with zeroes to obtain an XLEN-bit jump-table base address _jvt.base_ that is always aligned on a 64-byte boundary. _jvt.base_ is a virtual address, whenever virtual memory is enabled. -The memory pointed to by _jvt.base_ is treated as instruction memory for the purpose of executing table jump instructions, implying execute access permission. +The memory pointed to by _jvt.base_ is treated as instruction memory for the purpose of executing table jump instructions, implying execute access permission. [#JVT-config-table] ._jvt.mode_ definition @@ -2413,7 +2414,7 @@ The memory pointed to by _jvt.base_ is treated as instruction memory for the pur | others | *reserved for future standard use* |============================================================================================= -_jvt.mode_ is a *WARL* field, so can only be programmed to modes which are implemented. Therefore the discovery mechanism is to +_jvt.mode_ is a *WARL* field, so can only be programmed to modes which are implemented. Therefore the discovery mechanism is to attempt to program different modes and read back the values to see which are available. Jump table mode _must_ be implemented. [NOTE] @@ -2423,11 +2424,11 @@ in future the RISC-V Unified Discovery method will report the available modes. Architectural State: -_jvt_ CSR adds architectural state to the system software context (such as an OS process), therefore must be saved/restored on context switches. +_jvt_ CSR adds architectural state to the system software context (such as an OS process), therefore must be saved/restored on context switches. State Enable: -If the Smstateen extension is implemented, then bit 2 in _mstateen0_, _sstateen0_, and _hstateen0_ is implemented. If bit 2 of a controlling _stateen0_ CSR is zero, then access to the _jvt_ CSR and execution of a _cm.jalt_ or _cm.jt_ instruction by a lower privilege level results in an Illegal Instruction trap (or, if appropriate, a Virtual Instruction trap). +If the Smstateen extension is implemented, then bit 2 in _mstateen0_, _sstateen0_, and _hstateen0_ is implemented. If bit 2 of a controlling _stateen0_ CSR is zero, then access to the _jvt_ CSR and execution of a _cm.jalt_ or _cm.jt_ instruction by a lower privilege level results in an illegal-instruction trap (or, if appropriate, a virtual-instruction trap). <<< [#insns-cm_jt,reftext="Jump via table"] @@ -2586,6 +2587,3 @@ target_address[XLEN-1:0] = InstMemory[table_address][XLEN-1:0]; jal ra, target_address[XLEN-1:0]&~0x1; ---- - - - diff --git a/src/zfa.adoc b/src/zfa.adoc index 942aeef..20223d8 100644 --- a/src/zfa.adoc +++ b/src/zfa.adoc @@ -57,13 +57,13 @@ like FMV.W.X, but with _rs2_=1. |31 |_Canonical NaN_ |`0` |`11111111` |`100...000` |=== -[TIP] +[NOTE] ==== The preferred assembly syntax for entries 1, 30, and 31 is `min`, `inf`, and `nan`, respectively. For entries 0 through 29 (including entry 1), the assembler will accept decimal constants in C-like syntax. ==== -[TIP] +[NOTE] ==== The set of 32 constants was chosen by examining floating-point libraries, including the C standard math library, and to optimize @@ -170,7 +170,7 @@ FCVT.W.D with the same input operand. This instruction is only provided if the D extension is implemented. It is encoded like FCVT.W.D, but with the rs2 field set to 8 and the _rm_ field set to 1 (RTZ). Other _rm_ values are _reserved_. -[TIP] +[NOTE] ==== The assembly syntax requires the RTZ rounding mode to be explicitly specified, i.e., `fcvtmod.w.d rd, rs1, rtz`. diff --git a/src/zfh.adoc b/src/zfh.adoc index ab30e3d..e363a1c 100644 --- a/src/zfh.adoc +++ b/src/zfh.adoc @@ -27,7 +27,7 @@ halflatexmath:[$+$]singlelatexmath:[$\rightarrow$]half. New 16-bit variants of LOAD-FP and STORE-FP instructions are added, encoded with a new value for the funct3 width field. -include::images/wavedrom/sp-load-store.adoc[] +include::images/wavedrom/sp-load-store.edn[] [[sp-load-store]] //.Half-precision load and store instructions @@ -58,9 +58,9 @@ The half-precision floating-point computational instructions are defined analogously to their single-precision counterparts, but operate on half-precision operands and produce half-precision results. -include::images/wavedrom/spfloat-zfh.adoc[] +include::images/wavedrom/spfloat-zfh.edn[] -include::images/wavedrom/spfloat2-zfh.adoc[] +include::images/wavedrom/spfloat2-zfh.edn[] === Half-Precision Conversion and Move Instructions @@ -75,7 +75,7 @@ FCVT.WU.H, FCVT.LU.H, FCVT.H.WU, and FCVT.H.LU variants convert to or from unsigned integer values. FCVT.L[U].H and FCVT.H.L[U] are RV64-only instructions. -include::images/wavedrom/half-prec-conv-and-mv.adoc[] +include::images/wavedrom/half-prec-conv-and-mv.edn[] [[half-prec-conv-and-mv]] New floating-point-to-floating-point conversion instructions are added. @@ -90,14 +90,14 @@ is present, FCVT.Q.H or FCVT.H.Q converts a half-precision floating-point number to a quad-precision floating-point number, or vice-versa, respectively. -include::images/wavedrom/half-prec-flpt-to-flpt-conv.adoc[] +include::images/wavedrom/half-prec-flpt-to-flpt-conv.edn[] [[half-prec-flpt-to-flpt-conv]] Floating-point to floating-point sign-injection instructions, FSGNJ.H, FSGNJN.H, and FSGNJX.H are defined analogously to the single-precision sign-injection instruction. -include::images/wavedrom/flt-to-flt-sgn-inj-instr.adoc[] +include::images/wavedrom/flt-to-flt-sgn-inj-instr.edn[] [[flt-to-flt-sgn-inj-instr]] Instructions are provided to move bit patterns between the @@ -113,7 +113,7 @@ floating-point register _rd_, NaN-boxing the result. FMV.X.H and FMV.H.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved. -include::images/wavedrom/flt-pt-to-int-move.adoc[] +include::images/wavedrom/flt-pt-to-int-move.edn[] [[flt-pt-to-int-move]] === Half-Precision Floating-Point Compare Instructions @@ -122,7 +122,7 @@ The half-precision floating-point compare instructions are defined analogously to their single-precision counterparts, but operate on half-precision operands. -include::images/wavedrom/half-pr-flt-pt-compare.adoc[] +include::images/wavedrom/half-pr-flt-pt-compare.edn[] [[half-pr-flt-pt-compare]] === Half-Precision Floating-Point Classify Instruction @@ -131,7 +131,7 @@ The half-precision floating-point classify instruction, FCLASS.H, is defined analogously to its single-precision counterpart, but operates on half-precision operands. -include::images/wavedrom/half-pr-flt-pt-class.adoc[] +include::images/wavedrom/half-pr-flt-pt-class.edn[] [[half-pr-flt-class]] === "Zfhmin" Standard Extension for Minimal Half-Precision Floating-Point diff --git a/src/zfinx.adoc b/src/zfinx.adoc index 035222d..2b4119d 100644 --- a/src/zfinx.adoc +++ b/src/zfinx.adoc @@ -13,7 +13,7 @@ floating-point precisions. ==== The F extension uses separate `f` registers for floating-point computation, to reduce register pressure and simplify the provision of -register-file ports for wide superscalars. However, the additional of +register-file ports for wide superscalars. However, the additional architectural state increases the minimal implementation cost. By eliminating the `f` registers, the Zfinx extension substantially reduces the cost of simple RISC-V implementations with floating-point @@ -64,7 +64,7 @@ registers is compatible with the existing RV64 calling conventions, which leave === Zdinx The Zdinx extension provides analogous double-precision floating-point -instructions. The Zdinx extension requires the Zfinx extension. +instructions. The Zdinx extension depends upon the Zfinx extension. The Zdinx extension adds all of the instructions that the D extension adds, _except_ for the transfer instructions FLD, FSD, FMV.D.X, FMV.X.D, @@ -97,15 +97,16 @@ operand is zero—i.e., `x1` is not accessed. [NOTE] ==== -Load-pair and store-pair instructions are not provided, so transferring -double-precision operands in RV32Zdinx from or to memory requires two -loads or stores. Register moves need only a single FSGNJ.D instruction, -however. +Load-pair and store-pair instructions are contained in a separate extension +(see Section <<sec:zilsd,Extensions for Load/Store pair for RV32>>). +In case this is not available, transferring double-precision operands in +RV32Zdinx from or to memory requires two loads or stores. Register moves need +only a single FSGNJ.D instruction, however. ==== === Zhinx The Zhinx extension provides analogous half-precision floating-point -instructions. The Zhinx extension requires the Zfinx extension. +instructions. The Zhinx extension depends upon the Zfinx extension. The Zhinx extension adds all of the instructions that the Zfh extension adds, _except_ for the transfer instructions FLH, FSH, FMV.H.X, and @@ -120,7 +121,7 @@ number. The Zhinxmin extension provides minimal support for 16-bit half-precision floating-point instructions that operate on the `x` -registers. The Zhinxmin extension requires the Zfinx extension. +registers. The Zhinxmin extension depends upon the Zfinx extension. The Zhinxmin extension includes the following instructions from the Zhinx extension: FCVT.S.H and FCVT.H.S. If the Zdinx extension is diff --git a/src/zicond.adoc b/src/zicond.adoc index bd57878..8e413b8 100644 --- a/src/zicond.adoc +++ b/src/zicond.adoc @@ -41,7 +41,7 @@ The following instructions comprise the Zicond extension: [NOTE] ==== -Architecture Comment: defining additional comparisons, in addition to equal-to-zero and not-equal-to-zero, does not offer a benefit due to the lack of immediates or an additional register operand that the comparison takes place against. +Architecture Comment: defining additional comparisons, in addition to equal-to-zero and not-equal-to-zero, does not offer a benefit due to the lack of immediates or an additional register operand that the comparison takes place against. ==== Based on these two instructions, synthetic instructions (i.e., short instruction sequences) for the following *conditional arithmetic* operations are supported: @@ -221,4 +221,4 @@ or rd, rd, rtmp czero.nez rtmp, rs2, rc or rd, rd, rtmp -|===
\ No newline at end of file +|=== diff --git a/src/zicsr.adoc b/src/zicsr.adoc index 0e16de4..0bb448b 100644 --- a/src/zicsr.adoc +++ b/src/zicsr.adoc @@ -24,7 +24,7 @@ CSR specifier is encoded in the 12-bit _csr_ field of the instruction held in bits 31-20. The immediate forms use a 5-bit zero-extended immediate encoded in the _rs1_ field. -include::images/wavedrom/csr-instr.adoc[] +include::images/wavedrom/csr-instr.edn[] The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values in the CSRs and integer registers. CSRRW reads the old value of the CSR, @@ -77,7 +77,7 @@ effects regardless of _rd_ and _rs1_ fields. .Conditions determining whether a CSR instruction reads or writes the specified CSR. [%autowidth,float="center",align="center",cols="<,^,^,^,^",options="header",] |=== -5+^|*Register operand* +5+^|*Register operand* |Instruction |_rd_ is `x0` |_rs1_ is `x0` |Reads CSR |Writes CSR |CSRRW |Yes |- |No |Yes @@ -88,7 +88,7 @@ effects regardless of _rd_ and _rs1_ fields. |CSRRS/CSRRC |- |No |Yes |Yes -5+^|*Immediate operand* +5+^|*Immediate operand* |Instruction |_rd_ is `x0` |__uimm__latexmath:[$=$]0 |Reads CSR |Writes CSR diff --git a/src/zifencei.adoc b/src/zifencei.adoc index 666effb..ebe9951 100644 --- a/src/zifencei.adoc +++ b/src/zifencei.adoc @@ -17,7 +17,7 @@ snooping/invalidation overhead by writing translated instructions to memory regions that are known not to reside in the I-cache. ==== ''' -[TIP] +[NOTE] ==== The FENCE.I instruction was designed to support a wide variety of implementations. A simple implementation can flush the local instruction @@ -61,7 +61,7 @@ given address specified in _rs1_, and/or allowing software to use an ABI that relies on machine-mode cache-maintenance operations. ==== -include::images/wavedrom/zifencei-ff.adoc[] +include::images/wavedrom/zifencei-ff.edn[] [[zifencei-ff]] //.FENCE.I instruction (((FENCE.I, synchronization))) @@ -78,7 +78,7 @@ instruction memory visible to all RISC-V harts, the writing hart also has to execute a data FENCE before requesting that all remote RISC-V harts execute a FENCE.I. -The unused fields in the FENCE.I instruction, _imm[11:0]_, _rs1_, and +The unused fields in the FENCE.I instruction, _funct12_, _rs1_, and _rd_, are reserved for finer-grain fences in future extensions. For forward compatibility, base implementations shall ignore these fields, and standard software shall zero these fields. diff --git a/src/zihintntl.adoc b/src/zihintntl.adoc index 8e225cb..8ed948a 100644 --- a/src/zihintntl.adoc +++ b/src/zihintntl.adoc @@ -118,23 +118,23 @@ zeroed, but not allocated in L1 or L2. |=== | Memory hierarchy 4+| Recommended mapping of NTL + variant to actual cache level 4+| Recommended NTL variant for + -explicit cache management +explicit cache management | |P1 |PALL |S1 |ALL |L1 |L2 |L3 |L4/L5 9+^| Common Scenarios -| No caches 4+|--- 4+|none -|Private L1 only |L1 |L1 |L1 |L1| ALL |--- |--- |--- -|Private L1; shared L2 |L1 |L1 |L2 |L2 |P1|ALL|---|--- +| No caches 4+|--- 4+|none +|Private L1 only |L1 |L1 |L1 |L1| ALL |--- |--- |--- +|Private L1; shared L2 |L1 |L1 |L2 |L2 |P1|ALL|---|--- |Private L1; shared L2/L3 |L1 | L1 | L2 | L3 |P1 |S1 |ALL |--- |Private L1/L2 |L1 |L2 |L2 |L2 | P1 |ALL |--- |--- |Private L1/L2; shared L3 |L1 | L2 | L3 | L3 | P1 | PALL| ALL |--- |Private L1/L2; shared L3/L4 | L1 | L2| L3 | L4 | P1 | PALL | S1 | ALL 9+^| Uncommon Scenarios |Private L1/L2/L3; shared L4 | L1 | L3 |L4 |L4 |P1 |P1 |PALL |ALL -|Private L1; shared L2/L3/L4 |L1 | L1 |L2 |L4 |P1 |S1 |ALL |ALL -|Private L1/L2; shared L3/L4/L5 |L1 | L2 | L3 | L5 |P1 | PALL |S1 |ALL -|Private L1/L2/L3; shared L4/L5 |L1 |L3 |L4 |L5 |P1 |P1 |PALL |ALL +|Private L1; shared L2/L3/L4 |L1 | L1 |L2 |L4 |P1 |S1 |ALL |ALL +|Private L1/L2; shared L3/L4/L5 |L1 | L2 | L3 | L5 |P1 | PALL |S1 |ALL +|Private L1/L2/L3; shared L4/L5 |L1 |L3 |L4 |L5 |P1 |P1 |PALL |ALL |=== When an NTL instruction is applied to a prefetch hint in the Zicbop @@ -168,7 +168,7 @@ recommended to ignore the HINT in this case. ==== If an interrupt occurs between the execution of an NTL instruction and its target instruction, execution will normally resume at the target -instruction. That the NTL instruction is not reexecuted does not change +instruction. That the NTL instruction is not re-executed does not change the semantics of the program. Some implementations might prefer not to process the NTL instruction @@ -178,7 +178,7 @@ preferentially take the interrupt before the NTL, rather than between the NTL and the memory access. ==== ''' -[TIP] +[NOTE] ==== Since the NTL instructions are encoded as ADDs, they can be used within LR/SC loops without voiding the forward-progress guarantee. But, since diff --git a/src/zihintpause.adoc b/src/zihintpause.adoc index 9df71f3..7b98c21 100644 --- a/src/zihintpause.adoc +++ b/src/zihintpause.adoc @@ -40,7 +40,7 @@ performance. PAUSE is encoded as a FENCE instruction with _pred_=`W`, _succ_=`0`, _fm_=`0`, _rd_=`x0`, and _rs1_=`x0`. -//include::images/wavedrom/zihintpause-hint.adoc[] +//include::images/wavedrom/zihintpause-hint.edn[] //[zihintpause-hint] //.Zihintpause fence instructions @@ -61,4 +61,3 @@ The choice of a predecessor set of W is arbitrary, since the successor set is null. Other HINTs similar to PAUSE might be encoded with other predecessor sets. ==== - diff --git a/src/zilsd.adoc b/src/zilsd.adoc new file mode 100644 index 0000000..a7a3e6e --- /dev/null +++ b/src/zilsd.adoc @@ -0,0 +1,306 @@ +[[sec:zilsd]] +== "Zilsd", "Zclsd" Extensions for Load/Store pair for RV32, Version 1.0 + +The Zilsd & Zclsd extensions provide load/store pair instructions for RV32, reusing the existing RV64 doubleword load/store instruction encodings. + +Operands containing `src` for store instructions and `dest` for load instructions are held in aligned `x`-register pairs, i.e., register numbers must be even. Use of misaligned (odd-numbered) registers for these operands is _reserved_. + +Regardless of endianness, the lower-numbered register holds the +low-order bits, and the higher-numbered register holds the high-order +bits: e.g., bits 31:0 of an operand in Zilsd might be held in register `x14`, with bits 63:32 of that operand held in `x15`. + +[[zilsd, Zilsd]] +=== Load/Store pair instructions (Zilsd) + +The Zilsd extension adds the following RV32-only instructions: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|no +|ld rd, offset(rs1) +|<<#insns-ld>> + +|yes +|no +|sd rs2, offset(rs1) +|<<#insns-sd>> + +|=== + +As the access size is 64-bit, accesses are only considered naturally aligned for effective addresses that are a multiple of 8. +In this case, these instructions are guaranteed to not raise an address-misaligned exception. +Even if naturally aligned, the memory access might not be performed atomically. + +If the effective address is a multiple of 4, then each word access is required to be performed atomically. + +The following table summarizes the required behavior: + +[%header] +|=== +|Alignment |Word accesses guaranteed atomic? |Can cause misaligned trap? +|8B |yes |no +|4B not 8B |yes |yes +|else |no | yes +|=== + +To ensure resumable trap handling is possible for the load instructions, the base register must have its original value if a trap is taken. The other register in the pair can have been updated. +This affects x2 for the stack pointer relative instruction and rs1 otherwise. + +[NOTE] +==== +If an implementation performs a doubleword load access atomically and the register file implements write-back for even/odd register pairs, +the mentioned atomicity requirements are inherently fulfilled. +Otherwise, an implementation either needs to delay the write-back until the write can be performed atomically, +or order sequential writes to the registers to ensure the requirement above is satisfied. +==== + +[[zclsd, Zclsd]] +=== Compressed Load/Store pair instructions (Zclsd) + +Zclsd depends on Zilsd and Zca. It has overlapping encodings with Zcf and is thus incompatible with Zcf. + +Zclsd adds the following RV32-only instructions: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|yes +|no +|c.ldsp rd, offset(sp) +|<<#insns-cldsp>> + +|yes +|no +|c.sdsp rs2, offset(sp) +|<<#insns-csdsp>> + +|yes +|no +|c.ld rd', offset(rs1') +|<<#insns-cld>> + +|yes +|no +|c.sd rs2', offset(rs1') +|<<#insns-csd>> + +|=== + +=== Use of x0 as operand + +LD instructions with destination `x0` are processed as any other load, but the result is discarded entirely and x1 is not written. +For C.LDSP, usage of `x0` as the destination is reserved. + +If using `x0` as `src` of SD or C.SDSP, the entire 64-bit operand is zero — i.e., register `x1` is not accessed. + +C.LD and C.SD instructions can only use `x8-15`. + +=== Exception Handling + +For the purposes of RVWMO and exception handling, LD and SD instructions are +considered to be misaligned loads and stores, with one additional constraint: +an LD or SD instruction whose effective address is a multiple of 4 gives rise +to two 4-byte memory operations. + +NOTE: This definition permits LD and SD instructions giving rise to exactly one +memory access, regardless of alignment. +If instructions with 4-byte-aligned effective address are decomposed +into two 32b operations, there is no constraint on the order in which the +operations are performed and each operation is guaranteed to be atomic. +These decomposed sequences are interruptible. +Exceptions might occur on subsequent operations, making the effects of previous +operations within the same instruction visible. + +NOTE: Software should make no assumptions about the number or order of +accesses these instructions might give rise to, beyond the 4-byte constraint +mentioned above. +For example, an interrupted store might overwrite the same bytes upon return +from the interrupt handler. + +<<< + +=== Instructions +[#insns-ld,reftext="Load doubleword to register pair, 32-bit encoding"] +==== ld + +Synopsis:: +Load doubleword to even/odd register pair, 32-bit encoding + +Mnemonic:: +ld rd, offset(rs1) + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 0x3, attr: ['LOAD'], type: 8}, + {bits: 5, name: 'rd', attr: ['dest, dest[0]=0'], type: 2}, + {bits: 3, name: 0x3, attr: ['width=D'], type: 8}, + {bits: 5, name: 'rs1', attr: ['base'], type: 4}, + {bits: 12, name: 'imm[11:0]', attr: ['offset[11:0]'], type: 3}, +]} +.... + +Description:: +Loads a 64-bit value into registers `rd` and `rd+1`. +The effective address is obtained by adding register rs1 to the +sign-extended 12-bit offset. + +Included in: <<zilsd>> + +<<< + +[#insns-sd,reftext="Store doubleword from register pair, 32-bit encoding"] +==== sd + +Synopsis:: +Store doubleword from even/odd register pair, 32-bit encoding + +Mnemonic:: +sd rs2, offset(rs1) + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 7, name: 0x23, attr: ['STORE'], type: 8}, + {bits: 5, name: 'imm[4:0]', attr: ['offset[4:0]'], type: 3}, + {bits: 3, name: 0x3, attr: ['width=D'], type: 8}, + {bits: 5, name: 'rs1', attr: ['base'], type: 4}, + {bits: 5, name: 'rs2', attr: ['src, src[0]=0'], type: 4}, + {bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'], type: 3}, +]} +.... + +Description:: +Stores a 64-bit value from registers `rs2` and `rs2+1`. +The effective address is obtained by adding register rs1 to the +sign-extended 12-bit offset. + +Included in: <<zilsd>> + +<<< + +[#insns-cldsp,reftext="Stack-pointer based load doubleword to register pair, 16-bit encoding"] +==== c.ldsp + +Synopsis:: +Stack-pointer based load doubleword to even/odd register pair, 16-bit encoding + +Mnemonic:: +c.ldsp rd, offset(sp) + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 0x2, type: 8, attr: ['C2']}, + {bits: 5, name: 'imm', type: 3, attr: ['offset[4:3|8:6]']}, + {bits: 5, name: 'rd', type: 2, attr: ['dest≠0, dest[0]=0']}, + {bits: 1, name: 'imm', type: 3, attr: ['offset[5]']}, + {bits: 3, name: 0x3, type: 8, attr: ['C.LDSP']}, +], config: {bits: 16}} +.... + +Description:: +Loads stack-pointer relative 64-bit value into registers `rd'` and `rd'+1`. It computes its effective address by adding the zero-extended offset, scaled by 8, to the stack pointer, `x2`. It expands to `ld rd, offset(x2)`. C.LDSP is only valid when _rd_≠x0; the code points with _rd_=x0 are reserved. + +Included in: <<zclsd>> + +<<< + +[#insns-csdsp,reftext="Stack-pointer based store doubleword from register pair, 16-bit encoding"] +==== c.sdsp + +Synopsis:: +Stack-pointer based store doubleword from even/odd register pair, 16-bit encoding + +Mnemonic:: +c.sdsp rs2, offset(sp) + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 0x2, type: 8, attr: ['C2']}, + {bits: 5, name: 'rs2', type: 4, attr: ['src, src[0]=0']}, + {bits: 6, name: 'imm', type: 3, attr: ['offset[5:3|8:6]']}, + {bits: 3, name: 0x7, type: 8, attr: ['C.SDSP']}, +], config: {bits: 16}} +.... + +Description:: +Stores a stack-pointer relative 64-bit value from registers `rs2'` and `rs2'+1`. It computes an effective address by adding the _zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It expands to `sd rs2, offset(x2)`. + +Included in: <<zclsd>> + +<<< + +[#insns-cld,reftext="Load doubleword to register pair, 16-bit encoding"] +==== c.ld + +Synopsis:: +Load doubleword to even/odd register pair, 16-bit encoding + +Mnemonic:: +c.ld rd', offset(rs1') + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 0x0, type: 8, attr: ['C0']}, + {bits: 3, name: 'rd`', type: 2, attr: ['dest, dest[0]=0']}, + {bits: 2, name: 'imm', type: 3, attr: ['offset[7:6]']}, + {bits: 3, name: 'rs1`', type: 4, attr: ['base']}, + {bits: 3, name: 'imm', type: 3, attr: ['offset[5:3]']}, + {bits: 3, name: 0x3, type: 8, attr: ['C.LD']}, +], config: {bits: 16}} +.... + +Description:: +Loads a 64-bit value into registers `rd'` and `rd'+1`. +It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. + +Included in: <<zclsd>> + +<<< + +[#insns-csd,reftext="Store doubleword from register pair, 16-bit encoding"] +==== c.sd + +Synopsis:: +Store doubleword from even/odd register pair, 16-bit encoding + +Mnemonic:: +c.sd rs2', offset(rs1') + +Encoding (RV32):: +[wavedrom, ,svg] +.... +{reg: [ + {bits: 2, name: 0x0, type: 8, attr: ['C0']}, + {bits: 3, name: 'rs2`', type: 4, attr: ['src, src[0]=0']}, + {bits: 2, name: 'imm', type: 3, attr: ['offset[7:6]']}, + {bits: 3, name: 'rs1`', type: 4, attr: ['base']}, + {bits: 3, name: 'imm', type: 3, attr: ['offset[5:3]']}, + {bits: 3, name: 0x7, type: 8, attr: ['C.SD']}, +], config: {bits: 16}} +.... + +Description:: +Stores a 64-bit value from registers `rs2'` and `rs2'+1`. +It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. +It expands to `sd rs2', offset(rs1')`. + +Included in: <<zclsd>> diff --git a/src/zimop.adoc b/src/zimop.adoc index ab88a4a..307d9a1 100644 --- a/src/zimop.adoc +++ b/src/zimop.adoc @@ -32,7 +32,7 @@ Unless redefined by another extension, these instructions simply write 0 to `x[rd]`. Their encoding allows future extensions to define them to read `x[rs1]`, as well as write `x[rd]`. -include::images/wavedrom/mop-r.adoc[] +include::images/wavedrom/mop-r.edn[] [[mop-r]] The Zimop extension additionally defines 8 MOP instructions named @@ -41,7 +41,7 @@ Unless redefined by another extension, these instructions simply write 0 to `x[rd]`. Their encoding allows future extensions to define them to read `x[rs1]` and `x[rs2]`, as well as write `x[rd]`. -include::images/wavedrom/mop-rr.adoc[] +include::images/wavedrom/mop-rr.edn[] [[mop-rr]] NOTE: The recommended assembly syntax for MOP.R.__n__ is MOP.R.__n__ rd, rs1, @@ -74,9 +74,9 @@ are defined to _not_ write any register. Their encoding allows future extensions to define them to read register `x[__n__]`. -The Zcmop extension requires the Zca extension. +The Zcmop extension depends upon the Zca extension. -include::images/wavedrom/c-mop.adoc[] +include::images/wavedrom/c-mop.edn[] [[c-mop]] NOTE: Very few suitable 16-bit encoding spaces exist. This space was chosen diff --git a/src/zpm.adoc b/src/zpm.adoc new file mode 100644 index 0000000..a1bb3dd --- /dev/null +++ b/src/zpm.adoc @@ -0,0 +1,298 @@ +[[Zpm]] +== Pointer Masking Extensions, Version 1.0.0 + +=== Introduction + +RISC-V Pointer Masking (PM) is a feature that, when enabled, causes the CPU to ignore the upper bits of the effective address (these terms will be defined more precisely in the Background section). This allows these bits to be used in whichever way the application chooses. The version of the extension being described here specifically targets **tag checks**: When an address is accessed, the tag stored in the masked bits can be compared against a range-based tag. This is used for dynamic safety checkers such as HWASAN cite:[HWASAN]. Such tools can be applied in all privilege modes (U, S and M). + +HWASAN leverages tags in the upper bits of the address to identify memory errors such as use-after-free or buffer overflow errors. By storing a *pointer tag* in the upper bits of the address and checking it against a *memory tag* stored in a side table, it can identify whether a pointer is pointing to a valid location. Doing this without hardware support introduces significant overheads since the pointer tag needs to be manually removed for every conventional memory operation. Pointer masking support reduces these overheads. + +Pointer masking only adds the ability to ignore pointer tags during regular memory accesses. The tag checks themselves can be implemented in software or hardware. If implemented in software, pointer masking still provides performance benefits since non-checked accesses do not need to transform the address before every memory access. Hardware implementations are expected to provide even larger benefits due to performing tag checks out-of-band and hardening security guarantees derived from these checks. We anticipate that future extensions may build on pointer masking to support this functionality in hardware. + +It is worth mentioning that while HWASAN is the primary use-case for the current pointer masking extension, a number of other hardware/software features may be implemented leveraging Pointer Masking. Some of these use cases include sandboxing, object type checks and garbage collection bits in runtime systems. Note that the current version of the spec does not explicitly address these use cases, but future extensions may build on it to do so. + +While we describe the high-level concepts of pointer masking as if it was a single extension, it is, in reality, a family of extensions that implementations or profiles may choose to individually include or exclude (see <<_pointer_masking_extensions>>). + +=== Background + +==== Definitions + +We now define basic terms. Note that these rely on the definition of an “ignore” transformation, which is defined in <<sec-ignore-transform>>. + +* **Effective address (as defined in the RISC-V Base ISA):** A load/store effective address sent to the memory subsystem (e.g., as generated during the execution of load/store instructions). This does not include addresses corresponding to implicit accesses, such as page-table walks. + +* **Masked bits:** The upper PMLEN bits of an address, where PMLEN is a configurable parameter. We will use PMLEN consistently throughout this document to refer to this parameter. + +* **Transformed address:** An effective address after the ignore transformation has been applied. + +* **Address translation mode:** The MODE of the currently active address translation scheme as defined in the RISC-V privileged specification. This could, for example, refer to Bare, Sv39, Sv48, and Sv57. In accordance with the privileged specification, non-Bare translation modes are referred to as virtual-memory schemes. For the purpose of this specification, M-mode translation is treated as equivalent to Bare. + +* **Address validity:** The RISC-V privileged spec defines validity of addresses based on the address translation mode that is currently in use (e.g., Sv57, Sv48, Sv39, etc.). For a virtual address to be valid, all bits in the unused portion of the address must be the same as the Most Significant Bit (MSB) of the used portion. For example, when page-based 48-bit virtual memory (Sv48) is used, load/store effective addresses, which are 64 bits, must have bits 63–48 all set to bit 47, or else a page-fault exception will occur. For physical addresses, validity means that bits XLEN-1 to PABITS are zero, where PABITS is the number of physical address bits supported by the processor. + +* **NVBITS:** The upper bits within a virtual address that have no effect on addressing memory and are only used for validity checks. These bits depend on the currently active address translation mode. For example, in Sv48, these are bits 63-48. + +* **VBITS:** The bits within a virtual address that affect which memory is addressed. These are the bits of an address which are used to index into page tables. + +[[sec-ignore-transform]] +==== The “Ignore” Transformation + +The ignore transformation differs depending on whether it applies to a virtual or physical address. For virtual addresses, it replaces the upper PMLEN bits with the sign extension of the PMLEN+1st bit. + +[source] +."Ignore" Transformation for virtual addresses, expressed in Verilog code. +---- +transformed_effective_address = + {{PMLEN{effective_address[XLEN-PMLEN-1]}}, effective_address[XLEN-PMLEN-1:0]} +---- + +[NOTE] +==== +If PMLEN is less than or equal to NVBITS for the largest supported address translation mode on a given architecture, this is equivalent to ignoring a subset of NVBITS. This enables cheap implementations that modify validity checks in the CPU instead of performing the sign extension. +==== + +When applied to a physical address, including guest-physical addresses (i.e., all cases except when the active satp register's MODE field != Bare), the ignore transformation replaces the upper PMLEN bits with 0. This includes both the case of running in M-mode and running in other privilege modes with Bare address translation mode. + +[source] +."Ignore" Transformation for physical addresses, expressed in Verilog code. +---- +transformed_effective_address = + {{PMLEN{0}}, effective_address[XLEN-PMLEN-1:0]} +---- + +[NOTE] +==== +This definition is consistent with the way that RISC-V already handles physical and virtual addresses differently. While the unused upper bits of virtual addresses are the sign-extension of the used bits (see the definition of "address validity" in <<_definitions>>), the equivalent bits in physical addresses are zero-extended. This is necessary due to their interactions with other mechanisms such as Physical Memory Protection (PMP). +==== + +When pointer masking is enabled, the ignore transformation will be applied to every explicit memory access (e.g., loads/stores, atomics operations, and floating point loads/stores). The transformation *does not* apply to implicit accesses such as page-table walks or instruction fetches. The set of accesses that pointer masking applies to is described in <<_memory_accesses_subject_to_pointer_masking>>. + +[WARNING] +==== +Pointer masking does not change the underlying address generation logic or permission checks. Under a fixed address translation mode, it is semantically equivalent to replacing a subset of instructions (e.g., loads and stores) with an instruction sequence that applies the ignore operation to the target address of this instruction and then applies the instruction to the transformed address. References to address translation and other implementation details in the text are primarily to explain design decisions and common implementation patterns. +==== + +Note that pointer masking is purely an arithmetic operation on the address that makes no assumption about the meaning of the addresses it is applied to. Pointer masking with the same value of PMLEN always has the same effect for the same type of address (virtual or physical). This ensures that code that relies on pointer masking does not need to be aware of the environment it runs in once pointer masking has been enabled, as long as the value of PMLEN is known, and whether or not addresses are virtual or physical. For example, the same application or library code can run in user mode, supervisor mode or M-mode (with different address translation modes) without modification. + +[NOTE] +==== +A common scenario for such code is that addresses are generated by mmap system calls. This abstracts away the details of the underlying address translation mode from the application code. Software therefore needs to be aware of the value of PMLEN to ensure that its minimally required number of tag bits is supported. <<_determining_the_value_of_pmlen>> covers how this value is derived. +==== + +==== Example + +Table 1 shows an example of the pointer masking transformation on a virtual address when PM is enabled for RV64 under Sv57 (PMLEN=7). + +[%header, cols="25%,75%", options="header"] +.Example of PM address translation for RV64 under Sv57 +|=== +|Page-based profile|Sv57 on RV64 +|Effective Address |0xABFFFFFF12345678 + +NVBITS[1010101] VBITS[11111111111111111111111110001...000] +|PMLEN|7 +|Mask|0x01FFFFFFFFFFFFFF + +NVBITS[0000000] VBITS[11111111111111111111111111111...111] +|PMLEN+1st bit from the top (i.e., bit XLEN-PMLEN-1)|1 +|Transformed effective address |0xFFFFFFFF12345678 + +NVBITS[1111111] VBITS[11111111111111111111111110001...000] + +|=== + +If the address was a physical address rather than a virtual address with Sv57, the transformed address with PMLEN=7 would be 0x1FFFFFF12345678. + +==== Determining the Value of PMLEN + +From an implementation perspective, ignoring bits is deeply connected to the maximum virtual and physical address space supported by the processor (e.g., Bare, Sv48, Sv57). In particular, applying the above transformation is cheap if it covers only bits that are not used by **any** supported address translation mode (as it is equivalent to switching off validity checks). Masking NVBITS beyond those bits is more expensive as it requires ignoring them in the TLB tag, and even more expensive if the masked bits extend into the VBITS portion of the address (as it requires performing the actual sign extension). Similarly, when running in Bare or M mode, it is common for implementations to not use a particular number of bits at the top of the physical address range and fix them to zero. Applying the ignore transformation to those bits is cheap as well, since it will result in a valid physical address with all the upper bits fixed to 0. + +The current standard only supports PMLEN=XLEN-48 (i.e., PMLEN=16 in RV64) and PMLEN=XLEN-57 (i.e., PMLEN=7 in RV64). A setting has been reserved to potentially support other values of PMLEN in future standards. In such future standards, different supported values of PMLEN may be defined for each privilege mode (U/VU, S/HS, and M). + +[NOTE] +==== +Future versions of the pointer masking extension may introduce the ability to freely configure the value of PMLEN. The current extension does not define the behavior if PMLEN was different from the values defined above. In particular, there is no guarantee that a future pointer masking extension would define the ignore operation in the same way for those values of PMLEN. +==== + +==== Pointer Masking and Privilege Modes + +Pointer masking is controlled separately for different privilege modes. The subset of supported privilege modes is determined by the set of supported pointer masking extensions. Different privilege modes may have different pointer masking settings active simultaneously and the hardware will automatically apply the pointer masking settings of the currently active privilege mode. A privilege mode's pointer masking setting is configured by bits in configuration registers of the next-higher privilege mode. + +Note that the pointer masking setting that is applied only depends on the active privilege mode, not on the address that is being masked. Some operating systems (e.g., Linux) may use certain bits in the address to disambiguate between different types of addresses (e.g., kernel and user-mode addresses). Pointer masking _does not_ take these semantics into account and is purely an arithmetic operation on the address it is given. + +[NOTE] +==== +Linux places kernel addresses in the upper half of the address space and user addresses in the lower half of the address space. As such, the MSB is often used to identify the type of a particular address. With pointer masking enabled, this role is now played by bit XLEN-PMLEN-1 and code that checks whether a pointer is a kernel or a user address needs to inspect this bit instead. For backward compatibility, it may be desirable that the MSB still indicates whether an address is a user or a kernel address. An operating system's ABI may mandate this, but it does not affect the pointer masking mechanism itself. For example, the Linux ABI may choose to mandate that the MSB is not used for tagging and replicates bit XLEN-PMLEN-1 bit (note that for such a mechanism to be secure, the kernel needs to check the MSB of any user mode-supplied address and ensure that this invariant holds before using it; alternatively, it can apply the transformation from Listing 1 or 2 to ensure that the MSB is set to the correct value). +==== + +==== Memory Accesses Subject to Pointer Masking + +Pointer masking applies to all explicit memory accesses. Currently, in the Base and Privileged ISAs, these are: + +* **Base Instruction Set**: LB, LH, LW, LBU, LHU, LWU, LD, SB, SH, SW, SD. +* **Atomics**: All instructions in RV32A and RV64A. +* **Floating Point**: FLW, FLD, FLQ, FSW, FSD, FSQ. +* **Compressed**: All instructions mapping to any of the above, and C.LWSP, C.LDSP, C.LQSP, C.FLWSP, C.FLDSP, C.SWSP, C.SDSP, C.SQSP, C.FSWSP, C.FSDSP. +* **Hypervisor Extension**: HLV.\*, HSV.* (in some cases; see <<_ssnpm>>). +* **Cache Management Operations**: All instructions in Zicbom, Zicbop and Zicboz. +* **Vector Extension**: All vector load and store instructions in the ratified RVV 1.0 spec. +* **Zicfiss Extension**: SSPUSH, C.SSPUSH, SSPOPCHK, C.SSPOPCHK, SSAMOSWAP.W/D. +* **Assorted**: FENCE, FENCE.I (if the currently unused address fields become enabled in the future). + +[NOTE] +==== +This list will grow over time as new extensions introduce new instructions that perform explicit memory accesses. +==== + +For other extensions, pointer masking applies to all explicit memory accesses by default. Future extensions may add specific language to indicate whether particular accesses are or are not included in pointer masking. + +[NOTE] +==== +It is worth noting that pointer masking is not applied to `SFENCE.\*`, `HFENCE.*`, `SINVAL.\*`, or `HINVAL.*`. When such an operation is invoked, it is the responsibility of the software to provide the correct address. +==== + +MPRV and SPVP affect pointer masking as well, causing the pointer masking settings of the effective privilege mode to be applied. When MXR is in effect at the effective privilege mode where explicit memory access is performed, pointer masking does not apply. + +[NOTE] +==== +Note that this includes cases where page-based virtual memory is not in effect; i.e., although MXR has no effect on permissions checks when page-based virtual memory is not in effect, it is still used in determining whether or not pointer masking should be applied. +==== + +[NOTE] +==== +Cache Management Operations (CMOs) must respect and take into account pointer masking. Otherwise, a few serious security problems can appear, including: + +* CBO.ZERO may work as a STORE operation. If pointer masking is not respected, it would be possible to write to memory bypassing the mask enforcement. +* If CMOs did not respect pointer masking, it would be possible to weaponize this in a side-channel attack. For example, U-mode would be able to flush a physical address (without masking) that it should not be permitted to. +==== + +Pointer masking only applies to accesses generated by instructions on the CPU (including CPU extensions such as an FPU). E.g., it does not apply to accesses generated by page-table walks, the IOMMU, or devices. + +[NOTE] +==== +Pointer Masking does not apply to DMA controllers and other devices. It is therefore the responsibility of the software to manually untag these addresses. +==== + +Misaligned accesses are supported, subject to the same limitations as in the absence of pointer masking. The behavior is identical to applying the pointer masking transformation to every constituent aligned memory access. In other words, the accessed bytes should be identical to the bytes that would be accessed if the pointer masking transformation was individually applied to every byte of the access without pointer masking. This ensures that both hardware implementations and emulation of misaligned accesses in M-mode behave the same way, and that the M-mode implementation is identical whether or not pointer masking is enabled (e.g., such an implementation may leverage MPRV to apply the correct privilege mode's pointer masking setting). + +No pointer masking operations are applied when software reads/writes to CSRs, including those meant to hold addresses. If software stores tagged addresses into such CSRs, data load or data store operations based on those addresses are subject to pointer masking only if they are explicit (<<_memory_accesses_subject_to_pointer_masking>>) and pointer masking is enabled for the privilege mode that performs the access. The implemented WARL width of CSRs is unaffected by pointer masking (e.g., if a CSR supports 52 bits of valid addresses and pointer masking is supported with PMLEN=16, the necessary number of WARL bits remains 52 independently of whether pointer masking is enabled or disabled). + +In contrast to software writes, pointer masking **is applied** for hardware writes to a CSR (e.g., when the hardware writes the transformed address to `stval` when taking an exception). Pointer masking is also applied to the memory access address when matching address triggers in debug. + +For example, software is free to write a tagged or untagged address to `stvec`, but on trap delivery (e.g., due to an exception or interrupt), pointer masking **will not be applied** to the address of the trap handler. However, pointer masking **will be applied** by the hardware to any address written into `stval` when delivering an exception. + +[NOTE] +==== +The rationale for this choice is that delivering the additional bits may add overheads in some hardware implementations. Further, pointer masking is configured per privilege mode, so all trap handlers in supervisor mode would need to be careful to configure pointer masking the same way as user mode or manually unmask (which is expensive). +==== + +==== Pointer Masking Extensions + +Pointer masking refers to a number of separate extensions, all of which are privileged. This approach is used to capture optionality of pointer masking features. Profiles and implementations may choose to support an arbitrary subset of these extensions and must define valid ranges for their corresponding values of PMLEN. + +**Extensions**: + +* **Ssnpm**: A supervisor-level extension that provides pointer masking for the next lower privilege mode (U-mode), and for VS- and VU-modes if the H extension is present. +* **Smnpm**: A machine-level extension that provides pointer masking for the next lower privilege mode (S/HS if S-mode is implemented, or U-mode otherwise). +* **Smmpm**: A machine-level extension that provides pointer masking for M-mode. + +See <<_isa_extensions>> for details on how each of these extensions is configured. + +In addition, the pointer masking standard defines two extensions that describe an execution environment but have no bearing on hardware implementations. These extensions are intended to be used in profile specifications where a User profile or a Supervisor profile can only reference User level or Supervisor level pointer masking functionality, and not the associated CSR controls that exist at a higher privilege level (i.e., in the execution environment). + +* **Sspm**: An extension that indicates that there is pointer-masking support available in supervisor mode, with some facility provided in the supervisor execution environment to control pointer masking. +* **Supm**: An extension that indicates that there is pointer-masking support available in user mode, with some facility provided in the application execution environment to control pointer masking. + +The precise nature of these facilities is left to the respective execution environment. + +Pointer masking only applies to RV64. In RV32, trying to enable pointer masking will result in an illegal WARL write and not update the pointer masking configuration bits (see <<_isa_extensions>> for details). The same is the case on RV64 or larger systems when UXL/SXL/MXL is set to 1 for the corresponding privilege mode. Note that in RV32, the CSR bits introduced by pointer masking are still present, for compatibility between RV32 and larger systems with UXL/SXL/MXL set to 1. Setting UXL/SXL/MXL to 1 will clear the corresponding pointer masking configuration bits. + +[NOTE] +==== +Note that setting UXL/SXL/MXL to 1 and back to 0 does not preserve the previous values of the PMM bits. This includes the case of entering an RV32 virtual machine from an RV64 hypervisor and returning. +==== + +=== ISA Extensions + +This section describes the pointer masking extensions `Smmpm`, `Smnpm` and `Ssnpm`. All of these extensions are privileged ISA extensions and do not add any new CSRs. For the definitions of `Sspm` and `Supm`, see <<_pointer_masking_extensions>>. + +[NOTE] +==== +Future extensions may introduce additional CSRs to allow different privilege modes to modify their own pointer masking settings. This may be required for future use cases in managed runtime systems that are not currently addressed as part of this extension. +==== + +Each extension introduces a 2-bit WARL field (`PMM`) that may take on the following values to set the pointer masking settings for a particular privilege mode. + +[%header, cols="25%,75%", options="header"] +.Possible values of `PMM` WARL field. +|=== +|Value|Description +|00|Pointer masking is disabled (PMLEN=0) +|01|Reserved +|10|Pointer masking is enabled with PMLEN=XLEN-57 (PMLEN=7 on RV64) +|11|Pointer masking is enabled with PMLEN=XLEN-48 (PMLEN=16 on RV64) +|=== + +All of these fields are read-only 0 on RV32 systems. + +==== Ssnpm + +`Ssnpm` adds a new 2-bit WARL field (`PMM`) to bits 33:32 of `senvcfg`. Setting `PMM` enables or disables pointer masking for the next lower privilege mode (U/VU mode), according to the values in Table 2. + +In systems where the H Extension is present, `Ssnpm` also adds a new 2-bit WARL field (`PMM`) to bits 33:32 of `henvcfg`. Setting `PMM` enables or disables pointer masking for VS-mode, according to the values in Table 2. Further, a 2-bit WARL field (`HUPMM`) is added to bits 49:48 of `hstatus`. Setting `hstatus.HUPMM` enables or disables pointer masking for `HLV.\*` and `HSV.*` instructions in U-mode, according to the values in Table 2, when their explicit memory access is performed as though in VU-mode. In HS- and M-modes, pointer masking for these instructions is enabled or disabled by `senvcfg.PMM`, when their explicit memory access is performed as though in VU-mode. Setting `henvcfg.PMM` enables or disables pointer masking for `HLV.\*` and `HSV.*` when their explicit memory access is performed as though in VS-mode. + +[NOTE] +==== +The hypervisor should copy the value written to `senvcfg.PMM` by the guest to the `hstatus.HUPMM` field prior to invoking `HLV.\*` or `HSV.*` instructions in U-mode. +==== + +The memory accesses performed by the `HLVX.*` instructions are not subject to pointer masking. + +[NOTE] +==== +`HLVX.*` instructions, designed for emulating implicit access to fetch instructions from guest memory, perform memory accesses that are exempt from pointer masking to facilitate this emulation. For the same reason, pointer masking does not apply when MXR is set. +==== + +==== Smnpm + +`Smnpm` adds a new 2-bit WARL field (`PMM`) to bits 33:32 of `menvcfg`. Setting `PMM` enables or disables pointer masking for the next lower privilege mode (S-/HS-mode if S-mode is implemented, or U-mode otherwise), according to the values in Table 2. + +[NOTE] +==== +The type of address determines which type of pointer masking is applied. For example, when running with virtualization in VS/VU mode with `vsatp.MODE` = Bare, physical address pointer masking (zero extension) applies. +==== + +==== Smmpm + +`Smmpm` adds a new 2-bit WARL field (`PMM`) to bits 33:32 of `mseccfg`. The presence of `Smmpm` implies the presence of the `mseccfg` register, even if it would not otherwise be present. Setting `PMM` enables or disables pointer masking for M mode, according to the values in Table 2. + +==== Interaction with SFENCE.VMA + +Since pointer masking applies to the effective address only and does not affect any memory-management data structures, no SFENCE.VMA is required after enabling/disabling pointer masking. + +==== Interaction with Two-Stage Address Translation + +Guest physical addresses (GPAs) are 2 bits wider than the corresponding virtual address translation modes, resulting in additional address translation schemes Sv32x4, Sv39x4, Sv48x4 and Sv57x4 for translating guest physical addresses to supervisor physical addresses. When running with virtualization in VS/VU mode with `vsatp.MODE` = Bare, this means that those two bits may be subject to pointer masking, depending on `hgatp.MODE` and `senvcfg.PMM`/`henvcfg.PMM` (for VU/VS mode). If `vsatp.MODE` != BARE, this issue does *not* apply. + +[NOTE] +==== +An implementation could mask those two bits on the TLB access path, but this can have a significant timing impact. Alternatively, an implementation may choose to "waste" TLB capacity by having up to 4 duplicate entries for each page. In this case, the pointer masking operation can be applied on the TLB refill path, where it is unlikely to affect timing. To support this approach, some TLB entries need to be flushed when PMLEN changes in a way that may affect these duplicate entries. +==== + +To support implementations where (XLEN-PMLEN) can be less than the GPA width supported by `hgatp.MODE`, hypervisors should execute an `HFENCE.GVMA` with _rs1_=`x0` if the `henvcfg.PMM` is changed from or to a value where (XLEN-PMLEN) is less than GPA width supported by the `hgatp` translation mode of that guest. Specifically, these cases are: + +* `PMLEN=7` and `hgatp.MODE=sv57x4` +* `PMLEN=16` and `hgatp.MODE=sv57x4` +* `PMLEN=16` and `hgatp.MODE=sv48x4` + +[NOTE] +==== +`Smmpm` implementations need to satisfy max(largest supported virtual address size, largest supported supervisor physical address size) <= (XLEN - PMLEN) bits to avoid any masking logic on the TLB access path. +==== + +Implementation of an address-specific `HFENCE.GVMA` should either ignore the address argument, or should ignore the top masked GPA bits of entries when comparing for an address match. + +==== Number of Masked Bits + +As described in <<_determining_the_value_of_pmlen>>, the supported values of PMLEN may depend on the effective privilege mode. The current standard only defines PMLEN=XLEN-48 and PMLEN=XLEN-57, but this assumption may be relaxed in future extensions and profiles. Trying to enable pointer masking in an unsupported scenario represents an illegal write to the corresponding pointer masking enable bit and follows WARL semantics. Future profiles may choose to define certain combinations of privilege modes and supported values of PMLEN as mandatory. + +[NOTE] +==== +An option that was considered but discarded was to allow implementations to set PMLEN depending on the active addressing mode. For example, PMLEN could be set to 16 for Sv48 and to 25 for Sv39. However, having a single value of PMLEN (e.g., setting PMLEN to 16 for both Sv39 and Sv48 rather than 25) facilitates TLB implementations in designs that support Sv39 and Sv48 but not Sv57. 16 bits are sufficient for current pointer masking use cases but allow for a TLB implementation that matches against the same number of virtual tag bits independently of whether it is running with Sv39 or Sv48. However, if Sv57 is supported, tag matching may need to be conditional on the current address translation mode. +==== diff --git a/src/ztso-st-ext.adoc b/src/ztso-st-ext.adoc index f2ce6d1..356d6ff 100644 --- a/src/ztso-st-ext.adoc +++ b/src/ztso-st-ext.adoc @@ -1,5 +1,5 @@ [[ztso]] -== "Ztso" Extension for Total Store Ordering, Version 1.0 +== "Ztso" Extension for Total Store Ordering, Version 1.0 This chapter defines the "Ztso" extension for the RISC-V Total Store Ordering (RVTSO) memory consistency model. RVTSO is defined as a delta @@ -45,4 +45,3 @@ written assuming RVTSO will not run correctly on implementations not supporting Ztso. Binaries compiled to run only under Ztso should indicate as such via a flag in the binary, so that platforms which do not implement Ztso can simply refuse to run them. - |